Holden Karau,Mika Kimmins
Scaling Python with Dask: From Data Science to Machine Learning
Scaling Python with Dask: From Data Science to Machine Learning
💎 Earn 228 Points (£2.28) on this item.
YOU SAVE £18.31
- Condition: Brand new
- UK Delivery times: Usually arrives within 2 - 3 working days
- UK Shipping: Fee starts at £2.39. Subject to product weight & dimension
Bulk ordering. Want 15 or more copies? Get a personalised quote and bigger discounts. Learn more about bulk orders.
Couldn't load pickup availability
- More about Scaling Python with Dask: From Data Science to Machine Learning
Dask is an open-source library for parallel computing in Python that makes it easy to parallelize PyData libraries, including NumPy, pandas, and scikit-learn. It is popular among industry experts and academics and is used by organizations such as Walmart, Capital One, Harvard Medical School, and NASA. This book explains how to use Dask for batch data parallel processing, key distributed system concepts, methods for using Dask with higher-level APIs and building blocks, and how to work with integrated libraries such as scikit-learn, pandas, and PyTorch.
Format: Paperback / softback
Length: 202 pages
Publication date: 01 August 2023
Publisher: O'Reilly Media
Modern systems, equipped with multi-core CPUs and GPUs, possess the capability for parallel computing, yet many scientific Python tools lack the design to fully harness this potential. This concise yet comprehensive resource aims to empower data scientists and Python programmers with the knowledge of how the Dask open-source library for parallel computing offers APIs that simplify the parallelization of popular PyData libraries such as NumPy, pandas, and scikit-learn. Authored by Holden Karau and Mika Kimmins, this practical guide demonstrates how to utilize Dask computations both locally and on the cloud, catering to heavier workloads. By delving into the reasons behind Dask's popularity among industry experts and academics, as well as its adoption by organizations like Walmart, Capital One, Harvard Medical School, and NASA, this book provides valuable insights into the realm of distributed computing with Dask.
What Dask Is:
Dask is a powerful open-source library for parallel computing that leverages the distributed computing capabilities of modern systems. It provides a high-level interface for building and executing parallel computations efficiently on large datasets. Dask simplifies the process of parallelizing Python code by abstracting away the complexities of distributed computing and providing a seamless integration with existing Python libraries.
Where You Can Use Dask:
Dask is widely applicable across various domains and industries. It finds use in data analysis, machine learning, scientific computing, and high-performance computing. Data scientists and Python programmers can leverage Dask to accelerate their computations by distributing the workload across multiple cores or GPUs, leading to significant speedups and improved performance.
How Dask Compares with Other Tools:
Dask stands out from other parallel computing libraries in several ways. Firstly, it provides a flexible and scalable approach to parallelization. Dask allows users to easily scale their computations from local systems to the cloud, making it suitable for both small-scale and large-scale projects. Additionally, Dask offers a high-level interface that simplifies the process of parallelizing code, making it accessible to a wide range of Python programmers.
How to Use Dask for Batch Data Parallel Processing:
Dask excels at batch data parallel processing, which involves executing computations on large datasets in parallel. It provides a convenient way to distribute the workload across multiple machines or GPUs, enabling efficient processing of large datasets. Dask supports various parallelization strategies, including task scheduling, data partitioning, and distributed data processing, allowing users to tailor their computations to their specific needs.
Key Distributed System Concepts for Working with Dask:
Understanding distributed system concepts is crucial when working with Dask. Dask relies on a distributed computing framework, such as Apache Spark or Dask Distributed, to execute computations across multiple machines. Users need to familiarize themselves with concepts such as cluster management, resource allocation, and communication protocols to effectively utilize Dask.
Methods for Using Dask with Higher-Level APIs and Building Blocks:
Dask integrates well with higher-level APIs and building blocks, making it easier to integrate with existing Python libraries and frameworks. Users can leverage Dask's parallelization capabilities by wrapping their code in higher-level APIs, such as Pandas DataFrames or scikit-learn's Estimator API, which provide convenient ways to perform parallel computations.
How to Work with Integrated Libraries Such as scikit-learn, pandas, and PyTorch:
Dask supports integration with popular integrated libraries such as scikit-learn, pandas, and PyTorch. Users can leverage Dask's parallelization capabilities to speed up the training and inference of machine learning models, as well as the analysis of large datasets using pandas DataFrames. Dask also provides optimized implementations of these libraries, ensuring efficient execution of parallel computations.
How to Use Dask with GPUs:
GPUs (Graphics Processing Units) have become increasingly popular for parallel computing due to their high computational power and parallel processing capabilities. Dask supports GPU-enabled computations by providing optimized implementations of popular libraries and frameworks that utilize GPUs. Users can leverage Dask's GPU support to accelerate their computations and achieve faster results.
In conclusion, Dask is a powerful open-source library for parallel computing that empowers data scientists and Python programmers to leverage the full potential of modern systems for efficient and scalable computations. With its flexible and scalable approach, high-level interface, and integration with popular libraries and frameworks, Dask is becoming increasingly popular among industry experts and academics. Whether you are working on data analysis, machine learning, scientific computing, or high-performance computing, Dask can help you accelerate your computations and achieve better results.
Weight: 400g
Dimension: 176 x 234 x 17 (mm)
ISBN-13: 9781098119874
This item can be found in:
UK and International shipping information
UK and International shipping information
UK Delivery and returns information:
- Delivery within 2 - 3 days when ordering in the UK.
- Shipping fee for UK customers from £2.39. Fully tracked shipping service available.
- Returns policy: Return within 30 days of receipt for full refund.
International deliveries:
Shulph Ink now ships to Australia, Belgium, Canada, France, Germany, Ireland, Italy, India, Luxembourg Saudi Arabia, Singapore, Spain, Netherlands, New Zealand, United Arab Emirates, United States of America.
- Delivery times: within 5 - 10 days for international orders.
- Shipping fee: charges vary for overseas orders. Only tracked services are available for most international orders. Some countries have untracked shipping options.
- Customs charges: If ordering to addresses outside the United Kingdom, you may or may not incur additional customs and duties fees during local delivery.
