{"product_id":"scaling-python-with-dask-from-data-science-to-machine-learning-9781098119874","title":"Scaling Python with Dask: From Data Science to Machine Learning","description":"\u003cp\u003e\u003c\/p\u003e\u003cblockquote\u003e\n\u003cbr\u003eDask is an open-source library for parallel computing in Python that makes it easy to parallelize PyData libraries, including NumPy, pandas, and scikit-learn. It is popular among industry experts and academics and is used by organizations such as Walmart, Capital One, Harvard Medical School, and NASA. This book explains how to use Dask for batch data parallel processing, key distributed system concepts, methods for using Dask with higher-level APIs and building blocks, and how to work with integrated libraries such as scikit-learn, pandas, and PyTorch. \u003c\/blockquote\u003e\u003cp\u003e\u003cstrong\u003eFormat\u003c\/strong\u003e: Paperback \/ softback\u003cbr\u003e\u003cstrong\u003eLength\u003c\/strong\u003e: 202 pages\u003cbr\u003e\u003cstrong\u003ePublication date\u003c\/strong\u003e: 01 August 2023\u003cbr\u003e\u003cstrong\u003ePublisher\u003c\/strong\u003e: O'Reilly Media\u003cbr\u003e\u003c\/p\u003e \u003cp\u003e\u003cbr\u003eModern systems, equipped with multi-core CPUs and GPUs, possess the capability for parallel computing, yet many scientific Python tools lack the design to fully harness this potential. This concise yet comprehensive resource aims to empower data scientists and Python programmers with the knowledge of how the Dask open-source library for parallel computing offers APIs that simplify the parallelization of popular PyData libraries such as NumPy, pandas, and scikit-learn. Authored by Holden Karau and Mika Kimmins, this practical guide demonstrates how to utilize Dask computations both locally and on the cloud, catering to heavier workloads. By delving into the reasons behind Dask's popularity among industry experts and academics, as well as its adoption by organizations like Walmart, Capital One, Harvard Medical School, and NASA, this book provides valuable insights into the realm of distributed computing with Dask.\u003cbr\u003e\u003cbr\u003eWhat Dask Is:\u003cbr\u003eDask is a powerful open-source library for parallel computing that leverages the distributed computing capabilities of modern systems. It provides a high-level interface for building and executing parallel computations efficiently on large datasets. Dask simplifies the process of parallelizing Python code by abstracting away the complexities of distributed computing and providing a seamless integration with existing Python libraries.\u003cbr\u003e\u003cbr\u003eWhere You Can Use Dask:\u003cbr\u003eDask is widely applicable across various domains and industries. It finds use in data analysis, machine learning, scientific computing, and high-performance computing. Data scientists and Python programmers can leverage Dask to accelerate their computations by distributing the workload across multiple cores or GPUs, leading to significant speedups and improved performance.\u003cbr\u003e\u003cbr\u003eHow Dask Compares with Other Tools:\u003cbr\u003eDask stands out from other parallel computing libraries in several ways. Firstly, it provides a flexible and scalable approach to parallelization. Dask allows users to easily scale their computations from local systems to the cloud, making it suitable for both small-scale and large-scale projects. Additionally, Dask offers a high-level interface that simplifies the process of parallelizing code, making it accessible to a wide range of Python programmers.\u003cbr\u003e\u003cbr\u003eHow to Use Dask for Batch Data Parallel Processing:\u003cbr\u003eDask excels at batch data parallel processing, which involves executing computations on large datasets in parallel. It provides a convenient way to distribute the workload across multiple machines or GPUs, enabling efficient processing of large datasets. Dask supports various parallelization strategies, including task scheduling, data partitioning, and distributed data processing, allowing users to tailor their computations to their specific needs.\u003cbr\u003e\u003cbr\u003eKey Distributed System Concepts for Working with Dask:\u003cbr\u003eUnderstanding distributed system concepts is crucial when working with Dask. Dask relies on a distributed computing framework, such as Apache Spark or Dask Distributed, to execute computations across multiple machines. Users need to familiarize themselves with concepts such as cluster management, resource allocation, and communication protocols to effectively utilize Dask.\u003cbr\u003e\u003cbr\u003eMethods for Using Dask with Higher-Level APIs and Building Blocks:\u003cbr\u003eDask integrates well with higher-level APIs and building blocks, making it easier to integrate with existing Python libraries and frameworks. Users can leverage Dask's parallelization capabilities by wrapping their code in higher-level APIs, such as Pandas DataFrames or scikit-learn's Estimator API, which provide convenient ways to perform parallel computations.\u003cbr\u003e\u003cbr\u003eHow to Work with Integrated Libraries Such as scikit-learn, pandas, and PyTorch:\u003cbr\u003eDask supports integration with popular integrated libraries such as scikit-learn, pandas, and PyTorch. Users can leverage Dask's parallelization capabilities to speed up the training and inference of machine learning models, as well as the analysis of large datasets using pandas DataFrames. Dask also provides optimized implementations of these libraries, ensuring efficient execution of parallel computations.\u003cbr\u003e\u003cbr\u003eHow to Use Dask with GPUs:\u003cbr\u003eGPUs (Graphics Processing Units) have become increasingly popular for parallel computing due to their high computational power and parallel processing capabilities. Dask supports GPU-enabled computations by providing optimized implementations of popular libraries and frameworks that utilize GPUs. Users can leverage Dask's GPU support to accelerate their computations and achieve faster results.\u003cbr\u003e\u003cbr\u003eIn conclusion, Dask is a powerful open-source library for parallel computing that empowers data scientists and Python programmers to leverage the full potential of modern systems for efficient and scalable computations. With its flexible and scalable approach, high-level interface, and integration with popular libraries and frameworks, Dask is becoming increasingly popular among industry experts and academics. Whether you are working on data analysis, machine learning, scientific computing, or high-performance computing, Dask can help you accelerate your computations and achieve better results.\u003c\/p\u003e\u003cp\u003e\u003cstrong\u003eWeight\u003c\/strong\u003e: 400g\u003cbr\u003e\u003cstrong\u003eDimension\u003c\/strong\u003e: 176 x 234 x 17 (mm)\u003cbr\u003e\u003cstrong\u003eISBN-13\u003c\/strong\u003e: 9781098119874\u003c\/p\u003e","brand":"Holden Karau,Mika Kimmins","offers":[{"title":"Paperback \/ softback","offer_id":44523051254010,"sku":"9781098119874","price":45.68,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0522\/4297\/2845\/products\/1692978328259_book.jpg?v=1693207468","url":"https:\/\/shulphink.com\/products\/scaling-python-with-dask-from-data-science-to-machine-learning-9781098119874","provider":"Shulph Ink","version":"1.0","type":"link"}