Data Science at Scale with Python and Dask
Data Science at Scale with Python and Dask
YOU SAVE £5.50
- Condition: Brand new
- UK Delivery times: Usually arrives within 2 - 3 working days
- UK Shipping: Fee starts at £2.39. Subject to product weight & dimension
- More about Data Science at Scale with Python and Dask
Dask simplifies the process of ingesting, filtering, and transforming data, reducing or eliminating the need for a heavyweight framework like Spark. Data Science at Scale with Python and Dask teaches readers how to build distributed data projects that can handle huge amounts of data. It introduces Dask Data Frames and teaches helpful code patterns to streamline the analysis.
Format: Paperback / softback
Length: 296 pages
Publication date: 04 October 2019
Publisher: Manning Publications
Large datasets often exhibit characteristics such as distribution, non-uniformity, and frequent changes. This complexity can make it challenging to efficiently ingest, filter, and transform data. Dask, a powerful Python library, simplifies the process of handling large datasets by providing a lightweight and scalable framework.
"Data Science at Scale with Python and Dask" is a comprehensive guide that teaches readers how to build distributed data projects capable of handling massive amounts of data. The book introduces Dask Data Frames, a powerful tool for working with structured data, and provides practical code patterns to streamline data analysis.
Key Features:
Working with Large Structured Datasets: The book covers the fundamentals of working with large structured datasets, including concepts such as data partitioning, shuffling, and distributed computing. It demonstrates how to write DataFrames, a flexible and efficient data structure in Dask, to efficiently handle large datasets.
Writing DataFrames: Readers learn how to create and manipulate DataFrames using Dask's powerful data manipulation capabilities. They explore techniques such as data filtering, aggregation, transformation, and joining, which are essential for data analysis and machine learning tasks.
Cleaning and Visualizing DataFrames: The book provides practical guidance on cleaning and visualizing DataFrames to ensure data quality and facilitate data exploration. It covers techniques such as data validation, missing value handling, data transformation, and visualization using popular libraries like Pandas and Matplotlib.
Machine Learning with Dask-ML: Dask integrates seamlessly with machine learning libraries such as Scikit-Learn, allowing users to perform distributed machine learning tasks efficiently. Readers learn how to use Dask-ML to train machine learning models on large datasets and evaluate their performance.
Working with Bags and Arrays: Dask supports the use of Bags and Arrays, which are collections of elements that can be distributed across multiple machines. Readers learn how to work with Bags and Arrays to efficiently handle complex data structures and perform parallel computations.
Written for Data Engineers and Scientists with Python Experience: The book is designed for data engineers and scientists with experience using Python. It assumes a basic understanding of the PyData stack (Pandas, NumPy, and Scikit-Learn), which will be helpful in leveraging Dask's capabilities.
No Experience with Low-Level Parallelism Required: Dask simplifies the process of parallel computing by abstracting away low-level parallelism details. Readers do not need to have prior experience with low-level parallelism to effectively use Dask.
About the Technology:
Dask is a self-contained, easily extendible library designed to query, stream, filter, and consolidate huge datasets. It leverages the power of distributed computing to efficiently handle large-scale data processing tasks.
Jesse Daniel, the author of "Data Science at Scale with Python and Dask," has five years of experience writing applications in Python, including three years working with the PyData stack. Jesse joined the faculty of the University of Denver in 2016 as an adjunct professor of business information and analytics, where he currently teaches a Python for Data Science course.
In conclusion, "Data Science at Scale with Python and Dask" is a valuable resource for data engineers, scientists, and anyone interested in building distributed data projects. It provides comprehensive coverage of working with large structured datasets, writing DataFrames, cleaning and visualizing data, machine learning with Dask-ML, and working with Bags and Arrays. With its practical approach and emphasis on scalability, Dask simplifies the process of handling large datasets and enables efficient data analysis and machine learning tasks. Whether you are working with structured data or dealing with complex data structures, this book will help you unlock the full potential of large datasets with Python and Dask.
Weight: 558g
Dimension: 235 x 190 x 20 (mm)
ISBN-13: 9781617295607
This item can be found in:
UK and International shipping information
UK and International shipping information
UK Delivery and returns information:
- Delivery within 2 - 3 days when ordering in the UK.
- Shipping fee for UK customers from £2.39. Fully tracked shipping service available.
- Returns policy: Return within 30 days of receipt for full refund.
International deliveries:
Shulph Ink now ships to Australia, Belgium, Canada, France, Germany, Ireland, Italy, India, Luxembourg Saudi Arabia, Singapore, Spain, Netherlands, New Zealand, United Arab Emirates, United States of America.
- Delivery times: within 5 - 10 days for international orders.
- Shipping fee: charges vary for overseas orders. Only tracked services are available for most international orders. Some countries have untracked shipping options.
- Customs charges: If ordering to addresses outside the United Kingdom, you may or may not incur additional customs and duties fees during local delivery.