Skip to product information
1 of 1

Abdelaziz Testas

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

💎 Earn 187 Points (£1.87) on this item.

Low Stock: Only 2 copies remaining
Regular price £37.47 GBP
Regular price £44.99 GBP Sale price £37.47 GBP
Sale Sold out
Taxes included. Shipping calculated at checkout.

YOU SAVE £7.52

  • Condition: Brand new
  • UK Delivery times: Usually arrives within 2 - 3 working days
  • UK Shipping: Fee starts at £2.39. Subject to product weight & dimension

Bulk ordering. Want 15 or more copies? Get a personalised quote and bigger discounts. Learn more about bulk orders.

  • More about Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn


This book provides a roadmap for data scientists transitioning from pandas and scikit-learn to PySpark for handling vast amounts of data and achieving faster data processing times. It covers translating Python code,preprocessing large volumes of data,building and training machine learning models,and evaluating algorithms using PySpark. It is designed for data scientists, data engineers, and machine learning practitioners with some familiarity with Python but who are new to distributed machine learning and the PySpark framework.

Format: Paperback / softback
Length: 490 pages
Publication date: 24 November 2023
Publisher: APress


Distributed Machine Learning with PySpark is a comprehensive guide for data scientists looking to migrate from small data libraries like pandas and scikit-learn to big data processing and machine learning with PySpark. This book provides a roadmap to facilitate this transition, leveraging the similarities in syntax, functionality, and interoperability between these tools.

In Chapter 1, the book introduces the foundational concepts of distributed machine learning and PySpark. It covers topics such as Spark clusters, RDDs, and Spark SQL, which are essential for handling large amounts of data. The chapter also highlights the advantages of using PySpark for data processing, including its scalability, fault tolerance, and performance.

Chapter 2 delves into the differences between PySpark, scikit-learn, and pandas. It explains how PySpark differs from traditional data processing frameworks and highlights its strengths in handling large-scale data processing and machine learning tasks. The chapter also provides an overview of the key features and functionalities of PySpark, such as its resilient distributed dataset (RDD), functional programming API, and machine learning libraries.

Chapter 3 focuses on translating Python code from pandas and scikit-learn to PySpark. It provides step-by-step instructions on how to preprocess large volumes of data using PySpark, including data cleaning, feature extraction, and transformation. The chapter also covers building, training, testing, and evaluating popular machine learning algorithms such as linear and logistic regression, decision trees, random forests, support vector machines, Naïve Bayes, and neural networks.

Chapter 4 discusses the pipelines of PySpark and scikit-learn. It explains how these tools differ in their approach to data processing and machine learning tasks. The chapter also provides examples of how to combine PySpark and scikit-learn pipelines to build scalable ML data pipelines.

Chapter 5 covers advanced topics in distributed machine learning, such as distributed training, distributed data processing, and streaming data processing. It provides insights into how to optimize the performance of PySpark applications and handle real-time data processing.

Chapter 6 concludes the book by discussing the future of distributed machine learning and PySpark. It highlights the ongoing development and advancements in these tools and provides recommendations for future practitioners.

Who This Book Is For:

Distributed Machine Learning with PySpark is designed for data scientists, data engineers, and machine learning practitioners who have some familiarity with Python but are new to distributed machine learning and the PySpark framework. The book assumes a basic understanding of Python programming and mathematics, but it provides comprehensive explanations and examples to help readers grasp the concepts and apply them effectively.

In conclusion, Distributed Machine Learning with PySpark is a valuable resource for data scientists looking to migrate from small data libraries to big data processing and machine learning with PySpark. The book provides a comprehensive roadmap to facilitate this transition, leveraging the similarities in syntax, functionality, and interoperability between these tools. By mastering the fundamentals of supervised learning, unsupervised learning, NLP, and recommender systems, understanding the differences between PySpark, scikit-learn, and pandas, and performing linear regression, logistic regression, and decision tree regression with pandas, scikit-learn, and PySpark, readers will gain the skills necessary to apply these methods using PySpark, the industry standard for building scalable ML data pipelines.

Weight: 964g
Dimension: 254 x 178 (mm)
ISBN-13: 9781484297506
Edition number: 1st ed.

This item can be found in:

UK and International shipping information

UK Delivery and returns information:

  • Delivery within 2 - 3 days when ordering in the UK.
  • Shipping fee for UK customers from £2.39. Fully tracked shipping service available.
  • Returns policy: Return within 30 days of receipt for full refund.

International deliveries:

Shulph Ink now ships to Australia, Belgium, Canada, France, Germany, Ireland, Italy, India, Luxembourg Saudi Arabia, Singapore, Spain, Netherlands, New Zealand, United Arab Emirates, United States of America.

  • Delivery times: within 5 - 10 days for international orders.
  • Shipping fee: charges vary for overseas orders. Only tracked services are available for most international orders. Some countries have untracked shipping options.
  • Customs charges: If ordering to addresses outside the United Kingdom, you may or may not incur additional customs and duties fees during local delivery.
View full details