Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
YOU SAVE £19.84
- Condition: Brand new
- UK Delivery times: Usually arrives within 2 - 3 working days
- UK Shipping: Fee starts at £2.39. Subject to product weight & dimension
- More about Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
Apache Spark is a cluster-computing framework that provides speed, ease of use, sophisticated analytics, and multilanguage support. This hands-on guide teaches practical algorithms and examples using PySpark, including ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each recipe includes PySpark algorithms and shell scripts.
Format: Paperback / softback
Length: 500 pages
Publication date: 26 April 2022
Publisher: O'Reilly Media, Inc, USA
Apache Spark, renowned for its lightning-fast speed, user-friendly interface, advanced analytics capabilities, and multilingual support, has become an essential skill for data engineers and data scientists. This comprehensive hands-on guide aims to introduce beginners to Spark, providing practical algorithms and examples using PySpark. Author Mahmoud Parsian guides readers through the process of solving data problems with a range of Spark transformations and algorithms.
Chapter by chapter, Parsian demonstrates how to tackle complex data challenges, including Extract, Transform, and Load (ETL), design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms executed using the PySpark driver and shell script, making the learning process seamless and accessible.
By following this book, readers will gain a deep understanding of Spark transformations and reductions, such as reduceByKey(), combineByKey(), and mapPartitions(). They will explore powerful data partitioning techniques to optimize queries and build robust models using PySpark design patterns. The book also delves into motif-finding algorithms for graph data analysis and the GraphFrames API for seamless integration with graph structures. Furthermore, readers will apply PySpark algorithms to clinical and genomics data, unlocking valuable insights and applications in these domains.
In addition to technical knowledge, the book emphasizes practical and pragmatic data design patterns, enabling readers to create efficient and scalable data pipelines. It covers essential topics such as feature engineering in machine learning algorithms, data validation, and performance optimization. With its comprehensive coverage and hands-on approach, this book is an invaluable resource for anyone seeking to gain practical knowledge of Apache Spark and apply it to real-world data engineering and science challenges.
Dimension: 232 x 178 (mm)
ISBN-13: 9781492082385
This item can be found in:
UK and International shipping information
UK and International shipping information
UK Delivery and returns information:
- Delivery within 2 - 3 days when ordering in the UK.
- Shipping fee for UK customers from £2.39. Fully tracked shipping service available.
- Returns policy: Return within 30 days of receipt for full refund.
International deliveries:
Shulph Ink now ships to Australia, Belgium, Canada, France, Germany, Ireland, Italy, India, Luxembourg Saudi Arabia, Singapore, Spain, Netherlands, New Zealand, United Arab Emirates, United States of America.
- Delivery times: within 5 - 10 days for international orders.
- Shipping fee: charges vary for overseas orders. Only tracked services are available for most international orders. Some countries have untracked shipping options.
- Customs charges: If ordering to addresses outside the United Kingdom, you may or may not incur additional customs and duties fees during local delivery.