Hadoop and Spark Performance for the Enterprise

Notify me when the book’s added

Distributed computing environments, such as Hadoop and Spark, are busy places. Multiple users are submitting various jobs with various needs in an environment where resource limits are fixed and high-priority jobs cannot be distinguished from low-priority jobs. This constant resource contention creates complex performance issues for organizations--overprovisioning of resources, late jobs, and SLAs that are not met. How can distributed processing environments evolve to ensure Quality of Service (QoS)? In this report, Andy Oram explores QoS: what it is, why it is important for organizations and professionals, and how it can be implemented in the data pipeline. By looking at how operating systems and data warehouses have developed over the years, Oram offers a glimpse of what distributed processing QoS could be. He examines two systems--Quasar and Pepperdata--that aim to improve performance in distributed processing environments by using predictive profiling and real-time assessment of resource allocation.

This book is currently unavailable

22 printed pages

On the bookshelves

Алексей Хмелёв
Big Data
- 8
Unfollow