Designing Data-Intensive Applications

Notify me when the book’s added

esandrewhas quoted4 years ago
However, the downside of approach 2 is that posting a tweet now requires a lot of extra work. On average, a tweet is delivered to about 75 followers, so 4.6k tweets per second become 345k writes per second to the home timeline caches. But this average hides the fact that the number of followers per user varies wildly, and some users have over 30 million followers. This means that a single tweet may result in over 30 million writes to home timelines! Doing this in a timely manner—Twitter tries to deliver tweets to followers within five seconds—is a significant challenge. In the example of Twitter, the distribution of followers per user (maybe weighted by how often those users tweet) is a key load parameter for discussing scalability, since it determines the fan-out load. Your application may have very different characteristics, but you can apply similar principles to reasoning about its load.
- 1 like
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
Hyeonsoo Shinhas quoted4 days ago
Operability
Make it easy for operations teams to keep the system running smoothly.
Simplicity
Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system. (Note this is not the same as simplicity of the user interface.)
Evolvability
Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability, or plasticity.
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
Hyeonsoo Shinhas quoted17 days ago
A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
Hyeonsoo Shinhas quoted21 days ago
We call an application data-intensive if data is its primary challenge—the quantity of data, the complexity of data, or the speed at which it is changing—as opposed to compute-intensive, where CPU cycles are the bottleneck
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
Samson Mwathihas quotedlast year
Many applications today are data-intensive , as opposed to compute-intensive
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
b9449300348has quoted2 years ago
CPU clock speeds are barely increasing, but multi-core processors are stand
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
Peter Gazaryanhas quoted2 years ago
A data-intensive application is typically built from standard building blocks that provide commonly needed functionality. For example, many applications need to:

Store data so that they, or another application, can find it again later (databases)

Remember the result of an expensive operation, to speed up reads (caches)

Allow users to search data by keyword or filter it in various ways (search indexes)

Send a message to another process, to be handled asynchronously (stream processing)

Periodically crunch a large amount of accumulated data (batch processing)
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
exordiumexordiumhas quoted4 years ago
The currently trendy style of application development involves breaking down functionality into a set of services that communicate via synchronous network requests such as REST APIs (see “Dataflow Through Services: REST and RPC”). The advantage of such a service-oriented architecture over a single monolithic application is primarily organizational scalability through loose coupling: different teams can work on different services, which reduces coordination effort between teams (as long as the services can be deployed and updated independently).
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
esandrewhas quoted4 years ago
There is no quick solution to the problem of systematic faults in software. Lots of small things can help: carefully thinking about assumptions and interactions in the system; thorough testing; process isolation; allowing processes to crash and restart; measuring, monitoring, and analyzing system behavior in production.
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this
esandrewhas quoted4 years ago
Sometimes, when discussing scalable data systems, people make comments along the lines of, “You’re not Google or Amazon. Stop worrying about scale and just use a relational database.” There is truth in that statement: building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses.
- Like
- Comment
- Share
  Facebook
  Twitter
  Copy link
- Report this