Today I Learned - Rocky Kev

TIL Netflix's System Design

POSTED ON:

TAGS:

This post has some great designs.
How To Build Recommendation Algorithms And System Designs

Netflix' System Design

Netflix operates in two clouds:

1 - Open Connect (Netflix’s custom global CDN).

Everything that happens after you hit play is handled by Open Connect. Open Connect stores Netflix video in different locations throughout the world.

2 - AWS

Anything that doesn’t involve serving video is handled in AWS.

Both clouds must work together without error to deliver endless hours of customer-pleasing video.

How Netflix loads a movie/video:

Netflix converts the video to a type that works for your device, it's called transcoding or encoding.

It identifies a bunch of variables. Your screen-size, your network, your device, and picks the right video.

Netflix does create approx 1,200 files for every movie!!!

Use of Amazon Web Services (AWS)

All requests are handled by the server in AWS Eg: Login, recommendations, home page, users history, billing, customer support etc. Now you want to watch a video when you click the play button of the Video.

Netflix uses Amazons Elastic Load Balancer (ELB) service to route traffic to our front-end services. ELB’s are set up such that load is balanced across zones first, then instances. This is because the ELB is a two-tier load balancing scheme.

Microservices

Netflix uses MicroServices architecture to power all of the APIs needed for applications and Web apps. Each API calls the other micro-services for required data and then responds with the complete response

For reliability, Netflix uses Hystrix

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

As of Today:

Hystrix is no longer in active development, and is currently in maintenance mode.

Hystrix (at version 1.5.18) is stable enough to meet the needs of Netflix for our existing applications. Meanwhile, our focus has shifted towards more adaptive implementations that react to an application’s real time performance rather than pre-configured settings (for example, through adaptive concurrency limits). For the cases where something like Hystrix makes sense, we intend to continue using Hystrix for existing applications, and to leverage open and active projects like resilience4j for new internal projects. We are beginning to recommend others do the same.

Design Goal - Stateless

One of the major design goals of the Netflix architecture’s is stateless services.

These services are designed such that any service instance can serve any request in a timely fashion and so if a server fails it’s not a big deal. In the failure, case requests can be routed to another service instance and we can automatically spin up a new node to replace it.

Database

EC2 MySQL was ultimately the choice for the billing/user info use case, Netflix built MySQL using the InnoDB engine large ec2 instances. They even had master-master like setup with “Synchronous replication protocol” was used to enable the write operations on the primary node to be considered completed. Only after both the local and remote writes have been confirmed.

This is interesting:

The read traffic from ETL jobs was diverted to the read replica, sparing the primary database from heavy ETL batch processing. In case of the primary MySQL database failure, a failover is performed to the secondary node that was being replicated in synchronous mode. Once secondary node takes over the primary role, the route53 DNS entry for database host is changed to point to the new primary.

System Monitoring

Push all the netflix events to processing pipelines

~500 billion events and ~1.3 PB per day
~8 million events and ~24 GB per second during peak hours

What kind of events??

Read more here: https://medium.com/@narengowda/netflix-system-design-dbec30fede8d


Related TILs

Tagged:

TIL Netflix's System Design

Netflix operates in two clouds: Everything that happens after you hit play is handled by Open Connect. Open Connect stores Netflix video in different locations throughout the world. Anything that doesn’t involve serving video is handled in AWS.

TIL Netflix's System Design

Netflix operates in two clouds: Everything that happens after you hit play is handled by Open Connect. Open Connect stores Netflix video in different locations throughout the world. Anything that doesn’t involve serving video is handled in AWS.

TIL Netflix's System Design

Netflix operates in two clouds: Everything that happens after you hit play is handled by Open Connect. Open Connect stores Netflix video in different locations throughout the world. Anything that doesn’t involve serving video is handled in AWS.