TIL Netflix's method of providing internal tooling at scale

POSTED ON: Oct 12, 2021

Netflix had a problem. Before any program can be deployed, it had to run through a rigorous security checklist.

Adding to the complexity, many of the checklist items themselves had a variety of different options to fulfill them (“new apps do this, but legacy apps do that”; “Java apps should use this approach, but Ruby apps should try one of these four things”… yes, there were flowcharts inside checklists.

For many applications in Netflix, they use the same set of security rules.

They decided to build a security tool to help them with it, called Wall-E, which would be used in a application to automagically make security decisions.

The challenge is Team adoption! In other words, the teams would have to opt-in to Wall-E. The problem is that security conversations tend to be kicked back.

During our initial consultations, it was clear that developers preferred prioritizing product work over security or infrastructure improvements. Our meetings usually ended like this: “Security suggested we talk to you, and we like the idea of improving our security posture, but we have product goals to meet. Let’s talk again next quarter”.

Something in Netflix is that each team creates a version-controlled YAML file.

Originally this was intended as a simplified and developer-friendly way to help collect domain names and some routing rules into a versionable package, but we quickly realized we had stumbled into a powerful model: we were harvesting developer intent.

Having the developer intent allowed the security team to construct better stories to explain WHY to use Wall-E.

Teams started to notice.

For a typical paved road application with no unusual security complications, a team could go from “git init” to a production-ready, fully authenticated, internet accessible application in a little less than 10 minutes.

If they chose not to use Wall-E during set up, they were given a prompt asking why, and would make adjustments to fix that.

Wall-E now fronts over 350 applications, and is adding roughly 3 new production applications (mostly internet-facing) per week.

Even better - now that their tool is so prevalent, they pushed on to make it invisible and automate it.

In 2019, essentially 100% of the Wall-E app configuration was done manually by developers. In 2021, that interaction has changed dramatically: now more than 50% of app configuration in WallE is done by automated tools (which are acting on higher-level abstractions on behalf of developers).

The Show Must Go On: Securing Netflix Studios At Scale

Related TILs

Tagged: security

TIL how to build a chrome extension that steals everything

There's 3 components that will be used - background Service worker, Content script, and popup.

Mar 23, 2023

TIL executing a xss using a SVG image

This user was able to upload a '.svg', that then executed a xss attack to steal local storage data.

Mar 21, 2023

TIL How to steal localData using an XSS attack

But that's just a red flag that opens the door to bigger issues.

Mar 20, 2023