YouTube System Architecture

YouTube System Architecture

Written by WWCode HQ

Mie Haga

April 23 2023 is World Book Day! I would like to introduce you to System Design Interview – An insider's guide by Alex Xu. This is a great book for every software engineer who wants to get started with system design.

I decided to write a blog post about this book because I learned the basics and the real-world application through this book. I would recommend this to everyone who is interested in getting started with system architecture.

The author, Alex Xu, is making these two chapters open to the public on his website, so please check out his articles as well if you are interested!

In this blog, I would like to introduce you to the system design of YouTube I learned from the book + my own research to dig a little deeper into it! Among all the variety of system design examples in the book, I chose to write about Youtube because I am a big fan of the platform! Furthermore, Youtube system architecture is a great example to go over the basics of system design.

In this blog, we will focus on two features of the Youtube platform:

  • Uploading a video

  • Watching a video

Although System Design Interview – An insider's guide is to help you prepare for a system design interview, I put emphasis on system design concepts, not the interview prep part.


Uploading a Video

What happens when you upload a video on Youtube?

The flow of an actual video

Object storage vs databases

Speeding up the upload to object storage(S3)

The flow of metadata of the video provided by the user

Watching a Video

CDN(Content Delivery Network)

Load Balancer(LB)

Database Replication


Replicating across geographical locations

How do we recover quickly when the data center fails?


Uploading a Video

What happens when you upload a video on Youtube?

There are majorly two flows on video upload:

  • the flow of an actual video

  • the flow of metadata of the video provided by the user

The flow of an actual video

You have a video file, puppy.mp4, and you want to upload the video to the Youtube platform so that many people can watch your puppy video.

You upload the video through Youtube, and the uploaded video is sent to object storage. An example of object storage is AWS S3. In fact, Youtube does not use S3(S3 is Amazon’s product; Youtube is under Google/alphabet), and Google has its own distributed object storage. I am more familiar with AWS, so I will stick with using S3 as an example.

Object storage vs databases

Why do we store the video in object storage(like S3), why not in databases(MySQL, MongoDB, etc…)? It is commonly known that storing an object file, such as videos or images, is not the best practice for many reasons—slower queries, not scalable, etc.. (for more detail, check out this resource)

Speeding up the upload to object storage(S3)

AWS provides multipart upload and transfer acceleration service that helps developers increase object upload performance. Using multipart upload, upload speed increases by splitting a large video into smaller chunks and sending it to the object storage separately. Transfer Acceleration helps us provide a consistent network experience and lower latency. Youtube is part of Google, so in reality, they have their own ways to achieve this performance increase, but I believe this concept applies similarly. If you are interested in learning more about this. There is an amazing article for more information.

Now that Puppy.mp4 is stored in the object storage, the video is sent to a transcoding server. The primary purpose of the transcoding server is to convert a video format to other formats. This makes it easier for Youtube to provide a better video-watching service for users.

In brief sentences, some of the reasons why video transcoding is done:

  • reducing a large-size video to a smaller size

  • encode a video format(e.g. mp4) into many other formats

  • multiple resolution types

(In this blog, we are going to focus on something other than what video transcoding is and how it works since we want to focus on the breadth of architecture.)

After the video transcoding, the transcoded video data is sent to another object storage(e.g. S3) and saved.

As soon as the video transcoding is done, the transcoding server sends a request to a server—completion server that handles the metadata of transcoded video and every post-processing required after the transcoding. This completion server then saves the metadata to a database.

Here, we will choose MySQL database to store the metadata. In reality, Youtube uses a database called Vitess. It is created to solve its own unique scalability challenges.

NoSQL database has its own advantages as well. Especially, when we deal with a large number of videos, the efficient way to access video details, such as title and metadata, is by providing centralized access; In Video table, for example, each video detail is associated with its unique key.

The flow of metadata of the video provided by the user

Alongside the flow of an actual video file, there is another flow of metadata of the video file. This metadata includes user data, file name, size, format, timestamp, etc. This user request with the metadata is sent to an API server, which processes and saves the data to the database.

Watching a Video

Now that you are done uploading your puppy video to Youtube, how do other users watch the video?

On our Youtube system, when Alice watches the video, the entire video is not downloaded to her laptop, but she can watch the video. How does this work?

The puppy video is split into a few seconds chunks, and each chunk is sent to Alice one by one. While Alice is watching a few seconds of a chunk, the next chunk is sent to Alice. If the connection between Alice and our Youtube system is good, our Youtube system will send a good-quality video chunk, and if the connection is bad, a lower-quality video chunk is sent. This way, Alice can continue watching the puppy video without any interruption with the sacrifice of video quality. This is called adaptive bitrate streaming.

The next question is, where does the video data come from?

Let’s say, you have a friend Alice who wants to watch the video. She can directly access the video through video servers where the video servers obtain the puppy video from the object storage.

Alice watching the puppy video data from the video server is a perfectly working solution. However, if the video server is located in the US and Alice lives in Japan. Alice has to fetch the video data from a very far server and it takes a lot of time to load the video. User experience-wise, this is not great. CDN(Content Delivery Network) solves this problem.

CDN(Content Delivery Network)

A few examples of CDN are, Amazon CloudFront, CloudFlare, and Akamai. In reality, Youtube has its own global network—google could CDN.

Saving the video in CDN nearby where Alice lives solves this video loading latency problem.

Let’s add CDN to our youtube architecture.

One side note here is that storing all videos in CDN, while it reduces latency, is very expensive. We do not need to achieve high performance on loading time for every video that is uploaded to youtube. Therefore, for a more cost-optimal architecture, you want to store only popular videos that are accessed frequently in CDN, and other videos from a video server.

At this point, the big picture of our Youtube architecture looks like this:

Furthermore, CDN can get expensive since CDN is hosted all across the globe. You can reduce the cost by restricting the number of locations you provide your content

Load Balancer(LB)

So far, our architecture is good for a limited number of users.

Big-scale systems like Youtube have 5 million daily users. Our current architecture is not able to support that many loads. A single server is not optimal to handle 5 million daily users.

We will solve this problem by introducing a load balancer. Some examples of load balancers are Amazon Elastic Load Balancer, Nginx, etc…

Let’s apply a load balancer to our Youtube backend system.

In the section The flow of metadata of the video provided by the user, we talked about the flow of metadata associated with the video. Let’s recap the flow diagram.

Let’s zoom into this part.

The diagram below is the resulting flow of metadata upload after introducing the load balancer.

The load balancer can avoid the point of failure(POF). Even if one API server is down, all the requests are routed to all the remaining healthy API servers.

On top of the API server replications and a load balancer, API servers should be stateless. Stateless means that the user state is not saved in a specific API server, so even if a single server fails, all the users that had been connecting to the failed server can be directed to other healthy servers. If you are interested, watch this video about stateless vs stateful applications.

Database Replication

In fact, as we introduce a load balancer to our Youtube architecture, another problem rises. As more and more users use our service, to meet the increasing QPS (Queries Per Second) demand, the database also has to be horizontally scaled. Many API servers are accessing a single database, but a single database doesn’t have all the necessary resources to take all the incoming transactions. We will fix this issue by database replication.

Let’s apply database replication to our Youtube backend system.

Furthermore, you can scale the database by sharding.


Now we have successfully scaled our youtube architecture to server metadata upload to millions of users. In this section, we will discuss how to increase the performance of reading metadata saved in the databases — caching.

Examples of caches are, Amazon Elasticache, Redis, Memcached, etc.

Let’s apply cache to our Youtube backend system.

Replicating across geographical locations

We architected the Youtube system and hosted it in a specific region, let’s say this is US West Region.

Our Youtube service is perfectly fine by hosting it in US West Region. However, what if data centers(you can think of this as servers) that host our Youtube service in the US West Region fail? If it fails, our users cannot watch videos on our Youtube service until the data centers recover— which severely impacts our business!

How do we recover quickly when the data center fails?

The answer is replication. We can geographically replicate our Youtube system all over the globe(e.g. hosting the replica of our Youtube service in the US East Region). This way, even if the US West Region fails, the US East Region will still be available. User request traffics will be routed to the data center in US East Region.

For example, AWS Global Accelerator is specifically made to achieve this. In reality, again, google has its own global network infra that achieves the same thing.

Replication across many geographical locations helps improve our Youtube application performance as well. Imagine we have two users, user1 in the US West region and user2 in the US East region. user1’s request should be routed to the data center in the US West region, and user2’s request should be routed to the data center in the US East region. This is because the farther the user is from the server, the longer it takes to send a request and get a response back. This is a reference that explains that network latency increases as farther you are away from the server.

Overall, this is the end of this blog, and I hope this helps you get started with your system design journey! Please feel free to send me a direct message on Slack for any questions or if you would like further discussion!


System Design Interview – An insider's guide, Alex Xu



What is adaptive bitrate streaming?,

Why Storing Files in the Database Is Considered Bad Practice,

AWS Global Accelerator,

What Is Network Latency, Common Causes, and Best Ways to Reduce It,

Stateful vs Stateless Applications,

Uploading and copying objects using multipart upload,

S3 Transfer Acceleration,