Video Streaming platforms are engineering marvels where the backend and frontend are equally difficult to design and develop as multiple complexities such as video encoding, adaptive streaming, content delivery network, user interface design, etc are involved in this. Real-life examples of some of the popular video streaming platforms are:
- Youtube
- Netflix
- Amazon Prime
- Disney + Hotstar
- Jio Cinema
Table of Contents
Understanding the Problem and Establishing the Design Scope
Although a video streaming platform consists of many functionalities such as intuitive navigation, responsive layouts, genre-categorized lists, recommendation lists, video search, and autoplay video on user interaction such as hover, we will mainly focus on uploading and watching video features throughout the article.
Before beginning with video streaming platform design, the technical architect is required to ask the following questions to a product manager or a client:
- What are the expected daily active users?
- What is the expected average daily time spent on the product?
- Are International users supported?
- What are the supported video resolutions?
- Is encryption required?
- Any file size requirement for videos?
- Are we going to leverage existing cloud infrastructures provided by Amazon, Google, or Microsoft, or going to develop everything from scratch?
- What are supported devices such as mobiles, browsers, smart TVs, etc?
All the above analysis will help architects in deciding the better infrastructure with optimal cost. For example, let us assume that the following information is collected from the above analysis:
- 5 million daily active users (DAU).
- Average no. of users are expected to watch 5 videos per day.
- 10% of active users are expected to upload 1 video per day.
- The average video size is expected to be 300 MB
From the above analysis, the architect can deduce the following insights:
- Total daily storage space needed: 5 million * 10% * 300 MB = 150 TB.
- Total data downloaded daily: 5 million * 5 videos * 300 MB ~ 7324 TB
Let’s say that the architect wants to use the AWS CDN service to serve videos to users faster. Before using CDN the architect requires to calculate how much CDN will cost to the organization. Let’s assume that 100% of traffic is served from the United States where CDN average data transfer cost per GB is $0.02. So, the per day cost of CDN to the organization will be about $150,000. From this rough cost estimation, it can be concluded that serving videos from the CDN costs a lot of money. It is seen that cloud providers are willing to lower the CDN costs significantly for big customers, but still, the cost is substantial.
Video Upload
Video uploading can be divided into 2 sub-parts:
- Upload the actual video
- Update video metadata. Metadata contains information such as video URL, size, resolution, format, user info, etc.
Flow A: Upload the actual video
The figure below shows the high-level design for video uploading.
It consists of the following components:
- Client: It can be a web browser, smartphone app, or smart TV app.
- Load Balancer: A load balancer evenly distributes requests among API servers.
- Metadata DB: Video metadata is stored in Metadata DB. It is sharded and replicated to meet high availability and performance requirements.
- Metadata Cache: For better performance video metadata is cached.
- Storage: BLOB storage system to store original videos.
- Transcoding servers: Video transcoding is also called video encoding. It is the process of converting a video format to other formats (MPEG, HLS, etc), which provide the best video streams possible for different devices and bandwidth capabilities.
- Transcoded storage: It is BLOB storage that stores transcoded video files.
- CDN: Videos are cached in CDN. From CDN only, videos are streamed.
- Completion Queue: Message queue that stores information about video transcoding completion events.
- Completion Handler: It pulls completion events from the completion queue and updates the Metadata cache and Metadata Database.
Let us understand the stepwise interaction of the above components:
1. Videos are uploaded to the original storage.
2. Transcoding servers fetch videos from the original storage and start transcoding.
3. After transcoding is completed, the following two steps are executed simultaneously:
- 3a. Transcoded videos are sent to transcoded storage.
- 3a.1. Transcoded videos are distributed to CDN.
- 3b. Transcoding completion events are queued in the completion queue.
- 3b.1. The completion handler contains a bunch of workers who keep pulling event data from the queue.
- 3b.1.a. and 3b.1.b. The completion handler updates the metadata database and cache when video transcoding is complete.
4. API servers inform the client that the video is successfully uploaded and is ready for streaming.
Flow B: Update video metadata
While a file is being uploaded to the original storage, the client simultaneously sends a request to update the video metadata as shown in the below diagram. The request contains video metadata, including file name, size, format, etc. API servers update the metadata cache and database.
Video Streaming
Whenever you watch a video, it usually starts streaming immediately and you do not wait until the whole video is downloaded. Downloading means the whole video is copied to your device, while streaming means your device continuously receives video streams from remote source videos. When you watch streaming videos, your client loads a little bit of data at a time so you can watch videos immediately and continuously.
Before diving deep into video streaming, it is important to understand the following terminologies:
1. Streaming:
It is the process of continuously transmitting data, typically audio or video, over a network in real-time. It allows users to access and view content without having to download it fully before playback.
2. Buffering:
Preloading video content to guarantee uninterrupted playback and avoid interruption from sluggish network connections.
3. Bandwidth:
It is the maximum amount of data that can be transmitted over a network connection in a given amount of time. It helps in determining the quality and speed of video streaming. Higher bandwidth allows smoother playback and faster loading time.
4. Resolution:
It is the number of pixels displayed on the screen, determining the clarity and quality of the image. Higher resolutions, such as 4K or HD, provide sharper and more detailed visuals but require faster internet speeds and more bandwidth to stream smoothly.
5. Bitrate:
It determines the amount of data transmitted per second. It is used to determine internet connection speed.
6. Frame rate:
It is the number of frames displayed in a second. A video is a group of images that are rendered one after the other to provide motion. A minimum of 24fps is required for optimal experience.
7. Codec:
It is a software(algorithm) or hardware device that compresses or decompresses digital media files. It is in charge of encoding the video and audio data into a format that the streaming platform can easily transmit and decode. Popular codecs are H.264, H.265, VP9, and AVI.
Adaptive bitrate streaming
All the modern video streaming giants use Adaptive bitrate streaming(ABR) to provide the best possible streaming experience to their users, irrespective of the network bandwidth and device capabilities. ABR flow is as follows:
- When a video is uploaded, It is splitted into manageable sections each of which is encoded at many different quality levels. Those encoded video sections are stored in the “Transcoded storage” that we saw in the previous section.
- The video player on the client side determines the bandwidth and CPU capacity in real time and accordingly requests the best quality portion that can be broadcast smoothly without much buffering.
- This implies that to prevent interruptions to the video playing, video quality is automatically modified if the viewer’s internet speed changes.
Streaming Protocols
Streaming Protocols are a standardized way to control data transfer for video streaming. The most widely used streaming protocols are as follows:
Apple HTTP Live Streaming (HLS):
- HLS operates on the concept of segmented delivery, breaking up the video content into manageable chunks and sending them over HTTP. This allows for adaptive bitrate streaming (ABR).
- Developed by Apple for delivering live and on-demand video content to Apple devices, including iPhones, iPads, and Apple TVs, it was soon adopted for the non-Apple platforms (but only with support for the proprietary Apple FairPlay DRM for digital rights management and secure video delivery).
- The use of fragmented MP4 files and HTTP2 protocol to achieve low-latency streaming makes HLS stand out. It helps to bring down the latency to a few seconds, which makes it ideal to stream live events such as sports or news broadcasting with minimum delay.
MPEG Dynamic Adaptive Streaming Over HTTP (DASH):
- Similar to HLS, DASH also uses fragmented MP4 and MPEG-2 (MPEG for Moving Picture Experts Group) Transport Stream as its container formats breaking up the video content into smaller chunks and sending them over HTTP.
- It is a vendor-neutral, open-standard streaming protocol that supports a wide range of devices and platforms and can stream video content in various video compression formats, including H.264, H.265, and VP9 developed by MPEG.
Some other popular streaming protocols are Microsoft Smooth Streaming, and Adobe HTTP Dynamic Streaming (HDS).
Video Playback (or Video Player)
The below image shows the available options on the Netflix video player.
Let’s list down the main features that the above video playback provides:
- Playback Control: Option to play, stop, pause, seek, or scrub through the video.
- Audio Controls: Option to adjust volume, to switch language.
- Subtitle Controls: Option to enable/disable subtitle, change language of subtitle.
- Skip Controls: Option to forward or backward video by fixed seconds.
- PIP mode: Go in the picture in picture mode.
- Seek Control: Seek the progress bar along with the thumbnail visible for each moment.
HTML5 Native Video Player Limitation
To develop a video player with all the above features listed, would HTML5 Native Video Player be a good choice? No, it’s not. Let me tell you why:
- Lack of streaming support. It prefers progressive video downloading over adaptive streaming. In Progressive video downloading, videos are linearly streamed. The video will be downloaded completely irrespective of the available network bandwidth of the user.
- Native video player UI is browser-specific and each browser renders the video playback controls differently, so it is challenging to update and customize it.
There are many HTML5 Video players available that provide adaptive bitrate streaming support along with other features:
- VideoJS: It is free and open-source. It supports both HLS and DASH Streaming protocols. It has multi-language, subtitles support. It is easily themeable and is extendable with plugins. It is used by many high-profile organizations such as IGN, Tumblr, LinkedIn, and The Guardian for their video needs.
- JW Player: It is an end-to-end solution, not just a video player. It handles both video upload and video streaming.
- BitMovin: It is a video player and analytics platform. It supports HLS, DASH, and Microsoft Smooth Streaming protocol. It supports multiple video codecs, subtitles, and both server and client-side ad insertions. It is trusted by Media behemoths such as BBC, RTL and DAZN.
- Shaka Player: It is an open-source JavaScript library that supports DASH and HLS adaptive streaming protocols.
Optimizations
At this point, you ought to have a good understanding of the video uploading flow and video streaming flow. Next, we will refine the system with optimizations, including speed, safety, and cost-saving.
Speed Optimizations
- Don’t upload a video as a single file. Upload it in multi-parts or chunks.
- Set up multiple upload centers across the globe. To achieve this, we use CDN as an upload center.
- We had seen the following file upload flow in “Video Upload: Flow A” Section.
Let’s magnify this flow as follows:
In the above diagram, we have shown what goes inside the “Transcoding server”. So when all the video is downloaded, the “Download module” sends output to the “Encoding module”. When all the video is encoded, the “Encoding module” sends output to the Transcoded storage to store encoded video chunks. From the Transcoded storage encoded video chunks are served to the client via CDN. Now in this architecture “Download module”, “Encoding module” and “Upload module” are tightly coupled because they are dependent on each other’s output and cannot proceed ahead without getting output from the previous module. We can optimize this architecture by introducing message queues between Download, Encoding, and Upload modules.
For example, let’s consider the “Encoding module”.
- Before the message queue is introduced, the encoding module must wait for the output of the download module.
- After the message queue is introduced, the encoding module does not need to wait for the output of the download module anymore. If there are events in the message queue, the encoding module can execute those jobs in parallel.
Safety Optimizations
- To ensure only authorized users upload videos to the right location, we introduce pre-signed URLs as shown below.
The upload flow required to be updated as follows:
1. The client makes an HTTP request to API servers to fetch the pre-signed URL, which gives access permission to the object identified in the URL. The term pre-signed URL is used by uploading files to Amazon S3. Other cloud service providers might use a different name. For instance, Microsoft Azure blob storage supports the same feature, but calls it “Shared Access Signature”.
2. API servers respond with a pre-signed URL.
3. Once the client receives the response, it uploads the video using the pre-signed URL.
- Many content makers are reluctant to post videos online because they fear their original videos will be stolen. To protect copyrighted videos, we can adopt one of the following three safety options:
- Digital rights management (DRM) systems: Three major DRM systems are Apple FairPlay, Google Widevine, and Microsoft PlayReady.
- AES encryption: You can encrypt a video and configure an authorization policy. The encrypted video will be decrypted upon playback. This ensures that only authorized users can watch an encrypted video.
- Visual watermarking: This is an image overlay on top of your video that contains identifying information for your video. It can be your company logo or company name.
Cost-Saving Optimizations
In the “Understanding the problem and establishing the design scope” section, we learned that CDN involves a substantial cost, but at the same time, it is a crucial component of video upload and streaming flow. Based on this observation, we can implement the following optimizations:
- Only serve the most popular videos from CDN and other videos from high-capacity storage video servers
- For less popular content, we may not need to store many encoded video versions. Short videos can be encoded on-demand.
- Some videos are popular only in certain regions. There is no need to distribute these videos to other regions.
- Build your own CDN like Netflix and partner with Internet Service Providers (ISPs). Building your CDN is a giant project; however, this could make sense for large streaming companies. An ISP can be Comcast, AT&T, Verizon, or other internet providers. ISPs are located all around the world and are close to users. By partnering with ISPs, you can improve the viewing experience and reduce the bandwidth charges.
How HashStudioz Will Help You Develop an On-Demand Video Streaming System Design
At HashStudioz, we specialize in crafting robust and scalable On-Demand Video Streaming Solutions tailored to your unique needs. Our expert team combines industry knowledge with cutting-edge technology to deliver a seamless user experience. Here’s how we can assist you:
- Comprehensive System Architecture: We design a scalable architecture that supports millions of users and optimizes performance, including CDN integration for efficient content delivery.
- Custom Video Encoding and Transcoding Solutions: Our team implements adaptive bitrate streaming, ensuring optimal video quality across varying bandwidths and devices. We utilize advanced codecs and transcoding methods to enhance playback.
- User-Friendly Interface Design: We create intuitive user interfaces that enhance navigation and engagement, providing features like personalized recommendations, easy search functionality, and seamless playback controls.
- Cloud Infrastructure Integration: Leveraging cloud services from AWS, Google Cloud, or Azure, we ensure that your platform is reliable and cost-effective, enabling you to scale effortlessly as your user base grows.
- Robust Security Features: We implement industry-standard security measures, including DRM, AES encryption, and visual watermarking, to protect your content from unauthorized access and piracy.
- Video Analytics and Insights: Our solutions include powerful analytics tools to track user behavior, monitor performance metrics, and gain insights into viewer preferences, allowing for continuous improvement and optimization.
- Cross-Platform Compatibility: We ensure that your video streaming platform is accessible on various devices—smartphones, tablets, smart TVs, and browsers—delivering a consistent experience across all platforms.
- Post-Launch Support and Maintenance: Our commitment doesn’t end at deployment. We offer ongoing support, regular updates, and maintenance to ensure your platform remains cutting-edge and fully operational.
On-Demand App Development: Tailored Solutions for Your Business
In addition to our expertise in video streaming system design, HashStudioz is also a leader in On-Demand App Development. We understand that the on-demand economy is rapidly evolving, and businesses need adaptable solutions to meet consumer demands. Here’s how we can assist you in this domain:
- Custom App Development: We create bespoke on-demand applications tailored to various industries, including food delivery, ride-sharing, and home services. Our apps are designed to provide seamless user experiences and efficient service delivery.
- Real-Time Features: Our development process focuses on integrating real-time functionalities such as live tracking, instant notifications, and chat features to enhance user engagement and satisfaction.
- Scalable Infrastructure: We design scalable solutions that can handle fluctuating user demands, ensuring that your app performs optimally even during peak usage times.
- User-Centric Design: Our team emphasizes intuitive UI/UX design, making it easy for users to navigate and utilize the app’s features effectively, which is crucial for retaining customers in a competitive market.
- Integrated Payment Solutions: We implement secure and versatile payment gateways to facilitate smooth transactions, offering multiple payment options to cater to diverse user preferences.
- Data-Driven Insights: Our apps include analytics features that provide valuable insights into user behavior and preferences, allowing you to refine your services and marketing strategies continuously.
- Cross-Platform Development: We ensure your on-demand app is available on multiple platforms, including iOS, Android, and web, reaching a broader audience and maximizing your market presence.
- Post-Launch Support: Our commitment extends beyond development; we offer maintenance, updates, and support to ensure your app remains functional, secure, and up-to-date with the latest features.
Your feedback and questions are important to us! If you have any queries or concerns, feel free to reach out. Let’s connect and continue the conversation!