Netflix: Microservices Cloud Architecture Design Analysis

June 12, 2021

iauro Team

Contributing immensely to the global software solutions ecosystem. DevOps | Microservices | Microfrontend | DesignThinking LinkedIn


In a few recent blogs, we explained why it’s important for applications to adopt a four-tier application architecture that is developed and deployed as a set of microservices. It is becoming increasingly apparent that if you continue to use the development and application architecture processes that worked great a decade ago, you simply won’t be able to move fast enough to capture and hold the interest of mobile users, who can choose from an ever-growing number of applications.

Switching to a microservices architecture presents exciting opportunities for businesses in the market. System architects and developers are assured of unprecedented levels of control and speed in delivering innovative new web experiences to their customers. But at such a breathtaking pace, it can feel like there isn’t much room for error. In the real world, you cannot stop developing and deploying your applications as you retool your processes to do so. I know that future success depends on the move to a microservices architecture, but what do you really do?

Several early adopters of microservices are now generously sharing their open-source experiences, not only in the form of published code but also in conference presentations and blog posts. Netflix is ​​a prime example of this.

This post outlines best practices for defining and designing a microservices architecture.

Optimize speed, not efficiency

If you ask developers Is a slow development process better? No one will ever say yes. Neither management nor clients ever complain that your development cycle is too fast for them. The need for speed isn’t just about tech companies: as software becomes more ubiquitous on the Internet of Things – in cars, appliances, and sensors, as well as mobile devices – companies that haven’t done software before are now realizing that their success depends on the ability to do it well.

Netflix made the decision early to optimize the speed. This applies in particular to the tools in your software development process so that you can respond quickly to what your customers want, or better yet, create innovative web experiences that attract customers. Speed ​​means learning about your customers and providing them with what they want at a faster pace than your competitors. By the time competitors are ready to challenge you in a certain way, you will have moved on to the next set of improvements.

This approach turns the usual optimization paradigm for efficiency upside down. Efficiency usually means trying to control the overall flow of the development process to avoid duplication of effort and avoid mistakes in order to keep costs down. Typically, you focus on saving rather than looking for opportunities to increase income.

Efficiency becomes secondary as you satisfy this constraint and don’t slow down. To make your business more efficient, you need to move faster.

Make sure your assumptions are correct

Many large companies that are successful in their market (we may call them old-timers) find themselves being overtaken by more flexible, usually smaller, organizations (disruptors) that react much more quickly to changes in consumer behavior. Their large size is not necessarily the root of the problem – for example, Netflix is ​​no longer a small company. The main reason for the difficulties for the traditional players in the industry is that they operate on business assumptions that are no longer true.

Of course, you have to make assumptions when formulating a business model, and then it makes sense to optimize your business practices around them. The danger is that you stick to assumptions after they are no longer true, which means that you are optimizing the wrong thing. This is when you become vulnerable to industry revolutionaries making the right assumptions and optimizing the current business climate.

As examples, consider the following assumptions, which prevail in many operating companies. We’ll take a closer look at these in the above sections and describe the approach taken by Netflix.

  • Computing power is expensive. This was true when the increase in computing power required capital expenditures on computer hardware. See the “Hosting Infrastructure in the Cloud” section.
  • The process prevents problems. In many companies, the standard response to something going wrong is to add a preventive measure to the procedure.


Hosting Infrastructure in the Cloud

The only way to increase your computing power was to buy computer hardware. Then you could make money by using this expensive resource correctly to solve customer problems.

The advent of cloud computing has almost completely disproved this assumption. Now you can buy the required amount of resources when you need it, and pay only for the time that you actually use. You need to make a new assumption that (virtual) machines are ephemeral. You can create and destroy them with a single click of a button or API call without any negotiations with other departments in your company.

One way to think about this change is that the self-service cloud makes previously impossible things instant. All Netflix engineers are based in California, but they manage the worldwide infrastructure. The cloud allows them to experiment and determine if adding servers at a specific location improves performance (for example). Let’s say they noticed a problem with video delivery to Brazil. They can easily set up 100 instances of cloud servers in the center of the city in Brazil in a couple of hours. If, after a week, they find that the difference in delivery speed and reliability isn’t big enough to justify the cost of additional servers, they can shut down them as quickly and easily as they created.

Such an experiment would be so costly with traditional infrastructure that you would never do it. You will need to hire an agent in the city to coordinate a project, find a data center, comply with Brazilian government regulations, send cars to Brazil, and so on. It will be six months before you can run a test and find that increasing your local capacity hasn’t improved your delivery speed.

Make a culture of opportunity and obligation with less effort

Earlier, we noticed that many companies create rules and processes to prevent problems. When someone makes a mistake, they add a rule to the personnel manual that says, “Okay, don’t do it again.” If you read some HR manuals from this perspective, you can extract historical records of everything that went wrong in the company. When something goes wrong during development, the appropriate reaction is to add a new step to the procedure. The fundamental issue with making a difficult avoidance measure is that over the long run you develop complex cycles of “scar tissue” that slow you down.

Netflix doesn’t have an HR guide. There is one rule: “Act in the best interests of Netflix.” The idea is that if an employee cannot figure out how to interpret leadership in a given situation, he or she does not have enough judgment to work there. If you don’t trust the opinions of the people on your team, you should ask why you are hiring them. It is true that sometimes you have to fire people for breaking the rules. Overall, a high level of mutual trust between team members and within the company as a whole becomes a strong bonding force.

Replace scattered repositories with microservices commands

Most of the software development groups are divided into separate groups, between which there can be no duplication of personnel. A typical software development project process begins with a product manager meeting with user and developer groups to discuss ideas for new features. Once the idea is implemented in code, the code is passed on to the QA and database administration teams and discussed in other meetings.

Communication with the system, network, and SAN administrators is often done through tickets. The whole process is usually slow and overhead. Some companies try to accelerate by creating small startup-style teams that manage the development process from start to finish. Alternatively, as a result of the acquisition, such teams may continue to operate independently as an independent unit of the acquired company. But if small teams are still doing a monolithic delivery, there is usually still a handover between individuals or groups responsible for different functions. This process suffers from the same problems as monolithic delivery in larger companies – it just isn’t very efficient or flexible.

Accept continuous delivery

A siloed team organization is usually combined with a monolithic delivery model in which an integrated, feature-rich application is released as a single unit (often with a version number) on a regular schedule. Most software development teams initially use this model because it is relatively simple and works reasonably well with a small number of developers (say 50 or fewer). However, as the team grows, it becomes a real problem when you find a bug in one developer’s code during QA or production testing, and 99 other developers are blocked from release until the bug is fixed.

In 2009, Netflix adopted a continuous delivery model that fits perfectly with the microservices architecture. Each microservice is a separate product feature that can be updated independently of other microservices and on its own schedule. Finding a bug in a microservice does not affect the release schedule for other microservices. Continuous Delivery depends on bundling microservices in standard containers. Netflix originally used Amazon Machine Images (AMIs) and it was possible to roll out the update to a test or production environment in about 10 minutes. With Docker, this time is reduced even more, in some cases to a few seconds.

At Netflix, the conceptual framework for continuous development and delivery is the Observe-Orient-Decide-Act (OODA) cycle.

Observe refers to examining your current status to find places where you can innovate. You want your company culture to implicitly allow anyone who sees an opportunity to start a project to use it. Many people abandon the registration process on your website when they reach a certain stage. You can take a project to find out the cause and fix the problem.

Orient refers to the analysis of indicators in order to understand the causes of the phenomena that you observed at the observation point. This often includes analyzing large amounts of unstructured data such as log files; this is often referred to as big data analytics. The answers you are looking for are not yet in your business intelligence database. You explore data that no one has looked at before and ask questions that you haven’t asked before.

Decide alludes to creating and executing a task plan. The culture of the company is an important factor at this stage. As discussed earlier, in a culture of high freedom and high responsibility, you don’t need to get management approval before you start making changes. You share your plan, but you don’t need to ask permission.

Action refers to testing your solution and putting it into production. You deploy a microservice, which includes your incremental function, into the cloud, where it is automatically put into an A / B test to compare with the previous solution, side by side, as long as it takes to collect data that shows if you have the approach is better. Shared microservices are not violated, and clients do not see your changes until they are selected for testing. If your solution is better, you deploy it to a production environment. This shouldn’t be a big improvement. If the number of clients for your microservice is large enough, then it can be shown that even a small fraction of a percentage improvement (say, response time) is statistically significant, and the cumulative effect of many small changes over time can be significant.

How iauro Can Help you –

iauro believes that adopting an architecture where applications are developed and deployed as a set of microservices is critical to future success. When delivering apps, iauro provides an application delivery platform that provides the superior performance, reliability, and scalability that users expect. Moving to a single software tool for web services, load balancing, and content caching makes it easier and more successful to fully adopt a microservices-based architecture. Our approach allows developers to define and control the perfect delivery of microservices while respecting the standards and best practices introduced by the product team.


Submit a Comment

Your email address will not be published. Required fields are marked *

Subscribe for updates