August 30, 2019

Serverless versus Containers: A Real-World Case Study of Building a Microservice

Alban Diquet
By Alban Diquet

Data Theorem

Google claims they are doing “most of the hard work to protect” the security of containers within their cloud for customers. While Amazon claims “serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations.” Software engineering teams often need to consider  whether they should use a serverless or a container architecture, and if so, which approach will better suit their needs. As the head of engineering at Data Theorem, I encourage  our teams to evaluate and leverage new technologies when they can make our products better and more secure. As a software security company, we place special emphasis on the security aspects of the technologies we use and build ourselves. 

There are a lot of articles online that profess to say that containers provide more flexibility, especially when migrating legacy services to the cloud; but serverless is preferred when you need speed of development, scaling and lower runtime cost. 

We determined that the best way to confirm or refute conventional wisdom was to put both serverless and container architectures to a head-to-head test for a real use case of building a microservice. We wanted to understand five key considerations of these two technologies, including ease of use, cost, scalability, security and time to market. This article describes our experience and summarizes what we learned from the trial.

Our use case for the architecture comparison

We decided to test a serverless architecture using Google Cloud Function and a containers architecture using Google Kubernetes Engine to build a microservice that would become a component to our commercial software product, Brand Protect. We intended to compare the five considerations I mentioned earlier and choose our preferred architecture that best meets the needs for our specific use case.

Data Theorem’s product, Brand Protect, looks for cloned versions of our customers’ mobile applications that have been illegitimately placed in alternative app stores found around the world. The customers didn’t put their mobile apps in thesestores and would want them removed. Our product finds these illegitimate placements using a variety of machine-learning techniques, takes a screenshot of these fraudulent and unauthorized app listing in the app store, and sends that evidence in an automated request to take down the application.

In that context, the microservice we designed for our comparison needed to browse a given app store URL, take a screenshot and then store that image somewhere as part of our machine-learning based processing to determine if an automated take-down request would be necessary. Scalability of this process is a key criterion because we perform these actions hundreds of thousands of times a day, and as more customers use our product, the microservice must be able to scale with the growth. 

Our experience with a serverless architecture

The first approach we tried was the serverless architecture, as shown in Figure 1.

Figure 1: The serverless architecture of our test microservice

Following the flow from the left, the service running on Google Cloud Function receives a URL to browse to, captures a screenshot of the page, and stores the image on Google Cloud Storage. Once this is done, the service publishes results to a PubSub queue so that a follow-on service will be notified that the image is ready.

We found that it takes about two minutes to deploy every component of this architecture. It takes about 20 minutes to write the code to configure everything. Our initial impression was that it’s very simple to get started with serverless—very easy to design and deploy. In terms of infrastructure or configuration, there wasn’t much we needed to know or do. Of course, the downside is that this scenario creates vendor lock-in. We can’t just take the architecture and move it to another cloud; it’s totally tied to Google Cloud.

Next up, a container architecture

After the serverless prototype, we turned our efforts to testing a container strategy for the same microservice. The architecture is shown in Figure 2.

Figure 2: The container architecture of our test microservice

For this test, we stayed on Google Cloud and used their container orchestration offering, Google Kubernetes Engine (GKE). As before, the service starts with input of the URLs that we want to browse to and take a screenshot of if an app is found listed in an unintended store. The browsing process occurs in a container and it saves a screenshot to a virtual disk that gets automatically mounted to the worker instance running the container. And then it publishes the results to a queue—in this case, a Redis list.  

Building this and setting it up takes about 10 minutes to deploy and requires about 200 lines of code for the deployment and to configure each node of the containers. Thus, setup is 5 times more than serverless and coding is 10 times more for this architecture. Initially we were a bit overwhelmed by how many options we had and how many things we could customize and tweak. It’s impressive to be able to control every single container, every VM, how much RAM, how much disk space, the network layout, and so on.

What’s more, there is no vendor lock-in with this approach. Nothing in this architecture is specific to Google Cloud, so everything can be picked up and put into some other cloud.

A more in-depth analysis 

To give our comparison a more thorough look, we built a real-world test scenario doing 200,000 screenshot data tasks per day. Each task was memory-intensive because we had to run Chrome as a headless browser, which meant designing our virtual server architecture with a lot of RAM.

For the serverless architecture, we ran only one task per function, and for the container architecture, we ran only one task per container. This allowed us to have enough RAM to just run the browser and take a screenshot. 

Also, it’s important to note that we spread the workload throughout the day, sending about 5,000 tasks every hour instead of creating a big spike of tasks all at once. We were able to do that because we didn't have big real-time requirements for this microservice: the screenshots are not needed right away and hence it was fine to spread the workload throughout the day. This use case mirrors any kind of background batch processing and is not specific to our test case. Doing this allowed us to keep the average utilization of the container cluster high enough to avoid running idle which leads to inefficiencies and higher costs. 

Our lessons on scaling

The first thing we looked at was scaling. A big advantage with serverless is that it scales down to zero instances, so you don’t pay for anything when there’s no work going on. It scales up very quickly when there is work to do, but while one promise of Serverless is that you don't need to manage the scaling, we quickly realized that you do have to manage it, because of what happens in the “real world”:t external dependencies and APIs may not scale like serverless. For example, by default, Google Cloud Function will scale up to 1,000 instances, so you can run 1,000 tasks concurrently. But if your function is interacting with an external server, API or database, that external resource may not scale as well and will start returning errors.

In our case, we were accessing a website, and having a thousand functions crawling that website at once was going to overload it and cause an unintentional denial of service (DoS) attack. In addition, Google Cloud has per-project quotas on specific operations, such as opening a connection. Once you start having a lot of Cloud Function instances running, it's not hard to start hitting these quotas, and all the subsequent Cloud Function invocations will fail. Overall, we learned that with serverless, even though scaling is simple at first, someone eventually has to manage it. However, the good news is global scalability with serverless delivers better than anything we have seen in private data centers or anywhere else in public cloud infrastructure. 

With the container architecture and Kubernetes Engine, some VMs are always running. Figuring out the scaling took a lot more work, as we needed to figure out the number of VMs and RAM, but we also had a lot more controls and options of how our microservice should scale. We learned that with Kubernetes Engine, you have to deploy specific applications or containers to measure how busy the cluster is, and this also uses resources. Figuring out the scaling with containers is more work in the beginning, but the cost is more predictable because you know how much it’s going to scale. That said, the level of effort to sustain this architecture is non-trivial. 

  • Advantage: Serverless

Cost

The cost results were surprising to us. The serverless approach was 10 times more expensive than the container approach. We ended up paying about $20 a day with Google Cloud Function, compared to about $2 a day for the container architecture. That’s a significant difference.

Figure 3: The cost per day for our use case application

Of course these amounts only represent the GCP bill, but there are some hidden costs as well that are higher with the GKE architecture, such as needing an engineer to manage and monitor the cluster, do some maintenance, etc. We suspect your mileage may vary on the types of applications you build.  For example, if you have a time-sensitive or real-time application where there will likely be peaks and valleys of high to low workloads, we suspect the cost advantages tip in favor of serverless. However, with our batch processing approach for our specific use case, the tuning we were able to do with our container-based architecture allowed us to save more. 

  • Advantage: Containers

Security

The diagram below illustrates most of what you need to know about the differences in the approaches as far as security is concerned. Security needs to be added for the areas shown in pink and are not the considered the responsibility of cloud providers like Amazon, Google, or Microsoft.

Figure 4: Building security into our microservice

With serverless, you need to secure only the application code. The infrastructure is taken care of by the cloud provider in this specific case Google Cloud but the ame would apply to AWS Lambda, if using that service.

Much more work needs to be done to apply security to containers, and there are many articles available on this subject. Applying security with this approach is much more complex. For example, the container images can have vulnerabilities, so you have to update your images frequently and make sure they are secure. The network between the nodes needs to be encrypted in TLS or something similar. Also, the VMs themselves must be secured to prevent an attacker from taking advantage of them. The guest operating systems (OSes) that load inside these VMs have to be hardened and patched as an ongoing security practice. The container runtime (Kubernetes, the master node, etc.) is maintained by Google but again your mileage may vary as you move from one cloud provider to another.

With serverless, the most critical attack surface will be the application programming interfaces (APIs) that are exposed within the application code of a serverless app and monitoring all the types of databases and cloud resources that are dynamically attached to support the serverless app during runtime. The good news is that they are innovative new services that help to automate this type of security like Data Theorem’s API Discover and API Inspect offerings specifically designed to help with these security challenges.  From our perspective, the level of effort and potential costs to secure the infrastructure tiers of containers is significantly higher for most use cases in comparison to serverless.  .

  • Advantage: Serverless

Summary

After running our experiment pitting serverless against containers for our real-world use case, we concluded that for the microservice we were building, the serverless Google Cloud Function implementation was better suited to our needs. It is much more convenient—easy to set up, secure, scale, deploy and iterate. Those conveniences outweigh the cost disadvantage for us because, as a startup, we need to deliver our products quickly and get customers using them as soon as possible. We are willing to pay more for the time-to-market advantage and reduce complexity at the infrastructure management layer.

With that said, the 10x difference in cost we saw made us realize that we should consider using Google Kubernetes Engine for some of our projects, where the GCP bill may rise  due to the pricing of Cloud Functions. Switching to Google Kubernetes Engine would require a significant re-architecturing of the project, and the maintenance cost and overhead would become higher, but the portions of the cloud bill could be reduced.

Figure 5: The serverless architecture suits our needs

As the saying goes, “Your mileage may vary.” Both serverless and containers are exciting technologies that offer significant benefits to the discipline of software engineering. I encourage you to do your own comparison using your own use case to learn how (or even if) they fit into your software development program. If you have any questions about our experiment or would like to get a free cloud security report, please inquire here.

Security for DevOps: Enterprise Survey Report

ESG Analyst Report

ESG surveyed 371 IT and cybersecurity professionals with responsibility for cloud programs to weigh in on security.