Where to Scale Your Workloads

Season of Scale

Stephanie Wong
Google Cloud - Community

--

“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

In Season 1, we’re covering Infrastructure Automation and High Availability:

  1. Patterns for scalable and resilient applications
  2. Infrastructure as code
  3. Immutable infrastructure
  4. Where to scale your workloads (this article)
  5. Globally autoscaling web services
  6. High Availability (Autohealing & Auto updates)

In this article I’ll walk you through the compute options to scale your workloads.

Check out the video

Review

In the last article we learned about immutable infrastructure. It’s a key operational best practice for creating scalable environments that don’t put you at risk of configuration drift. If you are rolling out changes to your operating system, application, etc., you want to make sure your changes are tracked, rolled out through a CICD pipeline, and will be applied with certainty to fresh, immutable instances. Now Critter Junction is facing a new dilemma: deciding where to run their new game app on Google Cloud. While each application has its own infrastructure and scaling requirements, GCP offers a multitude of compute options. But how do you balance performance, flexibility, cost efficiency, and language support?

The Layout App

With more users than ever, and with their new immutable infrastructure, Critter Junction was in a great place to launch a new app. They wanted to launch a companion game that used the same database, but let you build and share house layouts using your inventory, codenamed The Layout App.

With a new app launch, scale and unpredictability is always a big challenge. You may never know exactly how many players will show up on launch day. They needed to find a solution on Google Cloud that could automatically handle any amount of load. The team already knew they wanted the layout app to scale independently from their web servers, but they had lots of options.

One solution is to run the layout app on a separate set of VMs. Using Google’s global load balancer, you can send traffic to backend managed instance groups (MIGs). These can automatically scale multiple, identical machines based on metrics like CPU utilization, so they could easily handle more traffic across regions.

Containerization

But it turned out the Layout App didn’t need to run with access to the operating system. And the team wanted their apps to be lightweight and portable across environments, so they decided to containerize it. Containers are basically isolated packages for just running an application. It’s a great way to build a cloud-native app, but you still need to serve it.

Two choices for serving containers would be managed instance groups running container-optimized OS, or Google Kubernetes Engine. Both solutions automatically scale up as demand increases, adding new machines as needed and removing them when demand is low. This can increase availability at peak times while keeping your costs manageable.

Either would work to run their app and meet their unpredictable scaling needs, but the team wanted further abstraction from the infrastructure. They decided they didn’t want up front provisioning and wanted the ability to scale down to zero resources.

While you must have a minimum of 1 instance for MIGs, you can specify a minimum of zero nodes for GKE (an idle node pool can scale down completely). But, at least one node must always be available in the cluster to run system Pods.

Serverless

It was time for the team to look at serverless technology. These technologies almost completely abstract away the infrastructure and orchestration. Unlike the previous options, these handle automatic resource scaling with virtually no configuration or server provisioning. Serverless keys in on running your application, so you can focus on your code. Google Cloud has three main options for running serverless.

Cloud Functions

First up, Cloud Functions is great for snippets of code and small, single-purpose applications. Functions scale up by creating new instances as demand rises, and each function handles a single request. You can scale down to zero and limit the maximum number of instances for full flexibility. It’s perfect for gluing together different cloud services like calling an API based on a schedule or sending notifications — but it was too simple for running the layout app.

App Engine

So they looked to App Engine, which offers the ability to run containers and custom web applications in a serverless way, automatically creating and shutting down instances as traffic fluctuates. You can configure the settings for automatic scaling based on your app’s needs. For example, scaling can be based on CPU utilization, throughput utilization, or maximum concurrent requests.

This could run their containerized app without a problem, but the team wanted to make sure they could run their app in multiple clouds down the road and didn’t want to re-architect.

Cloud Run

Finally, there’s Cloud Run, which is built on a project called Knative. Knative is a Kubernetes-based platform to build, deploy, and manage modern serverless workloads. With Cloud Run, Kubernetes management is taken care of behind the scenes. Each deployment is automatically scaled to the number of instances needed to handle all incoming requests, and it can scale from zero to your specified max instances. Because the app team expected traffic spikes, Cloud Run allowed them to keep a number of idle instances to minimize cold starts. If they wanted to later port to multiple clouds, they would be able to use the same application by leveraging Knative.

Given their needs for container portability, fast scaling, and minimal infrastructure overhead, the team decided on Cloud Run and was able to launch faster than they thought possible! While each compute option I covered has autoscaling capabilities built in, you can decide which of these solutions works best for the requirements of your app, giving you the power to choose and scale.

Stay tuned for our next piece on autoscaling web services. And remember, always be architecting.

Next steps and references:

--

--

Stephanie Wong
Google Cloud - Community

Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful