Route-Based Scaling on Kubernetes for Rails Monoliths

Joe Ferris

Diagram of Kubernetes resources for path-based scaling

Opinions on monoliths, microservices, and modularization vary widely throughout the Rails community, but one thing is certain: we still encounter a lot of monoliths out there. This is a comfortable style for many developers when building Rails applications, but some discomfort can arise when deploying and scaling those applications.

Rails applications frequently need to handle widely varying traffic patterns:

  • User requests to static, HTML-based pages
  • Sign up, sign in, and user-centric workflows
  • API requests from mobile and other front end applications
  • Calls to webhooks from integrated services
  • Backend administration pages

These pages may need to interact with the same data and business rules, but the shape of the traffic will be very different. Backend pages used by administrators will be less-frequently requested than public pages, but may need to look at larger batches of data or data from older time periods. Some pages may mostly be served by caches, whereas others may perform changes to the database in real time.

Optimizing Rails Containers

There are a number of tweaks you can make to the configuration for a Rails application which will make it respond better to certain types of traffic:

  • If your application runs large queries, it may run better with more memory.
  • If your application does a lot of rendering, it may run better with more CPU.
  • If your application is mostly waiting on IO from queries or API calls, running with more threads may help.
  • If response times vary widely, you may want to set a strict timeout to prevent long responses from holding up your web processes.

However, these settings are global and are configured when booting an application. If you’re using the same application to handle requests with different characteristics, you may be forced to choose a setting which improves response times for one action while degrading response times for another.

In addition, it may be convenient to split up the configuration and integrations for an application:

  • For an application with many endpoints, an APM tool like New Relic or Skylight could be overwhelming. Sending different application names to these services based on the subsection of the site can make it easier to see at a glance how you’re serving each audience.
  • Similarly, breaking down error tracking tools like Sentry or Airbrake might make it easier to quickly find relevant errors if you know a particular part of the site is being affected.
  • When using a reliability strategy like SRE or setting SLAs, it can be complicated to set up alarms and monitoring for a large Rails monolith.

Is there a way can we deploy an application so that we tailor the configuration and computational resources for each audience without breaking a Rails monolith into smaller services?

Routes to the Rescue

The first piece of the puzzle is to create a reliable way of knowing which traffic corresponds to which audience. Fortunately, Rails applications tend to mostly follow RESTful routes, and many applications already have route prefixes set up for different parts of the application using namespace. If your routes aren’t namespaced based on audience, that’s your first move:

Rails.application.routes.draw do
  namespace :admin do
    # ...
  end
  namespace :api do
    # ...
  end
  namespace :webhooks do
    # ...
  end
end

This means that all requests sent by users who want to use the backend admin area will have URIs starting with /admin, all users of the API will start with /api, and so on.

Ingress

Web applications on Kubernetes will use an ingress controller or API gateway to route HTTP requests to specific services running in the cluster. For a monolith, your ingress resource is usually pretty simple:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rails-app
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rails-app
            port:
              name: http

This ingress resource routes all traffic to the same backend service.

Separating Services

A typical Rails Kubernetes deployment will define a deployment and a service.

The service will find pods running in the cluster that are capable of responding to requests for that service. The deployment is responsible for launching and updating those pods.

For a Rails monolith, you would typically have a single service and deployment. Your manifests might look something like this:

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: rails-app
    app.kubernetes.io/component: puma
  name: rails-app
spec:
  ports:
  - name: http
    port: 3000
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: rails-app
    app.kubernetes.io/component: puma
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rails-app
  # ...
spec:
  # ...
  template:
    # ...
    spec:
      containers:
      - name: main
        image: example.io/mycompany/image:tag

However, we want to configure the pods differently based on the audience. This will mean making a deployment for each audience so that we can override the configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rails-app-admin
  # ...
spec:
  # ...
  template:
    # ...
    metadata:
      labels:
        app.kubernetes.io/name: rails-app
        app.kubernetes.io/component: puma-admin
    spec:
      containers:
      - name: main
        image: example.io/mycompany/image:tag
        env:
        # Example: allow a longer timeout for admin requests
        - name: RACK_TIMEOUT
          value: '30'
        # Customize pod size (memory, CPU) per audience
        resources:
          requests:
            cpu: 100m
            memory: 1024Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rails-app-api
  # ...
spec:
  # ...
  template:
    # ...
    metadata:
      labels:
        app.kubernetes.io/name: rails-app
        app.kubernetes.io/component: puma-api
    spec:
      containers:
      - name: main
        image: example.io/mycompany/image:tag
        env:
        - name: RACK_TIMEOUT
          value: '5'
        resources:
          requests:
            cpu: 1000m
            memory: 128Mi

Next, we configure services for each audience:

apiVersion: v1
kind: Service
metadata:
  name: rails-app-admin
  # ...
spec:
  # ...
  selector:
    app.kubernetes.io/name: rails-app
    app.kubernetes.io/component: puma-admin
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  name: rails-app-api
  # ...
spec:
  # ...
  selector:
    app.kubernetes.io/name: rails-app
    app.kubernetes.io/component: puma-api
  type: ClusterIP

Finally, we can modify our ingress resource to direct traffic to different services based on a path prefix:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rails-app
spec:
  rules:
  - http:
      paths:
      - path: /admin
        pathType: Prefix
        backend:
          service:
            name: rails-app-admin
            port:
              name: http
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: rails-app-api
            port:
              name: http
      # You may want this fallback route if you have un-namespaced paths
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rails-app-marketing
            port:
              name: http

Now /admin requests will be routed to pods where the Rails application has been tailored to the type of traffic generated by that audience.

What’s Next?

Writing these manifests by hand is error-prone and repetitive, so you’ll likely want to utilize a tool like Kustomize or Helm to generate the manifests for each audience.

Once you’ve separated your application into path-based audiences, there are other options for optimization depending on your use case:

  • You can set up horizontal pod autoscalers to automatically scale pods up and down based on the traffic for each audience
  • If you use vertical pod autoscalers, the cluster will right-size the CPU and memory requests for each audience based on what the Rails application requires
  • You can set different thresholds for alarms using tools like Prometheus so that each audience gets its own SLA
  • You can update the deployments for each pod using different CI/CD pipelines if desired, making it possible to roll out changes for one audience without affecting the others