Dec 2, 2019 6 min read kubernetes

Single-Node Patterns in distributed systems

Intro

The potential for rapid, viral growth of a service means that every application has to be built to scale nearly instantly in response to user demand.

As a starting point, the value that patterns for distributed systems offer is the opportunity to figuratively stand on the shoulders of giants. It’s rarely the case that the problems we solve or the systems we build are truly unique.

What is a pattern for a distributed system?

There are plenty of instructions out there that will tell you how to install specific distributed systems (such as a NoSQL database). But when I speak of patterns, I’m referring to general blueprints for organizing distributed systems, without mandating any specific technology or application choices. The purpose of a pattern is to provide general advice or structure to guide your design. The hope is that such patterns will guide your thinking and also be generally applicable to a wide variety of applications and environments.

Learning about patterns for distributed system development is the same as learning about any other best practice in computer programming.

Learning about and accelerating our understanding of distributed systems

2. Patterns provide a shared vocabulary that enables us to understand each other quickly.

Let's see an example of point 2.

To better understand this, imagine that we both are using the same object to build our house. I call that object a “Foo” while you call that object a “Bar.” How long will we spend arguing about the value of a Foo versus that of a Bar, or trying to explain the differing properties of Foo and Bar until we figure out that we’re speaking about the same object?

3. Reusable container.

We no longer have to spend time defining what it means to be a sidecar and can instead jump immediately to how the concept can be used to solve a particular problem. “If we just use a sidecar” … “Yeah, and I know just the container we can use for that.”

This example leads to the third value of patterns: the construction of reusable components. Also, Implementing these patterns as container images with HTTP based interfaces means they can be reused across many different programming languages.

-> shared code base gets sufficient usage to identify bugs and weaknesses.

Unfortunately distributed system design continues to be more of a black art practiced by wizards than a science applied by laypeople.

Single-Node Patterns

We know the aim is to run applications on many different machines, but fly down starting from patterns that exist on a single node.

The single container is not the atomic element yet, groups of containers are now the atomic element (spoiler -> in Kubernetes a group of correlated container is called Pod).

Why split up an application on a single machine into a group of containers?

resource isolation;
small, focused pieces for each team to own;
reusable modules that can be used by many teams;
separation of concerns;
more reliable rollouts (and rollbacks).

In contrast to multi-node, distributed patterns, all of these patterns assume tight dependencies among all of the containers in the pattern.

Let's see single-node patterns by examples!

Sidecar pattern

Example 1: adding HTTPS to a Legacy Service.

Consider, for example, an application with a source code that was built with an old version of the company’s build system, which no longer functions. This application obviously expose services only over HTTP protocol. Containerizing this HTTP application is simple enough: the binary can run in a container with a version of an old Linux distribution.

Instead of adding HTTPS feature of the old application (which is more challanging) is faster add an NGINX container with SSL behind of it.

SSL proxy and Legacy HTTP Service must share the same namespace.

You can generally use the sidecar pattern to adapt legacy applications where you no longer wanted to make modifications to the original source code.

Example 2: Dynamic Configuration.

New applications can be written with the expectation that configuration is a dynamic property that should be obtained using a cloud API, but adapting and updating an existing application can be significantly more challenging.

We can implement this flow, simply sending a SIGHUP signal from the Config Manager Sidecar when it refreshes the config file.

Sidecar pattern goals -> modularity + reusability

Modularity and reusability, just like achieving modularity in highquality software development requires focus and discipline. In particular, you need to focus on developing three areas:

1.Parameterizing your containers

Consider your container as a function in your program. How many parameters does it have? Each parameter represents an input that can customize a generic container to a specific situation.

docker run -e=PROXY_PORT=8080 -e=CERT_PATH=/path/to/cert.crt

2. Creating the API surface of your container

3. Documenting the operation of your container

As with software libraries, the key to building something truly useful is explaining how to use it. The right place to write documentation is Dockerfile.

An example is the PROXY_PORT parameter indicates the port on localhost to redirect traffic to.

es.

ENV PROXY_PORT 8000

Other docker words are: EXPOSE, LABELS. If you don't know it well, i suggest to read the Dockerfile documentation (https://docs.docker.com/engine/reference/builder/).

The names for the labels are drawn from the schema established by the Label Schema project (https://github.com/label-schema/label-schema.org).

https://github.com/label-schema/label-schema.org

Ambassadors

An ambassador container is a broker that manage interactions between the application container and the rest of the world.

Let's see ambassador pattern examples.

Example 1: Using an Ambassador to Shard a Service.

As the figure below shows, when we have a sharded service we need to develop a logic in order to choose which is the elected shard to use.

Hands on code.

You should now have DNS entries for shardedredis-0.redis, sharded-redis-1.redis and sharded-redis-2.redis.

We can use these names to configure twemproxy. Twemproxy is a lightweight, highly performant proxy for memcached and Redis, which was originally developed by Twitter and is open source and available on GitHub. We can configure twemproxy to point to the replicas we created by using the following configuration:

Example 2: Using an Ambassador for Service Brokering

Example 3: Using an Ambassador to Do Experimentation or Request Splitting

In many production systems, it is advantageous to be able to perform request splitting, where some fraction of all requests are not serviced by the main production service but rather are redirected to a different implementation of the service. Most often, this is used to perform experiments with new, beta versions of the service to determine if the new version of the software is reliable or comparable in performance to the currently deployed version.

Adapter Pattern

Example 1: Using Prometheus for Monitoring

Prometheus (link) expects every container to expose a specific metrics API.

Example 2: Normalizing Different Logging Formats

Etherogeneous applications generally have not a standard while logging out.

Most of log manager (es. Graylog) expect every log in a new line. Java stacktrace, altought, it is multiline. Also different libraries has different timestamp format when printing on stdout the logs.

So, imagine that we have to monitor Redis (link). Redis has a powerful command SLOWLOG that allow to show slow query that exceeded a particular time interval. As i said, SLOWLOG is a command utility that is executable directly inside the redis server.

To fix this limitation we can create a new container "redis-log-adapter" who runs the SLOWLOG command every N seconds (N as a parameter) and print the ouput on the STDOUT with the right format.

TIP: instead of doing this tricks by hands, Fluentd (https://github.com/fluent/fluentd) is a nice popular utility for that.

Example 3: Adding a Health Monitor on a database container

Consider the task of monitoring the health of an off-the-shelf database container, for example a MySql. We absolutely don't want to edit the MySql image in order to expose an HTTP API to check the status running a specific database query. With this API, Kubernetes, for example, can use liveness to check the database status and eventually restart it.

So the answer that we already know is, use an adapter container!