Building a Service Mesh with Envoy
Published: September 30, 2019
Service Mesh is the communication layer in a microservice setup. All requests, to and from each of the services go through the mesh. Also known as an infrastructure layer in a microservices setup, the service mesh makes communication between services reliable and secure.
Each service has its own proxy service (sidecars) and all the proxy services together form the service mesh. The side cars handle communication between services, which means all the traffic goes through the mesh and this transparent layer can now control how services interact.
The service mesh provides observability, service discovery, load balancing through components controlled by APIs.
In effect, if a service wants to call another service, there is no direct call to the destination service. The request is routed to the local proxy and the proxy routes it to the destination service. This means, the service instance isn’t aware of the outside world and is only aware of the local proxy.
When talking about the service mesh, one inevitably hears of the ‘Sidecar’- a proxy available to every instance of a service. Each sidecar takes care of one instance of one service. We will talk about sidecars in a little more detail further into the article.
The sidecar
All that a service mesh can deliver
More and more organizations are moving to the microservices architecture. Such organizations require all of the above mentioned capabilities that a service mesh can deliver. A decoupled approach that severs the use of a library or custom code is a clear winner.
Let’s begin by building a service mesh setup with three services. Here’s a graphic representation of what we are trying to build:
Services setup with sidecar proxies
Here’s a look at the Front Envoy configuration:
Envoy configuration mainly consists of:
One or more listener can run in a single Envoy instance. Lines 9 to 36 mention the address and the port of the current listener. Each listener can have one or more network filters as well. These filters enable routing, tls termination, traffic shifting and similar activities. Apart from “envoy.http_connection_manager” which is one of the inbuilt filters employed, Envoy has several other filters.
There are usually more than one instance of Service A, and Envoy supports multiple load balancing algorithms to route the traffic. In our example, we weild a simple round robin algorithm.
Line 48 does not directly talk to Service A. We talk to an instance of Service A’s Envoy proxy instead, which routes to the local Service A instance.
We could also leverage a service name that would return all the instances of Service A, like a headless service in kubernetes. Yes, we are carrying out the client side load balancing. Envoy caches all the hosts of Service A, and refreshes the hosts list every 5 seconds.
Envoy also supports both active and passive health checking. So, if we want active health checks, we configure it in the cluster configuration section.
Line 8’s “static_resources” indicates manual loading of all the configurations. We will discuss how to do this dynamically a little later in this article.
While, there are many more configurations that described here, our aim is not to go through all, but to get started with minimal configuration.
Lines 11 to 39 define a listener to route traffic to the actual Service A instance. You can find the respective cluster definition for service_a instance in Lines 103 to 111.
Service A talks to Service B and Service C, which points to two more listeners and clusters, respectively. In our example, we have separate listeners for each of our upstreams (localhost, Service B and Service C). An alternate approach would be to have a single listener and route based on the url or headers to any of the upstreams.
[codeblock 3 markdown}
What could make things complicated is a single listener and a single cluster in the above configuration.
Once configurations are done, we deploy this setup to kubernetes or use a docker-compose to test it. You could run docker-compose build and docker-compose up and hit localhost:8080 to see the request pass through all the services and Envoy proxies successfully. We could also use the logs to verify the same.
We can avoid the manual configuration, as discussed earlier, and load all the components, Clusters(CDS), Endpoints(EDS), Listeners(LDS) and Routes(RDS) using an API server. So each sidecar will talk to the API server and receive configurations. When new configurations are updated to the API server, it will automatically be reflected in the Envoy instances, thus avoiding a restart.
Here is a little more information about dynamic configurations. And, here is a sample xDS server that you could use.
Single service with envoy side car
So, changes will be needed in
This is what the deployment file would look like:
The containers section has an Envoy side car added to it. And, we mount our Envoy configuration file from configmap in lines 33 to 39.
This is what it would look like:
There are two things to note here. One is that line 6 makes the service headless and two is that we are not mapping the kubernetes service port to the app’s service port, but to the Envoy’s listener port. Meaning the traffic goes to Envoy first. At this point, kubernetes would work perfectly as well.
In this post we saw how to build a service mesh using Envoy proxy. We have a setup where all the communication happens through the mesh.
Now, the mesh not only has a lot of data about the traffic but also control. With this scenario, upcoming posts will delve into how one can leverage the benefits of observability, release management and more. Also, in the current scenario, Envoy's configuration is managed manually, and in later posts we will look at using a control plane like Istio to manage the configurations.
You can find all the discussed configurations and code here.
Each service has its own proxy service (sidecars) and all the proxy services together form the service mesh. The side cars handle communication between services, which means all the traffic goes through the mesh and this transparent layer can now control how services interact.
The service mesh provides observability, service discovery, load balancing through components controlled by APIs.
In effect, if a service wants to call another service, there is no direct call to the destination service. The request is routed to the local proxy and the proxy routes it to the destination service. This means, the service instance isn’t aware of the outside world and is only aware of the local proxy.
Services with local proxies
Another resource, the Thoughtworks Technology Radar, a biannual document to assess the risks and rewards of existing and nascent technologies, “A service mesh offers consistent discovery, security, tracing, monitoring and failure handling without the need for a shared asset such as an API gateway or ESB. A typical implementation involves lightweight reverse-proxy processes deployed alongside each service process, perhaps in a separate container.”When talking about the service mesh, one inevitably hears of the ‘Sidecar’- a proxy available to every instance of a service. Each sidecar takes care of one instance of one service. We will talk about sidecars in a little more detail further into the article.
The sidecar
What can a service mesh deliver?
All that a service mesh can deliver
More and more organizations are moving to the microservices architecture. Such organizations require all of the above mentioned capabilities that a service mesh can deliver. A decoupled approach that severs the use of a library or custom code is a clear winner. Why Envoy?
Envoy isn’t the only choice when building a service mesh, other proxies like Nginx, Traefik and more are perfectly suitable. I have chosen Envoy, a high performance proxy written in C++ because I prefer Envoy’s light footprint, powerful routing, observability and extensibility.Let’s begin by building a service mesh setup with three services. Here’s a graphic representation of what we are trying to build:
Services setup with sidecar proxies
Front Envoy
Front Envoy is the edge proxy in our setup where we usually carry out TLS termination, authentication, generate request headers and more.Here’s a look at the Front Envoy configuration:
--- admin: access_log_path: "/tmp/admin_access.log" address: socket_address: address: "127.0.0.1" port_value: 9901 static_resources: listeners: - name: "http_listener" address: socket_address: address: "0.0.0.0" port_value: 80 filter_chains: filters: - name: "envoy.http_connection_manager" config: stat_prefix: "ingress" route_config: name: "local_route" virtual_hosts: - name: "http-route" domains: - "*" routes: - match: prefix: "/" route: cluster: "service_a" http_filters: - name: "envoy.router" clusters: - name: "service_a" connect_timeout: "0.25s" type: "strict_dns" lb_policy: "ROUND_ROBIN" hosts: - socket_address: address: "service_a_envoy" port_value: 8786
Envoy configuration mainly consists of:
- Listeners
- Routes
- Clusters
- Endpoints
Listeners
One or more listener can run in a single Envoy instance. Lines 9 to 36 mention the address and the port of the current listener. Each listener can have one or more network filters as well. These filters enable routing, tls termination, traffic shifting and similar activities. Apart from “envoy.http_connection_manager” which is one of the inbuilt filters employed, Envoy has several other filters.Routes
Lines 22 to 34 configure the route specification for the filter. It also specifies the domains from which we should accept requests, and a route matcher that matches each request as per the rules configured and forwards the request to the appropriate cluster.Clusters
Clusters are specifications for upstream services to which Envoy routes traffic. Lines 41 to 50 define ‘Service A,’ the only upstream to which Front Envoy will talk. “connect_timeout” is the time limit within which to establish a connection to the upstream service before returning a 503.There are usually more than one instance of Service A, and Envoy supports multiple load balancing algorithms to route the traffic. In our example, we weild a simple round robin algorithm.
Endpoints
‘hosts’ specify the instances of Service A to which we want to route traffic. In our case, we have only one.Line 48 does not directly talk to Service A. We talk to an instance of Service A’s Envoy proxy instead, which routes to the local Service A instance.
We could also leverage a service name that would return all the instances of Service A, like a headless service in kubernetes. Yes, we are carrying out the client side load balancing. Envoy caches all the hosts of Service A, and refreshes the hosts list every 5 seconds.
Envoy also supports both active and passive health checking. So, if we want active health checks, we configure it in the cluster configuration section.
Others
Lines 2 to 7 configure the admin server which helps in viewing configurations, changing log levels, viewing stats and more.Line 8’s “static_resources” indicates manual loading of all the configurations. We will discuss how to do this dynamically a little later in this article.
While, there are many more configurations that described here, our aim is not to go through all, but to get started with minimal configuration.
Service A
Here is the Envoy configuration for ‘Service A’:admin: access_log_path: "/tmp/admin_access.log" address: socket_address: address: "127.0.0.1" port_value: 9901 static_resources: listeners: - name: "service-a-svc-http-listener" address: socket_address: address: "0.0.0.0" port_value: 8786 filter_chains: - filters: - name: "envoy.http_connection_manager" config: stat_prefix: "ingress" codec_type: "AUTO" route_config: name: "service-a-svc-http-route" virtual_hosts: - name: "service-a-svc-http-route" domains: - "*" routes: - match: prefix: "/" route: cluster: "service_a" http_filters: - name: "envoy.router" - name: "service-b-svc-http-listener" address: socket_address: address: "0.0.0.0" port_value: 8788 filter_chains: - filters: - name: "envoy.http_connection_manager" config: stat_prefix: "egress" codec_type: "AUTO" route_config: name: "service-b-svc-http-route" virtual_hosts: - name: "service-b-svc-http-route" domains: - "*" routes: - match: prefix: "/" route: cluster: "service_b" http_filters: - name: "envoy.router" - name: "service-c-svc-http-listener" address: socket_address: address: "0.0.0.0" port_value: 8791 filter_chains: - filters: - name: "envoy.http_connection_manager" config: stat_prefix: "egress" codec_type: "AUTO" route_config: name: "service-b-svc-http-route" virtual_hosts: - name: "service-b-svc-http-route" domains: - "*" routes: - match: prefix: "/" route: cluster: "service_c" http_filters: - name: "envoy.router" clusters: - name: "service_a" connect_timeout: "0.25s" type: "strict_dns" lb_policy: "ROUND_ROBIN" hosts: - socket_address: address: "service_a" port_value: 8081 - name: "service_b" connect_timeout: "0.25s" type: "strict_dns" lb_policy: "ROUND_ROBIN" hosts: - socket_address: address: "service_b_envoy" port_value: 8789 - name: "service_c" connect_timeout: "0.25s" type: "strict_dns" lb_policy: "ROUND_ROBIN" hosts: - socket_address: address: "service_c_envoy" port_value: 8790
Lines 11 to 39 define a listener to route traffic to the actual Service A instance. You can find the respective cluster definition for service_a instance in Lines 103 to 111.
Service A talks to Service B and Service C, which points to two more listeners and clusters, respectively. In our example, we have separate listeners for each of our upstreams (localhost, Service B and Service C). An alternate approach would be to have a single listener and route based on the url or headers to any of the upstreams.
Service B & Service C
Service B and Service C are at the leaf level and do not talk to any other upstreams apart from local host service instance. So the configuration is going to be simple.[codeblock 3 markdown}
What could make things complicated is a single listener and a single cluster in the above configuration.
Once configurations are done, we deploy this setup to kubernetes or use a docker-compose to test it. You could run docker-compose build and docker-compose up and hit localhost:8080 to see the request pass through all the services and Envoy proxies successfully. We could also use the logs to verify the same.
Envoy xDS
We provided configurations to each of the sidecars, and depending on the service, the configuration varied between the services. While we could hand craft and manage the sidecar configurations manually, atleast for the initial 2 or 3 services - as the number of services grow, it will become difficult. Also, everytime a sidecar configuration changes, you have to restart the Envoy instance for the changes to take effect.We can avoid the manual configuration, as discussed earlier, and load all the components, Clusters(CDS), Endpoints(EDS), Listeners(LDS) and Routes(RDS) using an API server. So each sidecar will talk to the API server and receive configurations. When new configurations are updated to the API server, it will automatically be reflected in the Envoy instances, thus avoiding a restart.
Here is a little more information about dynamic configurations. And, here is a sample xDS server that you could use.
Kubernetes
This section will evaluate a what-if situation - what if we were to implement the discussed setup in Kubernetes.Single service with envoy side car
So, changes will be needed in
- Pod
- Service
Pod
While the Pod spec has only one container defined within - as per definition, a Pod can hold one or more containers. And, to run a sidecar proxy with each of our service instances, we add the Envoy container to every Pod. To communicate with the outside world, the service container will talk to the Envoy container over the localhost.This is what the deployment file would look like:
admin: access_log_path: "/tmp/admin_access.log" address: socket_address: address: "127.0.0.1" port_value: 9901 static_resources: listeners: - name: "service-b-svc-http-listener" address: socket_address: address: "0.0.0.0" port_value: 8789 filter_chains: - filters: - name: "envoy.http_connection_manager" config: stat_prefix: "ingress" codec_type: "AUTO" route_config: name: "service-b-svc-http-route" virtual_hosts: - name: "service-b-svc-http-route" domains: - "*" routes: - match: prefix: "/" route: cluster: "service_b" http_filters: - name: "envoy.router" clusters: - name: "service_b" connect_timeout: "0.25s" type: "strict_dns" lb_policy: "ROUND_ROBIN" hosts: - socket_address: address: "service_b" port_value: 8082
The containers section has an Envoy side car added to it. And, we mount our Envoy configuration file from configmap in lines 33 to 39.
Service
Kubernetes services take care of maintaining the list of Pod endpoints it can route traffic to. While kube-proxy usually handles the load balancing between the Pod endpoints, in our case, we are carrying out the client side load balancing, and we do not want the kube-proxy to load balance. We want to extract the list of Pod endpoints and load balance it ourselves. For which, a “headless service” is utilized to return the list of endpoints.This is what it would look like:
apiVersion: apps/v1beta1 kind: Deployment metadata: name: servicea spec: replicas: 2 template: metadata: labels: app: servicea spec: containers: - name: servicea image: dnivra26/servicea:0.6 ports: - containerPort: 8081 name: svc-port protocol: TCP - name: envoy image: envoyproxy/envoy:latest ports: - containerPort: 9901 protocol: TCP name: envoy-admin - containerPort: 8786 protocol: TCP name: envoy-web volumeMounts: - name: envoy-config-volume mountPath: /etc/envoy-config/ command: ["/usr/local/bin/envoy"] args: ["-c", "/etc/envoy-config/config.yaml", "--v2-config-only", "-l", "info","--service-cluster","servicea","--service-node","servicea", "--log-format", "[METADATA][%Y-%m-%d %T.%e][%t][%l][%n] %v"] volumes: - name: envoy-config-volume configMap: name: sidecar-config items: - key: envoy-config path: config.yaml
There are two things to note here. One is that line 6 makes the service headless and two is that we are not mapping the kubernetes service port to the app’s service port, but to the Envoy’s listener port. Meaning the traffic goes to Envoy first. At this point, kubernetes would work perfectly as well.
In this post we saw how to build a service mesh using Envoy proxy. We have a setup where all the communication happens through the mesh.
Now, the mesh not only has a lot of data about the traffic but also control. With this scenario, upcoming posts will delve into how one can leverage the benefits of observability, release management and more. Also, in the current scenario, Envoy's configuration is managed manually, and in later posts we will look at using a control plane like Istio to manage the configurations.
You can find all the discussed configurations and code here.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.