My raw notes on Consul — Beginner’s guide
Consul is a full-featured service mesh solution that solves the networking and security challenges of operating microservices and cloud infrastructure. It is a distributed system designed to run on a cluster of nodes, which can be a physical server, cloud instance, virtual machine, or container.
This article records my raw notes in learning Consul as a beginner. I’ll record some common conceptions and experiment on Consul in development mode to try its functionality.
Conceptions
The set of nodes Consul runs on is called a datacenter. Consul has two agent modes in the datacenter, client agent and server agent.
A client is a lightweight process that registers services, runs health checks, and forwards queries to servers. So, a client agent must be running on every node in the Consul datacenter.
A server is responsible for maintaining Consul’s state, including information about other Consul servers and clients, what services are available for discovery, and which services are allowed to access other services.
Each datacenter must have at least one server. However, to ensure that Consul’s state is preserved even if a server fails, we should always run three or five servers in production(no more than five of them). The odd number of servers strikes a balance between performance and tolerance.
Installation
You can install Consul manually on any operating system. The easiest way on macOS is homebrew
.
brew tap hashicorp/tap
brew install hashicorp/tap/consul
The first command installs the HashiCorp tap, a repository of all HashiCorp’s Homebrew packages. The second command installs Consul with hashicorp/tap/consul
, which is a signed binary and is automatically updated with every new official release.
brew upgrade hashicorp/tap/consul
This command updates the Consul.
Experiment Consul
Now run Consul in development mode(with an in-memory server mode), which isn’t secure or scalable. Start the Consul agent to try most of the Consul’s functionality without extra configuration.
Start the Consul Agent in Development Mode
consul agent -dev -node jerome
Use -node
to specify the node name to avoid DNS failure on OS X because Consul uses the local machine's hostname as the default node name. DNS queries to the node will not work with Consul if the hostname contains periods.
==> Starting Consul agent...
Version: '1.9.1'
Node ID: 'a4b4cee7-7e04-b4af-96e4-0a0032f98fff'
Node name: 'jerome'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
The partial logs above report that the Consul agent has started as a server. Now we can access 127.0.0.1:8500 through the browser to see the Consul UI.
➜ ~ consul members
Node Address Status Type Build Protocol DC Segment
jerome 127.0.0.1:8301 alive server 1.9.1 2 dc1 <all>
List the agents in the datacenter. And now there is only one member, our machine.
➜ ~ consul leave
Graceful leave complete
The command consul leave
can stop the Consul gracefully.
Register a Service
The most common way of registering services is to provide a service definition(Another way is HTTP API). Consul loads all configuration files in the configuration directory, which is named consul.d
.
Now run the command mkdir ~/consul.d
and write the following content to the file ~/consul.d/web.json
.
{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}
Use web.json file to pretend there is a service named web running on port 80. An optional tag is used to find the service later on.
cd ~; consul agent -dev -node jerome -enable-script-checks -config-dir=./consul.d
Consul cannot read ~
symbol, so we need to go to the home directory at first.
Now open localhost:8500
we will see the service we defined. In actuality, we never started a web service, but Consul can register services that aren’t running yet.
Query a Service
We can query services using either the DNS interface or HTTP API.
First, query the service using the DNS interface. The DNS name of a service registered to the Consul is Name.service.consul, where Name is the registered service’s name. So, the DNS name of the service we want to query here is web.service.consul. Consul DNS interface runs by default on port 8600
.
➜ ~ dig @127.0.0.1 -p 8600 web.service.consul SRV; <<>> DiG 9.10.6 <<>> @127.0.0.1 -p 8600 web.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59121
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;web.service.consul. IN SRV;; ANSWER SECTION:
web.service.consul. 0 IN SRV 1 1 80 jerome.node.dc1.consul.;; ADDITIONAL SECTION:
jerome.node.dc1.consul. 0 IN A 127.0.0.1
jerome.node.dc1.consul. 0 IN TXT "consul-network-segment=";; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Wed Dec 30 14:14:56 CST 2020
;; MSG SIZE rcvd: 141
The SRV
record says the web service is running on port 80 and exists on the node jerome.node.dc1.consul
.
Second, query the service using HTTP API.
➜ ~ curl http://localhost:8500/v1/catalog/service/web
We’ll see the services information in JSON style.
Update a Service
Now we add a health check for the web service. Edit ~/consul.d/web.json
like the following.
{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80,
"check": {
"args": [
"curl",
"localhost"
],
"interval": "10s"
}
}
}
Run consul reload
to make Consul aware of the new health check. Then Consul will mark web service as unhealthy because the health check is always failing.
Connect Services
Next, we will use sidecar proxies to connect services within Consul Service Mesh.
First, we start a Consul-unaware service, which is a basic echo service.
brew install socat
socat -v tcp-l:8181,fork exec:"/bin/cat"
We can use nc 127.0.0.1 8181
to verify that the service has started. Now add a file called socat.json
to the ~/consul.d
directory with the following content.
{
"service": {
"name": "socat",
"port": 8181,
"connect": {
"sidecar_service": {}
}
}
}
Run consul reload
to let Consul read the new configuration file.
There is a special field named connect
. This configuration notifies Consul to register a sidecar proxy for the process on a dynamically allocated port. Consul will not automatically start the sidecar proxy.
consul connect proxy -sidecar-for socat
The command above will start the sidecar proxy for socat
process.
Second, we connect the service web
and socat
. Update the web
service definition like the following.
{
"service": {
"name": "web",
"connect": {
"sidecar_service": {
"proxy": {
"upstreams": [
{
"destination_name": "socat",
"local_bind_port": 9191
}
]
}
}
}
}
}
Add a sidecar proxy configuration with destination service(socat) and port. Consul will register a sidecar proxy for the service web
to establish mTLS connections to socat
.
consul connect proxy -sidecar-for web
The command above will start the sidecar proxy for the web.
nc 127.0.0.1 9191
The command above will let us connect to socat service again on 9191
port.
The communication between web and socat proxies are encrypted and authorized over a mutual TLS connection, while communication between each service and its sidecar proxy is unencrypted.
Intentions
Intentions define which services are allowed to communicate with which other services. All communications between services are allowed on development mode by default.
consul intention create -deny web socat
The command above will create an intention to deny access from the web to socat.
nc 127.0.0.1 9191
The command above will fail.
consul intention delete web socat
The command above will delete the deny intention.
Now, we finished the first step, and I’ll continue to learn how to use Consul in production mode and how it works with other infrastructure tools, such as Kubernetes, Istio, and Envoy.