Tuesday, December 30, 2014

Wednesday, December 17, 2014

Dynamic DNS updates with nsupdate (new and improved!)

I blogged about this topic before. This post shows a slightly different way of using nsupdate remotely against a DNS server running BIND 9 in order to programatically update DNS records. The scenario I am describing here involves an Ubuntu 12.04 DNS server running BIND 9 and an Ubuntu 12.04 client running nsupdate against the DNS server.

1) Run ddns-confgen and specify /dev/urandom as the source of randomness and the name of the zone file you want to dynamically update via nsupdate:

$ ddns-confgen -r /dev/urandom -z myzone.com

# To activate this key, place the following in named.conf, and
# in a separate keyfile on the system or systems from which nsupdate
# will be run:
key "ddns-key.myzone.com" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";
};

# Then, in the "zone" definition statement for "myzone.com",
# place an "update-policy" statement like this one, adjusted as
# needed for your preferred permissions:
update-policy {
 grant ddns-key.myzone.com zonesub ANY;
};

# After the keyfile has been placed, the following command will
# execute nsupdate using this key:
nsupdate -k <keyfile>

2) Follow the instructions in the output of ddns-keygen (above). I actually named the key just ddns-key, since I was going to use it for all the zones on my DNS server. So I added this stanza to /etc/bind/named.conf on the DNS server:

key "ddns-key" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";
};

3) Allow updates when the key ddns-key is used. In my case, I added the allow-update line below to all zones that I wanted to dynamically update, not only to myzone.com:

zone "myzone.com" {
        type master;
        file "/etc/bind/zones/myzone.com.db";
allow-update { key "ddns-key"; };
};

At this point I also restarted the bind9 service on my DNS server.

4) On the client box, create a text file containing nsupdate commands to be sent to the DNS server. In the example below, I want to dynamically add both an A record and a reverse DNS PTR record:

$ cat update_dns1.txt
server dns1.mycompany.com
debug yes
zone myzone.com
update add testnsupdate1.myzone.com 3600 A 10.10.2.221
show
send
zone 2.10.10.in-addr.arpa
update add 221.2.10.10.in-addr.arpa 3600 PTR testnsupdate1.myzone.com
show
send

Still on the client box, create a file containing the stanza with the DDNS key generated in step 1:

$ cat ddns-key.txt
key "ddns-key" {
algorithm hmac-sha256;
secret "Wxp1uJv3SHT+R9rx96o6342KKNnjW8hjJTyxK2HYufg=";
};

5) Run nsupdate and feed it both the update_dns1.txt file containing the commands, and the ddns-key.txt file:

$ nsupdate -k ddns-key.txt -v update_dns1.txt

You should see some fairly verbose output, since the command file specifies 'debug yes'. At the same time, tail /var/log/syslog on the DNS server and make sure there are no errors.

In my case, there were some hurdles I had to overcome on the DNS server. The first one was that apparmor was installed and it wasn't allowing the creation of the journal files used to keep track of DDNS records. I saw lines like these in /var/log/syslog:

Dec 16 11:22:59 dns1 kernel: [49671335.189689] type=1400 audit(1418757779.712:12): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/myzone.com.db.jnl" pid=31154 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107
Dec 16 11:22:59 dns1 kernel: [49671335.306304] type=1400 audit(1418757779.828:13): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/rev.2.10.10.in-addr.arpa.jnl" pid=31153 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107

To get past this issue, I disabled apparmor for named:

# ln -s /etc/apparmor.d/usr.sbin.named /etc/apparmor.d/disable/
# service apparmor restart

The next issue was an OS permission denied (nothing to do with apparmor) when trying to create the journal files in /etc/bind/zones:

Dec 16 11:30:54 dns1 named[32640]: /etc/bind/zones/myzone.com.db.jnl: create: permission denied
Dec 16 11:30:54 dns named[32640]: /etc/bind/zones/rev.2.0.10.in-addr.arpa.jnl: create: permission denied

I got past this issue by running

# chown -R bind:bind /etc/bind/zones

At this point everything worked as expected.


Monday, November 17, 2014

Service discovery with consul and consul-template

I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.

If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "10.0.0.1",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true
}

(and similar for lb2, with only node_name and bind_addr changed to lb2 and 10.0.0.2 respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join 10.0.0.2
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": false
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "wordpress1",
  "server": false,
  "bind_addr": "10.0.1.1",
  "datacenter": "us-west-1b",
  "rejoin_after_leave": true
}

(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to 10.0.1.2)

After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:

# consul join 10.0.0.2

At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:

Node          Address         Status  Type    Build  Protocol
lb1           10.0.0.1:8301   alive   server  0.4.0  2
wordpress2    10.0.1.2:8301   alive   client  0.4.0  2
lb2           10.0.0.2:8301   alive   server  0.4.0  2
wordpress1    10.0.1.1:8301   alive   client  0.4.0  2

Install and run dnsmasq on all nodes

The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."

The dnsmasq configuration file for consul is:

# cat /etc/dnsmasq.d/10-consul

server=/consul./127.0.0.1#8600

This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on 127.0.0.1 running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).

To start/stop dnsmasq, use: service dnsmasq start | stop.

Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:

$ dig wordpress1.node.consul

; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;wordpress1.node.consul. IN A

;; ANSWER SECTION:
wordpress1.node.consul. 0 IN A 10.0.1.1

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 14 00:09:16 2014
;; MSG SIZE  rcvd: 76

Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
{
    "service": {
        "name": "http",
        "tags": ["primary"],
        "port":80,
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H 'Host=www.mydomain.com' http://localhost",
        "interval": "5s"
        }
    }
}

$ cat ssl_service.json
{
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "port":443,
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H 'Host=www.mydomain.com' https://localhost:443",
        "interval": "5s"
        }
    }
}

$ cat varnish_service.json
{
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"
        }
    }
}

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.1
varnish.service.consul. 0 IN A 10.0.1.2

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig varnish.service.us-west-1b.consul

;; ANSWER SECTION:
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.2
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.1

If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress1.node.us-west-1b.consul.
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress2.node.us-west-1b.consul.

;; ADDITIONAL SECTION:
wordpress1.node.us-west-1b.consul. 0 IN A 10.0.1.1
wordpress2.node.us-west-1b.consul. 0 IN A 10.0.1.2

A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

;; ANSWER SECTION:
primary.varnish.service.consul. 0 IN A 10.0.1.1

$ dig backup.varnish.service.consul

;; ANSWER SECTION:
backup.varnish.service.consul. 0 IN A 10.0.1.2

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.2

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out

$ jq . <<< `cat services.out`
{
 "http": {
   "ID": "http",
   "Service": "http",
   "Tags": [
     "primary"
   ],
   "Port": 80
 },
 "ssl": {
   "ID": "ssl",
   "Service": "ssl",
   "Tags": [
     "primary"
   ],
   "Port": 443
 }
}

Note that only the http and ssl services are shown.

Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = "127.0.0.1:8500"

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

global
  log 127.0.0.1   local0
  maxconn 4096
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  bind 0.0.0.0:80
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}


To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.

Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):

$ cat set_haproxy_maxconn.sh
#!/bin/bash

MAXCONN=4000

curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn

To verify that the value was set, I used:

$ cat query_consul_kv.sh
#!/bin/bash

curl -v http://localhost:8500/v1/kv/?recurse

$ ./query_consul_kv.sh
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/kv/?recurse HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 30563
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Mon, 17 Nov 2014 23:01:07 GMT
< Content-Length: 118
<
* Connection #0 to host localhost left intact
* Closing connection #0
[{"CreateIndex":10995,"ModifyIndex":30563,"LockIndex":0,"Key":"service/haproxy/maxconn","Flags":0,"Value":"NDAwMA=="}]

At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:

# cat /etc/init/consul-template.conf
# Consul Template (Upstart unit)
description "Consul Template"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1

respawn
respawn limit 10 10
kill timeout 10

# start consul-template

Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:

frontend http
  maxconn 4000
  bind 0.0.0.0:80
  default_backend servers-http

backend servers-http
  balance            roundrobin
  option httpchk GET /
  option  httplog

    server wordpress1 10.0.1.1:6081 weight 1 check port 6081


    server wordpress2 10.0.1.2:6081 backup weight 1 check port 6081

and

frontend https
  maxconn            4000
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin

    server wordpress1 10.0.1.1:443 weight 1 check port 443


    server wordpress2 10.0.1.2:443 backup weight 1 check port 443

I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.






Wednesday, October 15, 2014

Testing CDN and geolocation with webpagetest.org

Assume you want to migrate some.example.com to a new CDN provider. Eventually you'll have to point example.mycompany.com as a CNAME to a domain name handled by the CDN provider, let's call it example.cdnprovider.com. To test this setup before you put it in production, the usual way is to get an IP address corresponding to example.cndprovider.com, then associate example.mycompany.com with that IP address in your local /etc/hosts file.

This works well for testing most of the functionality of your web site, but it doesn't work when you want to test geolocation-specific features such as displaying the currency based on the users's country of origin. For this, you can use a nifty feature from the amazing free service WebPageTest.

On the main page of WebPageTest, you can specify the test location from the dropdown. It contains a generous list of locations across the globe. To fake your DNS setting and point example.mycompany.com, you can specify something like this in the Script tab:

setDNSName example.mycompany.com example.cdnprovider.com
navigate http://example.mycompany.com

This will effectively associate the page you want to test with the CDN provider-specified URL, so you will hit the CDN first from the location you chose.

Monday, October 13, 2014

Watch the open files limit when running Riak

I was close to expressing my unbridled joy at how little hand-holding our Riak cluster needs, when we started to see strange increased latencies when hitting the cluster, on calls that should have been very fast. Also, the health of the Riak nodes seems fine in terms of CPU, memory and disk. As usual, our good old friend the error log file pointed us towards the solution. We saw entries like this in /var/log/riak/error.log:

2014-10-11 03:22:40.565 UTC [error] <0.12830.4607> CRASH REPORT Process <0.12830.4607> with 0 neighbours exited with reason: {error,accept_failed} in mochiweb_acceptor:init/3 line 34
2014-10-11 03:22:40.619 UTC [error] <0.168.0> {mochiweb_socket_server,310,{acceptor_error,{error,accept_failed}}}
2014-10-11 03:22:40.619 UTC [error] <0.12831.4607> application: mochiweb, "Accept failed error", "{error,emfile}"

A google search revealed that a possible cause of these errors is the dreaded open file descriptor limit, which is 1024 by default in Ubuntu.

To be perfectly honest, we hadn't done almost any tuning on our Riak cluster, because it had been running so smoothly. But recently we started to throw more traffic at it, hence issues with open file descriptors made sense. To fix it, we followed the advice in this Riak doc and created /etc/default/riak with the contents:

ulimit -n 65536

We also took the opportunity to apply the networking-related kernel tuning recommendations from this other Riak tuning doc and added these lines to /etc/sysctl.conf:

net.ipv4.tcp_max_syn_backlog = 40000
net.core.somaxconn=4000
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_tw_reuse = 1

Then we ran sysctl -p to update the above values in the kernel. Finally we restarted our Riak nodes one at a time.

I am happy to report that ever since, we've had absolutely no issues with our Riak cluster.  I should also say we are running Riak 1.3, and I understand that Riak 2.0 has better tests in place for avoiding this issue.

I do want to give kudos to Basho for an amazingly robust piece of technology, whose only fault is that it gets you into the habit of ignoring it because it just works!

Thursday, October 02, 2014

A quick note on haproxy acl rules

I blogged in the past about haproxy acl rules we used for geolocation detection purposes. In that post, I referenced acl conditions that were met when traffic was coming from a non-US IP address. In that case, we were using a different haproxy backend. We had an issue recently when trying to introduce yet another backend for a given country. We added these acl conditions:

       acl acl_geoloc_akamai_true_client_ip_some_country req.hdr(X-Country-Akamai) -m str -i SOME_COUNTRY_CODE
       acl acl_geoloc_src_some_country req.hdr(X-Country-Src) -m str -i SOME_COUNTRY_CODE

We also added this use_backend rule:

      use_backend www_some_country-backend if acl_akamai_true_client_ip_header_exists acl_geoloc_akamai_true_client_ip_some_country or acl_geoloc_src_some_country

However, the backend www_some_country-backend was never chosen by haproxy, even though we could see traffic coming from IP address from SOME_COUNTRY_CODE.

The cause of this issue was that another use_backend rule (for non-US traffic) was firing before the new rule we added. I believe this is because this rule is more generic:

       use_backend www_row-backend if acl_akamai_true_client_ip_header_exists !acl_geoloc_akamai_true_client_ip_us or !acl_geoloc_src_us

The solution was to modify the use_backend rule for non-US traffic to fire only when the SOME_COUNTRY acl condition isn't met:

       use_backend www_row-backend if acl_akamai_true_client_ip_header_exists !acl_geoloc_akamai_true_client_ip_us !acl_geoloc_akamai_true_client_ip_some_country or !acl_geoloc_src_us !acl_geoloc_src_some_country

Maybe another solution would be to change the order of acls and use_backend rules. I couldn't find any good documentation on how this order affects what gets triggered when.

Wednesday, September 10, 2014

Booting a Raspberry Pi B+ with the Raspbian Debian Wheezy image

It took me a while to boot my brand new Raspberry Pi B+ with a usable Linux image. I chose the Raspbian Debian Wheezy image available on the downloads page of the official raspberrypi.org site. Here are the steps I needed:

1) Bought micro SD card. Note DO NOT get a regular SD card for the B+ because it will not fit in the SD card slot. You need a micro SD card.

2) Inserted the SD card via an SD USB adaptor in my MacBook Pro.

3) Went to the command line and ran df to see which volume the SD card was mounted as. In my case, it was /dev/disk1s1.

4) Unmounted the SD card. I initially tried 'sudo umount /dev/disk1s1' but the system told me to use 'diskutil unmount', so the command that worked for me was:

diskutil unmount /dev/disk1s1

5) Used dd to copy the Raspbian Debian Wheezy image (which I previously downloaded) per these instructions. Important note: the target of the dd command is /dev/disk1 and NOT /dev/disk1s1. I tried initially with the latter, and the Raspberry Pi wouldn't boot (one of the symptoms that something was wrong other than the fact that nothing appeared on the monitor, was that the green light was solid and not flashing; a google search revealed that one possible cause for that was a problem with the SD card). The dd command I used was:

dd if=2014-06-20-wheezy-raspbian.img of=/dev/disk1 bs=1m

6) At this point, I inserted the micro SD card into the SD slot on the Raspberry Pi, then connected the Pi to a USB power cable, a monitor via an HDMI cable, a USB keyboard and a USB mouse. I was able to boot and change the password for the pi user. The sky is the limit next ;-)




Wednesday, August 20, 2014

Two lessons on haproxy checks and swap space

Let's assume you want to host a Wordpress site which is not going to get a lot of traffic. You want to use EC2 for this. You still want as much fault tolerance as you can get at a decent price, so you create an Elastic Load Balancer endpoint which points to 2 (smallish) EC2 instances running haproxy, with each haproxy instance pointing in turn to 2 (not-so-smallish) EC2 instances running Wordpress (Apache + MySQL). 

You choose to run haproxy behind the ELB because it gives you more flexibitity in terms of load balancing algorithms, health checks, redirections etc. Within haproxy, one of the Wordpress servers is marked as a backup for the other, so it only gets hit by haproxy when the primary one goes down. On this secondary Wordpress instance you set up MySQL to be a slave of the primary instance's MySQL. 

Here are two things (at least) that you need to make sure you have in this scenario:

1) Make sure you specify the httpchk option in haproxy.cfg, otherwise the primary server will not be marked as down even if Apache goes down. So you should have something like:

backend servers-http
  server s1 10.0.1.1:80 weight 1 maxconn 5000 check port 80
  server s2 10.0.1.2:80 backup weight 1 maxconn 5000 check port 80
  option httpchk GET /

2) Make sure you have swap space in case the memory on the Wordpress instances gets exhausted, in which case random processes will be killed by the oom process (and one of those processes can be mysqld). By default, there is no swap space when you spin up an Ubuntu EC2 instance. Here's how to set up a 2 GB swapfile:

dd if=/dev/zero of=/swapfile1 bs=1024 count=2097152
mkswap /swapfile1
chmod 0600 /swapfile1
swapon /swapfile1
echo "/swapfile1 swap swap defaults 0 0" >> /etc/fstab

I hope these two things will help you if you're not already doing them ;-)

Friday, August 15, 2014

Managing OpenStack security groups from the command line

I had an issue today where I couldn't connect to a particular OpenStack instance on port 443. I decided to inspect the security group it belongs (let's call it myapp) to from the command line:

# nova secgroup-list-rules myapp
+-------------+-----------+---------+------------+--------------+
| IP Protocol | From Port | To Port | IP Range   | Source Group |
+-------------+-----------+---------+------------+--------------+
| tcp         | 80        | 80      | 0.0.0.0/0  |              |
| tcp         | 443       | 443     | 0.0.0.0/24 |              |
+-------------+-----------+---------+------------+--------------+

Note that the IP range for port 443 is wrong. It should be all IPs and not a /24 network.

I proceeded to delete the wrong rule:

# nova secgroup-delete-rule myapp tcp 443 443 0.0.0.0/24                                                               
+-------------+-----------+---------+------------+--------------+
| IP Protocol | From Port | To Port | IP Range   | Source Group |
+-------------+-----------+---------+------------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/24 |              |
+-------------+-----------+---------+------------+--------------+


Then I added back the correct rule:

 # nova secgroup-add-rule myapp tcp 443 443 0.0.0.0/0                                                                   
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range  | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/0 |              |
+-------------+-----------+---------+------------+--------------+

Finally, I verified that the rules are now correct:

# nova secgroup-list-rules myapp                                                                                       
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range  | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/0 |              |
| tcp         | 80        | 80      | 0.0.0.0/0 |              |
+-------------+-----------+---------+-----------+--------------+

Of course, the real test was to see if I could now hit port 443 on my instance, and indeed I was able to.

Tuesday, July 22, 2014

Troubleshooting haproxy 502 errors related to malformed/large HTTP headers

We had a situation recently where our web application started to behave strangely. First nginx (which sits in front of the application) started to error out with messages of this type:

upstream sent too big header while reading response header from upstream

A quick Google search revealed that a fix for this is to bump up proxy_buffer_size in nginx.conf, for both http and https traffic, along these lines:

proxy_buffer_size   256k;
proxy_buffers   4 256k;
proxy_busy_buffers_size   256k;

Now nginx was happy when hit directly. However, haproxy was still erroring out with a 502 'bad gateway' return code, followed by PH. Here is a snippet from the haproxy log file:

Jul 22 21:27:13 127.0.0.1 haproxy[14317]: 172.16.38.57:53408 [22/Jul/2014:21:27:12.776] www-frontend www-backend/www2:80 1/0/1/-1/898 502 8396 - - PH-- 0/0/0/0/0 0/0 "GET /someurl HTTP/1.1"

Another Google search revealed that PH means that haproxy rejected the header from the backend because it was malformed.

At this point, an investigation into the web app did discover a loop in the code that kept adding elements to a cookie included in the response header.

Anyway, I leave this here in the hope that somebody will stumble on it and benefit from it.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...