Saturday, February 26, 2011

AWS CloudFormation is a provisioning and not a config mgmt tool

There's a lot of buzz on Twitter on how the recently announced AWS CloudFormation service spells the death of configuration management tools such as Puppet/Chef/cfengine/bcfg2. I happen to think that the opposite is true.

CloudFormation is a great way to provision what it calls a 'stack' in your EC2 infrastructure. A stack comprises several AWS resources such as EC2 instances, EBS volumes, Elastic Load Balancers, Elastic IPs, RDS databases, etc. Note that it was always possible to do this via your own homegrown tools by calling in concert the various APIs offered by these services/resources. What CloudFormation brings to the table is an easy way to describe the relationships between these resources via a JSON file which they call a template.

Some people get tripped by the inclusion in the CloudFormation sample templates of applications such as WordPress, Joomla or Redmine -- they think that CloudFormation deals with application deployments and configuration management. If you look closely at one of these sample templates, let's say the Joomla one, you'll see that what happens is simply that a pre-baked AMI containing the Joomla installation is used when launching the EC2 instances included in the CloudFormation stack. Also, the UserData mechanism is used to pass certain values to the instance. They do add a nice feature here where you can reference attributes defined in other parts of the stack template, such as DB endpoint address in this example:

"UserData": {
          "Fn::Base64": {
            "Fn::Join": [
              ":",
              [
                {
                  "Ref": "JoomlaDBName"
                },
                {
                  "Ref": "JoomlaDBUser"
                },
                {
                  "Ref": "JoomlaDBPwd"
                },
                {
                  "Ref": "JoomlaDBPort"
                },
                {
                  "Fn::GetAtt": [
                    "JoomlaDB",
                    "Endpoint.Address"
                  ]
                },
                {
                  "Ref": "WebServerPort"
                },
                {
                  "Fn::GetAtt": [
                    "ElasticLoadBalancer",
                    "DNSName"
                  ]
                }
              ]
            ]
          }
        },

However, all this was also possible before CloudFormation. You were always able to bake your own AMI containing your own application, and use the UserData mechanism to run whatever you want at instance creation time. Nothing new here. This is NOT configuration management. This will NOT replace the need for a solid deployment and configuration management tool. Why? Because rolling your own AMI results in an opaque 'black box' deployment. You need to document and version your per-baked AMIs carefully, then develop a mechanism for associating an AMI ID with a list of packages installed on that AMI. If you think about it, you actually end up writing an asset management tool. Then if you need to deploy a new version of the application, you either bake a new AMI (painful), or you reach for a real deployment/config mgmt tool to do it.

The alternative, which I espouse, is to start with a bare-bone AMI (I use the official Ubuntu AMIs provided by Canonical) and employ the UserData mechanism to bootstrap the installation of a configuration management client such as chef-client or the Puppet client. The newly created instance then 'phones home' to your central configuration management server (Chef server or Puppetmaster for example) and finds out how to configure itself. The beauty of this approach is that the config mgmt server keeps track of the customizations made on the client. No need for you to document that separately -- just use the search functions provided by the config mgmt tool to find out which packages and applications have been installed on the client.

The barebone AMI + config mgmt mechanism does result in EC2 instances taking longer to get fully configured initially (as opposed to the pre-baked AMI technique), but the flexibility and control you gain over those instances is well worth it.

One other argument, that I almost don't need to make, is that the pre-baked AMI technique is very specific to EC2. You will have to reinvent the wheel if you want to deploy your infrastructure to a different cloud provider, or inside your private cloud or datacenter.

So.....do continue to hone your skills at learning how to fully utilize a good configuration management tool. It will serve you well, both in EC2 and in other environments.

Tuesday, February 22, 2011

Cheesecake project now on GitHub

I received a feature request for the Cheesecake project last week (thanks Joost Cassee!), so as an experiment I also put the code up on Github. Hopefully the 'social coding' aspect will kick in and more people will be interested in the project. One can dream.

HAProxy monitoring with Nagios and Munin

HAProxy is one of the most widely used (if not THE most widely used) software load balancing solution out there. I definitely recommend it if you're looking for a very solid and very fast piece of software for your load balancing needs. I blogged about it before, but here I want to describe ways to monitor it with Nagios (for alerting purposes) and Munin (for resource graphing purposes).

HAProxy Nagios plugin

Near the top of Google searches for 'haproxy nagios plugin' is this message to the haproxy mailing list from Jean-Christophe Toussaint which contains links to a Nagios plugin he wrote for checking HAProxy. This plugin is what I ended up using. It's a Perl script which needs the Nagios::Plugin CPAN module installed. Once you do it, drop check_haproxy.pl in your Nagios libexec directory, then configure it to check the HAProxy stats with a command line similar to this:

/usr/local/nagios/libexec/check_haproxy.pl -u 'http://your.haproxy.server.ip:8000/haproxy;csv' -U hauser -P hapasswd

This assumes that you have HAProxy configured to output its statistics on port 8000. I have these lines in /etc/haproxy/haproxy.cfg:
# status page.
listen stats 0.0.0.0:8000
    mode http
    stats enable
    stats uri /haproxy
    stats realm HAProxy
    stats auth hauser:hapasswd

Note that the Nagios plugin actually requests the stats in CSV format. The output of the plugin is something like:

HAPROXY OK -  cluster1 (Active: 60/60) cluster2 (Active: 169/169) | t=0.131051s;2;10;0; sess_cluster1=0sessions;;;0;20000 sess_cluster2=78sessions;;;0;20000

It shows the active clusters in your HAProxy configuration (e.g. cluster2), together with the number of backends that are UP among the total number of backends for that cluster (e.g 169/169), and also with the number of active sessions for each cluster. If any backend is DOWN, the check status code is critical and you'll get a Nagios alert.

HAProxy Munin plugins

Another Google search, this time for HAProxy and Munin, reveals another message to the haproxy mailing list with links to 4 Munin plugins written by Bart van der Schans:

- haproxy_check_duration: monitor the duration of the health checks per server
- haproxy_errors: monitor the rate of 5xx response headers per backend
- haproxy_sessions: monitors the rate of (tcp) sessions per backend
- haproxy_volume: monitors the bps in and out per backend

I downloaded the plugins, dropped them into /usr/share/munin/plugins, symlink-ed them into /etc/munin/plugins, and added this stanza to /etc/munin/plugin-conf.d/munin-node:

[haproxy*]
user haproxy
env.socket /var/lib/haproxy/stats.socket

However, note that for the plugins to work properly you need 2 things:

1) Configure HAProxy to use a socket that can be queried for stats. I did this by adding these lines to the global section in my haproxy.cfg file:

chroot /var/lib/haproxy
user haproxy
group haproxy
stats socket /var/lib/haproxy/stats.socket uid 1002 gid 1002

(where in my case 1002 is the uid of the haproxy user, and 1002 the gid of the haproxy group)

After doing 'service haproxy reload', you can check that the socket stats work as expected by doing something like this (assuming you have socat installed):

echo 'show stat' | socat unix-connect:/var/lib/haproxy/stats.socket stdio

This should output the HAProxy stats in CSV format.

2) Edit the 4 plugins and change the 'exit 1' statement to 'exit 1' at the top of each plugin:

if ( $ARGV[0] eq "autoconf" ) {
    print_autoconf();
    exit 0;
} elsif ( $ARGV[0] eq "config" ) {
    print_config();
    exit 0;
} elsif ( $ARGV[0] eq "dump" ) {
    dump_stats();
    exit 0;
} else {
    print_values();
    exit 0;
}

If you don't do this, the plugins will exit with code 1 even in the case of success, and this will be interpreted by munin-node as an error. Consequently, you will scratch your head wondering why no haproxy-related links and graphs are showing up on your munin stats page.

Once you do all this, do 'service munin-node reload' on the node running the HAProxy Munin plugins, then check that the plugins are working as expected by cd-ing into the /etc/munin/plugins directory and running each plugin through the 'munin-run' utility. For example:

# munin-run haproxy_sessions 
cluster2.value 146761052
cluster1.value 0

That's it. These plugins make it fairly easy for you to get more peace of mind and a better sleep at night. Although it's well known that in #devops we don't sleep that much anyway...

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...