Showing posts from 2010

Using libcloud to manage instances across multiple cloud providers

More and more organizations are moving to ‘the cloud’ these days. In most cases, using ‘the cloud’ means buying compute and storage capacity from a public cloud vendor such as Amazon, Rackspace, GoGrid, Linode, etc. I believe that the next step in cloud usage will be deploying instances across multiple cloud providers, mainly for high availability, but also for performance reasons (for example if a specific provider has a presence in a geographical region closer to your user base).

All cloud vendors offer APIs for accessing their services -- if they don’t, they’re not a genuine cloud vendor in my book at least. The onus is on you as a system administrator to learn how to use these APIs, which can vary wildly from one provider to another. Enter libcloud, a Python-based package that offers a unified interface to various cloud provider APIs. The list of supported vendors is impressive, and more are added all the time. Libcloud was started by Cloudkick but has since migrated to the Apache …

A Fabric script for striping EBS volumes

Here's a short Fabric script which might be useful to people who need to stripe EBS volumes in Amazon EC2. Striping is recommended if you want to improve the I/O of your EBS-based volumes. However, striping won't help if one of the member EBS volumes goes AWOL or suffers performance issues. In any case, here's the Fabric script:

import commands from fabric.api import * # Globals env.project='EBSSTRIPING' env.user = 'myuser' DEVICES = [ "/dev/sdd", "/dev/sde", "/dev/sdf", "/dev/sdg", ] VOL_SIZE = 400 # GB # Tasks def install(): install_packages() create_raid0() create_lvm() mkfs_mount_lvm() def install_packages(): run('DEBIAN_FRONTEND=noninteractive apt-get -y install mdadm') run('apt-get -y install lvm2') run('modprobe dm-mod') def create_raid0(): cmd = 'mdadm --create /dev/md0 --level=0 --chunk=256 --raid-devices=4 ' for de…

Automated deployments with LittleChef

See my post on the Sysadvent blog.

Working with Chef attributes

It took me a while to really get how to use Chef attributes. It's fairly easy to understand what they are and where they are referenced in recipes, but it's not so clear from the documentation how and where to override them. Here's a quick note to clarify this.

A Chef attribute can be seen as a variable that:

1) gets initialized to a default value in cookbooks/mycookbook/attributes/default.rb


default[:mycookbook][:swapfilesize] = '10485760'
default[:mycookbook][:tornado_version] = '1.1'
default[:mycookbook][:haproxy_version] = '1.4.8'
default[:mycookbook][:nginx_version] = '0.8.20'

2) gets used in cookbook recipes such as cookbooks.mycookbook/recipes/default.rb or any other myrecipefile.rb in the recipes directory; the syntax for using the attribute's value is of the form #{node[:mycookbook][:attribute_name]}

Example of using the haproxy_version attribute in a recipe called haproxy.rb:

# install haproxy from source
haproxy = &q…

How to whip your infrastructure into shape

It's easy, just follow these steps:

Step 0. If you're fortunate enough to participate in the design of your infrastructure (as opposed to being thrown at the deep end and having to maintain some 'legacy' one), then try to aim for horizontal scalability. It's easier to scale out than to scale up, and failures in this mode will hopefully impact a smaller percentage of your users.

Step 1. Configure a good monitoring and alerting system

This is the single most important thing you need to do for your infrastructure. It's also a great way to learn a new infrastructure that you need to maintain.

I talked about different types of monitoring in another blog post. My preferred approach is to have 2 monitoring systems in place:
an internal monitoring system which I use to check the health of individual servers/devicesan external monitoring system used to check the behavior of the application/web site as a regular user would.
My preferred internal monitoring/alerting system i…

MySQL load balancing with HAProxy

In an earlier blog post I was advising people to use HAProxy 1.4 and above if they need MySQL load balancing with health checks. It turns out that I didn't have much luck with that solution either. HAProxy shines when it load balances HTTP traffic, and its health checks are really meant to be run over HTTP and not plain TCP. So the solution I found was to have a small HTTP Web service (which I wrote using tornado) listening on a configurable port on each  MySQL node.

For the health check, the Web service connects via MySQLdb to the MySQL instance running on a given port and issues a 'show databases' command. For more in-depth checking you can obviously run fancier SQL statements.

The code for my small tornado server is here. The default port it listens on is 31337.

Now on the HAProxy side I have a "listen" section for each collection of MySQL nodes that I want to load balance. Example:
listen mysql-m0 mode tcp option httpchk GET /mysqlchk/?port=3…

Introducing project "Overmind"

Overmind is the brainchild of Miquel Torres. In its current version, released today, Overmind is what is sometimes called a 'controller fabric' for managing cloud instances, based on libcloud. However, Miquel's Roadmap for the project is very ambitious, and includes things like automated configuration management and monitoring for the instances launched and managed via Overmind.

A little bit of history: Miquel contacted me via email in late July because he read my blog post on "Automated deployment systems: push vs. pull" and he was interested in collaborating on a queue-based deployment/config management system. The first step in such a system is to actually deploy the instances you need configured. Hence the need for something like Overmind.

I'm sure you're asking yourself -- why do these guys wanted to roll their own system? Why not use something like OpenStack? Note in late July OpenStack had only just been announced, and to this day (mid-October 2010…

Getting detailed I/O stats with Munin

Ever since Vladimir Vuksan pointed me to his Ganglia script for getting detailed disk stats, I've been looking for something similar for Munin. The iostat and iostat_ios Munin plugins, which are enabled by default when you install Munin, do show disk stats across all devices detected on the system. I wanted more in-depth stats per device though. In my case, the devices I'm interested in are actually Amazon EBS volumes mounted on my database servers.

I finally figured out how to achieve this, using the diskstat_ Munin plugin which gets installed by default when you install munin-node.

If you run

/usr/share/munin/plugins/diskstat_ suggest

you will see the various symlinks you can create for the devices available on your server.

In my case, I have 2 EBS volumes on each of my database servers, mounted as /dev/sdm and /dev/sdn. I created the following symlinks for /dev/sdm (and similar for /dev/sdn):

ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_latency_sdm
ln -sn…

Quick note on installing and configuring Ganglia

I decided to give Ganglia a try to see if I like its metric visualizations and its plugins better than Munin's. I am still in the very early stages of evaluating it. However, I already banged my head against the wall trying to understand how to configure it properly. Here are some quick notes:

1) You can split your servers into clusters for ease of metric aggregation.

2) Each node in a cluster needs to run gmond. In Ubuntu, you can do 'apt-get install ganglia-monitoring' to install it. The config file is in /etc/ganglia/gmond.conf. More on the config file in a minute.

3) Each node in a cluster can send its metrics to a designated node via UDP.

4) One server in your infrastructure can be configured as both the overall metric collection server, and as the web front-end. This server needs to run gmetad, which in Ubuntu can be installed via 'apt-get install gmetad'. Its config file is /etc/gmetad.conf.

Note that you can have a tree of gmetad nodes, with the root of the…

Managing Rackspace CloudFiles with python-cloudfiles

I've started to use Rackspace CloudFiles as an alternate storage for database backups. I have the backups now on various EBS volumes in Amazon EC2, AND in CloudFiles, so that should be good enough for Disaster Recovery purposes, one would hope ;-)

I found the documentation for the python-cloudfiles package a bit lacking, so here's a quick post that walks through the common scenarios you encounter when managing CloudFiles containers and objects. I am not interested in the CDN aspect of CloudFiles for my purposes, so for that you'll need to dig on your own.

A CloudFiles container is similar to an Amazon S3 bucket, with one important difference: a container name cannot contain slashes, so you won't be able to mimic a file system hierarchy in CloudFiles the way you can do it in S3. A CloudFiles container, similar to an S3 bucket, contains objects -- which for CloudFiles have a max. size of 5 GB. So the CloudFiles storage landscape consists of 2 levels: a first level of con…

MySQL InnoDB hot backups and restores with Percona XtraBackup

I blogged a while ago about MySQL fault-tolerance and disaster recovery techniques. At that time I was experimenting with the non-free InnoDB Hot Backup product. In the mean time I discovered Percona's XtraBackup (thanks Robin!). Here's how I tested XtraBackup for doing a hot backup and a restore of a MySQL database running Percona XtraDB (XtraBackup works with vanilla InnoDB too).

First of all, I use the following Percona .deb packages on a 64-bit Ubuntu Lucid EC2 instance:

# dpkg -l | grep percona
ii libpercona-xtradb-client-dev 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database development files
ii libpercona-xtradb-client16 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client library
ii percona-xtradb-client-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client binaries
ii percona-xtradb-common 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database common files (e.g. /etc
ii percona-xtradb-server-5.1 …

Poor man's MySQL disaster recovery in EC2 using EBS volumes

First of all, I want to emphasize that this is NOT a disaster recovery strategy I recommend. However, in a pinch, it might save your ass. Here's the scenario I have:

2 m1.large instances running Ubuntu 10.04 64-bit and the Percona XtraDB MySQL builds (for the record, the exact version I'm using is "Server version: 5.1.43-60.jaunty.11-log (Percona SQL Server (GPL), XtraDB 9.1, Revision 60")I'll call the 2 servers db101 and db201each server is running 2 MySQL instances -- I'll call them m1 and m2instance m1 on db101 and instance m1 on db201 are set up in master-master replication (and similar for instance m2)the DATADIR for m1 is /var/lib/mysql/m1 on each server; that file system is mounted from an EBS volume (and similar for m2)the configuration files for m1 are in /etc/mysql1 on each server -- that directory was initially a copy of the Ubuntu /etc/mysql configuration directory, which I then customized (and similar for m2)the init.d script for m1 is in /etc/ini…

Visualizing MySQL metrics with the munin-mysql plugin

Munin is a great tool for resource visualization. Sometimes though installing a 3rd party Munin plugin is not as straightforward as you would like. I have been struggling a bit with one such plugin, munin-mysql, so I thought I'd spell it out for my future reference. My particular scenario is running multiple MySQL instances on various port numbers (3306 and up) on the same machine. I wanted to graph in particular the various InnoDB metrics that munin-mysql supports. I installed the plugin on various Ubuntu flavors such as Jaunty and Lucid.

Here are the steps:

1) Install 2 pre-requisite Perl modules for munin-mysql: IPC-ShareLite and Cache-Cache

2) git clone

3) cd munin-mysql; edit Makefile and point PLUGIN_DIR to the directory where your munin plugins reside (if you installed Munin on Ubuntu via apt-get, that directory is /usr/share/munin/plugins)

4) make install --> this will copy the mysql_ Perl script to PLUGIN_DIR, and the mysql_.conf file to…

MySQL and AppArmor on Ubuntu

This is just a quick post that I hope will save some people some headache when they try to customize their MySQL setup on Ubuntu. I've spent some quality time with this problem over the weekend. I tried in vain for hours to have MySQL read its configuration files from a non-default location on an Ubuntu 9.04 server, only to figure out that it was all AppArmor's fault.

My ultimate goal was to run multiple instances of MySQL on the same host. In the past I achieved this with MySQL Sandbox, but this time I wanted to use MySQL installed from Debian packages and not from a tarball of the binary distribution, and MySQL Sandbox has some issues with that.

Here's what I did: I copied /etc/mysql to /etc/mysql0, then I edited /etc/mysql0/my.cnf and modified the location of the socket file, the pid file and the datadir to non-default locations. Then I tried to run:

/usr/bin/mysqld_safe --defaults-file=/etc/mysql0/my.cnf

At this point, /var/log/daemon.log showed this error:


What automated deployment/config mgmt tools do you use?

I posted this question yesterday as a quick tweet. I got a bunch of answers already that I'll include here, but feel free to add your answers as comments to this post too. Or reply to @griggheo on Twitter.

I started by saying I have 2 favorite tools: Fabric for pushing app state (pure Python) and Chef for pulling/bootstraping OS/package state (pure Ruby). For more discussions on push vs. pull deployment tools, see this post of mine.

Here are the replies I got on Twitter so far:

@keyist : Fabric and Chef for me as well. use Fabric to automate uploading cookbooks+json and run chef-solo on server
@vvuksan : mcollective for control, puppet for config mgmt/OS config. Some reasons why outlined here

@RackerHacker : There is another solution besides ssh and for loops? :-P

@chris_mahan : libcloud, to pop debian stable on cloud instance, fabric to set root passwd, install python2.6.5, apt-get nginx php django fapws.

@alfredodeza : I'm biased since I wrote it, but I use Pac…

Bootstrapping EC2 instances with Chef

This is the third installment of my Chef post series (read the first and the second). This time I'll show how to use the Ubuntu EC2 instance bootstrap mechanism in conjunction with Chef and have the instance configure itself at launch time. I had a similar post last year, in which I was accomplishing a similar thing with puppet.

Why Chef this time, you ask? Although I am a Python guy, I prefer learning a smattering of Ruby rather than a proprietary DSL for configuration management. Also, when I upgraded my EC2 instances to the latest Ubuntu Lucid AMIs, puppet stopped working, so I was almost forced to look into Chef -- and I've liked what I've seen so far. I don't want to bad-mouth puppet though, I recommend you look into both if you need a good configuration management/deployment tool.

Here is a high-level view of the bootstrapping procedure I'm using:

1) You create Chef roles and tie them to cookbooks and recipes that you want executed on machines which will be asso…

Tracking and visualizing mail logs with MongoDB and gviz_api

To me, nothing beats a nice dashboard for keeping track of how your infrastructure and your application are doing. At Evite, sending mail is a core part of our business. One thing we need to ensure is that our mail servers are busily humming away, sending mail out to our users. To this end, I built a quick outgoing email tracking tool using MongoDB and pymongo, and I also put together a dashboard visualization of that data using the Google Visualization API via the gviz_api Python module.

Tracking outgoing email from the mail logs with pymongo

Mail logs are sent to a centralized syslog. I have a simple Python script that tails the common mail log file every 5 minutes, counts the lines that conform to a specific regular expression (looking for a specific msgid pattern), then inserts that count into a MongoDB database. Here's the snippet of code that does that:

import datetime
from pymongo import Connection

conn = Connection(host="")
db = conn.logs
maillogs = db…