Tuesday, December 16, 2008

Some issues when restoring files using duplicity

I blogged a while back about how to do incremental encrypted backups to S3 using duplicity. I've been testing the restore procedure for some of my S3 backups, and I had a problem with the way duplicity deals with temporary directories and files it creates during the restore.

By default, duplicity will use the system default temporary directory, which on Unix is usually /tmp. If you have insufficient disk space in /tmp for the files you're trying to restore from S3, the restore operation will eventually fail with "IOError: [Errno 28] No space left on device".

One thing you can do is create another directory on a partition with lots of disk space, and specify that directory in the duplicity command line using the --tempdir command line option. Something like: /usr/local/bin/duplicity --tempdir=/lotsofspace/temp

However, it turns out that this is not sufficient. There's still a call to os.tmpfile() buried in the patchdir.py module installed by duplicity. Consequently, duplicity will still try to create temporary files in /tmp, and the restore operation will still fail. As a workaround, I solved the issue in a brute-force kind of way by editing /usr/local/lib/python2.5/site-packages/duplicity/patchdir.py (the path is obviously dependent on your Python installation directory) and replacing the line:

tempfp = os.tmpfile()

with the line:

tempfp, filename = tempdir.default().mkstemp_file()

(I also needed to import tempdir at the top of patchdir.py; tempdir is a module which is part of duplicity and which deals with temporary file and directory management -- I guess the author of duplicity just forgot to replace the call to os.tmpfile() with the proper calls to the tempdir methods such as mkstemp_file).

This solved the issue. I'll try to open a bug somehow with the duplicity author.

Friday, December 12, 2008

Working with Amazon EC2 regions

Now that Amazon offers EC2 instances based in data centers in Europe, there is one more variable that you need to take into account when using the EC2 API: the concept of 'region'. Right now there are 2 regions to choose from: us-east-1 (based of course in the US on the East Coast), and the new region eu-west-1 based in Western Europe. Knowing Amazon, they will probably launch data centers in other regions across the globe -- Asia, South America, etc.

Each region has several availability zones. You can see the current ones in this nice article from the AWS Developer Zone. The default region is us-east-1, with 3 availability zones (us-east-1a, 1b and 1c). If you don't specify a region when you call an EC2 API tool, then the tool will query the default region. That's why I was baffled when I tried to launch a new AMI in Europe; I was calling 'ec2-describe-availability-zones' and it was returning only the US ones. After reading the article I mentioned, I realized I need to have 2 versions of my scripts: the old one I had will deal with the default US-based region, and the new one will deal with the Europe region by adding '--region eu-west-1' to all EC2 API calls (you need the latest version of the EC2 API tools from here).

You can list the zones available in a given region by running:

# ec2-describe-availability-zones --region eu-west-1
AVAILABILITYZONE eu-west-1a available eu-west-1
AVAILABILITYZONE eu-west-1b available eu-west-1
Note that all AWS resources that you manage belong to a given region. So if you want to launch an AMI in Europe, you have to create a keypair in Europe, a security group in Europe, find available AMIs in Europe, and launch a given AMI in Europe. As I said, all this is accomplished by adding '--region eu-west-1' to all EC2 API calls in your scripts.

Another thing to note is that the regions are separated in terms of internal DNS too. While you can access AMIs within the same zone based on their internal DNS names, this access doesn't work across regions. You need to use the external DNS name of an instance in Europe if you want to ssh into it from an instance in the US (and you also need to allow the external IP of the US instance to access port 22 in the security policy for the European instance.)

All this introduces more headaches from a management/automation point of view, but the benefits obviously outweigh the cost. You get low latency for your European customers, and you get more disaster recovery options.

Thursday, December 11, 2008

Deploying EC2 instances from the command line

I've been doing a lot of work with EC2 instances lately, and I wrote some simple wrappers on top of the EC2 API tools provided by Amazon. These tools are Java-based, and I intend to rewrite my utility scripts in Python using the boto library, but for now I'm taking the easy way out by using what Amazon already provides.

After downloading and unpacking the EC2 API tools, you need to set the following environment variables in your .bash_profile file:
export EC2_HOME=/path/to/where/you/unpacked/the/tools/api
export EC2_PRIVATE_KEY = /path/to/pem/file/containing/your/ec2/private/key
export EC2_CERT = /path/to/pem/file/containing/your/ec2/cert
You also need to add $EC2_HOME/bin to your PATH, so the command-line tools can be found by your scripts.

At this point, you should be ready to run for example:
# ec2-describe-images -o amazon
which lists the AMIs available from Amazon.

If you manage more than a handful of EC2 AMIs (Amazon Machine Instances), it quickly becomes hard to keep track of them. When you look at them for example using the Firefox Elasticfox extension, it's very hard to tell which is which. One solution I found to this is to create a separate keypair for each AMI, and give the keypair a name that specifies the purpose of that AMI (for example mysite-db01). This way, you can eyeball the list of AMIs in Elasticfox and make sense of them.

So the very first step for me in launching and deploying a new AMI is to create a new keypair, using the ec2-add-keypair API call. Here's what I have, in a script called create_keypair.sh:
# cat create_keypair.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

ec2-add-keypair $KEYNAME.keypair > ~/.ssh/$KEYNAME.pem
chmod 600 ~/.ssh/$KEYNAME.pem

Now I have a pem file called $KEYNAME.pem containing my private key, and Amazon has my public key called $KEYNAME.keypair.

The next step for me is to launch an 'm1.small' instance (the smallest instance you can get from EC2) whose AMI ID I know in advance (it's a 32-bit Fedora Core 8 image from Amazon with an AMI ID of ami-5647a33f). I am also using the key I just created. My script calls the ec2-run-instances API.
# cat launch_ami_small.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

# We launch a Fedora Core 8 32 bit AMI from Amazon
ec2-run-instances ami-5647a33f -k $KEYNAME.keypair --instance-type m1.small -z us-east-1a
Note that the script makes some assumptions -- such as the fact that I want my AMI to reside in the us-east-1a availability zone. You can obviously add command-line parameters for the availability zone, and also for the instance type (which I intend to do when I rewrite this in Python).

Next, I create an EBS volume which I will attach to the AMI I just launched. My create_volume.sh script takes an optional argument which specifies the size in GB of the volume (and otherwise sets it to 50 GB):
# cat create_volume.sh
#!/bin/bash

SIZE=$1
if [ -z "$SIZE" ]
then
SIZE=50

fi

ec2-create-volume -s $SIZE -z us-east-1a
The volume should be created in the same availability zone as the instance you intend to attach it to -- in my case, us-east-1a.

My next step is to attach the volume to the instance I just launched. For this, I need to specify the instance ID and the volume ID -- both values are returned in the output of the calls to ec2-run-instances and ec2-create-volume respectively.

Here is my script:

# cat attach_volume_to_ami.sh
#!/bin/bash

VOLUME_ID=$1
AMI_ID=$2

if [ -z "$VOLUME_ID" ] || [ -z "$AMI_ID" ]
then
echo "You must specify a volume ID followed by an AMI ID"
exit 1

fi

ec2-attach-volume $VOLUME_ID -i $AMI_ID -d /dev/sdh

This attaches the volume I just created to the AMI I launched and makes it available as /dev/sdh.

The next script I use does a lot of stuff. It connects to the new AMI via ssh and performs a series of commands:
* format the EBS volume /dev/sdh as an ext3 file system
* mount /dev/sdh as /var2, and copy the contents of /var to /var2
* move /var to /var.orig, create new /var
* unmount /var2 and re-mount /dev/sdh as /var
* append the mounting as /dev/sdh as /var to /etc/fstab so that it happens upon reboot

Before connecting via ssh to the new AMI, I need to know its internal DNS name or IP address. I use ec2-describe-instances to list all my running AMIs, then I copy and paste the internal DNS name of my newly launched instance (which I can isolate because I know the keypair name it runs with).

Here is the script which formats and mounts the new EBS volume:

# cat format_mount_ebs_as_var_on_ami.sh
#!/bin/bash

AMI=$1
KEYNAME=$2

if [ -z "$AMI" ] || [ -z "$KEY" ]
then
echo "You must specify an AMI DNS name or IP followed by a keypair name"
exit 1

fi

CMD='mkdir /var2; mkfs.ext3 /dev/sdh; mount -t ext3 /dev/sdh /var2; \
mv /var/* /var2/; mv /var /var.orig; mkdir /var; umount /var2; \
echo "/dev/sdh /var ext3 defaults 0 0" >>/etc/fstab; mount /var'

ssh -i ~/.ssh/$KEY.pem root@$AMI $CMD
The effect is that /var is now mapped to a persistent EBS volume. So if I install MySQL for example, the /var/lib/mysql directory (where the data resides by default in Fedora/CentOS) will be automatically persistent. All this is done without interactively logging in to the new instance. so it can be easily scripted as part of a larger deployment procedure.

That's about it for the bare-bones stuff you have to do. I purposely kept my scripts simple, since I use them more to remember what EC2 API tools I need to run than anything else. I don't do a lot of command-line option stuff and error-checking stuff, but they do their job.

If you run scripts similar to what I have, you should have at this point a running AMI with a 50 GB EBS volume mounted as /var. Total running time of all these scripts -- 5 minutes at most.

As soon as I have a nicer Python script which will do all this and more, I'll post it here.

Thursday, December 04, 2008

New job at OpenX

I meant to post this for a while, but haven't had the time, because...well, it's a new job, so I've been quite swamped. I started 2 weeks ago as a system engineer at OpenX, a company based in Pasadena, whose main product is an Open Source ad server. I am part of the 'black ops' team, and my main task for now is to help with deploying and scaling the OpenX Hosted service within Amazon EC2 -- which is just one of several cloud computing providers that OpenX uses (another one is AppNexus for example).

Lots of Python involved in this, lots of automation, lots of testing, so all this makes me really happy :-)

Here is some stuff I've been working on, which I intend to post on with more details as time permits:

* command-line provisioning of EC2 instances
* automating the deployment of the OpenX application and its pre-requisites
* load balancing in EC2 using HAProxy
* monitoring with Hyperic
* working with S3-backed file systems

I'll also start working soon with slack, a system developed at Google for automatic provisioning of files via the interesting concept of 'roles'. It's in the same family as cfengine or puppet, but simpler to use and with a powerful inheritance concept applied to roles.

All in all, it's been a fun and intense 2 weeks :-)

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...