Tuesday, August 31, 2010

Poor man's MySQL disaster recovery in EC2 using EBS volumes

First of all, I want to emphasize that this is NOT a disaster recovery strategy I recommend. However, in a pinch, it might save your ass. Here's the scenario I have:

  • 2 m1.large instances running Ubuntu 10.04 64-bit and the Percona XtraDB MySQL builds (for the record, the exact version I'm using is "Server version: 5.1.43-60.jaunty.11-log (Percona SQL Server (GPL), XtraDB 9.1, Revision 60")
  • I'll call the 2 servers db101 and db201
  • each server is running 2 MySQL instances -- I'll call them m1 and m2
  • instance m1 on db101 and instance m1 on db201 are set up in master-master replication (and similar for instance m2)
  • the DATADIR for m1 is /var/lib/mysql/m1 on each server; that file system is mounted from an EBS volume (and similar for m2)
  • the configuration files for m1 are in /etc/mysql1 on each server -- that directory was initially a copy of the Ubuntu /etc/mysql configuration directory, which I then customized (and similar for m2)
  • the init.d script for m1 is in /etc/init.d/mysql1 (similar for m2)
What I tested:
  • I took a snapshot of each of the 2 EBS volumes associated with each of the DB servers (4 snapshots in all)
  • I terminated the 2 m1.large instances
  • I launched 2 m1.xlarge instances and installed the same Percona distribution (this was done via a Chef recipe at instance launch time); I'll call the 2 new instances xdb101 and xdb102
  • I pushed the configuration files for m1 and m2, as well as the init.d scripts (this was done via fabric)
  • I created new volumes from the EBS snapshots (note that these volumes can be created in any EC2 availability zone)
  • On xdb101, I attached the 2 volumes created from the EBS snapshots on db101; I specified /dev/sdm and /dev/sdn as the device names (similar on xdb201)
  • On xdb101, I created /var/lib/mysql/m1 and mounted /dev/sdm there; I also created /var/lib/mysql/m2 and mounted /dev/sdn there (similar on xdb201)
  • At this point, the DATADIR directories for both m1 and m2 are populated with 'live files' from the moment when I took the EBS snapshot
  • I made sure syslog-ng accepts UDP traffic from localhost (by default it doesn't); this is because by default in Ubuntu mysql log messages are sent to syslog --> to do this, I ensured that "udp(ip(127.0.0.1) port(514));" appears in the "source s_all" entry in /etc/syslog-ng/syslog-ng.conf
At this point, I started up the first MySQL instance on xdb101 via "/etc/init.d/mysql1 start". This script most likely will show [fail] on the console, because MySQL will not start up normally. If you look in /var/log/syslog, you'll see entries similar to:

Aug 31 18:03:21 xdb101 mysqld: 100831 18:03:21 [Note] Plugin 'FEDERATED' is disabled.
Aug 31 18:03:21 xdb101 mysqld: InnoDB: The InnoDB memory heap is disabled
Aug 31 18:03:21 xdb101 mysqld: InnoDB: Mutexes and rw_locks use GCC atomic builtins
Aug 31 18:03:22 xdb101 mysqld: 100831 18:03:22  InnoDB: highest supported file format is Barracuda.
Aug 31 18:03:23 xdb101 mysqld: InnoDB: The log sequence number in ibdata files does not match
Aug 31 18:03:23 xdb101 mysqld: InnoDB: the log sequence number in the ib_logfiles!
Aug 31 18:03:23 xdb101 mysqld: 100831 18:03:23  InnoDB: Database was not shut down normally!
Aug 31 18:03:23 xdb101 mysqld: InnoDB: Starting crash recovery.

If you wait a bit longer (and if you're lucky), you'll see entries similar to:

Aug 31 18:04:20 xdb101 mysqld: InnoDB: Restoring possible half-written data pages from the doublewrite
Aug 31 18:04:20 xdb101 mysqld: InnoDB: buffer...
Aug 31 18:04:24 xdb101 mysqld: InnoDB: In a MySQL replication slave the last master binlog file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200672, file name mysql-bin.000015
Aug 31 18:04:24 xdb101 mysqld: InnoDB: and relay log file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200817, file name ./mysqld-relay-bin.000042
Aug 31 18:04:24 xdb101 mysqld: InnoDB: Last MySQL binlog file position 0 17490532, file name /var/lib/mysql/m1/mysql-bin.000002
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 InnoDB Plugin 1.0.6-9.1 started; log sequence number 1844705956
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Recovering after a crash using /var/lib/mysql/m1/mysql-bin
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Starting crash recovery...
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Crash recovery finished.

At this point, you can do "/etc/init.d/mysql1 restart" just to make sure that both stopping and starting that instance work as expected. Repeat for instance m2, and also repeat on server xdb201.

So....IF you are lucky and the InnoDB crash recovery process did its job, you should have 2 functional MySQL instances one each of xdb101 and xdb201. I tested this with several pairs of servers and it worked for me every time, but I hasten to say that YMMV, so DO NOT bet on this as your disaster recovery strategy!

At this point I still had to re-establish the master-master replication between m1 on xdb101 and m1 on xdb201 (and similar for m2). 

When I initially set up this replication between the original m1.large servers, I used something like this on both db101 and db201:

CHANGE MASTER TO MASTER_HOST='master1', MASTER_PORT=3306, MASTER_USER='masteruser', MASTER_PASSWORD='xxxxxx';"

The trick for me is that master1 points to db201 in db101's /etc/hosts, and vice-versa.

On the newly created xdb101 and xdb201, there are no entries for master1 in /etc/hosts, so replication is broken. Which is a good thing initially, because you want to have the MySQL instances on each server be brought back up without throwing replication into the mix.

Once I added an entry for master1 in xdb101's /etc/hosts pointing to xdb201, and did the same on xdb201, I did a 'stop slave; start slave; show slave status\G' on the m1 instance on each server. In all cases I tested, one of the slaves was showing everything OK, while the other one was complaining about   not being able to read from the master's log file. This was fairly simply to fix. Let's assume xdb101 is the one complaining. I did the following:
  • on xdb201, I ran 'show master status\G' and noted the file name (for example "mysql-bin.000017") and the file position (for example 106)
  • on xdb101, I ran the following command: "stop slave; change master to master_log_file='mysql-bin.000017', master_log_pos=106; start slave;"
  • not a 'show slave status\G' on xdb101 should show everything back to normal
Some lessons:
  • take periodic snapshots of your EBS volumes (at least 1/day)
  • for a true disaster recovery strategy, use at least mysqldump to dump your DB to disk periodically, or something more advanced such as Percona XtraBackup; I recommend dumping the DB to an EBS volume and taking periodic snapshots of that volume
  • the procedure I detailed above is handy when you want to grow your instance 'vertically' -- for example I went from m1.large to m1.xlarge

Friday, August 20, 2010

Visualizing MySQL metrics with the munin-mysql plugin

Munin is a great tool for resource visualization. Sometimes though installing a 3rd party Munin plugin is not as straightforward as you would like. I have been struggling a bit with one such plugin, munin-mysql, so I thought I'd spell it out for my future reference. My particular scenario is running multiple MySQL instances on various port numbers (3306 and up) on the same machine. I wanted to graph in particular the various InnoDB metrics that munin-mysql supports. I installed the plugin on various Ubuntu flavors such as Jaunty and Lucid.

Here are the steps:

1) Install 2 pre-requisite Perl modules for munin-mysql: IPC-ShareLite and Cache-Cache

2) git clone http://github.com/kjellm/munin-mysql

3) cd munin-mysql; edit Makefile and point PLUGIN_DIR to the directory where your munin plugins reside (if you installed Munin on Ubuntu via apt-get, that directory is /usr/share/munin/plugins)

4) make install --> this will copy the mysql_ Perl script to PLUGIN_DIR, and the mysql_.conf file to /etc/munin/plugin-conf.d

5) Edit /etc/munin/plugin-conf.d/mysql_.conf and customize it with your specific MySQL information.

For example, if you run 2 MySQL instances on ports 3306 and 3307, you could have something like this in mysql_.conf:


[mysql_3306_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3306
env.mysqluser myuser1
env.mysqlpassword mypassword1

[mysql_3307_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3307
env.mysqluser myuser2
env.mysqlpassword mypassword2

6) Run "/usr/share/munin/plugins/mysql_ suggest" to see what metrics are supported by the plugin. Then proceed to create symlinks in /etc/munin/plugins, adding the port number and the metric name as the suffix.

For example, to track InnoDB I/O metrics for the MySQL instance running on port 3306, you would create this symlink:

ln -s /usr/share/munin/plugins/mysql_ /etc/munin/plugins/mysql_3306_innodb_io

(replace 3306 with 3307 to track this metric for the other MySQL instance running on port 3307)

Of course, it's easy to automate this by a simple shell script.

7) Restart munin-node and wait 10-15 minutes for the munin master to receive the information about the new metrics.

Important! If you need to troubleshoot this plugin (and any Munin plugin), do not make the mistake of simply running the plugin script directly in the shell. If you do this, it will not read the configuration file(s) correctly, and it will most probably fail. Instead, what you need to do is to follow the "Debugging Munin plugins" documentation, and run the plugin through the munin-run utility. For example:


# munin-run mysql_3306_innodb_io
ib_io_read.value 34
ib_io_write.value 57870
ib_io_log.value 8325
ib_io_fsync.value 55476

One more thing: you should probably automate all these above steps. I have most of it automated via a fabric script. The only thing I do by hand is to create the appropriate symlinks for the specific port numbers I have on each server.

That's it! Enjoy staring for hours at your brand new MySQL metrics!


Monday, August 16, 2010

MySQL and AppArmor on Ubuntu

This is just a quick post that I hope will save some people some headache when they try to customize their MySQL setup on Ubuntu. I've spent some quality time with this problem over the weekend. I tried in vain for hours to have MySQL read its configuration files from a non-default location on an Ubuntu 9.04 server, only to figure out that it was all AppArmor's fault.

My ultimate goal was to run multiple instances of MySQL on the same host. In the past I achieved this with MySQL Sandbox, but this time I wanted to use MySQL installed from Debian packages and not from a tarball of the binary distribution, and MySQL Sandbox has some issues with that.

Here's what I did: I copied /etc/mysql to /etc/mysql0, then I edited /etc/mysql0/my.cnf and modified the location of the socket file, the pid file and the datadir to non-default locations. Then I tried to run:

/usr/bin/mysqld_safe --defaults-file=/etc/mysql0/my.cnf

At this point, /var/log/daemon.log showed this error:

mysqld[25133]: Could not open required defaults file: /etc/mysql0/my.cnf
mysqld[25133]: Fatal error in defaults handling. Program aborted

It took me as I said a few hours trying all kinds of crazy things until I noticed lines like these in /var/log/syslog:

kernel: [18593519.090601] type=1503 audit(1281847667.413:22): operation="inode_permission" requested_mask="::r" denied_mask="::r" fsuid=0 name="/etc/mysql0/my.cnf"
 pid=4884 profile="/usr/sbin/mysqld"

This made me realize it's AppArmor preventing mysqld from opening non-default files. I don't need AppArmor on my servers, so I just stopped it with 'service apparmor stop' and chkconfig-ed it off....at which point every customization I had started to work perfectly.

At least 2 lessons:

1) when you see mysterious, hair-pulling errors, check security-related processes on your server: iptables, AppArmor, SELinux etc.

2) check all log files in /var/log -- I was focused on daemon.log and didn't notice the errors in syslog quickly enough

Google didn't help when I searched for "mysqld Could not open required defaults file". I couldn't find any reference to AppArmor, only to file permissions.

Tuesday, August 03, 2010

What automated deployment/config mgmt tools do you use?

I posted this question yesterday as a quick tweet. I got a bunch of answers already that I'll include here, but feel free to add your answers as comments to this post too. Or reply to @griggheo on Twitter.

I started by saying I have 2 favorite tools: Fabric for pushing app state (pure Python) and Chef for pulling/bootstraping OS/package state (pure Ruby). For more discussions on push vs. pull deployment tools, see this post of mine.

Here are the replies I got on Twitter so far:

@keyist : Fabric and Chef for me as well. use Fabric to automate uploading cookbooks+json and run chef-solo on server

@vvuksan : mcollective for control, puppet for config mgmt/OS config. Some reasons why outlined here http://j.mp/cAKarI

@RackerHacker : There is another solution besides ssh and for loops? :-P

@chris_mahan : libcloud, to pop debian stable on cloud instance, fabric to set root passwd, install python2.6.5, apt-get nginx php django fapws.

@alfredodeza : I'm biased since I wrote it, but I use Pacha: http://code.google.com/p/pacha

@tcdavis : Fabric for remote calls; distribute/pip for python packaging and deps; Makefile to eliminate any repetition.

@competentgirl : Puppet for its expandability and integration w/ other tools (ie svn)

@yashh : Fabric. bundle assets, push to bunch of servers and restart in a shot.. love it

@bitprophet : I'm biased but I use Fab scripts for everything. Was turned off by daemons/declarative/etc aspects of Chef/Puppet style systems.

@kumar303 : @bitprophet we use Cap only because we do 2-way communication with remote SSH procs. Been meaning to look at patching Fab for this

Update with more Twitter replies:

@lt_kije : cfengine on work cluster (HPC/HTC), radmind on personal systems (OpenBSD)

@zenmatt : check out devstructure, built on puppet, more natural workflow for configuring servers.

@almadcz : paver for building, fabric or debian repo for deployment, buildbot for forgetting about it.

@geo_kollias: I use fabric for everything as @bitprophet, but i am thinking about making use of @hpk42 's execnet soon.