Thursday, June 28, 2012

A sweep through my Instapaper for June 2012


I'm not sure if I'll do this every month, but it does seem like a good way of recapitulating the last month in terms of interesting blog posts and articles that came my way. So here's my list for the month of June 2012:
  • Latency numbers every programmer should know -- from cache references to intercontinental network latency, some numbers that will help you do those back-of-the-envelope calculations when you need to speed things up in your infrastructure
  • Cynic -- test harness by Ruslan Spivak for simulating remote HTTP service behavior, useful when you want to see how your application reacts to various failures when interacting with 3rd party services
  • Amazon S3 performance tips and tricks -- some best practices for getting the maximum performance out of S3 from Doug Grismore, Director of Storage Operations at AWS
  • How to stop sucking and be awesome instead -- Jeff Atwood advises you to embrace failure, ship often, listen to feedback, and more importantly work on stuff that matters
  • Examining file system latency in production (PDF) -- Brendan Gregg from Joyent goes into more detail than you ever wanted regarding disk-level I/O latency vs file system I/O (dtrace is mentioned obviously)
  • Openstack in real life -- Paul Guth from Cloudscaling describes that elusive animal: a real life deployment of Openstack (a welcome alternative to the usual press-release-driven development of Openstack)
  • Peaches and pecans -- Theo Schlossnagle talks about the balance needed between doing challenging things that you are good at on one hand, and doing things that may seem boring but make you grow in unexpected ways on the other hand
  • What Facebook knows -- they pretty much know everything about you, and they want to use it to make money (but then you knew that already)
  • ACM Turing Centenary celebration -- Dave Pacheco reviews a celebration that gathered some of the brightest minds in Computer Science; great bits and pieces disseminated throughout this post, such as Ken Thompson's disappointment at the complexity of a modern Linux system
  • Embracing risk in career decisions -- it boils down to 'listen to your heart'
  • Flexible indexing in Hadoop via Elephant Twin -- Dmitriy Ryabov from Twitter talks about a new tool that can be used to create indexes in Hadoop in order to speed up queries (and can also be integrated with Pig)
  • The interesting thing about cutting costs -- just an example of the mind-boggling posts that Simon Wardley consistently comes up with; I highly recommend his blog for those of you interested in long-term business and technology strategy and vision
  • Building websites with science -- another great post from Etsy regarding good and bad ways to do data science, and some caveats regarding A/B testing
  • 100 most influential programming books -- a good list from Stack Overflow; curious if there's anybody who read all of them?
  • Building resilient user experiences -- another gem from Mike Brittain at Etsy on how to offer a good user experience even in the face of backend errors; your mantra should be 'never say no to your customers money'
  • Nobody ever got fired for using Hadoop on a cluster (PDF) -- interesting point of view from a team at Microsoft Research on how many 'Big Data' datasets can actually fit in (generous amounts of) RAM and how this can impact the architecture of your data analytics infrastructure
  • 9 beliefs of remarkably successful people -- I usually don't like lists of '10 things that will change your life' but this one is pretty good


Monday, June 25, 2012

Installing and using sysbench on Joyent SmartOS

If you read Percona's MySQL Performance blog (and if you run MySQL in production, you should!), then you know that one of their favorite load testing tools is sysbench. As it turns out, it's not trivial to install this tool, especially when you have to install from source, for example on Solaris-based systems such as the Joyent SmartOS machines. Here's what I did to get it to work.

Download source distribution for sysbench

I downloaded the latest version of sysbench (0.4.12) from the Sourceforge download page for the project.

Compile and install sysbench

If you launch a SmartOS machine in the Joyent cloud, you'll find out very quickly that it's lacking tools that you come to take for granted when dealing with Ubuntu or Fedora. In this case, you need to install compilers and linkers such as gcc and gmake. Fortunately, SmartOS has its own package installer called pkgin, so this is not too bad.

To see what packages are available if you know the tool you want to install, you can run the 'pkgin available' command and grep for the tool name:

# pkgin available | grep gcc
gcc-compiler-4.5.2 GNU Compiler Collection 4.5
gcc-runtime-4.5.2 GNU Compiler Collection 4.5 Runtime libs
gcc-tools-0 Subset of binutils needed for GCC


To install gcc, I ran:

# pkgin install gcc-compiler-4.5.2 gcc-runtime-4.5.2 gcc-tools-0

Similarly, I installed gmake and automake:

# pkgin install gmake automake

When I ran ./configure for sysbench, I got hit with errors of the form

../libtool: line 838: X--tag=CC: command not found 
../libtool: line 871: libtool: ignoring unknown tag : command not found 
../libtool: line 838: X--mode=link: command not found
etc

A quick Google search revealed this life-saving blog post which made things work. So first of all I ran ./autogen.sh, got hit with more errors, and edited configure.ac per the blog post -- basically I commented out this line in configure.ac:

#AC_PROG_LIBTOOL

And added this line:

AC_PROG_RANLIB

Now running ./autogen.sh produces no errors.

At this point I was ready to run ./configure again. However, if you want to run sysbench against a MySQL server, you need to specify MySQL header and library files when you run ./configure. This also means that you need to install some MySQL client package in order to satisfy those dependencies. If you install sysbench on a Joyent Percona SmartMachines, those packages are already there. On a plain SmartOS machine, you need to run:

# pkgin install mysql-client-5.0.92

At this point, on a Percona SmartMachine you have MySQL header files in /opt/local/include/mysql and MySQL libraries in /local/lib. On a vanilla SmartOS machine, the MySQL header files are in /opt/local/include/mysql and the MySQL libraries are in /opt/local/lib/mysql. So the configure command line will be as follows.

On a Percona SmartMachine:

# ./configure --with-mysql-includes=/opt/local/include/mysql --with-mysql-libs=/local/lib/

On a vanilla SmartOS machine where you installed mysql-client:

# ./configure --with-mysql-includes=/opt/local/include/mysql --with-mysql-libs=/opt/local/lib/mysql

Now you're ready to run the usual commands:

# make; make install

If everything goes well, the sysbench binary will be in /usr/local/bin. That directory is not in the default PATH on SmartOS, so you need to add it to your PATH environment variable in .bashrc or .bash_profile.

On a vanilla SmartOS machine, I also had issues when trying to run the sysbench tool -- I got an error message of the type 'ld.so.1: sysbench: fatal: libmysqlclient.so.18: open failed: No such file or directory'

To get past this, I had to do two things: 

1) Add "export LD_LIBRARY_PATH=/opt/local/lib/mysql" to .bashrc

2) symlink from the existing shared library file libmysqlclient.so.15 to libmysqlclient.so.18:

# ln -s /opt/local/lib/mysql/libmysqlclient.so.15 /opt/local/lib/mysql/libmysqlclient.so.18

If you've followed along this far, you reward is that you'll finally be able to run syblog post on 'DROP TABLE and stalls'sbench with no errors.

Running sysbench

It is recommended that you run sysbench from a remote host against your MySQL server, so that no resources on the server get taken by sysbench itself. I used two phases in my sysbench tests: a prepare phase, where the table I tested against was created by sysbench, and the proper load testing phase. For the prepare phase, I ran:

# sysbench --test=oltp --mysql-host=remotehost --mysql-user=mydbadmin --mysql-db=mydb --mysql-password=mypass --mysql-table-engine=innodb --oltp-table-size=1000000 --oltp-table-name=millionRowsA prepare

where
  • remotehost is the host running MySQL server
  • mydb is a database I created on the MySQL server
  • mydbadmin/mypass are the user name and password for a user which I granted all permissions for on the mydb database (with a MySQL statement like "GRANT ALL ON mydb.* TO 'mydbadmin'@'remoteip' IDENTIFIED BY 'mypass'" where remoteip is the IP address of the host I was running sysbench from)
This command will create a table called millionRowsA with 1 million rows, using InnoDB as the storage engine.

To perform a load test against this table, I ran:

# sysbench --test=oltp --mysql-host=remotehost --mysql-user=mydbadmin --mysql-db=mydb --mysql-password=mypass --mysql-table-engine=innodb --oltp-table-size=1000000 --oltp-table-name=millionRowsA --num-threads=16 run

This will run an OLTP-type test using 16 threads. Per the sysbench documentation, an OLTP-type test will perform advanced transactional operations against the test database, thus mimicking real-life scenarios to the best of its ability.

I would like to stress one thing at this point: I am not a big believer in benchmarks. Most of the time I find that they do not even remotely manage to model real-life scenarios that you see in production. In fact, there is nothing like production traffic to stress-test a component of your infrastructure, which is why techniques such as dark launching are so important. But benchmarks do give you at least a starting point for a conversation with your peers or your vendors about specific issues you might find. However, it's important to consider them starting points and not end points. Ovais Tariq from Percona agrees with me in a recent blog post on 'DROP TABLE and stalls':

I would also like to point out one thing about benchmarks – we have been always advising people to look beyond average performance numbers because they almost never really matter in production, it is not a question if average performance is bad but what stalls and pileups you have.

So far, my initial runs of sysbench against a Percona SmartMachine with 16 GB of RAM and against an EC2 m1.xlarge instance running Percona (with RAID0 across ephemeral disks, no EBS) show pretty similar results. No huge advantage either way. (I tried 16 and 32 threads against 1 million row tables and 10 million row tables). One advantage of EC2 is that it's a known ecosystem and I can run Ubuntu. I am working with Joyent on maybe further tuning the Percona SmartMachine to squeeze more performance out of it.

Thursday, June 21, 2012

Using the Joyent Cloud API

Here's some notes I took while doing some initial experiments with provisioning machines in the Joyent Cloud. I used their CloudAPI directly, although in the future I also want to try the libcloud Joyent driver. The promise of the Joyent Cloud 'SmartMachines' is that they are really Solaris zones running on a SmartOS host, and that gives you more performance (especially I/O performance) than regular virtual machines such as the ones offered by most cloud vendors. I have yet to fully verify this performance increase, but it's next on my TODO list.

Installing the Joyent CloudAPI tools


I did the following on an Ubuntu 10.04 server:

  • installed node.js -- I downloaded it in tar.gz format from http://nodejs.org/dist/v0.6.19/node-v0.6.19.tar.gz then I ran the usual './configure; make; make install'
  • installed the Joyent smartdc node package by runing 'npm install smartdc -g'
  • created new ssh RSA keypair: id_rsa_joyentapi (private key) and id_rsa_joyentapi.pub (public key)
  • ran the sdc-setup utility, pointing it to the US-EAST-1 region:
# sdc-setup https://us-east-1.api.joyentcloud.com
Username (login): (root) myjoyentusername
Password:
The following keys exist in SmartDataCenter:
   [1] grig
Would you like to use an existing key? (yes) no
SSH public key: (/root/.ssh/id_rsa.pub) /root/.ssh/id_rsa_joyentapi.pub

If you set these environment variables, your life will be easier:
export SDC_CLI_URL=https://us-east-1.api.joyentcloud.com
export SDC_CLI_ACCOUNT=myjoyentusername
export SDC_CLI_KEY_ID=id_rsa_joyentapi
export SDC_CLI_IDENTITY=/root/.ssh/id_rsa_joyentapi


  • added recommended environment variables (above) to .bash_profile, sourced the file
Using the Joyent CloudAPI tools

At this point I was able to use the various 'sdc' commands included in the Joyent CloudAPI toolset. For example, to list the available Joyent datacenters, I used sdc-listdatacenters:

# sdc-listdatacenters
{
 "us-east-1": "https://us-east-1.api.joyentcloud.com",
 "us-west-1": "https://us-west-1.api.joyentcloud.com",
 "us-sw-1": "https://us-sw-1.api.joyentcloud.com"
}


To list the available operating system images available for provisioning, I used sdc-listdatasets (the following is just an excerpt of its output):

# sdc-listdatasets
[
 {
"id": "988c2f4e-4314-11e1-8dc3-2bc6d58f4be2",
"urn": "sdc:sdc:centos-5.7:1.2.1",
"name": "centos-5.7",
"os": "linux",
"type": "virtualmachine",
"description": "Centos 5.7 VM 1.2.1",
"default": false,
"requirements": {},
"version": "1.2.1",
"created": "2012-02-14T05:53:49+00:00"
 },
 {
"id": "e4cd7b9e-4330-11e1-81cf-3bb50a972bda",
"urn": "sdc:sdc:centos-6:1.0.1",
"name": "centos-6",
"os": "linux",
"type": "virtualmachine",
"description": "Centos 6 VM 1.0.1",
"default": false,
"requirements": {},
"version": "1.0.1",
"created": "2012-02-15T20:04:18+00:00"
 },
  {
"id": "a9380908-ea0e-11e0-aeee-4ba794c83c33",
"urn": "sdc:sdc:percona:1.0.7",
"name": "percona",
"os": "smartos",
"type": "smartmachine",
"description": "Percona SmartMachine",
"default": false,
"requirements": {},
"version": "1.0.7",
"created": "2012-02-13T19:24:17+00:00"
 },
etc

To list the available machine sizes available for provisioning, I used sdc-listpackages (again, this is just an excerpt of its output):

# sdc-listpackages
[
 {
"name": "Large 16GB",
"memory": 16384,
"disk": 491520,
"vcpus": 3,
"swap": 32768,Cloud Analytics API
"default": false
 },
 {
"name": "XL 32GB",
"memory": 32768,
"disk": 778240,
"vcpus": 4,
"swap": 65536,
"default": false
 },
 {
"name": "XXL 48GB",
"memory": 49152,
"disk": 1048576,
"vcpus": 8,
"swap": 98304,
"default": false
 },
 {
"name": "Small 1GB",
"memory": 1024,
"disk": 30720,
"vcpus": 1,
"swap": 2048,
"default": true
 },
etc

Provisioning and terminating machines

To provision a machine, you use sdc-createmachine and pass it the 'urn' field of the dataset (OS) you 
want, and the package name for the size you want. Example:

# sdc-createmachine --dataset sdc:sdc:percona:1.3.9 --package "Large 16GB"
{
 "id": "7ccc739e-c323-497a-88df-898dc358ea40",
 "name": "a0e7314",
 "type": "smartmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:percona:1.3.9",
 "ips": [
"A.B.C.D",
"X.Y.Z.W"
 ],
 "memory": 16384,
 "disk": 491520,
 "metadata": {
"credentials": {
  "root": "",
  "admin": "",
  "mysql": ""
}
 },
 "created": "2012-06-07T17:55:29+00:00",
 "updated": "2012-06-07T17:55:30+00:00"
}

The above command provisions a Joyent SmartMachine running the Percona distribution of MySQL in the 'large' size, with 16 GB RAM. Note that the output of the command contains the external IP of the provisioned machine (A.B.C.D) and also its internal IP (X.Y.Z.W). The output also contains the passwords for the root, admin and mysql accounts, in the metadata field.

Here's another example for provisioning a machine running Ubuntu 10.04 in the 'small' size (1 GB RAM). You can also specify a machine name when you provision it:

# sdc-createmachine --dataset sdc:sdc:ubuntu-10.04:1.0.1 --package "Small 1GB" --name ggtest
{
 "id": "dc856044-7895-4a52-bfee-35b404061920",
 "name": "ggtest",
 "type": "virtualmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
 "ips": [
"A1.B1.C1.D1",
"X1.Y1.Z1.W1"
 ],
 "memory": 1024,
 "disk": 30720,
 "metadata": {
"root_authorized_keys": ""
 },
 "created": "2012-06-07T19:28:19+00:00",
 "updated": "2012-06-07T19:28:19+00:00"
}

For an Ubuntu machine, the 'metadata' field contains the list of authorized ssh keys (which I removed from my example above). Also, note that the Ubuntu machine is of type 'virtualmachine' (so a regular KVM virtual instance) as opposed to the Percona Smart Machine, which is of type 'smartmachine' and is actually a Solaris zone within a SmartOS physical host.

To list your provisioned machines, you use sdc-listmachines:

# sdc-listmachines
[
 {
"id": "36b50e4c-88d2-4588-a974-11195fac000b",
"name": "db01",
"type": "smartmachine",
"state": "running",
"dataset": "sdc:sdc:percona:1.3.9",
"ips": [
  "A.B.C.D",
  "X.Y.Z.W"
],
"memory": 16384,
"disk": 491520,
"metadata": {},
"created": "2012-06-04T18:03:18+00:00",
"updated": "2012-06-07T00:39:20+00:00"
 },

  {

    "id": "dc856044-7895-4a52-bfee-35b404061920",
    "name": "ggtest",
    "type": "virtualmachine",
    "state": "running",
    "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
    "ips": [
      "A1.B1.C1.D1",
      "X1.Y1.Z1.W1"
    ],
    "memory": 1024,
    "disk": 30720,
    "metadata": {
      "root_authorized_keys": ""
    },
    "created": "2012-06-07T19:30:29+00:00",
    "updated": "2012-06-07T19:30:38+00:00"
  },

]

Note that immediately after provisioning a machine, its state (as indicated by the 'state' field in the output of sdc-listmachines) will be 'provisioning'. The state will change to 'running' once the provisioning process is done. At that point you should be able to ssh into the machine using the private key you created when installing the CloudAPI tools.

To terminate a machine, you first need to stop it via sdc-stopmachine, then to delete it via sdc-deletemachine. Both of these tools take the id of the machine as a parameter. If you try to delete a machine without first stoppping it, or without waiting sufficient time for the machine to go into the 'stopped' state, you will get a message similar to Requested transition is not acceptable due to current resource state.

Bootstrapping a machine with user data

In my opinion, a cloud API for provisioning instances/machines is only useful if it offers a bootstrapping mechanism for running user-specified scripts upon the first run. This would enable an integration with configuration management tools such as Chef or Puppet. Fortunately, the Joyent CloudAPI does support this bootstrapping via its Metadata API.
For a quick example of a customized bootstrapping action, I changed the hostname of an Ubuntu machine and also added it to /etc/hostname. This is a toy example. In a real-life situation, you would instead download a script from one of your servers and run it in order to install whatever initial packages you need, then to configure the machine as a Chef or Puppet client, etc. In any case, you need to actually spell out the commands you need the machine to run during its initial provisioning boot process. You do that by defining the metadata 'user-script' variable:

# sdc-createmachine --dataset sdc:sdc:ubuntu-10.04:1.0.1 --package "Small 1GB" --name ggtest2 --metadata user-script='hostname ggtest2; echo ggtest2 > /etc/hostname'
{
 "id": "379c0cad-35bf-462a-b680-fc091c74061f",
 "name": "ggtest2",
 "type": "virtualmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
 "ips": [
"A2.B2.C2.D2",
"X2.Y2.Z2.W2"
 ],
 "memory": 1024,
 "disk": 30720,
 "metadata": {
"user-script": "hostname ggtest2; echo ggtest2 > /etc/hostname",
"root_authorized_keys": ""
 },
 "created": "2012-06-08T23:17:44+00:00",
 "updated": "2012-06-08T23:17:44+00:00"
}

Note that the metadata field now contains the user-script variable that I specified.

Collecting performance metrics with Joyent Cloud Analytics

The Joyent Cloud Analytics API lets you define metrics that you want to query for on your machines in the Joyent cloud. Those metrics are also graphed on the Web UI dashboard as you define them, which is a nice touch. For now there aren't that many such metrics available, but I hope their number will increase.

Joyent uses a specific nomenclature for the Analytics API. Here are some definitions, verbatim from their documentation (CA means Cloud Analytics):

metric is any quantity that can be instrumented using CA. For examples:

  • Disk I/O operations
  • Kernel thread executions
  • TCP connections established
  • MySQL queries
  • HTTP server operations
  • System load average


When you want to actually gather data for a metric, you create an instrumentation. The instrumentation specifies:
  • which metric to collect
  • an optional predicate based on the metric's fields (e.g., only collect data from certain hosts, or data for certain operations)
  • an optional decomposition based on the metric's fields (e.g., break down the results by server hostname)
  • how frequently to aggregate data (e.g., every second, every hour, etc.)
  • how much data to keep (e.g., 10 minutes' worth, 6 months' worth, etc.)
  • other configuration options
To get started with this API, you need to first see what analytics/metrics are available. You do that by calling sdc-describeanalytics (what follows is just a fragment of the output):

# sdc-describeanalytics
 "metrics": [
{
  "module": "cpu",
  "stat": "thread_samples",
  "label": "thread samples",
  "interval": "interval",
  "fields": [
    "zonename",
    "pid",
    "execname",
    "psargs",
    "ppid",
    "pexecname",
    "ppsargs",
    "subsecond"
  ],
  "unit": "samples"
},
{
  "module": "cpu",
  "stat": "thread_executions",
  "label": "thread executions",
  "interval": "interval",
  "fields": [
    "zonename",
    "pid",
    "execname",
    "psargs",
    "ppid",
    "pexecname",
    "ppsargs",
    "leavereason",
    "runtime",
    "subsecond"
  ],
etc

You can create instrumentations either via the Web UI (go to the Analytics tab) or via the command line API.

Here's an example of creating an instrumentation for file system logical operations via the sdc-createinstrumentation API:


# sdc-createinstrumentation -m fs -s logical_ops

{
  "module": "fs",
  "stat": "logical_ops",
  "predicate": {},
  "decomposition": [],
  "value-dimension": 1,
  "value-arity": "scalar",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340228876662,
  "value-scope": "interval",
  "id": "17",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/17/value/raw",
      "name": "value_raw"
    }
  ]
}

To list the instrumentations you have created so far, you use sdc-listinstrumentations:


# sdc-listinstrumentations

[
  {
    "module": "fs",
    "stat": "logical_ops",
    "predicate": {},
    "decomposition": [],
    "value-dimension": 1,
    "value-arity": "scalar",
    "enabled": true,
    "retention-time": 600,
    "idle-max": 3600,
    "transformations": {},
    "nsources": 2,/
    "granularity": 1,
    "persist-data": false,
    "crtime": 1340228876662,
    "value-scope": "interval",
    "id": "17",
    "uris": [
      {
        "uri": "/myjoyentusername/analytics/instrumentations/17/value/raw",
        "name": "value_raw"
      }
    ]
  }
]

To retrieve the actual metrics captured by a given instrumentation, call sdc-getinstrumentation and pass it the instrumentation id:

# sdc-getinstrumentation -v 17 { "value": 1248, "transformations": {}, "start_time": 1340229361, "duration": 1, "end_time": 1340229362, "nsources": 2, "minreporting": 2, "requested_start_time": 1340229361, "requested_duration": 1, "requested_end_time": 1340229362 } 

You can see how this can be easily integrated with some like Graphite in order to keep historical information about these metrics.

You can dig deeper into a specific metric by decomposing it by different fields, such as the application name. For example, to see filesystem logical operation by application name, you would call:


# sdc-createinstrumentation -m fs -s logical_ops --decomposition execname

{
  "module": "fs",
  "stat": "logical_ops",
  "predicate": {},
  "decomposition": [
    "execname"
  ],
  "value-dimension": 2,
  "value-arity": "discrete-decomposition",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340231734049,
  "value-scope": "interval",
  "id": "18",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/18/value/raw",
      "name": "value_raw"
    }
  ]
}

Now if you retrieve the value for this instrumentation, you see several values in the output, one value for application that performs file system logical operations:



# sdc-getinstrumentation -v 18
{
  "value": {
    "grep": 4,
    "ksh93": 5,
    "cron": 7,
    "gawk": 15,
    "svc.startd": 2,
    "mysqld": 163,
    "nscd": 27,
    "top": 159
  },
  "transformations": {},
  "start_time": 1340231762,
  "duration": 1,
  "end_time": 1340231763,
  "nsources": 2,
  "minreporting": 2,
  "requested_start_time": 1340231762,
  "requested_duration": 1,
  "requested_end_time": 1340231763
}

Another useful technique is to isolate metrics pertaining to a specific host (or 'zonename' in Joyent parlance). For this, you need to specify a predicate that will filter only the host with a specific id (you can see the id of a host when you call sdc-listmachines). Here's an example that captures the CPU wait time for a Percona SmartMachine which I provisioned earlier:


# sdc-createinstrumentation -m cpu -s waittime -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}'

{
  "module": "cpu",
  "stat": "waittime",
  "predicate": {
    "eq": [
      "zonename",
      "36b50e4c-88d2-4588-a974-11195fac000b"
    ]
  },
  "decomposition": [],
  "value-dimension": 1,
  "value-arity": "scalar",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340232271092,
  "value-scope": "interval",
  "id": "19",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/19/value/raw",
      "name": "value_raw"
    }
  ]
}

You can combine decomposition with predicates. For example, here's how to create an instrumentation for CPU usage time decomposed by CPU mode (user, kernel):

# sdc-createinstrumentation -m cpu -s usage -n cpumode -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}' { "module": "cpu", "stat": "usage", "predicate": { "eq": [ "zonename", "36b50e4c-88d2-4588-a974-11195fac000b" ] }, "decomposition": [ "cpumode" ], "value-dimension": 2, "value-arity": "discrete-decomposition", "enabled": true, "retention-time": 600, "idle-max": 3600, "transformations": {}, "nsources": 0, "granularity": 1, "persist-data": false, "crtime": 1340232361944, "value-scope": "point", "id": "20", "uris": [ { "uri": "/myjoyentusername/analytics/instrumentations/20/value/raw", "name": "value_raw" } ] }
Now when you retrieve the values for this instrumentation, you can see them separated by CPU mode:

# sdc-getinstrumentation -v 20 { "value": { "kernel": 24, "user": 28 }, "transformations": {}, "start_time": 1340232390, "duration": 1, "end_time": 1340232391, "nsources": 2, "minreporting": 2, "requested_start_time": 1340232390, "requested_duration": 1, "requested_end_time": 1340232391 }


Finally, here's a MySQL-specific instrumentation that you can create on a machine running MySQL, such as a Percona SmartMachine. This one is for capturing MySQL queries:

# sdc-createinstrumentation -m mysql -s queries -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}' { "module": "mysql", "stat": "queries", "predicate": { "eq": [ "zonename", "36b50e4c-88d2-4588-a974-11195fac000b" ] }, "decomposition": [], "value-dimension": 1, "value-arity": "scalar", "enabled": true, "retention-time": 600, "idle-max": 3600, "transformations": {}, "nsources": 0, "granularity": 1, "persist-data": false, "crtime": 1340232562361, "value-scope": "interval", "id": "22", "uris": [ { "uri": "/myjoyentusername/analytics/instrumentations/22/value/raw", "name": "value_raw" } ] }
Overall, I found the Joyent Cloud API and its associated Analytics API fairly easy to use, once I got past some nomenclature quirks. I also want to mention that the support I got from Joyent was very, very good. Replies to questions regarding some of the topics I discussed here were given promptly and knowledgeably. My next step is gauging the performance of MySQL on a SmartMachine, when compared to a similar-sized instance running in the Amazon EC2 cloud. Stay tuned.