Tuesday, October 13, 2015

Using golang to integrate Pingdom checks into a Cachet status page

Recently I've been looking at a decent open source status page system for our API endpoints. After looking around, I decided to give Cachet a try. Installing Cachet on an Ubuntu 14.04 box is not hard, but not trivial either; you need to spend a bit of a time on learning how to deploy a php/laravel/composer system. I used the official Cachet installation doc, as well as this top-notch Digital Ocean tutorial on installing nginx and php5-fpm on Ubuntu.

We already use Pingdom as an external monitoring system for our API endpoints, so it made sense to try integrating our existing Pingdom checks into Cachet. Pingdom support for 3rd party services is very limited though. They do offer a way to call a webhook as part of an alerting policy, but AFAICT it's not possible to customize that call with authentication information, which is what 3rd party APIs usually require.

I ended up rolling my own short and sweet golang program that combines russellcardullo's go-pingdom package with making Cachet API calls (it would be cool if somebody came up with a Cachet golang binding, and not hard, but I am punting on that for now.)

To start with, you need to create a Pingdom Application Key. It's not obvious how this is done. You need to go to your my.pingdom.com, click the Sharing icon in the left vertical menu, then click the Pingdom API link, then click the Register Application button. At that point you give your application a name and you get a key. Not sure why they make it so hard.

You also need a Cachet API Token. This is available in the admin user's profile page.

Two main entities in Cachet are Components and Incidents. Components correspond roughly to the Web pages or API endpoints whose status you want to display. Incidents correspond to the results of Pingdom checks for a particular Web page or API endpoint.

I defined some constants in my golang code to make things easier to read:

const PINGDOM_USERNAME="xxxx"
const PINGDOM_PASS="xxxx"
const PINGDOM_API_KEY="xxxx"

const CACHET_API_KEY="xxxx"
const CACHET_INCIDENT_URL="https://status.mycompany.com/api/v1/incidents"
const CACHET_COMPONENT_URL="https://status.mycompany.com/api/v1/components"

const COMPONENT_OPERATIONAL = "1"
const COMPONENT_PERF_ISSUES = "2"
const COMPONENT_PARTIAL_OUTAGE = "3"
const COMPONENT_MAJOR_OUTAGE = "4"

const INCIDENT_SCHEDULED = "0"
const INCIDENT_INVESTIGATING = "1"
const INCIDENT_IDENTIFIED = "2"
const INCIDENT_WATCHING = "3"
const INCIDENT_FIXED = "4"

The component and incident status values above are described in the Cachet documentation.

Here is a code snippet that determines the status of a Cachet component:

func get_cachet_component_status(component_id string) string {
url := fmt.Sprintf("%s/%s", CACHET_COMPONENT_URL, component_id)
token := CACHET_API_KEY
        fmt.Printf("Sending request to %s\n", url)
status_code, json_data := send_http_req_query_string("GET", url, token, nil)
if status_code != 200 {
return ""
}
component_status := get_nested_item_property(json_data, "data", "status")
        return component_status
}

I use a couple of helper functions that I found useful, swiped from an integration test suite I wrote. For making HTTP calls, I found the grequests package, which is a golang port of the Python requests library, extremely useful. Here is send_http_req_query_string which in this case makes a GET call to the Cachet API and also sets the X-Cachet-Token header for authentication purposes.

func send_http_req_query_string(req_type, url, token string, query_string map[string]string) (int, map[string]interface{}) {
    ro := &grequests.RequestOptions{}
    ro.Headers = map[string]string{"X-Cachet-Token":token}
    if query_string != nil {
        ro.Params = query_string
    }
    var resp *grequests.Response
    return _http_response(req_type, url, ro, resp)
}

This in turn calls _http_response:

func _http_response(req_type, url string, ro *grequests.RequestOptions, resp *grequests.Response) (int, map[string]interface{}) {
switch req_type {
case "POST":
resp, _ = grequests.Post(url, ro)
case "PUT":
resp, _ = grequests.Put(url, ro)
case "PATCH":
resp, _ = grequests.Patch(url, ro)
case "DELETE":
resp, _ = grequests.Delete(url, ro)
case "GET":
resp, _ = grequests.Get(url, ro)
default:
fmt.Printf("HTTP method %s not recognized\n", req_type)
return 0, nil
}

fmt.Printf("Sending HTTP %s request to: %s\n", req_type, url)
var json_data map[string]interface{}
status_code := resp.StatusCode
err := resp.JSON(&json_data)
if err != nil {
fmt.Println("Unable to coerce to JSON", err)
return 0, nil
}
return status_code, json_data
}

One other function highlighted in get_cachet_component_status is get_nested_item_property which is useful for getting a value out of a JSON nested dictionary such as 'data' below:

{
    "data": {
        "id": 1,
        "name": "API",
        "description": "This is the Cachet API.",
        "link": "",
        "status": 1,
        "order": 0,
        "group_id": 0,
        "created_at": "2015-07-24 14:42:10",
        "updated_at": "2015-07-24 14:42:10",
        "deleted_at": null,
        "status_name": "Operational"
    }
}


Here is the get_nested_item_property function:

func get_nested_item_property(json_data map[string]interface{}, item_name, item_property string) string {
item := json_data[item_name]
prop := item.(map[string]interface{})[item_property]
return prop.(string)
}

This is a useful example of golang code which converts a JSON value to a map[string]interface{}, grabs another value out of that map and converts that value to string.

Here is an example of a PUT call which updates the status of a Cachet component:

func update_cachet_component_status(component_id, status string ) (int, map[string]interface{}) {
url := fmt.Sprintf("%s/%s", CACHET_COMPONENT_URL, component_id)
        token := CACHET_API_KEY
        fmt.Printf("Sending request to %s\n", url)

        payload := map[string]string{
                "id":  component_id,
                "status": status,
        }

        status_code, json_data := send_http_req_json_body("PUT", url, token, payload)
        return status_code, json_data
}

This function calls another helper function useful for POST-ing or PUT-ing JSON payloads to an API endpoint. Here is that function, which also uses the grequests package:

func send_http_req_json_body(req_type, url, token string, json_body interface{}) (int, map[string]interface{}) {
    ro := &grequests.RequestOptions{}
    if json_body != nil {
        ro.JSON = json_body
    }
    ro.Headers = map[string]string{"X-Cachet-Token":token}
    var resp *grequests.Response
    return _http_response(req_type, url, ro, resp)
}

In order to get the details of a given Pingdom check, I used the go-pingdom package:

func get_pingdom_check_details(client *pingdom.Client, check_id int) (string, string, string) {
check_details, err := client.Checks.Read(check_id)
if err != nil {
fmt.Println(err)
return "", "", ""
}
/*
fmt.Println("ID:", check_details.ID)
fmt.Println("Name:", check_details.Name)
fmt.Println("Resolution:", check_details.Resolution)
fmt.Println("Created:", time.Unix(check_details.Created, 0))
fmt.Println("Hostname:", check_details.Hostname)
fmt.Println("Status:", check_details.Status)
fmt.Println("LastErrorTime:", time.Unix(check_details.LastErrorTime, 0))
fmt.Println("LastTestTime:", time.Unix(check_details.LastTestTime, 0))
fmt.Println("LastResponseTime:", check_details.LastResponseTime)
fmt.Println("Paused:", check_details.Paused)
fmt.Println("ContactIds:", check_details.ContactIds)
*/
hostname := check_details.Hostname
status := check_details.Status
message := fmt.Sprintf("LastErrorTime: %v\nLastTestTime: %v\n",
time.Unix(check_details.LastErrorTime, 0),
time.Unix(check_details.LastTestTime, 0))
return hostname, status, message
}

Note that I only return a few of the fields available in the go-pingdom CheckResponse struct (which is defined here).

The main function in my check_pingdom_and_post_cachet_status.go program does the following for a given mapping of Cachet components to Pingdom checks:

  • check status of Cachet component
  • check status of Pingdom check
  • if Pingdom check is down, create new incident for the given component and set the status of the incident to INCIDENT_IDENTIFIED; also update the given components and set its status to COMPONENT_PARTIAL_OUTAGE
  • if Pingdom check is up
    • if the component status is already COMPONENT_OPERATIONAL, do nothing (we don't want consecutive 'healthy' status updates)
    • otherwise, create new incident for the given component and set the status of the incident to INCIDENT_FIXED; also update the given components and set its status to COMPONENT_OPERATIONAL
Finally, I built a go binary (via 'go build') out of 2 files I have: check_pingdom_and_post_cachet_status.go and utils.go.


I scheduled the resulting binary to run out of cron every 5 minutes.

For the curious, here is a gist with the code for check_pingdom_and_post_cachet_status.go and utils.go.


Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...