Friday, February 12, 2016

Running a static website with Hugo on Google Cloud Storage

I've played a bit with Hugo, the static web site generator written in golang that has been getting a lot of good press lately. At the suggestion of my colleague Warren Runk, I also experimented with hosting the static files generated by Hugo on Google Cloud Storage (GCS). That way there is no need for launching any instances that would serve those files. You can achieve this by using AWS S3 as well of course.

Notes on GCS setup


You first need to sign up for a Google Cloud Platform (GCP) account. You get a 30-day free trial with a new account. Once you are logged into the Google Cloud console, you need to create a new project. Let's call it my-gcs-hugo-project.

You need to also create a bucket in GCS. If you want to serve your site automatically out of this bucket, you need to give the bucket the same name as your site. Let's assume you call the bucket hugotest.mydomain.com. You will have to verify that you own mydomain.com either by creating a special CNAME in the DNS zone file for mydomain.com pointing to google.com, or by adding a special META tag to the HTML file served at hugotest.mydomain.com (you can achieve the latter by temporarily CNAME-ing hugotest to www.mydomain.com and adding the HEAD tag to the home page for www).

If you need to automate deployments to GCS, it's a good idea to create a GCP Service Account. Click on the 'hamburger' menu in the upper left of the GCP console, then go to Permissions, then Service Accounts. Create a new service account and download its private key in JSON format (the key will be called something like my-gcs-hugo-project-a37b5acd7bc5.json.

Let's say your service account is called my-gcp-service-account1. The account will automatically be assigned an email address similar to my-gcp-service-account1@my-gcs-hugo-project.iam.gserviceaccount.com.

I wanted to be able to deploy the static files generated by Hugo to GCS using Jenkins. So I followed these steps on the Jenkins server as the user running the Jenkins process (user jenkins in my case):

1) Installed the Google Cloud SDK



$ wget https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz
$ tar xvfz google-cloud-sdk.tar.gz
$ cd google-cloud-sdk/
$ ./install.sh

- source .bashrc

$ which gcloud
/var/lib/jenkins/google-cloud-sdk/bin/gcloud


2) Copied the service account's private key my-gcs-hugo-project-a37b5acd7bc5.json to the .ssh directory of the jenkins user.

3) Activated the service account using the gcloud command-line utility (still as user jenkins)

$ gcloud auth activate-service-account --key-file .ssh/my-gcs-hugo-project-a37b5acd7bc5.json
Activated service account credentials for: [my-gcp-service-account1@my-gcs-hugo-project.iam.gserviceaccount.com]

4) Set the current GCP project to my-gcs-hugo-project


$ gcloud config set project my-gcs-hugo-project

$ gcloud config list
Your active configuration is: [default]

[core]
account = my-gcp-service-account1@my-gcs-hugo-project.iam.gserviceaccount.com
disable_usage_reporting = True

project = my-gcs-hugo-project

5) Configured GCS via the gsutil command-line utility (this may actually be redundant since we already configured the project with gcloud, but I leave it here in case you encounter issues with using just gcloud)

$ gsutil config -e
It looks like you are trying to run "/var/lib/jenkins/google-cloud-sdk/bin/bootstrapping/gsutil.py config".
The "config" command is no longer needed with the Cloud SDK.
To authenticate, run: gcloud auth login
Really run this command? (y/N) y
Backing up existing config file "/var/lib/jenkins/.boto" to "/var/lib/jenkins/.boto.bak"...
This command will create a boto config file at /var/lib/jenkins/.boto
containing your credentials, based on your responses to the following
questions.
What is the full path to your private key file? /var/lib/jenkins/.ssh/my-gcs-hugo-project-a37b5acd7bc5.json

Please navigate your browser to https://cloud.google.com/console#/project,
then find the project you will use, and copy the Project ID string from the
second column. Older projects do not have Project ID strings. For such projects,
click the project and then copy the Project Number listed under that project.

What is your project-id? my-gcs-hugo-project

Boto config file "/var/lib/jenkins/.boto" created. If you need to use
a proxy to access the Internet please see the instructions in that

file.

6) Added the service account created above as an Owner for the bucket hugotest.mydomain.com

7) Copied a test file from the local file system of the Jenkins server to the bucket hugotest.mydomain.com  (still logged in as user jenkins), then listed all files in the bucket, then removed the test file

$ gsutil cp test.go gs://hugotest.mydomain.com/
Copying file://test.go [Content-Type=application/octet-stream]...
Uploading   gs://hugotest.mydomain.com/test.go:             951 B/951 B

$ gsutil ls gs://hugotest.mydomain.com/
gs://hugotest.mydomain.com/test.go

$ gsutil rm gs://hugotest.mydomain.com/test.go
Removing gs://hugotest.mydomain.com/test.go...

8) Created a Jenkins job for uploading all static files for a given website to GCS

Assuming all these static files are checked in to GitHub, the Jenkins job will first check them out, then do something like this (where TARGET is the value selected from a Jenkins multiple-choice dropdown for this job):

BUCKETNAME=$TARGET

# upload all filee and disable caching (for testing purposes)
gsutil -h "Cache-Control:private" cp -r * gs://$BUCKETNAME/

# set read permissions for allUsers
for file in `find . -type f`; do
    # remove first dot from file name
    file=${file#"."}
    gsutil acl ch -u allUsers:R gs://${BUCKETNAME}${file}
done

The first gsutil command does a recursive copy (cp -r *) of all files to the bucket. This will preserve the directory structure of the website. For testing purposes, the gsutil command also sets the Cache-Control header on all files to private, which tells browsers not to cache the files.

The second gsutil command is executed for each object in the bucket, and it sets the ACL on that object so that the object has Read (R) permissions for allUsers (by default only owners and other specifically assigned users have Read permissions). This is because we want to serve a public website out of our GCS bucket.

At this point, you should be able to hit hugotest.mydomain.com in a browser and see your static site in all its glory.

Notes on Hugo setup


I've only dabbled in Hugo in the last couple of weeks, so these are very introductory-type notes.

Installing Hugo on OSX and creating a new Hugo site

$ brew update && brew install hugo
$ mkdir hugo-sites
$ cd hugo-sites
$ hugo new site hugotest.mydomain.com
$ git clone --recursive https://github.com/spf13/hugoThemes themes
$ cd hugotest.mydomain.com
$ ln -s ../themes .

At this point you have a skeleton directory structure created by Hugo (via the hugo new site command) under the directory hugotest.mydomain.com:

$ ls
archetypes  config.toml content     data        layouts     static themes

(note that we symlinked the themes directory into the hugotest.mydomain.com directory to avoid duplication)

Configuring your Hugo site and choosing a theme

One file you will need to pay a lot of attention to is the site configuration file config.toml. The default content of this file is deceptively simple:

$ cat config.toml
baseurl = "http://replace-this-with-your-hugo-site.com/"
languageCode = "en-us"
title = "My New Hugo Site"

Before you do anything more, you need to decide on a theme for your site. Browse the Hugo Themes page and find something you like. Let's assume you choose the Casper theme. You will need to become familiar with the customizations that the theme offers. Here are some customizations I made in config.toml, going by the examples on the Casper theme web page:

$ cat config.toml
baseurl = "http://hugotest.mydomain.com/"
languageCode = "en-us"
title = "My Speedy Test Site"
newContentEditor = "vim"

theme = "casper"
canonifyurls = true

[params]
  description = "Serving static sites at the speed of light"
  cover = "images/header.jpg"
  logo = "images/mylogo.png"
  # set true if you are not proud of using Hugo (true will hide the footer note "Proudly published with HUGO.....")
  hideHUGOSupport = false

#  author = "Valère JEANTET"
#  authorlocation = "Paris, France"
#  authorwebsite = "http://vjeantet.fr"
#  bio= "my bio"
#  googleAnalyticsUserID = "UA-79101-12"
#  # Optional RSS-Link, if not provided it defaults to the standard index.xml
#  RSSLink = "http://feeds.feedburner.com/..."
#  githubName = "vjeantet"
#  twitterName = "vjeantet"
  # facebookName = ""
  # linkedinName = ""

I left most of the Casper-specific options commented out and only specified a cover image, a logo and a description. 

Creating a new page

If you want blog-style posts to appear on your home page, create a new page with Hugo under a directory called post (some themes want this directory to be named post and others want it posts, so check what the theme expects). 

Let's assume you want to create a page caled hello-world.md (I haven't even mentioned this so far, but Hugo deals by default with Markdown pages, so you will need to brush up a bit on our Markdown skills). You would run:

$ hugo new post/hello-world.md

This creates the post directory under the content directory, creates a file called hello-world.md in content/post, and opens up the file for editing in the editor you specified as the value for newContentEditor in config.toml (vim in my case). The default contents of the md file are specific to the theme you used. For Casper, here is what I get by default:

+++
author = ""
comments = true
date = "2016-02-12T11:54:32-08:00"
draft = false
image = ""
menu = ""
share = true
slug = "post-title"
tags = ["tag1", "tag2"]
title = "hello world"

+++


Now add some content to that file and save it. Note that the draft property is set to false by the Casper theme. Other themes set it to true, in which case it would not be published by Hugo by default. The slug property is set by Casper to "post-title" by default. I changed it to "hello-world". I also changed the tags list to only contain one tag I called "blog".

At this point, you can run the hugo command by itself, and it will take the files it finds under content, static, and its other subdirectories, turn them into html/js/css/font files and save it in a directory called public:

$ hugo
0 draft content
0 future content
1 pages created
3 paginator pages created
1 tags created
0 categories created
in 55 ms

$ find public
public
public/404.html
public/css
public/css/nav.css
public/css/screen.css
public/fonts
public/fonts/example.html
public/fonts/genericons.css
public/fonts/Genericons.eot
public/fonts/Genericons.svg
public/fonts/Genericons.ttf
public/fonts/Genericons.woff
public/index.html
public/index.xml
public/js
public/js/index.js
public/js/jquery.fitvids.js
public/js/jquery.js
public/page
public/page/1
public/page/1/index.html
public/post
public/post/hello-world
public/post/hello-world/index.html
public/post/index.html
public/post/index.xml
public/post/page
public/post/page/1
public/post/page/1/index.html
public/sitemap.xml
public/tags
public/tags/blog
public/tags/blog/index.html
public/tags/blog/index.xml
public/tags/blog/page
public/tags/blog/page/1
public/tags/blog/page/1/index.html

That's quite a number of files and directories created by hugo. Most of it is boilerplate coming from the theme. Our hello-world.md file was turned into a directory called hello-world under public/post, with an index.html file dropped in it. Note that the Casper theme names the hello-world directory after the slug property in the hello-world.md file.

Serving the site locally with Hugo

Hugo makes it very easy to check your site locally. Just run

$ hugo server
0 draft content
0 future content
1 pages created
3 paginator pages created
1 tags created
0 categories created
in 35 ms
Watching for changes in /Users/grig.gheorghiu/mycode/hugo-sites/hugotest.mydomain.com/{data,content,layouts,static,themes}
Serving pages from memory
Web Server is available at http://localhost:1313/ (bind address 127.0.0.1)
Press Ctrl+C to stop


Now if you browse to http://localhost:1313 you should see something similar to this:


Not bad for a few minutes of work.

For other types of content, such as static pages not displayed on the home page, you can create Markdown files in a pages directory:

$ hugo new pages/static1.md
+++
author = ""
comments = true
date = "2016-02-12T12:24:26-08:00"
draft = false
image = ""
menu = "main"
share = true
slug = "static1"
tags = ["tag1", "tag2"]
title = "static1"

+++

Static page 1.

Note that the menu property value is "main" in this case. This tells the Casper theme to create a link to this page in the main drop-down menu available on the home page.

If you run hugo server again, you should see something the menu available in the upper right corner, and a link to static1 when you click on the menu:




To deploy your site to GCS, S3 or regular servers, you need to upload the files and directories under the public directory. It's that simple.

I'll stop here with my Hugo notes. DigitalOcean has a great tutorial on installing and running Hugo on Ubuntu 14.04.




Thursday, February 11, 2016

Some notes on Ansible playbooks and roles

Some quick notes I jotted down while documenting our Ansible setup. Maybe they will be helpful for people new to Ansible.

Ansible playbooks and roles


Playbooks are YAML files that specify which roles are applied to hosts of certain type.

Example: api-servers.yml

$ cat api-servers.yml
---

- hosts: api
 sudo: yes
 roles:
   - base
   - tuning
   - postfix
   - monitoring
   - nginx
   - api
   - logstash-forwarder

This says that for each host in the api group we will run tasks defined in the roles listed above.

Example of a role: the base role is one that (in our case) is applied to all hosts. Here is its directory/file structure:

roles/base
roles/base/defaults
roles/base/defaults/main.yml
roles/base/files
roles/base/files/newrelic
roles/base/files/newrelic/newrelic-sysmond_2.0.2.111_amd64.deb
roles/base/files/pubkeys
roles/base/files/pubkeys/id_rsa.pub.jenkins
roles/base/files/rsyslog
roles/base/files/rsyslog/50-default.conf
roles/base/files/rsyslog/60-papertrail.conf
roles/base/files/rsyslog/papertrail-bundle.pem
roles/base/files/sudoers.d
roles/base/files/sudoers.d/10-admin-users
roles/base/handlers
roles/base/handlers/main.yml
roles/base/meta
roles/base/meta/main.yml
roles/base/README.md
roles/base/tasks
roles/base/tasks/install.yml
roles/base/tasks/main.yml
roles/base/tasks/newrelic.yml
roles/base/tasks/papertrail.yml
roles/base/tasks/users.yml
roles/base/templates
roles/base/templates/hostname.j2
roles/base/templates/nrsysmond.cfg.j2
roles/base/vars
roles/base/vars/main.yml

An Ansible role has the following important sub-directories:

defaults - contains the main.yml file which defines default values for variables used throughout other role files; note that the role’s files are checked in to GitHub, so these values shouldn’t contain secrets such as passwords, API keys etc. For those types of variables, use group_vars or host_vars files which will be discussed below.

files - contains static files that are copied over by ansible tasks to remote hosts

handlers - contains the main.yml file which defines actions such as stopping/starting/restarting services such as nginx, rsyslog etc.

meta - metadata about the role; things like author, description etc.

tasks - the meat and potatoes of ansible, contains one or more files that specify the actions to be taken on the host that is being configured; the main.yml file contains all the other files that get executed

Here are 2 examples of task files, one for configuring rsyslog to send logs to Papertrail and the other for installing the newrelic agent:

$ cat tasks/papertrail.yml
- name: copy papertrail pem certificate file to /etc
 copy: >
   src=rsyslog/{{item}}
   dest=/etc/{{item}}
 with_items:
   - papertrail-bundle.pem

- name: copy rsyslog config files for papertrail integration
 copy: >
   src=rsyslog/{{item}}
   dest=/etc/rsyslog.d/{{item}}
 with_items:
   - 50-default.conf
   - 60-papertrail.conf
 notify:
    - restart rsyslog

$ cat tasks/newrelic.yml
- name: copy newrelic debian package
 copy: >
   src=newrelic/{{newrelic_deb_pkg}}
   dest=/opt/{{newrelic_deb_pkg}}

- name: install newrelic debian package
 apt: deb=/opt/{{newrelic_deb_pkg}}

- name: configure newrelic with proper license key
 template: >
   src=nrsysmond.cfg.j2
   dest=/etc/newrelic/nrsysmond.cfg
   owner=newrelic
   group=newrelic
   mode=0640
 notify:
    - restart newrelic

templates - contains Jinja2 templates with variables that get their values from defaults/main.yml or from group_vars or host_vars files. One special variable that we use (and is not defined in these files, but instead is predefined by Ansible) is inventory_hostname which points to the hostname of the target being configured. For example, here is the template for a hostname file which will be dropped into /etc/hostname on the target:

$ cat roles/base/templates/hostname.j2
{{ inventory_hostname }}

Once you have a playbook and a role, there are a few more files you need to take care of:

  • hosts/myhosts - this is an INI-type file which defines groups of hosts. For example the following snippet of this file defines 2 groups called api and magento.

[api]
api01 ansible_ssh_host=api01.mydomain.co
api02 ansible_ssh_host=api02.mydomain.co

[magento]
mgto ansible_ssh_host=mgto.mydomain.co

The api-servers.yml playbook file referenced at the beginning of this document sets the hosts variable to the api group, so all Ansible tasks will get run against the hosts included in that group. In the hosts/myhosts file above, these hosts are api01 and api02.

  • group_vars/somegroupname - this is where variables with ‘secret’ values get defined for a specific group called somegroupname. The group_vars directory is not checked into GitHub. somegroupname needs to exactly correspond to the group defined in hosts/myhosts.

Example:

$ cat group_vars/api
ses_smtp_endpoint: email-smtp.us-west-2.amazonaws.com
ses_smtp_port: 587
ses_smtp_username: some_username
ses_smtp_password: some_password
datadog_api_key: some_api_key
. . . other variables (DB credentials etc)


  • host_vars/somehostname - this is where variables with ‘secret’ values get defined for a specific host called somehostname. The host_vars directory is not checked into GitHub. somehostname needs to exactly correspond to a host defined in hosts/myhosts.

Example:

$ cat host_vars/api02
insert_sample_data: false

This overrides the insert_sample_data variable and sets it to false only for the host called api02. This could also be used for differentiating between a DB master and slave for example.

Tying it all together

First you need to have ansible installed on your local machine. I used:

$ pip install ansible

To execute a playbook for a given hosts file against all api server, you would run:

$ ansible-playbook -i hosts/myhosts api-servers.yml

The name that ties together the hosts/myhosts file, the api-servers.yml file and the group_vars/groupname file is in this case api.

You need to make sure you have the desired values for that group in these 3 files:
  • hosts/myhosts: make sure you have the desired hosts under the [api] group
  • api-server.yml: make sure you have the desired roles for hosts in the api group
  • group_vars/api: make sure you have the desired values for variables that will be applied to the hosts in the api group

Launching a new api instance in EC2

I blogged about this here.

Updating an existing api instance


Make sure the instance hostname is the only hostname in the [api] group in the hosts/myhosts file. Then run:

$ ansible-playbook -i hosts/myhosts api-servers.yml