Random stuff

Laying out roles, inventories and playbooks

2015-07-02T00:00:00+00:00

(revised 20181110 per @theenglishway suggestions)

I have been writing playbooks for quite a while now. Along the way, I went through various stages, and used different ways to layout Ansible files. I guess that after going down this trial and error path, I finally came up with something I will stick to.

I am not saying that this is the be-all and end-all of Ansible files layout but may be it will fast forward you to a saner file layout, and you’ll be able to move on from there. This post will probably help you if you are new to Ansible, trying to figure out what to put and where. I hope it will prove usefull if you have some Ansible experience too.

Some terminology

In this post, I will mostly talk about 3 things: roles, inventories and playbooks. Other items do exist (plays, tasks, …) but those 3 elements shape the big picture of the layout.

Roles

A role is a collection of tasks and templates (among other things, but those are the most common) focused on one very specific goal. For instance, you can have a role that installs nginx, another that deploys ssh keys for admins, etc…

Nginx role will install and configure nginx. Nothing else. It won’t create DNS entries, trim logs, add a ftp server or anything. It just installs nginx. Period.

Inventories

An inventory is a list of hosts, eventually assembled into groups, on which you will run ansible playbooks. Ansible automatically puts all defined hosts in the aptly named group all.

For instance, you could have hosts www1 and www2, assembled in group webservers, and later reference the group or individual hosts, depending on your needs.

Inventories can also come with variables applied to hosts or groups (including all).

Inventories can be dynamic. If the inventory file is executable, Ansible will run it and use its output as the inventory (note that, in this case, the format is not the same as static inventory).

You can of course have multiple inventories, segregated from each other. We will take advantage from this later on.

Playbooks

The last piece of the puzzle is the playbook. The playbook is the pivot between and inventory and roles. This is where you basically tell Ansible: please install roles foo, bar and baz on machines alice, bob and charlie.

Role layout

Role layout is pretty well documented at Ansible website. A role contains several directories. All directories are optional besides tasks. For each directory, the entry point is main.yml. Thus, the only compulsory file in a role is tasks/main.yml.

ansible-foobar/
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   ├── check_vars.yml
│   ├── foobar.yml
│   └── main.yml
└── templates
    └── foobar.conf.j2

Let’s cover briefly the layout an see the function of each file and directory.

`defaults/main.yml`

This directory contains defaults for variables used in roles. I encourage you to define every variable used in your role, for several reasons:

this file will be a nice and always up to date reference list of settings configuration in your roles
having configured variables will prevent your role failing in an uncontrolled way (more on this later).

If some of these variables are used in templates to generate config files, I highly encourage you to use your target OS defaults. The principle of least surprise should apply here.

Best practices assumes that you are using pseudo-namespacing for your role’s variables (e.g. for role foobar, all variables should begin with foobar_) to avoid collisions with other roles.

`files/`

This directory holds files that do not require Jinja interpolation, and can be copied as-is on the remote nodes.

`handlers/main.yml`

This is where you define handlers that get notified by tasks. Handlers are just standard tasks. You can use include in this file if you want to separate handlers (for different OSes versions for instances), but try to keep the file number as low as possible so you don’t end up hunting down stuff everywhere.

If your handler restarts any service, you have to make sure that the service config file is valid before attempting to restart it. Some daemons allow this (e.g. nginx, haproxy, apache). If your service does not, provide some fallback mechanism. You don’t want your playbook to screw up your running system because you typoed a configuration variable. See the validate option in the template module.

Note that handlers are just standard tasks.

`meta/main.yml`

This metadata has (AFAIK) only two variables:

galaxy_info: meta information for galaxy about your role. You just don’t need this if you don intend to push your role to Galaxy. For details on the format, see TODO: find ref
dependencies: what roles this role depends on.

The latter is of utmost importance, and setting it right deserves a blog post on it’s own. Until then, the rule of thumb to remember is to only include compulsory role dependencies for the target host.

This means that adding nginx dependency in a php-fpm role sounds perfectly reasonable¹. However, adding a mysql dependency to your web application role is not, because mysql can be deployed on another server.

A note 3 years later: I do not use dependencies anymore. I had issues regarding role’s defaults variables behavior. Also, the playbook is the main focus area when building infrastructure code. Having explicit dependencies in the playbook is the way to go. No weird or hard to track magic.

`tasks/main.yml`

This file is the tasks entry point. However, it should be mostly empty. Why ? Because you want to use Ansible tags. Tags are a great way to limit task execution for an Ansible run, where only tagged tasks are run.

For instance, in a playbook that deploys your application, you could choose to run only tasks regarding nginx.

The problem is that tagging every task in main.yml would be cumbersome, error prone, and clutter the code unnecessarily.

The best way to tag all your tasks is to include your real task file from tasks/main.yml and tag the whole file:

- import_tasks: foobar.yml
  tags:
    - foobar

Here, I name the real task file foobar.yml with the same name as the role (quite handy with find or locate; no need to guess which main.yml you are looking for) and apply the tag foobar to all tasks in the role.

You can repeat this if you have a big list of tasks and want to split them in several files. You could, for instance, separate configuration and installation matters, and add another specific tag for each of them:

- import_tasks: foobar-install.yml
  tags:
    - foobar
    - foobar:install

- import_tasks: foobar-config.yml
  tags:
    - foobar
    - foobar:config

Here I added two tags to the installation part (foobar and foobar:install), and two for the configuration part (foobar and foobar:config).

Note that the : between, for instance, foobar and config has no meaning. Ansible treats tags as dumb strings. It is just a personnal convention (Redis like) for refining tags.

With this setup, you could run only the configuration part of your role by issuing:

ansible-playbook playbook.yml -t foobar:config

The -t and -l combination is a very powerful weapon to target a specific host with a precise change (think of this as pointing to a matrix cell targetting host (i.e. row) and tag (i.e. column)).

A word of caution

Do not overdo tags: most of the time, this is YAGNI (You Ain’t Gonna Need It). Create a tag if you’re gonna need it. It can be hard to mentally predict what will happen if you do too much. Beware of the never tag, that will skip tasks unless you explicitely use another tag.

For instance:

- import_tasks: foobar-uninstall.yml
  tags
    - never
    - foobar

will execute tasks in foobar-uninstall.yml if tag foobar is specified at the command line.

`tasks/check_vars.yml`

I use this file to ensure that required variables are defined.

#
# Checking that required variables are set
#
- name: Checking that required variables are set
  fail: msg="{{ item }} is not defined"
  when: item not in vars
  loop:
    - foobar_database
    - foobar_deploy_user

Then, include this file in tasks/main.yml:

- import_tasks: check_vars.yml
  tags:
    - foobar
    - foobar:check
    - check

- import_tasks: foobar.yml
  tags:
    - foobar

`templates/*`

This is the place where templates (i.e. files with interpolated variables goes). While this is not necessary, I often reference them using a relative path like so:

- name: Template foo
  template:
    src: ../templates/foo.conf.j2
    dest: /some/place/in/the/node/filesystem/foo.conf

The goal of using relative path is to be able to hit gf in Vim and open the file directly. You can get rid of that and just use src: foo.conf.j2. It is just a readability/convenience tradeoff.

The file name I use is the intended filename at the destination, appended with .j2 so it is clear that it is a Jinja2 template, and easier to search (find or locate).

Some folks like to replicate the destination hierarchy (e.g. src: etc/sysconfig /network-scripts/ifcfg-ethx.cfg.j2). This is a matter of taste, but personaly I don’t see the point of having those deep hierarchies in the role if the naming is correct.

`vars/main.yml`

It is sometimes difficult to grok the difference between vars/main.yml and defaults/main.yml. After all, they both contain variable assignements.

I do not always use a vars/main.yml, but when I do, I put “constants like” variables in it. These are variables that are not intended to be overriden.

For instance the github repository for a particular piece of code (e.g. your web application) will certainly go there. However, the version you want to deploy won’t.

All in all, it is just a mechanism to take those values out of tasks files readability and role life cycle.

Inventories and playbook layout

A playbook glues together roles and inventories. Thus playbooks depend on roles and inventories. But while you have mechanisms to list roles requirements in a playbook, you don’t have any for inventories.

Since the playbook can not live without the targeted inventories I include my inventories in my playbooks.

playbook-foobar/
├── ansible.cfg
├── requirements.txt
├── roles/
│   └── requirements.yml
├── inventories
│   ├── development
│   |   ├── group_vars
│   |   │   └── all
│   |   └── hosts
│   ├── integration
│   |   ├── group_vars
│   |   │   └── all
│   |   └── hosts
│   └── production
│       ├── group_vars
│       │   └── all
│       └── hosts
├── site.yml
└── playbooks
    ├── 10_database.yml
    └── 20_stuff.yml

`ansible.cfg`, `roles/` and `roles/requirements.yml`

This file controles ansible behaviour. You can have one in /etc/ansible or as a personal dotfile (~/.ansible.cfg). Adding an ansible.cfg file in the playbook root will ensure that the required settings for the playbook to run are really there. The precedence order for Ansible config files is²:

ANSIBLE_CONFIG (an environment variable pointing to a file)
ansible.cfg (in the current directory)
.ansible.cfg (in the home directory)
/etc/ansible/ansible.cfg

Ansible will use the first config file found.

In this config file, I always set at least two options:

hostfile = ./inventories/dev
roles_path = ./roles:/some/path/to/roles/repos

The first one (hostfile) sets which inventory Ansible will use. More explanations will come below.

The second one set the path where Ansible will look for roles. I typically set two directories here (separated by :, like shell’s PATH):

the first directory will be used by ansible galaxy to install imported roles. I set it to ./roles but the name doesn’t matter. Don’t forget to add the directory content (except requirements.yml) in your playbook’s .gitignore like so:

!/roles
/roles/*
!/roles/requirements.yml

sometimes I add a second directory that points to my roles developmenent directory path

The advantages for this setup are two fold: first, you have a dedicated path, ignored by your SCM, where you will download roles. The roles will be searched there first. Secondly, if a role is not found, it will be searched in your role development directory. This let you hack on your roles while writing a playbook. You don’t need to go through a commit/push/install cycle when you are coding your roles for this playbook.

Roles dependencies for your playbook are listed in requirements.yml and can be installed with ansible-galaxy install -r requirements.yml:

# Role on galaxy
- you.rolename
# Public role on github
- name: role-public
  src: https://github.com/erasme/role-public.git
# Private role on github
- name: role-private
  src: git+ssh://git@github.com/you/role-private.git

`inventories/`

This directory holds all inventories you want to apply your playbook too. The most common pattern is to use per-environment inventories: one for development, one for integration, another for production, etc…

Of course, the hostfile variable in ansible.cfg should point to development to avoid accidentaly messing with production. Executing the playbook on non-development inventories will force you tu use the -i, which is a good safety measure.

While you can define variables in groups (in group_vars) and hosts (host_vars), you should stuff as much variables as possible in group_vars/all. The rationale is that it is much easier to find a variable when a single file is involved. Variables scattered in a dozen of files are not manageable.

And when you’ll want to create an additional inventory (e.g. create production from development), it will be much easier to change a single file and set the variables to proper values than to do the same in several files.

Note that group_vars/all can be a directory containing several files. I usually split variables in a clear text file (group_vars/all/all) and a ciphered one (group_vars/all_secret) using the transparent vaulting techniques described in this post.

Note 3 years later: now ansible allow you to vault a single variable in an inventory. Use it !

Here is a handy bash alias that crypts text selected with a mouse:

alias vault_clip_crypt='echo Passing $(xclip -rmlastnl -o) to ansible-vault && echo -n "$(xclip -rmlastnl -o)" | ansible-vault encrypt_string'

`site.yml` and `playbooks/`

This directory contains the playbooks themselves. I always create a “master” playbook called site.yml in the playbook root directory, which includes all other playbooks located in playbooks/.

I prefix playbooks with a number (Basic style) so I get a sense of the order playbook will be executed just by looking at playbooks/ content.

For instance:

#!/usr/bin/env ansible-playbook

- import_playbook: playbooks/10_database.yml
- import_playbook: playbooks/20_stuff.yml

The rationale is to be able to use ansible-pull easily if needed (ansible- pull, by default, tries to execute a playbook calledsite.yml). The other point is to split playbook in related parts.

For instance, you could have a playbook the takes care of setting up the database, another that will set the OS level stuff (e.g. ssh keys, firewalling, …), another one that takes care of deploying your web application, etc… When needed, You can use all the playbooks at once with site.yml, or just focus on a specific problem running the appropriate playbook (no need to run the ssh-key setup if you’re just deploying the latest version of your web application).

The shebang line at the top of the file (#!/usr/bin/env ansible-playbook) will make the playbook directly executable (adjust ansible-playbook path and chmod +x the playbook file). You can still pass additional ansible-playbook parameters if required.

`requirements.txt`

This file contains the result of a pip freeze. I now only use pip under virtualenv to install ansible and required modules. It makes it really easy to switch ansible (and even python) version between projects.

So when someone needs to work on this project, the workflow is simple:

git clone http://github.com/some/playbook-repos
cd playbook-repos
mkvirtualenv playbook-repos --no-site-packages
pip install -r requirements.txt
ansible-galaxy install -r roles/requirements.yml

and you’re good to go.

Layout Antipatterns

When I started using Ansible, I cumulated several antipatterns at the same time: trying to emcompass all my infrastructure in a single inventory containing per-host fine grained variables, used in a single playbook, without using any role.

While this sounds feasible, it is doomed to failure unless you manage a very small infrastructure. Let’s zoom in briefly on each mistake.

Trying to encompass all your infrastructure in one playbook

Is is tempting to aim for a one-liner that will magically deploy all your infrastructure in one shot. This gives you some bragging rights at your next meetup, and feels like the ultimate sysadmin masterpiece.

However, it has many drawbacks:

it will be slow: do you really want to run a playbook over dozens of more tasks or roles, just to change an entry in /etc/hosts ? Yes, there are workaround for this, but it will require some command line magic, a lot of thinking.
it mixes bananas and apples: you should strive for separation of concerns in your playbooks if you want be able to read them (and, as a consequence, maintain them).

As a consequence, your infrastructure code will be unnecessary hard to test and maintain.

Per-host fine grained variables

This is a corolary of the previous antipattern: when you try to encompass your whole infrastructure, you start to think, inheritance, variables overriding and refining.

And while doing this, you add considerable complexity to your inventories. It is very hard to track down variables definitions when you overrides them in group_vars/some_group, group_vars/all, hosts_vars/machine, role defaults, …

Now this can get even worse when you use the hash_behavior: merge Ansible configuration setting: it introduces more confusion, and makes your Ansible work potentially unshareable with people using hash_behaviour: replace. Since I am guilty on this one, it is time to make some apologies. Sorry folks. Michael DeHaan did not like it, and he was right.

Single playbook

A single playbook relates to the first Sin again, but also applies to more focused playbooks where you only deploy one thing. Splitting your playbooks between various logically related roles will fasten your deployments. Again, why running ssh key distribution, storage cluster deployment, web stack, middlewares and application when you just change the color of a button in your web app ?

Split your playbook in related parts that reflects your stack architecture. They will be faster and easier to use.

No roles (tasks only)

Well, this is obvious. Even if you don’t want to share, make roles and strive for code reuse. Reused code will save you time of course, but it is also battle tested since it is used more frequently.

Tasks-only playbook can be used for a quick hit and run, solving a transient problem that doesn’t offer any code reuse opportunities.

I also try to avoid tasks along roles in playbooks: this hurts the abstraction level you manage to build using roles. When thinking in terms of roles, you don’t need to think about the nitty gritty details of the roles when reading your playbooks. If your roles are thouroughly tested, you can read your infrastructure in seconds. Add tasks to the mix, and you loose this superpower.

Yes, you could separate your application server (e.g. php-fpm) and put it on a different machine than your webserver, it ll depends on your local context. ↩
http://docs.ansible.com/intro_configuration.html ↩

Laying out roles, inventories and playbooks was originally published by Michel Blanc at Random stuff on July 02, 2015.

Decoupling your Ansible roles

2015-06-27T00:00:00+00:00

Having tightly coupled role is the best way to have a hard time maintaining roles and playbooks, and live in fear of changing anything in them.

Here is a journey into role decoupling.

The problem

Let say we have a role (my_app) that depend on php-fpm role. In the php-fpm role, we want to display errors in HTML output depending on the application running environment (e.g. always display unless we’re running in production environment).

The application running environment is available in myapp_environment.

First idea

The first idea that comes to mind is to change the php.ini according to myapp_environment, like so:

{% if myapp_environment == "production" %}
display_errors = Off
{% else %}
display_errors = On
{% endif %}

The problem with this approach if that php-fpm role now needs myapp_environment to be defined, which is quite absurd.

So instead, you could rename the variable environment, and use this in both roles (myapp and php-fpm). This is better, but not much. The problem with this approach is that a plain environment variable is not linked (by it’s name) to any role, and this can lead to great confusion is it is used and set in many roles or in different places in the inventory.

Another try

So the best way is to have two variables, php_fpm_environment and myapp_environment which makes is meaningful. But now how can I sync them together ?

One ways is to match them in your inventory, like so :

# Somewhere in inventory
myapp_environment: "production"
php_fpm_environment: "{{ myapp_environment }}"

However, this has some drawbacks. For instance, we are still talking about php_fpm_environment and, while not a big deal, it has no php-fpm meaning per se and it is not obvious what this variable does.

Also, in the php.ini template, we will still have to test against the string “production” to set display_errors. Testing against a string set somewhere else is quite dangerous. What is the production name for the app is “live” instead ? Our php-fpm role is broken now.

Some progress

We could go a better way: let’s call the variable php_fpm_display_error (more meaningful) and make it a boolean. We now can do this :

{% if php_fpm_display_errors %}
display_errors = On
{% else %}
display_errors = Off
{% endif %}

and somewhere in inventory:

myapp_environment: "production"
php_fpm_display_errors: "{{ myapp_environment == 'production' }}"

Streamlining our solution

Well, this is better now. But it is not perfect. The inventory is more verbose than required and handles something that it shouldn’t have to take care of. It is also quite easy to forget to add it to the inventory and end up with errors showing in production.

By moving this logic away from the inventory, and directly in the role dependencies, this configuration setting becomes completely transparent, and we get rid of redundancy. We just have to add the following lines in myapp/meta/main.yml:

dependencies:
  - role: role-php-fpm
    php_fpm_display_errors: "{{ myapp_environment == 'production' }}"

Now, php-fpm role is completely decoupled from myapp role, and the production setting is completely transparent to the role user. Setting myapp_environment is enough to have the depending role set variables accordingly. You don’t even have to be aware of the myapp role dependency. If you swap, let say nginx/php-fpm for apache/php, you just have to change the role dependency and have no impact on your inventory. If you want to name your production environment “live”, you can do so by changing meta/main.yml and not touching anything else.

Keeping role decoupled is the best way to have manageable and reusable roles. Try to make them self sufficient, and avoid cross variables or even worse, group names in roles !

Decoupling your Ansible roles was originally published by Michel Blanc at Random stuff on June 27, 2015.

Transparent encryption with ansible vault revisited

2015-05-25T00:00:00-00:00

Doing it the wrong way

Last attempt to make ansible vault encryption/decryption transparent wasn’t quite right. Decrypting files after commit wasn’t a good idea as Raphael Campardou noticed.

In search for a better idea, I eventually realized that hooks where not the right place to do it: yes, you can guard from commiting files that should be encrypted, but hacking around hooks to build a crypt/decrypt pipeline is doomed to failure.

Doing it better

While looking for alternate ways, I remembered I hacked around with git filters back in the days to see clear-text diffs for OpenOffice files.

Git let’s you apply smudge, clean and textconv filters to files which are applied this way:

filter/smudge: after checkout, reads blob from STDIN and outputs the workfile from STDOUT
filter/clean: converts the worktree file to blob upon check in
diff/textconv: applied before diffing files

So, for our needs, smudge and textconv are good places to decrypt, while clean is the place to encrypt.

Implementation

The implementation requires to write the 3 filters (smudge, clean, textconv) and configure your git repos to use the filters.

Those filters should be executable.

As we did in last post, we will use a .vault_password file in the project root directory containing the vault key (don’t forget to add it to your .gitignore file !). The filters fail if the file is not present.

Smudge

The problem that came up to write the smudge & clean filters is that the blob content is fed on STDIN, and ansible-vault can only encrypt/decrypt files in-place.

So we have to write the blob in a temporary file. While this is not really a problem for the smudge filter, it is for the clean filter since the temporary file contains the clear-text version of the file. The temp file is created with restricted permissions, but you’ve been warned.

Smudge’s filter job is simple:

write STDIN content to temp file
decrypt the temp file and swallow the output in a variable (using ansible-vault view after setting the PAGER to cat)
if the file was a vault encrypted file, display the variable, else, bail out.

#!/bin/sh

if [ ! -r '.vault_password' ]; then
  exit 1
fi

tmp=`mktemp`
cat > $tmp

export PAGER='cat'
CONTENT=`ansible-vault view "$tmp" --vault-password-file=.vault_password 2> /dev/null`

if echo $CONTENT | grep 'ERROR: data is not encrypted' > /dev/null; then
  echo "Looks like one file was commited clear text"
  echo "Please fix this before continuing !"
  exit 1
else
  echo $CONTENT
fi

rm $tmp

As you guessed, ansible-vault does not output errors on STDERR but on STDOUT.

Clean

The clean filter works almost the same way:

write STDIN to a temp file
encrypt the temp file in place
write the temp file to STDOUT

#!/bin/sh

if [ ! -r '.vault_password' ]; then
  exit 1
fi

tmp=`mktemp`
cat > $tmp

ansible-vault encrypt $tmp --vault-password-file=.vault_password > /dev/null 2>&1

cat "$tmp"
rm $tmp

This one was quite easy. We could also use modelines, by encrypting only if “vault: true” is present in the 4 first lines. This way, we could apply the filters to all the files. However I ditched the idea for performance reasons (see below).

Diff filter

The filter works like the smudge filter except that it uses the file name passed as a parameter.

#!/bin/sh

if [ ! -r '.vault_password' ]; then
  exit 1
fi

export PAGER='cat'
CONTENT=`ansible-vault view "$1" --vault-password-file=.vault_password 2> /dev/null`

if echo "$CONTENT" | grep 'ERROR: data is not encrypted' > /dev/null; then
  cat "$1"
else
  echo "$CONTENT"
fi

Git configuration

Attributes

Now that the various filters are out and chmoded +x, we need to set-up out git repos to use them.

For this, we need to tell git on which files we want to apply the filters, using a .gitattributes file in our project top directory.

The following .gitattributes file

*_vault* filter=vault diff=vault

will run filters on repository blobs/files that match *_vault*.

I initially intended to run the filters on all files, using modelines. However, performance was really bad, so I finally ended up removing a full wildcard (*) and restrict filter selection to specific files. You can repeat the lines ad nauseam if you want to catch multiple fileglobs.

Gitconfig

I put my filters in ~/.bin/, but the location doesn’t matter. You can event add them to the project and commit them, so everyone has them.

The following section needs to be added to the project’s .git/config file:

[filter "vault"]
  smudge = ~/.bin/smudge_vault
  clean  = ~/.bin/clean_vault

[diff "vault"]
  textconv = ~/.bin/diff_vault

Test

Adding a file that matches a glob in .gitattributes should now trigger transparent encryption.

Here is a sample transcript.

Big fat warning

The git cat-file part is not here for decoration. At least the first time, ensure that encryption works.

The filters

Transparent encryption with ansible vault revisited was originally published by Michel Blanc at Random stuff on May 26, 2015.

Transparent encryption/decryption with ansible vault

2015-05-25T00:00:00-00:00

Big Fat Warning

THIS FILE IS LEFT HERE FOR REFERENCE

However, the method described here is WRONG. Check out next post instead !

Pain points

ansible-vault is handy. You can crypt your stuff before commiting it so your private stuff (AWS/DigitalOcean/… keys, passwords, …) don’t end up world-readable on GitHub.

However, it is too easy to decrypt your stuff, forget about it, and commit it without encrypting it back. It is also quite tedious to ansible-vault encrypt/decrypt all day long.

Solution

Raphael Campardou proposed a nice solution to prevent commiting ansible vault files.

In his solution, you have to name your files *_vault.yml so they get busted by a pre-commit hook if they are not currently encrypted.

This is nice: by naming your files appropriately, you can not commit them unless they are ansible-vault crypted beforehand.

I extended his idea so it can apply to any file in an Ansible repository, with very little configuration, and added a post-commit hook so files gets transparently decrypted after being commited.

Transparent encryption/decryption

The goal is simple: automagically encrypt the proper files before commit, commit them, then decrypt them afterwards so we can hack again without any manual intervention. All this with minimal configuration.

Marking file for encryption

The center trick is to find a way to mark a file for encryption. Modelines (a.k.a. emacs local variable lines) to the rescue.

To tell git hooks that a file requires encryption, we’ll add this line to the top of the file (or on line 2 if the file already has a shebang line) :

# -*- vault: true; -*-

Any file having vault: true in a modeline is set to require encryption before commit.

The icing on the cake is that you can use this modeline to set the filetype too[1], and help your editor to find out the proper file content, which is quite handy with some files not ending in yml:

# -*- mode: yaml; vault: true; -*-

This is supported out of the box by vim and Emacs. If you use SublimeText, you can use the STEmacsModelines package.

Using the hooks

The pre-commit hook will encrypt files marked with vault: true. If a .vault_password_hooks file is present in the project root directory, it will be used as the password.

If this file doesn’t exist, you’ll be promted for an encryption password and this password will be saved in .vault_password_hooks, in your project’s root.

If .vault_password_hooks is listed in .gitignore, this file will persist and you won’t be asked for a password anymore for encryption as well as for decryption. Otherwise, .vault_password_hooks will be erased after encryption to avoid commiting the file.

After commiting the files, the post-commit hook will use the same password to decrypt the previously encrypted files.

TL;DR: add .vault_password_hooks to your .gitignore, add # -*- vault: true; -*- to files that requires encryption and you’re set.

You end up with a workflow where your files are transparently encrypted before commit and decrypted after.

Hooks

Put the hooks in .git/hooks/ and don’t forget to chmod +x {pre,post}-commit them.

Transparent encryption/decryption with ansible vault was originally published by Michel Blanc at Random stuff on May 25, 2015.

Making dynamic inventory usable with Ansible and Digital Ocean

2015-05-11T00:00:00-00:00

The problem

You’ve been there too. Spinning up droplets on DigitalOcean with Ansible and using a dynamic inventory script is quite a pain.

Most approaches use the digital_ocean ansible module in playbooks to spin up droplets, along with the digital_ocean.py dynamic inventory script, using this kind of workflow:

define your droplets in a YAML file (eventually with size, region, etc…)
create a playbook that will loop over droplet list (with_items or equivalent) and spin up the droplet
dynamically add started droplets to inventory

This approach has many drawbacks, and, to be honest, is not really usable.

Slooooooow

First, it is damn slow. Droplet creation is serialized. Since digital_ocean waits for the droplet to come up, and since DO itself advertizes ‘Start your droplet in 55 seconds !’, you can do the math. Starting a single droplet is quite long, so spinning up your multi-tier, fault-tolerant, distributed architecture will take ages.

You probably can use async + poll to spin up the droplets. I didn’t try and don’t know where this would lead. But you’d still face the other issues.

Naming

You droplets won’t have real names. They will be known by their IPs. Sure, if you use the name parameter during creation, you might be able to use it, but at best, this will be a group name.

You could also use add_host in your bootstrapping script, but this is a run time hack, so forget about setting variables in host_vars.

Since droplets are mostly nameless, grouping them is hard. Sure, you can do it at run time with add_host too, but you won’t leverage group_vars usage.

Anyway, all those run-time naming hacks will force you to loop over all your droplets definitions, hit DO API to make sure they’re alive, then loop over API responses to add hosts and groups EVERY time you execute a playbook.

localhost is forced in

Spinning up instances on DO will require to run the digital_ocean module as a local_action or using delegate_to: localhost. This means that you are bound to declare localhost in your inventory. This is a real pain, since it makes the all group mostly unusable, unless you change all your playbook hosts definitions from hosts: all to hosts: all:!localhost. Pretty bad for readability.

Let’s stop here, there are already enough reasons to find an alternate way. There are probably other cons, and certainly pros too for the dynamic approach, but I fell that this way of doing it is barely usable for serious, repeatable stuff.

Alternate aproach

In the end, we would like to work as we do with on-prem hardware: have a static inventory.

The idea is to create this static inventory first, and then use a bootstrapping script that will use this inventory as a contract to apply on DigitalOcean.

The script will list all hosts in your inventory (using ansible --list-hosts), and parallelize droplet creation on digital ocean.

When all droplets are created, it will create a complementary inventory file in your inventory directory containing hosts with their respective IPs.

At this point, you have a perfectly static inventory, and can run your ansible playbook normally, without hitting external APIs (serialized), without naming problems, … Things are just normal, fast and reliable, without edge cases introduced by dynamic inventories.

Using this approach on a 8 droplets setup, the time to set-up instances went from 9’33” down to 1’56”. And the time to destroy instances went from 0’55” down to 0’3” (see demo below). Of course, more droplets, more gain.

And these are just create/destroy gains. You also benefit from static inventory for all your lifecycle playbook runs, since you never hit DO API and don’t have to build inventory at run time, which is always slower despite the inventory cache.

Example

Assuming you have an inventory directory in inventories/devel/, containing a hosts file, you can spin up your droplets like this:

do_boot.sh inventories/devel/

When you’re finished with your infrastructure, call the same command with the deleted parameter:

do_boot.sh inventories/devel/ deleted

That’s all.

The script has defaults regarding droplet size, region, image and ssh key. You can change the defaults in the script to something that suits you, and override these defaults per droplet in your inventory:

[www]
www1
www2
www3 do_region=2

[database]
db1 do_size=62 do_image=12345

[redis]
redis1

[elastic]
elastic1 do_size=60

###Spinning up and down 8 droplets in 2’15”

Script

You can grab the script in this gist.

UPDATE: if you run Ansible v2.0+, use this script instead. It will use the new digital Ocean API (v2.0 too). You just need to set DO_API_TOKEN.

#!/bin/bash
#
# Change defaults below
# ---------------------
#
# Digital Ocean default values
# You can override them using do_something in your inventory file
# Example:
# 
# [www]
# www1 do_size=62 do_image=12345
# ...
#
# If you don't override in your inventory, the defaults below will apply
DEFAULT_SIZE=66       # 512mb (override with do_size)
DEFAULT_REGION=5      # ams2 (override with do_region)
DEFAULT_IMAGE=9801950 # Ubuntu 14.04 x64 (override with do_image)
DEFAULT_KEY=785648    # SSH key, change this ! (override with do_key)

# localhost entry for temporary inventory
# This is a temp inventory generated to start the DO droplets
# You might want to change ansible_python_interpreter
LOCALHOST_ENTRY="localhost ansible_python_interpreter=/usr/bin/python2" 

# Set state to present by default
STATE=${2:-"present"}

# digital_ocean module command to use
# name, size, region, image and key will be filled automatically
COMMAND="state=$STATE command=droplet private_networking=yes unique_name=yes"
# ---------------------

function bail_out {
  echo $1
  echo -e "Usage: $0 <inventory_directory> [present|deleted]\n"
  echo -e "\tinventory_directory: the directory containing the inventory goal (compulsory)"
  echo -e "\tpresent: the droplet will be created if it doesn't exist (default)"
  echo -e "\tdeleted: the droplet will be destroyed if it exists"
  exit 1
}

# Check that inventory is a directory
# We need this since we generate a complementary inventory with IP addresses for hosts
INVENTORY=$1
[[ ! -d "$INVENTORY" ]]  && bail_out "Inventory does not exist, is not a
directory, or is not set"
[[ ! -e $DO_CLIENT_ID ]] || bail_out "DO_CLIENT_ID not set"
[[ ! -e $DO_API_KEY ]]   || bail_out "DO_API_KEY not set"

# Get a list of hosts from inventory dir
HOSTS=$(ansible -i $1 --list-hosts all | awk '{ print $1 }' | tr '\n' ' ')

# Clean up previously generated inventory
rm $INVENTORY/generated

# Creating temporary inventory with only localhost in it
TEMP_INVENTORY=$(mktemp)
echo Creating temporary inventory in $TEMP_INVENTORY
echo $LOCALHOST > $TEMP_INVENTORY

# Create droplets in //
for i in $HOSTS; do 
  SIZE=$(grep $i $1/hosts | grep do_size | sed -e 's/.*do_size=\(\d*\)/\1/')
  REGION=$(grep $i $1/hosts | grep do_region | sed -e 's/.*do_region=\(\d*\)/\1/')
  IMAGE=$(grep $i $1/hosts | grep do_image | sed -e 's/.*do_image=\(\d*\)/\1/')
  KEY=$(grep $i $1/hosts | grep do_key | sed -e 's/.*do_key=\(\d*\)/\1/')

  SIZE=${SIZE:-$DEFAULT_SIZE}
  REGION=${REGION:-$DEFAULT_REGION}
  IMAGE=${IMAGE:-$DEFAULT_IMAGE}
  KEY=${KEY:-$DEFAULT_KEY}

  if [ "${STATE}" == "present" ]; then
    echo "Creating $i of size $SIZE using image $IMAGE in region $REGION with key $KEY"
  else
    echo "Deleting $i"
  fi
  # echo " => $COMMAND name=$i size_id=$SIZE image_id=$IMAGE region_id=$REGION ssh_key_ids=$KEY"
  ansible localhost -c local -i $TEMP_INVENTORY -m digital_ocean \
    -a "$COMMAND name=$i size_id=$SIZE image_id=$IMAGE region_id=$REGION ssh_key_ids=$KEY" &
done

wait

# Now do it again to fill up complementary inventory
if [ "${STATE}" == "present" ]; then
  for i in $HOSTS; do 
    echo Checking droplet $i
    IP=$(ansible localhost -c local -i $TEMP_INVENTORY -m digital_ocean -a "state=present command=droplet unique_name=yes name=$i" | grep "\"ip_address" | awk '{ print $2 }' | cut -f2 -d'"')
    echo "$i ansible_ssh_host=$IP" >> $INVENTORY/generated
  done
fi

echo "All done !"

Making dynamic inventory usable with Ansible and Digital Ocean was originally published by Michel Blanc at Random stuff on May 03, 2015.

Testing Ansible roles, part 2

2015-03-15T00:00:00+00:00

Now that we have created our basic role in part 1, we need to set-up a Vagrant machine and some tooling to run our tests.

Creating the Vagrant machine

Vagrantfile

To spin up a Vagrant machine, we need to create a Vagrantfile. We’ll create it in our role top directory:

Vagrant.configure(2) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.define "nginx" do |nginx|
  end
  config.vm.provision "shell",
    :path => "vagrant_specs.sh",
    :upload_path => "/home/vagrant/specs",
    # change role name below
    :args => "--install ansible-nginx"
end

You can change config.vm.box to another Vagrant box that better suits your needs, but keep in mind RoleSpec is very Debian/Ubuntu inclined. We’ll provision this machine with a shell script (not with Ansible, so we don’t end up in an inception style situation).

Provisionning script

The provisionning script, vagrant_specs.sh serves two purposes:

it takes care of installing RoleSpec and setting up the test directory when called with --install. This happens only at vagrant provisionning time (e.g. vagrant up of vagrant provision)
it can be called to run the test suite; to make invocation easier, it will copy itself to /usr/local/bin/specs

Create the vagrant_specs.sh with the following content:

#!/bin/bash
#
# Vagrant provisionning script
#
# Usage for provisionning VM & running (in Vagrant file):
# 
# script.sh --install <role>
#
# e.g. : 
# script.sh --install ansible-nginx
# 
# Usage for running only (from host):
#
# vagrant ssh -c specs
#
if [ "x$1" == "x--install" ]; then
  mv ~vagrant/specs /usr/local/bin/specs
  chmod 755 /usr/local/bin/specs
  sudo apt-get install -qqy git
  su vagrant -c 'git clone --depth 1 https://github.com/nickjj/rolespec'
  cd ~vagrant/rolespec && make install
  su vagrant -c 'rolespec -i ~/testdir'
  su vagrant -c "ln -s /vagrant/ ~/testdir/roles/$2"
  su vagrant -c "ln -s /vagrant/tests/$2/ ~/testdir/tests/"
  exit
fi

cd ~vagrant/testdir && rolespec -r $(ls roles) "$*"

and make it executable (chmod +x vagrant_specs.sh).

Running the Vagrat box

Now, let’s check this ! It might take a while if you don’t already have the vagrant image on your box:

$ vagrant up
Bringing machine 'nginx' up with 'virtualbox' provider...
==> nginx: Importing base box 'ubuntu/trusty64'...
==> nginx: Matching MAC address for NAT networking...
==> nginx: Checking if box 'ubuntu/trusty64' is up to date...
==> nginx: Setting the name of the VM: ansible-nginx_nginx_1426331325901_88232
...
==> nginx: Cloning into 'rolespec'...
==> nginx: Installing RoleSpec scripts in /usr/local/bin ...
==> nginx: Installing RoleSpec libs in /usr/local/lib/rolespec ...
==> nginx: Initialized new RoleSpec directory in /home/vagrant/testdir
$

Creating tests

We’re almost done. Only two files left to create. First, we RoleSpec needs an inventory. Nothing fancy here, we just need to create an inventory file with a single host, placeholder_fqdn, RoleSpec will take care of the rest:

$ echo "placeholder_fqdn" > tests/ansible-nginx/inventory/hosts

Writing the test file

And finally, we need a test file, where we can check if our playbook works. We can check the syntax, the idempotency, the resulting templates, etc…

This test file is simply a bash script, in which we include some RoleSpec files to get access to its DSL.

Let’s start with a simple one, and create tests/ansible-nginx/test with the following content:

#!/bin/bash
# -*- bash -*-

# This gives you access to the custom DSL
. "${ROLESPEC_LIB}/main"

# Install a specific version of Ansible
install_ansible "v1.8.3"

# Check syntax first, and then that the playbook runs
assert_playbook_runs

# Check that the playbook is idempotent
assert_playbook_idempotent

Don’t forget to make the test file executable (chmod +x tests/ansible-nginx/test).

Runing tests

Our simple tests are setup. To run them, we need to execute /usr/local/bin/specs in the Vagrant host.

vagrant ssh -c 'specs'

RoleSpecs will then download Ansible (version 1.8.3 since this is what we asked for), install it, and run our test case.

As you can see in the recording, RoleSpec:

installs Ansible (ROLESPEC: [Install Ansible - v1.8.3])
executes the playbook with assert_playbook_runs (TEST: [Run playbook syntax check] and TEST: [Run playbook])
check that the playbook is idempotent with assert_playbook_idempotent (TEST: [Re-run playbook])

Pretty neat !

Runing tests faster

There is one downside though: it takes almost 3 minutes to run. However, you can speed up subsequent runs as long as you don’t have to change the Ansible version: since Ansible is already installed, there is no need to install it again every time. Using the -p option will run in playbook mode, which means it will only run assert_playbook_runs test.

vagrant ssh -c 'specs'

25 seconds only, we cut the runtime by six, not bad.

Local continuous integration

Now that we have reasonable playbook test run time, we can add local continuous integration to our setup. We will use Guard for this.

Assuming you have a ruby environment setup, just install guard and guard-shell gems.

gem install guard guard-shell --no-ri --no-rdoc

Then create a Guardfile in the roles top directory, with the following content:

# -- -*- mode: ruby; -*-
guard :shell do
  watch(%r{^(?!tests).*/.*\.yml$}) do |m|
    puts "#{m[0]} changed - running tests"
    system('vagrant ssh -c "specs -p"')
  end
end

This file will ask guard to execute vagrant ssh -c "specs -p" everytime it detects a change in a file ending with .yml in the project’s subdirectories. Note that we excluded the tests directory since it contains somewhere a test.yml playbook file generated by RoleSpec at run time. If we don’t exclude it from the guard watch, the test will loop forever.

Now run guard, change a file (.e.g. touch tasks/main.yml), and see what happens.

In the next part, we will add some more tests, and see what we can do with RoleSpec.

Testing Ansible roles, part 2 was originally published by Michel Blanc at Random stuff on March 15, 2015.

Testing Ansible roles, part 1

2015-03-14T00:00:00+00:00

RoleSpec does a great job helping out testing your roles. It is maintained and used primarily to test the DebOps role suite by the fine folks hanging out in #debops IRC channel. RoleSpec handles all the boiler plate to run tests (installing the right version of Ansible, adjusting paths, taking care of the inventory, wrapping your role in a playbook, …) and privides a simple DSL to write tests.

However, in its current state, RoleSpec is mostly intended to run a test suite on travis. And this test suite is separated from your role.

I personally prefer to have my role tests along the Ansible role, in a tests directory.

We see below how we can achieve this with RoleSpec, and will leverage Vagrant for this. We’ll also use Guard to continuously test our role while writing it.

A simple role

Let’s start by creating a simple nginx role:

mkdir -p ansible-nginx/{defaults,handlers,tasks,templates,tests/ansible-nginx/inventory}

The tests directory will be used for our tests later.

If you already have a role want to convert it, create the tests/ansible- nginx/ directory and skip straight to part 2.

Defaults

In default/main.yml, we’ll declare a few default values for our role. We won’t do much, in our role, just install nginx and set a few variables, so let’s keep this simple:

nginx_root: /var/lib/nginx/
nginx_worker_connections: 1024
nginx_ie8_support: yes
nginx_port: 80

Handlers

For the handlers part, handlers/main.yml will contain a basic restart handler, followed by a port check for good measure:

- name: Restart nginx
  action: service name=nginx state=restarted
  notify: Check nginx

- name: Check nginx
  wait_for: port={{ nginx_port }} delay=5 timeout=10

Tasks

Now the task part. I always put my tasks in a separate file, and include this file from main.yml. This trick will allow you to set a tag for the whole included file, like so:

- include: nginx.yml tags=nginx

And then, in nginx.yml, put the real tasks:

- name: Adds nginx ppa
  apt_repository:
    repo=ppa:nginx/stable

- name: Adds PPA key
  apt_key: 
    url=http://keyserver.ubuntu.com:11371/pks/lookup?op=get&search=0x00A6F0A3C300EE8C
    state=present

- name: Installs nginx
  apt:
    pkg=nginx-full
    state=latest

- name: Writes nginx.conf
  template: 
    src="../templates/nginx.conf.j2"
    dest=/etc/nginx/nginx.conf
    validate='nginx -tc %s'
  notify:
  - Restart nginx

- name: Replaces nginx default server
  template:
    src="../templates/default.j2"
    dest=/etc/nginx/sites-available/default
  notify:
    - Restart nginx

Templates

We just need to add 2 templates, and our role will be ready. The first one is the main nginx.conf.j2 file:

user www-data;
worker_processes {{ ansible_processor_count }};

pid         /var/run/nginx.pid;

events {
    worker_connections {{ nginx_worker_connections }};
    # multi_accept on;
}

http {
    ##
    # Basic Settings
    ##
    sendfile    on;
    tcp_nopush  on;
    tcp_nodelay on;

    # SSL stuff
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

{% if nginx_ie8_support %}
    ssl_ciphers "ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";
{% else %}
    ssl_ciphers "EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH+aRSA+RC4:EECDH:EDH+aRSA:!RC4:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXPORT:!PSK:!SRP:!DSS";
{% endif %}

    ssl_session_cache shared:SSL:32m;
    ssl_buffer_size 8k;
    ssl_session_timeout 10m;

    keepalive_timeout     65;
    types_hash_max_size 2048;

    server_tokens off;

    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    ##
    # Logging Settings
    ##
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##
    gzip on;
    gzip_disable "msie6";
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_types  application/javascript 
                application/json 
                application/x-javascript 
                application/xml 
                application/xml+rss 
                image/svg+xml
                text/css 
                text/plain
                text/xml 
                text/javascript;

    ##
    # If HTTPS, then set a variable so it can be passed along.
    ##
    map $scheme $server_https {
        default off;
        https on;
    }

    ##
    # Virtual Host Configs
    ##
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

The file is a bit long, but it just contains basic settings. Note that we’re aligning the number of worker processes to the number of processors reported by Ansible for the host.

We are also switching cipher suites depending on whether we want to support IE8 or not.

Then, we just add a default virtualhost on our server:

server {
  listen {{ nginx_port }};

  root {{ nginx_root }};
  index index.html index.htm;

  # Make site accessible from http://localhost/
  server_name _;

  location / {
    try_files $uri $uri/ /index.php?q=$uri&$args;
  }

  error_page 404 /404.html;

  # redirect server error pages to the static page /50x.html
  #
  error_page 500 502 503 504 /50x.html;
  location = /50x.html {
    root /usr/share/nginx/html/;
  }

  # deny access to .htaccess files, if Apache's document root
  # concurs with nginx's one
  #
  location ~ /\.ht {
    deny all;
  }
}

Our role is now ready. We can now setup the tooling for our tests as explained in part 2

Testing Ansible roles, part 1 was originally published by Michel Blanc at Random stuff on March 14, 2015.

Invalidating REDIS cache from Ruby

2015-03-01T00:00:00-00:00

REDIS as Padrino cache

Using REDIS as and application cache is very handy. You can easily use it in, say, Padrino like this:

module MyApp
  class App < Padrino::Application
    enable :caching
    set :cache, Padrino::Cache.new( :Redis, 
                                    :host => ENV['REDIS_SERVER'],
                                    :port => ENV['REDIS_PORT'],
                                    :db => 0)
  end
end

MyApp::App.controllers :test do
  get :index, :cache => true do
    cache_key { current_account.email + ":test:index" }
    expires 3600
    @title = "Some page with expensively computed values"
    render 'test/index'
  end
end

Note that we have a specific cache entry for each user (current_account.email). For instance, a user with email foo@bar.com will have this entry cached at foo@bar.com:test:index

Cache invalidation

Now, sometimes you need to expire the cache forcibly. For instance, let’s say you know you’ve changed something in the database and that you don’t want stale data to be served, you can invalidate the cache manually. Or may be you want to invalidate a complete user cache at login time.

However, this is not easy in our case, since we want to remove all entries matching *:test:index (or foo@bar.com:* if we want to completely wipe out the user cache).

The first idea that comes to mind is to use the Redis KEYS command that can accept globs to match key names, like KEYS foo@bar.com:*.

But in the documentation¹, you’ll find a big fat warning about KEYS:

Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don’t use KEYS in your regular application code. If you’re looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.

Scary as it sounds.

Cursors to the rescue

REDIS comes with a nice, but not very known feature since v2.8: SCAN. The SCAN command is a cursor based iterator. You give him a key pattern, and every time you call it, it will return the next set of matching keys, and an index for the next call.

Here is a piece of code that can invalidate key wildcards from padrino :

MyApp::App.controllers :test do
  define_method :invalidate_cache_like do |wildcard|
    r = Redis.new(:host => ENV['REDIS_SERVER'], :port => ENV['REDIS_PORT'])
    cursor = nil

    while cursor != "0" do
      cursor, keys = r.scan(cursor||e, { match: wildcard})

      keys.each do |k|
        r.del(k)
      end
    end
  end
end

You can not easily invalidate a cache wildcard calling invalidate_cache_like.

For instance, at user login, you could call :

`invalidate_cache_like "#{current_account.email}:test:index"

and the user cache is now cleared.

Benchmarking

Let’s play with Benchmark a bit to compare SCAN and KEYS performance on a moderately sized database. While we’re at it, we’ll also check these commands using redis and hiredis drivers, to see if it makes any difference.

I used the following piece of code for that:

#!/bin/env ruby

require 'hiredis'
require 'em-synchrony'
require 'redis'
require 'benchmark'

def build_cache(redis)
  ('aaaa'..'zzzz').each do |s|
    redis.set(s, 1)
  end
end

def invalidate_cache_cursor(redis, wildcard)
  cursor = nil

  while cursor != "0" do
    cursor, keys = redis.scan(cursor||0, { match: wildcard})

    keys.each do |k|
      redis.del(k)
    end
  end
end

def invalidate_cache_keys(redis, wildcard)
  redis.keys(wildcard).each do |k|
    redis.del(k)
  end
end

hiredis = Redis.new(:driver => :hiredis)
redis = Redis.new()

Benchmark.bm(22) do |x|
  [:ruby, :hiredis].each do |d|
    r = Redis.new(:driver => d)
    build_cache(r)
    x.report("looping (#{d}):") {
      ('aaa'..'aaz').each do |l|
        invalidate_cache_keys(r, "#{l}*")
      end
    }
    build_cache(r)
    x.report("scanning (#{d}):") {
      ('aaa'..'aaz').each do |l|
        invalidate_cache_cursor(r, "#{l}*")
      end
    }
  end
end

After a few minutes running, I got those surprising results:

$ ./redis-expire-wildcard.rb 
                            user     system      total        real
looping (ruby):          0.040000   0.010000   0.050000 (  1.059056)
scanning (ruby):        49.000000  11.490000  60.490000 ( 61.113561)
looping (hiredis):       0.020000   0.010000   0.030000 (  1.073681)
scanning (hiredis):     19.680000  12.880000  32.560000 ( 44.972220)

First, there is no much improvements using hiredis over redis when looping in our case. This sounds legit, since we loop only 26 times here and the hiredis performance benefit doesn’t rise with so few commands (hiredis does a much more better job if you change the tested range so more commands are issued).

Second, using SCAN here is much slower than using KEYS !

So why use SCAN instead of KEYS ? The problem with KEYS is that it will block your server while retrieving all the keys. The cursor based approach will return small chunks of keys and won’t block the server for the time of a whole key scan.

However, handling cursor based expiration can be tricky in a web application. Since it takes so much longer (but is friendlier to Redis), you might have to handle it in a separate task from your application process (in Sidekiq for instance).

It all depends on your app. You can start using simply KEYS, but will have to keep in mind that cursors will be needed if usage or concurrent trafic rises and monitor your Redis statistics for this.

http://www.padrinorb.com/ ↩

Invalidating REDIS cache from Ruby was originally published by Michel Blanc at Random stuff on March 01, 2015.

Restoring Arch bootloader for the future self

2015-02-14T00:00:00+00:00

Grab latest Arch, create a bootable key (do this before you’re doomed)¹.
Press F2 at boot, change the boot order to start on the key in UEFI mode
Boot on Arch, then

Cross fingers…

In case you need to reinstall all the things²:

sudo pacman -Sy `yaourt -Q | grep -v '^aur'| grep -v '^local' | cut -f2
-d'/' | awk '{ print $1 }'`

Multisystem is pretty handy for this. You can put several OSes on the key, and choose what to boot. http://liveusb.info/dotclear/ ↩
if you don’t use yaourt, remove the grep part ↩

Restoring Arch bootloader for the future self was originally published by Michel Blanc at Random stuff on February 14, 2015.

Random stuff

Laying out roles, inventories and playbooks

Some terminology

Roles

Inventories

Playbooks

Role layout

defaults/main.yml

files/

handlers/main.yml

meta/main.yml

tasks/main.yml

A word of caution

tasks/check_vars.yml

templates/*

vars/main.yml

Inventories and playbook layout

ansible.cfg, roles/ and roles/requirements.yml

inventories/

site.yml and playbooks/

requirements.txt

Layout Antipatterns

Trying to encompass all your infrastructure in one playbook

Per-host fine grained variables

Single playbook

No roles (tasks only)

Decoupling your Ansible roles

The problem

First idea

Another try

Some progress

Streamlining our solution

Transparent encryption with ansible vault revisited

Doing it the wrong way

Doing it better

Implementation

Smudge

Clean

Diff filter

Git configuration

Attributes

Gitconfig

Test

Big fat warning

The filters

Transparent encryption/decryption with ansible vault

Big Fat Warning

Pain points

Solution

Transparent encryption/decryption

Marking file for encryption

Using the hooks

Hooks

Making dynamic inventory usable with Ansible and Digital Ocean

The problem

Slooooooow

Naming

localhost is forced in

Alternate aproach

Example

Script

Testing Ansible roles, part 2

Creating the Vagrant machine

Vagrantfile

Provisionning script

Running the Vagrat box

Creating tests

Writing the test file

Runing tests

Runing tests faster

Local continuous integration

Testing Ansible roles, part 1

A simple role

Defaults

Handlers

Tasks

Templates

Invalidating REDIS cache from Ruby

REDIS as Padrino cache

Cache invalidation

`defaults/main.yml`

`files/`

`handlers/main.yml`

`meta/main.yml`

`tasks/main.yml`

`tasks/check_vars.yml`

`templates/*`

`vars/main.yml`

`ansible.cfg`, `roles/` and `roles/requirements.yml`

`inventories/`

`site.yml` and `playbooks/`

`requirements.txt`