Test-driven Ops – Part 1 – QA Containers

In this series, I'll talk about how we empower the role of QA and leverage testing practices so that it makes the both lives of dev team easier and software/infrastructure more robust.

Why?

We use gitflow as our branching model and GitHub to store our repositories. Code review is done at pull requests and QA verification was done post-merge. This led to inevitable contamination of bugs at our develop branch, which in turn made releases harder and harder to push out as the debt banked on. Due to sprints having a certain theme, bugs were hard to isolate and sometimes complex irreproducable bugs were spotted at QA environment due to code overlaps at stories. Also, QA Engineers had to wait until the end of a sprint to get all code merged and work in intense bursts before a release to verify all features, which was very inefficient use of an already bottlenecking resource. Once we were aware of this, we knew we had to change something to prevent this. We were already running the full unit test suites during pull requests at this point, but we wanted more. There came the idea of QA Containers.

Empowering QA

To prevent introducing bugs to develop branch, thus keeping it always releasable just like our master, we had to move QA verification step to pre-merge and wanted QA Engineers to be the authority on feature merges. This required deploying the code in pull requests to an individual environment for both automated and manual tests. From risk management perspective, creating AWS EC2 instances everytime we verify a feature may had been problematic, due to idle and unused VM's. Thus we decided to use LXC for creating lightweight VM's in a statically provisioned EC2 instance. These resources were controlled by a python cron job, which released container resources after a pull request was merged.

Original and Altered Flow of Development

A Step by Step Example

To give a clear example what this process alteration gives us, lets go over an example scenario step by step.

Say the ticket name of our example feature is EXMP-113. Its development would be carried on at feature/EXMP-113 branch due to gitflow conventions.

  1. When the feature is deemed sufficiently developed, developer creates a pull request.
  2. A Jenkins job that executes unit tests is triggered
  3. Code review is carried out
  4. "Deploy this please" is typed at pull request comments
  5. A Jenkins job creates an on demand LXC Container & deploys the pull request merge result
  6. When this is successful, Jenkins posts the resulting url back to pull request comments (e.g. http://exmp-113.qa.mycompany.int)
  7. QA Engineer executes whatever tests deemed important
  8. QA Engineer merges the pull request, or gives the ticket back to developer if a bug is spotted

Real Life Example

Results

We greatly improved the production stability with this initial step. Number of hotfixes required were suddenly significantly lower, since it was impossible (unless ninja developers merging their own pull requests) to have an untested feature even on develop branch. This also increased the QA throughput and bugfix speed, since the bugs were spotted in isolated branches, which narrowed the scope a single patch set.

This approach was so effective and natural in a gitflow system that, in almost all the other projects we worked on, some implementation of this was demanded from both QA Engineers and Developers. This first application was only a naive and tailored implementation. It could have been generified with some more unseen benefits which I will discuss in the upcoming post.