Right away, I will admit: CircleCI is not my favorite CI/CD tool. There has been an explosion of new-generation tooling that isn't all hype in this product space which has presented improved ergonomics, functionality, and pricing for developers compared to CircleCI.
However, sometimes, the correct choice isn't what we wan
t but what we have.
For my current engagement, the client had dozens of repositories successfully validating pull requests and deploying changes into their respective environments using CircleCI.
I was tasked with setting up a new repository and change management processes to support an enterprise-ready data engineering and machine learning platform. Given the client's longstanding familiarity with CircleCI as a platform and the task at hand, a monorepo seemed a suitable choice for encouraging code sharing and enforcing a consistent set of practices across business units.
And so, dear reader, I have identified and navigated all the foot-guns and false-starts so that you may learn from my begrudging, grumbling hours spent accomplishing this task using my not-favorite CI/CD tool.
The project structure
To begin, let's imagine a repository with the following structure:
$ tree -a myproject
myproject
├── .python-version
├── __init__.py
├── common
│ ├── common
│ │ └── __init__.py
│ ├── poetry.lock
│ └── pyproject.toml
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── subproject_one
│ ├── Dockerfile
│ ├── poetry.lock
│ ├── pyproject.toml
│ └── subproject_one
│ └── __init__.py
└── subproject_two
├── Dockerfile
├── poetry.lock
├── pyproject.toml
└── subproject_two
└── __init__.py
Two projects (subproject_one
, subproject_two
) are independently deployable services, both of which consume a common
package of library-level code.
I wanted to orchestrate a CI/CD pipeline such that merging changes to our main
branch would automatically deploy to staging and that deploying changes to a prod
branch would automatically deploy to production.
Further, I had build and validation steps that were common to all three directories and build, validation, and deployment steps that were unique to each directory.
Nothing exotic here - for example, I might run a linter across all files and I might build and push an image for subproject_one
to a particular Docker repository that is different from subproject_two
.
Don't fight their APIs
My initial inclination was to create three configuration files. One for tasks that might be common across all projects, for example, running the tests across a project or validating that the code is properly formatted. And two others, each corresponding to our subprojects, where we could place logic specific to those projects.
I recognized that this was not the official recommendation but attempted this (for a time) anyway. Orbs (CircleCI's word for packages) bundle functionality to 1) filter based on paths and 2) invoke a "continuation" of a pipeline in order to run another file. These can be stitched together to create separate files for each project, and I did this for a time. However, in retrospect, I would not recommend it, and I migrated away from this approach. It was finicky, error-prone, and difficult to maintain. You live and learn, right?
CircleCI's dynamic configuration prescribes creating two files: a config.yml
, where we can author jobs common to all projects and invoke our project-based workflows, and a continuation_config.yml
, where we can author our project-based jobs and workflows.
You may be wondering: but won't that become a huge mess of a file? Particularly if many subprojects are present in our monorepo, one file containing many mixed concerns would make most software engineers eager to refactor.
Well, you're right.
It could become a huge mess of a file.
But! I have identified a few techniques we can use to keep it modular, DRY (don't-repeat-yourself), and maintainable.
So where does that leave us?
First, we need a directory for our CircleCI configuration files and some proprietary setup:
├── .circleci
│ ├── config.yml
│ └── continue_config.yml
Your config.yml
file must include a setup: true
block alongside some CircleCI-specific configuration.
From there, we can move on to the aforementioned techniques.
Use the path-filtering orb, Luke
CircleCI's path filtering orb provides functionality for continuing a pipeline based on the paths of changed files.
Using the mapping
parameter, we can pass values to variables in our continuation configuration for usage in workflow when
clauses, providing a mechanism for triggering particular workflows.
In practice, this will look like:
version: 2.1
setup: true
orbs:
path-filtering: circleci/path-filtering@1.0.0
jobs:
validate-source-code:
steps: ...
workflows:
always-run:
jobs:
- validate-source-code
- path-filtering/filter:
name: check-updated-files
mapping: |
common/.* run-common-workflow true
subproject_one/.* run-subproject-one-workflow true
subproject_two/.* run-subproject-two-workflow true
base-revision: main
config-path: .circleci/continue_config.yml
---
parameters:
run-common-workflow:
type: boolean
default: false
run-subproject-one-workflow:
type: boolean
default: false
run-subproject-two-workflow:
type: boolean
default: false
---
workflows:
subproject-one:
when:
or:
- equal: [true, << pipeline.parameters.run-subproject-one-workflow >>]
- equal: [true, << pipeline.parameters.run-common-workflow >>]
jobs: ...
subproject-two:
when:
or:
- equal: [true, << pipeline.parameters.run-subproject-two-workflow >>]
- equal: [true, << pipeline.parameters.run-common-workflow >>]
jobs: ...
Notably, this provided the flexibility to run all workflows when a change occurred in common
and only run a particular workflow when changes occurred in its subdirectory.
Keep things DRY with the tooling available
YAML isn't a programming language, but it is a declarative configuration language with not-often explored advanced features.
Some of my favorite features to use are anchors, aliases, and merge keys.
Combined, they allow us to author re-usable snippets in our CircleCI template (and most yaml
documents in general):
common_settings: &common_settings
executor:
name: python/default
tag: 3.10.8
subproject_one_common_settings: &subproject_one_common_settings
working_directory: ~/myproject/subproject_one
<<: *common_settings
...
jobs:
subproject-one-validate:
<<: *subproject_one_common_settings
steps:
- myproject-checkout
- install-acme-cli
- validate
So, if you have repeated snippets of orchestration (and you likely do, given you're working in a monorepo), creating a common block of configuration, anchoring it, and then using that anchoring via aliases and merge keys allow us to write it once and run it everywhere, DRYing up your configuration file.
Use filters for branch-based logic
I am more familiar with the GitHub Actions style workflow triggers to invoke particular workflows based on branch conditions. CircleCI offers similar functionality via filters. For our example project, I wanted to create three different workflows based on branching.
First, for every pull request and merge, I wanted to run some common tasks (such as validating the change has no syntax errors).
Second, when a change was merged to main
, and had no git tag, I wanted to deploy it to a staging environment.
Third, when a change was merged to prod
and had a tag of the form v$.$.$
(such asv1.0.0
), I wanted to deploy it to the production environment.
In practice, this looks like:
stg-filters: &stg-filters
filters:
branches:
only: main
tags:
ignore: /.*/
prod-filters: &prod-filters
filters:
branches:
only: prod
tags:
only: /^v.*/
...
workflows:
subproject-one:
jobs:
- subproject-one-validate
- subproject-one-deploy-stg:
requires:
- subproject-one-validate
<<: *stg-filters
- subproject-one-deploy-prod:
requires:
- subproject-one-validate
<<: *prod-filters
Combined with the aforementioned anchoring, aliasing, and merge keys, we can compose a common set of branch-based rules to use in our workflows for each subproject included in our monorepo.
Don't be afraid to offload complex logic into scripts
If you're struggling to fit a complicated step into your job or workflow declarations, offload that logic into a script. This can be authored with bash, or even your favorite programming language, for example:
#!/usr/bin/env python
import os
NAME = os.environ["NAME"]
print(f"Hello, {NAME}!")
jobs:
- run:
name: Invoke your complicated logic
command: ~/myproject/.circleci/my_script.sh
environment:
NAME: Bob
For my purposes, this was helpful to orchestrate a sequence of steps that required the usage of an API client given my deployment target did not have a CircleCI orb available. I know I would rather debug a python script than a hobbling of bash in a CI configuration file when it (inevitably) breaks.
RTFM!
This sounds naive, but consulting the official documentation for a CircleCI configuration file proved to be the best source of information while exploring the tools available. Further, it informed me of what options were available to me and provided brief examples for their implementation.
Googling for answers tended to lead to outdated community answers. And using ChatGPT for CircleCI was often flat-out wrong. So, in this instance, doing things the old-fashioned way paid the most dividends.