Building Technoblather

February 08, 2023

What is this and why am I here?

First and foremost: I'll skip any references to ontology with a subtitle like that.

If you've ever worked with me, you've likely listened to me whinge about our (developers) lack of conscientiousness towards how the software we develop is operated. Are the logs any good? Do we expose any metrics? Is the code performant under load? And so forth. Anecdotally, most companies that employ developers make it someone else's problem to deal with whatever the developers produce. Obviously that means we generally do a terrible job of it, and we ought not as an expression of our professional practice.

People smarter than myself have identified this as a problem (and you can probably tell I agree with them). This identification has given rise to a new approach regarding software development team practices called DevOps. Ignore the fact that colloquially DevOps has become a catch-all aggregate job role for operations, system administration, platform development, cloud development, and whatever you call writing pipelines (yaml configurator?). Ideally, a team practicing this methodology will have developers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function [1].

Panacea! This sounds great and is a lofty goal for any technical team to achieve. In pursuit of developing the capability to walk-the-talk, I recently attained a Solutions Architect – Associate certification with Amazon and have an eye towards attaining the DevOps Engineer - Professional certification this year.

And so, this brings us to our main topic: I wanted a project to both crystallize some of what I learned via studying and test taking for the aforementioned certification, and to engage in open learning via blogging as a public journal of my professional development. And so, with the intention of using AWS services and industry standard tooling, I made a blog.

Ok - so you made a blog

Not just any blog though - a good blog. But what does that mean? Blogs are meant to be read, and as such we'll want to optimize for SEO. From a technical perspective then, we should prioritize solutions that consider:

  • Performance: a blog should be really, really fast to deliver content.
  • Accessibility: a blog should be accessible to all users and should be highly available.
  • Trustable: a blog should encrypt traffic, not spam users, not engage in dark-pattern UI distractions, and so on.
  • Machine-crawlability: a blog should include all the metadata search engines expect and be crawlable to get higher search rankings.

Furthermore, I wanted to utilize "best practices" from a DevOps perspective with my technical decisions. And so, to facilitate this:

  • For the blog engine, I chose Gatsby. Architecting the blog as static content means it's easy to cache and removes the need for any server side infrastructure.
  • For operating the blog, I chose AWS. This is the industry standard for operating services in the cloud.
  • For configuring the infrastructure, I chose Terraform as an infrastructure-as-code solution. Likewise, the industry standard IaC solution.
  • To deploy the blog, I opted to use Github Actions to set up continuous integration and deployment pipelines. Github is being used for version control already so Actions is a natural choice.

High level technoblather

Before delving into the details of the terraform declarations, I'd like to give an overview of the services used and how they relate to one another:

  • IAM for defining groups and policies to operate the solution.
  • Route53 for DNS management.
  • Certificate Manager for provisioning an SSL certificate.
  • Cloudfront for distributing and caching the blog.
  • S3 for storing terraform state and blog content.
  • SNS for publishing events related to operating the blog.
  • Cloudwatch for acting on events (e.g., alerting).

This will look familiar if you've ever hosted a static website via S3. Everything detailed in this blog post could just as well be applied to a single page application, e.g. a React app.

Visualized, this looks like:

technoblather's infrastructure diagram

Getting into the nitty-gritty

Now, let's get into the details. We'll begin with explaining the IAM user administering this account, moving onto the services used from the front to the back of web traffic, and ending with a brief overview of how the pipelines are set up. You can find all code referenced in this github repository.

Creating an IAM user for the project

You should never use your root account for provisioning resources for a project and instead embrace applying least-privilege permissions to an IAM user, group, or role [2]. For this project, I opted to create a separate IAM user with the following Amazon managed policies attached via my AdministratorAccess account.

  • AmazonS3FullAccess
  • CloudWatchFullAccess
  • CloudFrontFullAccess
  • AmazonSNSFullAccess
  • AmazonRoute53FullAccess
  • AWSCertificateManagerFullAccess

This isn't in the terraform declarations because I needed it prior to writing the terraform declarations (i.e., while I was figuring out how to do all this). In other words, a chicken-and-egg problem. In retrospect, I could have used a AdministratorAccess designated account to create this IAM user via terraform and then assume the created IAM user for all subsequent commands.

Authoring the infrastructure

Prepare yourself for lots of HCL.

We'll start with the module entry point (main.tf), the provider declarations (aws.tf), and the terraform variables (variables.tf).

# main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  backend "s3" {
    bucket = "technoblather-terraform"
    key    = "tfstate"
    region = "ca-central-1"
  }

  required_version = ">= 1.2.0"
}

I opted to use ca-central-1 everywhere I could since I live nowhere close to Toronto. You'll see shortly that this led to a few gotchas related to certificate management and Cloudfront.

# aws.tf

provider "aws" {
  region                   = "ca-central-1"
  shared_credentials_files = ["~/.aws/credentials"]
  profile                  = "technoblather"
}

provider "aws" {
  alias  = "acm_provider"
  region = "us-east-1"
}

Aforementioned solution to the gotchas - the latter aws provider block specifies us-east-1 for services that aren't operable outside that region [3]. Basically, you need to use us-east-1 to set up SSL with Cloudfront.

# variables.tf

variable "domain_name" {
  type        = string
  description = "The domain name for the website."
}

variable "bucket_name" {
  type        = string
  description = "The name of the bucket without the www. prefix. Normally domain_name."
}

variable "common_tags" {
  description = "Common tags you want applied to all components."
}

variable "alert_emails" {
  type        = list(string)
  description = "A list of emails for alerting via cloudwatch alarms."
}

No surprises here. One good practice I adopted was issuing a uniform set of tags for each resource since it made winnowing down billing for this project easier, e.g.:

common_tags = {
  Project = "technoblather"
}

Route53 and Certificate Manager

Moving onto DNS. The vast majority of my career has been in development, so DNS and SSL are a topic I've skimmed a how-to-tutorial as little as possible to accomplish my goal and moved on. Trying to make amends here - I have a few hobby projects I operate now (sole user) and know the difference between an A record and a CNAME record. Also smart (lazy) enough to delegate certificate renewal to an automated process or service.

# route53.tf

resource "aws_route53_zone" "main" {
  name = var.domain_name
  tags = var.common_tags
}

resource "aws_route53_record" "root-a" {
  zone_id = aws_route53_zone.main.zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = aws_cloudfront_distribution.root_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.root_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www-a" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "www.${var.domain_name}"
  type    = "A"

  alias {
    name                   = aws_cloudfront_distribution.www_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.www_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "main" {
  for_each = {
  for dvo in aws_acm_certificate.ssl_certificate.domain_validation_options : dvo.domain_name => {
    name   = dvo.resource_record_name
    record = dvo.resource_record_value
    type   = dvo.resource_record_type
  }
  }
  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = aws_route53_zone.main.zone_id
}

Here, we have two declarations for A records (www.technoblather.ca and technoblather.ca) which lets Route53 handle requests for our domain and traffic them to the appropriate service (Cloudfront).

You'll notice a recurring duplication between a www resource and a root resource as we proceed down the service stack. This is because I needed two buckets in order to handle url redirection [4].

I didn't want a request to technoblather.ca to result in a 404, so root functions as a redirect to my www bucket.

Moreover, having a canonical url is a good practice [5], so two A records pointing to one Cloudfront distribution wasn't tenable.

# acm.tf

resource "aws_acm_certificate" "ssl_certificate" {
  provider                  = aws.acm_provider
  domain_name               = var.domain_name
  subject_alternative_names = ["*.${var.domain_name}"]
  validation_method         = "DNS"

  tags = var.common_tags

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_acm_certificate_validation" "cert_validation" {
  provider                = aws.acm_provider
  certificate_arn         = aws_acm_certificate.ssl_certificate.arn
  validation_record_fqdns = [for record in aws_route53_record.main : record.fqdn]
}

Here we're setting up SSL for our domain name. This is almost verbatim copied from the terraform documentation . Setting up DNS validation is recommended and was easier than trying to register a mail server for my domain. Note the usage of the aws.acm_provider to set SSL certificates up via us-east-1. Validating the domain took some time (nearly an hour) which had me wondering whether I had misconfigured something.

Cloudfront

Cloudfront is a highly-available, globally distributed CDN which we can leverage to have low latency responses to requests for our content. We use it to cache the content of our blog through a distribution, specifying the origin (S3), security parameters (e.g. SSL enforcement), geographic restrictions, and caching behaviours [6].

# cloudfront.tf

resource "aws_cloudfront_distribution" "www_s3_distribution" {

  origin {
    domain_name              = aws_s3_bucket.www_bucket.bucket_regional_domain_name
    origin_access_control_id = aws_cloudfront_origin_access_control.www_s3_origin_access_control.id
    origin_id                = "S3-www.${var.bucket_name}"
  }

  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"

  aliases = ["www.${var.domain_name}"]

  custom_error_response {
    error_caching_min_ttl = 0
    error_code            = 404
    response_code         = 200
    response_page_path    = "/404.html"
  }

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-www.${var.bucket_name}"

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }


    function_association {
      event_type   = "viewer-request"
      function_arn = aws_cloudfront_function.add_index_cloudfront_function.arn
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 31536000
    default_ttl            = 31536000
    max_ttl                = 31536000
    compress               = true
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }

  tags = var.common_tags
}

resource "aws_cloudfront_origin_access_control" "www_s3_origin_access_control" {
  name                              = "www-s3-origin-access-control"
  description                       = ""
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

# Cloudfront S3 for redirect to www.
resource "aws_cloudfront_distribution" "root_s3_distribution" {
  origin {
    domain_name              = aws_s3_bucket.root_bucket.website_endpoint
    origin_id                = "S3-.${var.bucket_name}"

    # https://stackoverflow.com/a/55042824/4198382
    custom_origin_config {
      http_port              = "80"
      https_port             = "443"
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }


  enabled         = true
  is_ipv6_enabled = true

  aliases = [var.domain_name]

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-.${var.bucket_name}"

    forwarded_values {
      query_string = true

      cookies {
        forward = "none"
      }

      headers = ["Origin"]
    }

    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 86400
    max_ttl                = 31536000
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }

  tags = var.common_tags
}

resource "aws_cloudfront_function" "add_index_cloudfront_function" {
  name    = "add_index"
  runtime = "cloudfront-js-1.0"
  comment = ""
  publish = true
  code    = file("functions/wwwAddIndex.js")
}
/// wwwAddIndex.js

function handler(event) {
  var request = event.request
  var uri = request.uri

  // Check whether the URI is missing a file name.
  if (uri.endsWith("/")) {
    request.uri += "index.html"
  }
  // Check whether the URI is missing a file extension.
  else if (!uri.includes(".")) {
    request.uri += "/index.html"
  }

  return request
}

Of note here: the origin is our S3 bucket, our root object is index.html, we restrict access to the S3 bucket by specifying an aws_cloudfront_origin_access_control such that only Cloudfront can access it, we enforce SSL, and we set up the cache properties (TTL, compression).

So, this means that all requests must be routed through Cloudfront - accessing the S3 bucket via the bucket url will result in a rejected request. HTTP requests will be redirected to HTTPS. Content is cached for as long as possible - the only time we want the cache to be invalidated is when a new deployment occurs (you'll see later that the CI/CD pipeline invalidates the distribution when deploying changes).

Note the differences between the www distribution and the root distribution. The www will be serving our content whereas the root serves as a redirect to the www.

A major "gotcha" was figuring out how to handle requests to a private bucket for files and directories that weren't the index.html (i.e., the blog root). I ended up having to utilize a Cloudfront Function to rewrite requests for url paths, appending an index.html to the end of them.

So, www.technoblather.ca/some-blog-post becomes www.technoblather.ca/some-blog-post/index.html behind the scenes, matching Gatsby's output build structure (and what's hosted in S3 as a result).

Via Google's Lighthouse tool, this configuration was evaluated with a 100/100 for performance.

technoblather's lighthouse score

Not too shabby.

S3

Blog content is hosted in S3. One bucket contains the build output from Gatsby, the other serves as a redirect to the aforementioned bucket. Of note, these buckets are private and only accessible to our specified Cloudfront distribution. Versioning is also enabled in case of any "oopses" (disaster recovery, fat finger clicking delete).

# www_bucket.tf

resource "aws_s3_bucket" "www_bucket" {
  bucket = "www.${var.bucket_name}"
  tags   = var.common_tags
}

resource "aws_s3_bucket_acl" "blog_acl" {
  bucket = aws_s3_bucket.www_bucket.id
  acl    = "private"
}

resource "aws_s3_bucket_versioning" "blog_versioning" {
  bucket = aws_s3_bucket.www_bucket.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_policy" "blog_policy" {
  bucket = aws_s3_bucket.www_bucket.id
  policy = templatefile("templates/s3-private-policy.json", { bucket = "www.${var.bucket_name}", cloudfront_arn = aws_cloudfront_distribution.www_s3_distribution.arn })
}

resource "aws_s3_bucket_cors_configuration" "blog_cors_configuration" {
  bucket = aws_s3_bucket.www_bucket.id

  cors_rule {
    allowed_headers = ["Authorization", "Content-Length"]
    allowed_methods = ["GET", "POST"]
    allowed_origins = ["https://www.${var.domain_name}"]
    max_age_seconds = 3000
  }
}
# root_bucket.tf

resource "aws_s3_bucket" "root_bucket" {
  bucket = var.bucket_name
  tags   = var.common_tags
}

resource "aws_s3_bucket_policy" "root_blog_policy" {
  bucket = aws_s3_bucket.root_bucket.id
  policy = templatefile("templates/s3-public-policy.json", { bucket = var.bucket_name })
}

resource "aws_s3_bucket_acl" "root_acl" {
  bucket = aws_s3_bucket.root_bucket.id
  acl    = "public-read"
}

resource "aws_s3_bucket_website_configuration" "root_blog_website_configuration" {
  bucket = aws_s3_bucket.root_bucket.id
  redirect_all_requests_to {
    host_name = "www.${var.domain_name}"
    protocol  = "https"
  }
}
// s3-private-policy.json

{
  "Version": "2012-10-17",
  "Statement": {
    "Sid": "AllowCloudFrontServicePrincipalReadOnly",
    "Effect": "Allow",
    "Principal": {
      "Service": "cloudfront.amazonaws.com"
    },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::${bucket}/*",
    "Condition": {
      "StringEquals": {
        "AWS:SourceArn": "${cloudfront_arn}"
      }
    }
  }
}
// s3-public-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::${bucket}/*"
    }
  ]
}

The root bucket has a much more permissive policy given it serves as a redirect to the www bucket. Also note the bucket acl is public-read versus private and that the website configuration includes a redirect_all_requests_to block.

Cloudwatch Alarms and SNS

In the interest of improving my capabilities around operating software, I wanted to make sure I had some form of monitoring and alerting in place (DevOps - right?). Cloudfront publishes a set of metrics by default which can be consumed by Cloudwatch [7]. So, I figured I should send myself an email if my blog ever begins returning a 500 error (i.e., something unexpected has gone wrong and my content is no longer available).

# cloudwatch.tf

resource "aws_cloudwatch_metric_alarm" "blog_broken_alarm" {
  provider            = aws.acm_provider
  alarm_name          = "blog-broken-alarm"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "5xxErrorRate"
  namespace           = "AWS/CloudFront"
  period              = "60"
  statistic           = "Average"
  threshold           = "1"
  treat_missing_data  = "notBreaching"

  alarm_description = "Monitors whenever average error rate exceeds 1%"
  alarm_actions     = [aws_sns_topic.technoblather_sns_topic_500_error_threshold_exceeded.arn]

  dimensions = {
    DistributionId = aws_cloudfront_distribution.www_s3_distribution.id
    Region         = "Global"
  }

  tags = var.common_tags
}
# sns.tf

resource "aws_sns_topic" "technoblather_sns_topic_500_error_threshold_exceeded" {
  provider = aws.acm_provider
  name     = "technoblather-500-error-threshold-exceeded"
  delivery_policy = jsonencode({
    "http" : {
      "defaultHealthyRetryPolicy" : {
        "minDelayTarget" : 20,
        "maxDelayTarget" : 20,
        "numRetries" : 3,
        "numMaxDelayRetries" : 0,
        "numNoDelayRetries" : 0,
        "numMinDelayRetries" : 0,
        "backoffFunction" : "linear"
      },
      "disableSubscriptionOverrides" : false,
      "defaultThrottlePolicy" : {
        "maxReceivesPerSecond" : 1
      }
    }
  })
  tags = var.common_tags
}

resource "aws_sns_topic_subscription" "technoblather_sns_topic_500_error_threshold_exceeded_email_subscription" {
  provider  = aws.acm_provider
  count     = length(var.alert_emails)
  topic_arn = aws_sns_topic.technoblather_sns_topic_500_error_threshold_exceeded.arn
  protocol  = "email"
  endpoint  = var.alert_emails[count.index]
}

The 5xxErrorRate metric triggers an alarm if the average threshold of errors is >= 1 within a minute. In plain language, if it ever happens, I'll get an alarm. One gotcha I had while setting this up was configuring the treat_missing_data to explicitly be notBreaching. This means no data being returned will be an OK instead of an INSUFFICIENT_DATA. In our case, no data is a good thing.

The alarm triggers an event to be published to an SNS topic. That topic has a subscription resulting in an email send being triggered.

So, whenever a 5xx happens from a request to the blog, the alert_emails get an email. I was able to test this using the set-alarm-state command from the aws CLI.

Authoring the CI/CD pipeline

Here things became more familiar for me. There are two processes I wanted to automate:

  • Verifying the changes we're pushing on a pull request are standardized.
  • Deploying the changes when they've been merged.

For the first, we run formatters and check that gatsby can build without throwing an error.

For the latter, we build an output artifact via Gatsby, sync it to S3, and invalidate our Cloudfront distribution to break the cache such that our new content will be served.

# verify_pull_request.yml

name: Verify pull request

on:
  pull_request:
    branches: [main]

jobs:
  verify:
    name: Build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v1

      - name: Setup Node
        uses: actions/setup-node@v3
        with:
          node-version: 18

      - name: Setup terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.1.7

      - name: Install node dependencies
        run: npm ci

      - name: Prettier fmt
        run: npm run format:check

      - name: Terraform fmt
        run: terraform fmt -check

      - name: Terraform Validate
        run: terraform validate -no-color

      - name: Gatsby build
        run: npm run build
# deploy_to_s3.yml

name: Deploy to s3 and refresh cloudfront

on:
  push:
    branches: [main]

jobs:
  deploy-to-s3:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - name: Checkout main
        uses: actions/checkout@v2

      - name: Setup Node
        uses: actions/setup-node@v3
        with:
          node-version: 18

      - name: Install node dependencies
        run: npm ci

      - name: Build production artifact
        run: npm run build

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: ${{ vars.RUNNER_AWS_ROLE_ID }}
          aws-region: ca-central-1

      - name: Deploy technoblather to S3
        run: |
          aws s3 sync public ${{ vars.BUCKET_ID }}

      - name: Invalidate Cloudfront cache
        run: |
          aws cloudfront create-invalidation --distribution-id ${{ vars.CLOUDFRONT_ID }} --paths "/*";

Notably, the AWS secret and environment variables are plumbed through Github's repository settings. I did have to set up an OIDC provider as per AWS's Github Actions documentation [8] to facilitate this in a secure way.

So, what's next?

We covered a large quantity of information scrolling through the 2000 and some words to this point. There are a number of next steps I have planned with the intention of validating more of my knowledge (and simply learning new things). While not exhaustive, next steps could be:

  • Setting up request logging in Cloudfront.
  • Creating more alarms for metrics published to Cloudwatch (e.g. p99 latency, cache misses).
  • Setting up DDoS protection with AWS WAF and/or AWS Shield.
  • Setting up a staging environment to validate changes before deploying to production.
  • Setting up a budget via AWS Budgets.
  • Setting up a "subscribe" feature for new posts published to the blog.
  • Handling 404 redirects correctly.

...and more.

I'll likely write posts on these topics as I make my way through them alongside my other hobby projects. Notably, I operate a weightlifting application and a goal tracking application. Sign ups for both are disabled (aside from my partner who graciously humours me) but if you're really curious you can bother me for an account.

Final thoughts

Overall, going through the process of building and documenting my experience gave me the opportunity to validate my understanding and learn what I didn't understand. If you can't explain something, you don't understand it.

A static website is a good entry point for learning how to compose services with AWS given it's a relatively simple architecture and has many opportunities for expansion. Moreover, treating it like a live service meant I had to consider how to monitor, operate, and observe my architecture - all of which are critical components of developing and operating live services in a DevOps fashion.

Having this blog presents myself the opportunity to share those 'Aha!' moments openly with the technical community and to hopefully help people out if they meander here via hyperlinks or search queries. Writing is something I used to do lots (postsecondary education: english, philosophy, computer science) and haven't done much of lately. It's a valuable skill to keep sharp.

I'm hopeful you learned something if you made it this far. If not, I'm hopeful you at least found something useful to copy and paste for whatever you're working on. Cheers all.


Profile picture

Written by Mavrick Laakso. He is an experienced software and DevOps engineer with ten years of technical experience. Find him on LinkedIn, GitHub, or via email.

© 2024 Mavrick Laakso