Logbook - 2022-04-20

Image by Annie Ruygt

This post is a changelog, and it’s also about the challenges of generating a changelog at Fly.io—where we run your apps in VMs on our hardware around the world. It only takes a few minutes to try out the latest and greatest Fly.io!

Here’s a changelog covering our most recent activity (i.e. since we started compiling updates, a bit under two weeks ago):

  • [Feature] Added extra capacity in IAD.
  • [Feature] Backup regions (by far our most confusing misfeature) are now disabled by default. This should mean less people getting apps deployed in weird regions.
  • [Feature] Released flyctl 0.0.318 with NuxtJS app launch support.
  • [Feature] Created our NuxtJS launcher. NuxtJS is a huge framework that wraps VueJS with server side rendering for SEO and faster initial load times. Those developers can really benefit from deploying apps closer to their users. This launcher uses a recommended Dockerfile from their docs so no buildpack needed.
  • [Feature] Completed a first version flyctl support and documentation for RedwoodJS app deployment on Fly.io. This is a new framework with a lot of momentum behind it.
  • [Feature] Released flyctl v0.0.319 which updates some Fly.io API response handling to allow us to provide more informative errors without breaking functionality.
  • [Feature] Moved all the postgres operations in flyctl (except connect) from going through ssh to going through http, which increases their reliability since we had people reporting issues with commands like fly pg attach. (PR) (postgres-ha release v0.0.19)
  • [Feature] Took some time to clarify internal logic around timeouts and error handling. This is the first step towards documenting more errors (that currently appear as code “Undocumented”) and making it easier to diagnose the performance and behavior of requests going through fly-proxy.
  • [Feature] Refactored Trigger Failover, Restart, View PG Settings operations in the latest releases of postgres-ha and flyctl. This continues the move of Fly.io postgres operations from ssh to http. (postgres-ha v0.0.20), (flyctl v0.0.319)
  • [Feature] Just shipped a changelog fizz update type so we can start getting interesting stuff in front of users
  • [Fix] Merged a fix for IP collision errors, once this is deployed users will see far fewer of these errors preventing their VMs from starting. Probably 0 errors.
  • [Fix] Fixed a state machine bug that broke builders in weird ways. Builder machines were occasionally getting in a state that didn’t allow a “start” to be issued. This caused fly deploy to hang for some organizations.
  • [Fix] Fixed “recent logs” on flyctl vm status.
  • [Fix] Deployed updates to make the flyctl release process more reliable. Released flyctl versions were getting stuck such that certain platforms weren’t always notified about most up-to-date flyctl version.
  • [Docs] Documented the restart_limit option for TCP and HTTP health checks configured in fly.toml. This was missing from the docs, and there were two different plausible behaviours. (Docs)

How Do We Make a Changelog Happen at Fly.io?

We want to tell you about every interesting thing we’re doing, from adding a new option to a flyctl command, to speeding up Docker image pulls, to generating more enlightening error messages. How do we collect updates like this from a distributed company that’s grown from 7 people to 26 in the past 8 months? Turns out, it’s not easy. Here’s why it’s hard for us.

The job is basically:

  1. Capture all the work that should get an entry.
  2. Formulate entries so users see what we did and why it’s interesting.

In the spirit of exploration, we tried having one person compile a changelog by looking at all the status updates and git commits over the span of a few days. This helped crystallize some challenges for us; specifically:

  • Commits don’t always mean something interesting got finished.
  • Something interesting getting done didn’t always mean anybody wrote a status update about it.
  • We tend to write commit messages and status updates from a technical point of view, not always including the context needed to judge how the work impacts users.

Since it’s not practical to delegate discovery and translation of everyone’s work to a team of changelog artisans, we’d better all just write good updates for our own work.

This means we all have to get good at the following:

  1. Writing changelog items that (a) make it obvious what we did, and (b) explain why it matters to users.
  2. Recognizing that the thing we just finished does matter to users, so we should write an update (if it really doesn’t, why did we do it?). This can be really tricky!

On that second point: we do a tremendous amount of work to improve apps that already work on Fly.io, and most of this work is practically invisible.

You may have noticed that in today’s list we have new deployment support for two frameworks. We are definitely pedal-to-the-metal on making it easy to deploy all kinds of apps on Fly.io, and it’s easy to write an update for new features. But there’s also furious activity on improving the platform for all the apps running on us, and this is important and interesting, and we should showcase it, even if it’s a harder blurb to write.

Shiny new things

Trying RedwoodJS? Check out how easy we made it to deploy on Fly.io.

Get started for free

If everyone’s writing their own updates, this brings up its own minor issue: the mechanics of collecting all the changelog entries in one place. Fortunately, we’re nerds (or at least Kurt is), so we’ve automated this part. When we have an update to emit, we tell it to a Slack bot that passes it to an app (that Kurt wrote), which in turn collates our updates into feeds ripe for the copypasting. It doesn’t solve the hardest problems, but it’s pretty damn cool.

We’ll be making a special effort to cultivate good changelog habits so we can bring you all the interesting things.