If you're working at a large company and downtime is extremely expensive, this checklist is a good guide. Otherwise, if you have good test coverage, you can get by with something simpler. It's super rare to have a breaking change in go.<p>We do quarterly upgrades of all services in a monorepo (about 20-30). The steps are basically this:<p>- Upgrade all dependencies to their latest versions, fixing build and test breaks (I read release notes for Go, but not for dependencies)<p>- Look for deprecated packages and replace them<p>- Upgrade all toolchains, including CI/CD containers, go.mod, etc.<p>- Run all tests<p>- Deploy to the test environment and make sure everything is green<p>- Deploy to staging and do some sanity checks<p>- Deploy to prod, keeping an eye on metrics for an hour or two<p>We're on k8s and the state of all clusters (i.e. which images are running) is tracked in git, so a rollback is just git revert + apply.<p>In practice, after about four years of this, we've seen maybe a dozen build breaks, and I can only remember one regression caused by a breaking change in a library[1].<p>[1] <a href="https://github.com/golang/go/issues/24211">https://github.com/golang/go/issues/24211</a>
Out of curiosity, were you dealing with microservices defined within a monorepo, or microservices each in their own repo? The steps here:<p>> Build your binaries with the new version. Go through the build errors if any.<p>> Run all the unit tests with the new version. Go through the test failures.<p>are a lot easier in a monorepo.<p>Separately, I've experienced frequent breaking changes in the golangci-lint configuration file. I can't point to a specific instance of this happening but one thing I'd suggest is pinning your version of golangci-lint in development and in CI rather than using "latest".<p>Golang's backwards compatibility and simplified toolchain is one of my favorite parts about it. Bumping go.mod and downloading the new version of go is usually all it takes!
I’ll add an item that is not yet on our checklist but has already bitten us several times: check your code generation. Since code generation is so popular in the Go ecosystem, we’ve got 5 or 6 different codegen tools that update on various timelines. Twice now we’ve gone through a checklist similar to this article, patted ourselves on the back, and a week later found out no one can regenerate any code.
Another suggestion: if your monorepo's service packaging is sufficiently uniform, build every service against both Go versions, package both binaries into the deploy artifact, and install a feature flag that lets you select which binary to boot when the service starts. This also lets you canary an arbitrary percentage of the fleet with the new Go version, and you can execute a version rollback by redeploying (without needing to revert any commits).
> Update go.mod.<p>Be careful after you did this. Go has changed for-loop semantics since Go 1.22. When you change the go version to 1.22+ from 1.22-, you Go code has a probability to being broken: <a href="https://go101.org/blog/2024-03-01-for-loop-semantic-changes-in-go-1.22.html" rel="nofollow">https://go101.org/blog/2024-03-01-for-loop-semantic-changes-...</a> (It is long. A short important summary is here: <a href="https://github.com/golang/go/issues/66156">https://github.com/golang/go/issues/66156</a>)<p>Currently (Go 1.24), the official team has not published a tool to identify all of the breaking cases caused by this change. So you might need to check the code by your eyes.
> With the introduction of generics at 1.18, many linters lacked support for generics for months. We delayed the upgrade due to this issue.<p>I wouldn't plan on using a new feature in production in the release that introduced it. Why would you plan to be using generics on day one?<p>> There was talk of trying to solve this issue in the upstream ourselves.<p>Was there a genuine business case that would make Lyft more profit if they used generics? If not then why would you even consider this?<p>> Fortunately, by the time we seriously started exploring this option, linter support was added and go 1.19 was also released. We eventually upgraded directly to 1.19 from 1.17 but we were around 10 months late.<p>You weren't late. You were precisely on time. This is some odd project mentality.