In 2022 I hacked together a hybrid setup for @SuperSeriousBot: keep GitHub’s managed x86 runners, bolt on my own arm64 box over SSH, and let buildx juggle them both. It was janky, but it delivered a 10x speedup over emulating arm64 locally.
Now that GitHub ships first-party arm64 runners, the obvious question: can I ditch the self-hosted machine and still smash my multi-arch build times?
Short answer: yes. The old workflow took 10 minutes 20 seconds. The new one lands at 2 minutes 45 seconds, with no extra hardware to babysit.
Where things stood
To support linux/amd64 + linux/arm64 images, I previously leaned on QEMU emulation and a single Buildx invocation. It looked like this:
name: Release
on:
push:
branches:
- "master"
jobs:
publish:
name: Build Docker Image
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
id: builder
uses: docker/setup-buildx-action@v3
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
file: ./Dockerfile
push: true
tags: |
ghcr.io/obviyus/gotm-remix:${{ github.sha }}
ghcr.io/obviyus/gotm-remix:latest
cache-from: type=gha
cache-to: type=gha,mode=max
It got the job done, but every arm64 layer had to run under emulation. Even with aggressive caching (cache-to hit 1.2 GB at one point), each release still idled for ten minutes while QEMU ground through Bun + Remix builds.
Enter GitHub’s arm64 runners
GitHub now offers ARM-powered hosts via runs-on: ubuntu-24.04-arm. That means I can schedule a real linux/arm64 job without self-hosting or SSH tunnels. Pair that with the existing x86 fleet and we can parallelise the build matrix properly.
The upgraded workflow splits the heavy lifting into two stages: build each architecture on native metal, then stitch manifests together.
name: Release
on:
push:
branches:
- master
env:
IMAGE: ghcr.io/obviyus/gotm-remix
jobs:
build:
strategy:
fail-fast: false
matrix:
include:
- platform: linux/amd64
runner: ubuntu-latest
artifact: linux-amd64
- platform: linux/arm64
runner: ubuntu-24.04-arm
artifact: linux-arm64
runs-on: ${{ matrix.runner }}
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v5
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/setup-buildx-action@v3
- name: Build & push by digest
id: build
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
platforms: ${{ matrix.platform }}
outputs: type=image,name=${{ env.IMAGE }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=${{ github.ref_name }}-gotm-remix
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}-gotm-remix
provenance: mode=max
sbom: true
- name: Export digest
run: |
mkdir -p ${{ runner.temp }}/digests
echo "${{ steps.build.outputs.digest }}" | sed 's/^sha256://' | xargs -I{} touch "${{ runner.temp }}/digests/{}"
- uses: actions/upload-artifact@v4
with:
name: digests-${{ matrix.artifact }}
path: ${{ runner.temp }}/digests/*
retention-days: 1
merge:
needs: build
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/setup-buildx-action@v3
- uses: actions/download-artifact@v4
with:
path: ${{ runner.temp }}/digests
pattern: digests-*
merge-multiple: true
- name: Create and push manifest
working-directory: ${{ runner.temp }}/digests
run: |
docker buildx imagetools create \
-t $IMAGE:latest \
-t $IMAGE:${{ github.sha }} \
$(printf "$IMAGE@sha256:%s " *)
- name: Inspect
run: docker buildx imagetools inspect $IMAGE:latest
Why this is faster
- Native arm64 silicon — Bun’s compiler and Vite’s asset pipeline execute on Neoverse cores instead of QEMU’s TCG interpreter, so every syscall and file watch stays in-kernel rather than jumping through software translation.
- Parallel execution — BuildKit instances run on separate runners, letting
docker/build-push-actionfan out the layer graph; pushes complete once per architecture with no cross-platform coordination inside a single builder. - Digest-first publishing —
push-by-digest=truewrites OCI payloads once per platform and defers manifest creation; the merge step replaysimagetools createagainst cached digests, so a cache miss on arm64 no longer forces a full multi-arch upload. - Scoped cache — Cache exporters are keyed to
${{ github.ref_name }}-gotm-remix, keeping branch builds sandboxed and retaining the hot path formasterinstead of thrashing a global cache namespace.
The end result is a 3.7x improvement (for this specific case!) versus the already-optimised 2022 setup—without the maintenance overhead of a bespoke runner.
Caveats worth noting
- The arm64 pool is still smaller than the x86 fleet. Queue times have been fine so far, but I expect longer waits around big releases or US daytime.
docker/build-push-action@v6requires BuildKit features that older Dockerfiles might not expect (SBOMs, provenance).- The final artifact juggling looks awkward, but it keeps things clean: each job uploads just the digest string, so the the merge job only needs login to the target registry and the digests; it doesn’t need the per-arch builders or local images.