$ cd ..

Debugging a Cloudflare Routing Blackhole

πŸ“… 2025-07-30

βŒ› 8 days ago

πŸ“– 3 min read


TL;DR: Users reported issues, two Cloudflare proxied hostnames would just hang. Same app on a non proxied domain worked great. The culprit (my best guess) was a routing/peering blackhole on the ISP -> Cloudflare path for a specific Cloudflare anycast pool. Not DNS!

Workaround: Change to a non-proxied DNS only (direct) hostname.

Update: Cloudflare is currently investigating. According to this CF Community thread, it was broken on Jio yesterday but they fixed it today and broke it for Airtel.

The Setup

Users started reporting issues that the sites kept timing out. I was completely unable to replicate this. I asked friends and colleagues, but none of them had issues.

Two doors, same house:

Users could not access app.alpha.example while app.beta.example worked just fine. Other apps on the same box (:3001) worked perfectly.

The Symptom

So either my server hates orange clouds, or the path to Cloudflare is cursed. What made this infinitely more complex is that it was intermittent. Out of nowhere, it started working for a subset of users but broke for another.

First Try: DNS is Boring (and That’s Good)

dig +short A app.alpha.example
# β†’ 104.21.48.1 104.21.80.1 104.21.96.1 104.21.112.1 104.21.16.1 104.21.32.1 104.21.64.1

dig +short AAAA app.alpha.example
# β†’ 2606:4700:3030::6815:6001 ... ::6815:5001

Resolves fine. Shockingly, it wasn’t DNS. The proxied domain lands in 104.21.0.0/16. The direct one hits my origin.

Second Try: Pinning the Edge IPs

for ip in $(dig +short A app.alpha.example); do
  echo "== testing $ip =="
  curl -sSvk --connect-timeout 6 \
    --resolve app.alpha.example:443:$ip \
    https://app.alpha.example/cdn-cgi/trace | head -n 6 || echo "(timeout)"
done

The Smoking Gun: mtr on TCP/443

sudo mtr -rwzc 100 -P 443 104.21.96.1

Result:

1. 192.168.31.1         0.0%   100    3.1   4.5   2.5  12.6   1.6
2. 192.168.1.1          0.0%   100    5.8   4.9   3.1  11.2   1.3
3. <ISP access>         0.0%   100    9.1  10.8   5.2  46.1   7.3
4. <ISP core>           0.0%   100    8.3  11.5   4.6  50.7   7.6
5. *                  100%   100     β€”     β€”     β€”     β€”     β€”

Hop 5: the abyss. Exactly where the ISP should hand traffic to Cloudflare (AS13335). Instead, the packets vanish.

”But What About Argo?”

… is what I asked myself. Nope. Argo optimizes edge ⇄ origin. This failure happens before the edge handshake. If your SYNs don’t land, Argo cannot save you.

Takeaways (and a checklist for next time)

I had become quite a fan of using Cloudflare Tunnels lately. Alas, back to good old nginx I go.