TL;DR: Users reported issues, two Cloudflare proxied hostnames would just hang. Same app on a non proxied domain worked great. The culprit (my best guess) was a routing/peering blackhole on the ISP -> Cloudflare path for a specific Cloudflare anycast pool. Not DNS!
Workaround: Change to a non-proxied DNS only (direct) hostname.
Update: Cloudflare is currently investigating. According to this CF Community thread, it was broken on Jio yesterday but they fixed it today and broke it for Airtel.
The Setup
Users started reporting issues that the sites kept timing out. I was completely unable to replicate this. I asked friends and colleagues, but none of them had issues.
Two doors, same house:
app.alpha.example
β Cloudflare proxied (via Tunnel) β Bun server on:3002
app.beta.example
β direct (Nginx, no CF proxy) β same app
Users could not access app.alpha.example
while app.beta.example
worked just fine. Other apps on the same box (:3001
) worked perfectly.
The Symptom
- On one ISP: Cloudflare proxied hostname(s) β infinite timeout.
- On VPN / other ISPs: flawless.
- On the same ISP, direct hostname: flawless.
So either my server hates orange clouds, or the path to Cloudflare is cursed. What made this infinitely more complex is that it was intermittent. Out of nowhere, it started working for a subset of users but broke for another.
First Try: DNS is Boring (and Thatβs Good)
dig +short A app.alpha.example
# β 104.21.48.1 104.21.80.1 104.21.96.1 104.21.112.1 104.21.16.1 104.21.32.1 104.21.64.1
dig +short AAAA app.alpha.example
# β 2606:4700:3030::6815:6001 ... ::6815:5001
Resolves fine. Shockingly, it wasnβt DNS. The proxied domain lands in 104.21.0.0/16. The direct one hits my origin.
Second Try: Pinning the Edge IPs
for ip in $(dig +short A app.alpha.example); do
echo "== testing $ip =="
curl -sSvk --connect-timeout 6 \
--resolve app.alpha.example:443:$ip \
https://app.alpha.example/cdn-cgi/trace | head -n 6 || echo "(timeout)"
done
- On the broken ISP: every 104.21.x.x IP β timeout.
- On VPN: same IPs β normal
colo=...
traces.
The Smoking Gun: mtr
on TCP/443
sudo mtr -rwzc 100 -P 443 104.21.96.1
Result:
1. 192.168.31.1 0.0% 100 3.1 4.5 2.5 12.6 1.6
2. 192.168.1.1 0.0% 100 5.8 4.9 3.1 11.2 1.3
3. <ISP access> 0.0% 100 9.1 10.8 5.2 46.1 7.3
4. <ISP core> 0.0% 100 8.3 11.5 4.6 50.7 7.6
5. * 100% 100 β β β β β
Hop 5: the abyss. Exactly where the ISP should hand traffic to Cloudflare (AS13335). Instead, the packets vanish.
βBut What About Argo?β
β¦ is what I asked myself. Nope. Argo optimizes edge β origin. This failure happens before the edge handshake. If your SYNs donβt land, Argo cannot save you.
Takeaways (and a checklist for next time)
- If a CF proxied site dies on one ISP, suspect routing.
- Prove it fast:
curl --resolve
+mtr -P 443
. - Keep users happy: DNS only fallback.
- Donβt blame your tunnel/server when it is a hop 5 blackhole.
I had become quite a fan of using Cloudflare Tunnels lately. Alas, back to good old nginx I go.