Getting random CONNECTION RST responses from Fastly

I’ve created this small repo to showcase how the problem happens:

Here’s the logs from that program:

./connection_test --host="https://www.fastly.com"
Will attempt to connect 100 times to https://www.fastly.com
Get "https://www.fastly.com": read tcp 10.10.10.47:60884->151.101.193.57:443: read: connection reset by peer
Get "https://www.fastly.com": read tcp 10.10.10.47:60888->151.101.193.57:443: read: connection reset by peer
Get "https://www.fastly.com": read tcp 10.10.10.47:60890->151.101.193.57:443: read: connection reset by peer
Test complete.

I’ve also made a web based test, but this is a bit more flaky I’ve noticed: https://aaomidi.github.io/connection_test/

I’ve set a custom user agent on that CLI utility to make it easier to pinpoint where this problem is:

req.Header.Set("User-Agent", "https://github.com/aaomidi/connection_test")

I’m wondering if this is another case of: https://mailman.nanog.org/pipermail/nanog/2018-September/096871.html

The IP that your connection fails to is anycast so it could very well be a similar problem of unstable ECMP.

You could try a TCP traceroute to see what that shows, something like:

mtr -rwbzc 100 -T -P 443 151.101.193.57
HOST: Amirs-MacBook-Pro.local                                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    10.10.10.1                                            0.0%   100    2.1   2.7   1.9   5.6   0.7
  2. AS40898  199.38.69.2                                           0.0%   100    4.0   7.7   3.1  66.5  10.8
        161.199.180.3
     AS40898  161.199.180.3
  3. AS13536  66-152-105-13.static.firstlight.net (66.152.105.13)   0.0%   100    8.4   8.0   5.9  11.4   1.2
        66-109-46-9.static.firstlight.net (66.109.46.9)
     AS13536  66-109-46-9.static.firstlight.net (66.109.46.9)
  4. AS???    be2.albnypscr1.ip.firstlight.net (66.109.52.57)       0.0%   100    6.7   8.4   5.8  16.0   1.8
        be30.bnghnyhecr1.ip.firstlight.net (66.152.98.225)
     AS13536  be30.bnghnyhecr1.ip.firstlight.net (66.152.98.225)
  5. AS13536  be24.albynypsbr2.ip.firstlight.net (66.152.97.41)     2.0%   100  3010. 825.6   6.1 5010. 1179.1
        be23.nycmnyqobr1.ip.firstlight.net (66.152.97.49)
     AS13536  be23.nycmnyqobr1.ip.firstlight.net (66.152.97.49)
  6. AS46887  eqix-ny1-1.fastly.com (198.32.118.104)                0.0%   100    9.6  10.6   9.2  13.6   1.0
        fastly-1.nyiix.net (198.32.160.22)
     AS???    fastly-1.nyiix.net (198.32.160.22)
  7. AS54113  151.101.193.57                                        0.0%   100   10.4  10.6   8.8  25.1   2.0

I’ve also noticed that this is happening to another CDN as well: Microsoft Edge Network.

I’m guessing this is probably more credence to ANYCAST handling being the culprit here, and I think it comes back down to First Light Fiber. I’m not sure what the best way to reach them is going to be…

The multi-pathing actually starts inside OEConnect AS40898 so it could be there too.

I will try to reach out to their NOC referencing this discussion.

Perhaps you can contact their support too?

OEConnect has mostly just been forwarding these issues to First Light Fiber. But agreed that it could be starting there as well.

I’ve reached out to a few other STUB ASes that uses AS13536 as their only upstream to see if they’ve also been noticing this. If they’re not seeing similar patterns of problems, I’ll push back a bit more on OEConnect.

Cheers! The internet really is held together with glue sometimes :sweat_smile:

My ISP has solved this problem. Cheers all!