Requests with absolute uri return 404


#1

Requests with absolute URI in the HTTP request line return 404, but when the request line has a path it works correctly.

Here is an example of a request against a Fastly service with an Amazon S3 backend that works correctly:

GET /registry.ets.gz HTTP/1.1
Host: repo.hex.pm

HTTP clients when making requests to a proxy (without using CONNECT) use an absolute URI in the request line to inform the proxy to which server to connect to and the proxy will forward the request line without rewriting it from absolute URI to path. Example of absolute URI:

GET https://repo.hex.pm/registry.ets.gz HTTP/1.1
Host: repo.hex.pm

Unfortunately Fastly will always return 404 in these cases. RFC 2616 says that absolute URI must be handled by servers:

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

It’s not an issue with Amazon because they correctly handle requests with absolute URI:

GET http://s3.amazonaws.com/s3.hex.pm/registry.ets.gz HTTP/1.1
Host: s3.amazonaws.com

#2

Hi Eric,

First off, can you elaborate on the circumstances where this would happen? Because we’ve not seen any browsers or other clients behave this way (they all tend to CONNECT.) So it would be good for us to know the details around cases where it does happen.

Second, a change has been made to include support for absolute URLs that start with https instead of http. It should be rolled out in the next week or so.

Third, if this is a problem that you’re running into that needs urgent fixing, you can use either this snippet of VCL:

    if (req.url ~ "(?i)^https://") {
      set req.url = regsub(req.url, "(?i)^https://[^/]*","");
    }

Or you can add a Header object in the UI, under Configure > Content, and use the following settings:

Name: something descriptive like "strip absolute URLs"
Type: Request
Action: Regex
Destination: url
Source: req.url
Regex: (?i)^https://[^/]*
Substitution: empty

After adding it, you can then add a Condition to it that reads: req.url ~ "(?i)^https://"

I hope this helps, and I’m very curious as to the answer to my question.

Cheers,

Doc


#3

Hi,

The header configuration you suggested works great and resolves our issue, thank you. It seems an absolute URI starting with http:// always works, even without any special rules in my configuration, why is that?

Second, a change has been made to include support for absolute URLs that start with https instead of http. It should be rolled out in the next week or so.

That is good news, when the change is rolled out can I remove the header configuration then?

First off, can you elaborate on the circumstances where this would happen? Because we’ve not seen any browsers or other clients behave this way (they all tend to CONNECT.) So it would be good for us to know the details around cases where it does happen.

The client we are using, httpc from Erlang/OTP, uses CONNECT when proxying https connections but even when using CONNECT it uses an absolute URI in the subsequent request. I haven’t found anything in the HTTP RFCs that indicates that this is wrong but from googling around it seems to be uncommon behaviour. I am discussing with the OTP team if it can be changed to using a path URI in the request after the CONNECT. (For reference: https://github.com/erlang/otp/pull/1052)


#4

We have code for removing the http://<hostname> from URLs in our master VCL, and since this is the first time we ran into a client that didn’t use CONNECT properly, we never saw any reason to strip https://<hostname>.

Yes, once we’ve rolled out our change, you’ll be able to remove the workaround.

Good to know! Thank you!


#5

For some definitions of “properly”. :slight_smile: