How to convey with an URI what is content type of the resource

url

I would like to use URIs to represent different files we can use in our system. But to know which module to use to parse it, it would be great if I could somehow encode what is content type of the resource to which it is pointing at, so that it can be given to a proper module for parsing.

I was thinking to maybe extend scheme part to convey this. For example file+csv:///path/to/file would point to a CSV file, while file+caffe:///path/to/directory would point to a directory with Caffe model and parameters. And so on. I have a limited set of types I want to support so this seems a reasonable way?

But is there some other standard way?

Best Answer

URLs by themselves are very protocol-agnostic. They do not specify much more than a common syntax and basic semantics. An URL generally describes how to find something, but not what you'll find there.

It is the job of a particular protocol such as HTTP to indicate the content type. Some resources do not have a meaningful content type, for example mailto: URLs. The FTP protocol has no concept of MIME types, but merely distinguishes textual files, binary files, and directories (specified as a ;type=<typecode> parameter in an FTP URL). Regarding file URLs, RFC 1738 Uniform Resource Locators (URL) notes:

The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited.

RFC 8089 The "file" URI Scheme concurs:

The file URI scheme is not coupled with a specific protocol nor with a specific media type [RFC6838].

So most URL schemes do not allow you to include the content type in the URL, and there is no scheme-agnostic mechanism to do that.

You can of course develop your own non-standard URL scheme that consists of MIME type + transport. It would be best to not put the type into the scheme name: I'd consider a design such as example:text/csv:file://path/to/file.

Alternatively you could store the type in a query param of a file URL – except that a file URI syntax as defined by the RFC does not have query parameters. This also may lead to problems with some implementations on Windows systems. But this has the advantage that query params for file URLs are ignored by parsers that use the WHATWG's generic URL parsing algorithm.

Related Topic