An Open Letter to Web Developers on Percent-Encoding
Dear web developers,
Please understand the difference between the encoding called for in application/x-www-form-urlencoded
encoded data [1] and the encoding called for in URIs (and how that encoding applies to URIs in the http/https scheme).
In particular, note the difference in how spaces are treated. The former encodes them as +
, the later as %20
.
Understand also how your web server processes different pieces of URLs. Pay particular attention to your web server’s treatment of plus signs. PHP, for example, interprets “$_GET
” data (the data drawn from the query component of the URI) as though it were application/x-www-form-urlencoded
and converts those plus signs to spaces [2].
Now that you know that the different encoding methods encode spaces differently and that the different decoding methods decode pluses differently, make sure you’re being consistent.
Because when I type into your website my email address with its plus sign in it, I expect things to work.
Love,
mdawaffe
[1] More details in HTML5‘s definition of the application/x-www-form-urlencoded
encoding algorithm. ↑
[2] See the difference between urlencode()
/urldecode()
and rawurlencode()
/rawurldecode()
. parse_str()
uses urldecode()
.
Decoding the query component of the URI according to application/x-www-form-urlencoded
is the logical thing to do, since the web server can’t tell the difference between a GET request generated by a form and one that isn’t (since user agents are not supposed to set a “Content-Type: application/x-www-form-urlencoded
” header when submitting a form over http/https via the GET method), but I can’t tell if converting pluses in the query part of the URI into spaces is required; I can’t find anything in the definition of the http URI scheme that says that plus signs are “delimiters” in the query component of the URI. So I believe it to be an implementation-specific choice and therefore up for grabs depending on how you power your web server (PHP, ruby, .NET, etc.). ↑