blogwaffe

Dear web developers,

Please understand the difference between the encoding called for in application/x-www-form-urlencoded encoded data [1] and the encoding called for in URIs (and how that encoding applies to URIs in the http/https scheme).

In particular, note the difference in how spaces are treated. The former encodes them as +, the later as %20.

Understand also how your web server processes different pieces of URLs. Pay particular attention to your web server’s treatment of plus signs. PHP, for example, interprets “$_GET” data (the data drawn from the query component of the URI) as though it were application/x-www-form-urlencoded and converts those plus signs to spaces [2].

Now that you know that the different encoding methods encode spaces differently and that the different decoding methods decode pluses differently, make sure you’re being consistent.

Because when I type into your website my email address with its plus sign in it, I expect things to work.

Love,
mdawaffe

[1] More details in HTML5‘s definition of the application/x-www-form-urlencoded encoding algorithm. ↑

[2] See the difference between urlencode()/urldecode() and rawurlencode()/rawurldecode(). parse_str() uses urldecode().

Decoding the query component of the URI according to application/x-www-form-urlencoded is the logical thing to do, since the web server can’t tell the difference between a GET request generated by a form and one that isn’t (since user agents are not supposed to set a “Content-Type: application/x-www-form-urlencoded” header when submitting a form over http/https via the GET method), but I can’t tell if converting pluses in the query part of the URI into spaces is required; I can’t find anything in the definition of the http URI scheme that says that plus signs are “delimiters” in the query component of the URI. So I believe it to be an implementation-specific choice and therefore up for grabs depending on how you power your web server (PHP, ruby, .NET, etc.). ↑

02.07.2010

An Open Letter to Web Developers on Percent-Encoding

internal

external

Recent Discussion

Search

interweb