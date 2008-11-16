Nicholas Gruen received an email today from a distraught reader who couldn’t sign up because his name is Irish. Consequently his email contains an apostrophe.

Apostrophes are perfectly legal characters in email addresses. But WordPress, for reasons known only to the bozos who write it, doesn’t use any well-tested or well-known email address validator. They rolled their own, quite painfully incomplete, “validator”.

Tucked away in wp-includes/formatting.php is a function is_email(). It looks like this:



function is_email($user_email) { $chars = "/^([a-z0-9+_]|\\-|\\.)[email protected](([a-z0-9_]|\\-)+\\.)+[a-z]{2,6}\$/i"; if (strpos($user_email, '@') !== false && strpos($user_email, '.') !== false) { if (preg_match($chars, $user_email)) { return true; } else { return false; } } else { return false; } }

The important line is this:

$chars = "/^([a-z0-9+_]|\\-|\\.)[email protected](([a-z0-9_]|\\-)+\\.)+[a-z]{2,6}\$/i";

In it we see a regular expression encoding of what the WordPress team thinks is a legitimate email address. It’s laughably incomplete.

It turns out that the regular expression to fully check a legitimate email address looks like this:

(?:(?:\r

)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t]

)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:

\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(

?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[

\t]))*"(?:(?:\r

)?[ \t])*))*@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0

31]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\

](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+

(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:

(?:\r

)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z

|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)

?[ \t])*)*\<(?:(?:\r

)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\

r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[

\t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)

?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t]

)*))*(?:,@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[

\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*

)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t]

)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*)

*:(?:(?:\r

)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+

|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r



)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:

\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t

]))*"(?:(?:\r

)?[ \t])*))*@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031

]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](

?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?

:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?

:\r

)?[ \t])*))*\>(?:(?:\r

)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?

:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?

[ \t]))*"(?:(?:\r

)?[ \t])*)*:(?:(?:\r

)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]

\000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|

\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>

@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"

(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*))*@(?:(?:\r

)?[ \t]

)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\

".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?

:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[

\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-

\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(

?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)*\<(?:(?:\r

)?[ \t])*(?:@(?:[^()<>@,;

:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([

^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\"

.\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\

]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*(?:,@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\

[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\

r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\]

\000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]

|\\.)*\](?:(?:\r

)?[ \t])*))*)*:(?:(?:\r

)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0

00-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\

.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,

;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?

:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*))*@(?:(?:\r

)?[ \t])*

(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".

\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t])*(?:[

^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]

]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*\>(?:(?:\r

)?[ \t])*)(?:,\s*(

?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\

".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)(?:\.(?:(

?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[

\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t

])*))*@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t

])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?

:\.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|

\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*|(?:

[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\

]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)*\<(?:(?:\r

) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["

()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)

?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>

@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*(?:,@(?:(?:\r

)?[

\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,

;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\.(?:(?:\r

)?[ \t]

)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\

".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*))*)*:(?:(?:\r

)?[ \t])*)?

(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\["()<>@,;:\\".

\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])*)(?:\.(?:(?:

\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z|(?=[\[

"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r

)?[ \t]))*"(?:(?:\r

)?[ \t])

*))*@(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])

+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r

)?[ \t])*)(?:\

.(?:(?:\r

)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r

)?[ \t])+|\Z

|(?=[\["()<>@,;:\\".\[\]]



That’s from a Perl module called Mail::RFC822::Address, which fully validates to the standard (which is subtle and complex11. How complex?: Very complex, actually. A commenter at Reddit points out that this regular expression was generated programmatically and even it can’t account for cases where an email address has more than six levels of nested parentheses. But I’d back it against WordPress’s function any day. Another commenter at Reddit points at that email addresses are not technically parseable using regular expressions at all — another Perl module, RFC::RFC822::Address uses a recursive parser to check email addresses. [↩]), rather than just making something up.

And this is part of what separates PHP projects, like WordPress, from the world of sane programming. PHP as a language deters sensible composition and reuse because it lacks things like a sensible module system or namespaces; and so every project winds up reinventing the same damn wheels over and over again. WordPress is a particularly bad example as it seems to be highly allergic to sensible inclusions. The life cycle of a WordPress software nightmare goes like this:

Problem X emerges. It has already been solved by Y. WordPress implementors are told about Y but decide to write Z instead. It turns out that Z does not conform to the standard, or has bugs, or overlooks a lot of corner cases. More patches are released every time WordPress is updated, but soon the WordPress designers get bored and go back to rewriting the admin interface again. Z languishes. Either the standard gets updated or it is found to be riddled with exploitable bugs. Years after first being told to use the already mature, proved, tested, bugfixed and available-the-entire-time Y, WordPress developers integrate Y. It’s big news. Problem A emerges. It has already been solved by B…

Of course we are stuck with it. WordPress is the most recent example of the triumph of Worse-is-Better software and I for one am used to its warts. Like an abused spouse I too afraid to go elsewhere in case I have to go through another cycle of pain.

But seriously. This glitch has been on the bugtracker in two different entries for more than a year. Both are marked to be fixed in 2.9, which is more than 6 months away — when it would take about 30 seconds with google to do a better job. So if you have a celtic surname like O’Reilly, O’Malley or O’Hannesey, you might just be out O’luck. And if you use WordPress, you’re shit out of luck no matter what your name is.