Post by Ihe Onwuka Post by David Carlisle
No just that if you are writing vocabulary specific regex you need
to use vocabulary specific regex terms. If I'm looking for words
in English I tend to use [a-z] even if some people try to sneak
accents into cafe or naive :-)
Well mine is not a regional vocabulary scenario. The backtick
appears in a title which is used to create a url which (I believe)
will not tolerate such characters.
well then grave accent is the least of your concerns with \w
URI letters are defined as ALPHA (%41-%5A and %61-%7A) ie [a-zA-Z] so
doesn't allow accented letters, or Greek or Cyrillic or 10s of thousands
of other characters included in \w
Of course most user-facing systems such as html or XML allow a much
wider set of characters in href attributes and SYSTEM identifiers and
leave it to the system to %-encode according to the somewhat arcane URI
rules, cf IRI or LEIRI syntax.