Better option... just strip all non-digit characters on input (except 'x' and leading '+' signs), taking care because of the British tendency to write numbers in the non-standard form +44 (0) ...
when asked to use the international prefix (in that specific case, you should discard the (0)
entirely).
Then, you end up with values like:
12345678901
12345678901x1234
345678901x1234
12344678901
12345678901
12345678901
12345678901
+4112345678
+441234567890
Then when you display, reformat to your hearts content. e.g.
1 (234) 567-8901
1 (234) 567-8901 x1234
re.match
is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^
in the pattern.
As the re.match documentation says:
If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding MatchObject
instance.
Return None
if the string does not
match the pattern; note that this is
different from a zero-length match.
Note: If you want to locate a match
anywhere in string, use search()
instead.
re.search
searches the entire string, as the documentation says:
Scan through string looking for a
location where the regular expression
pattern produces a match, and return a
corresponding MatchObject
instance.
Return None
if no position in the
string matches the pattern; note that
this is different from finding a
zero-length match at some point in the
string.
So if you need to match at the beginning of the string, or to match the entire string use match
. It is faster. Otherwise use search
.
The documentation has a specific section for match
vs. search
that also covers multiline strings:
Python offers two different primitive
operations based on regular
expressions: match
checks for a match
only at the beginning of the string,
while search
checks for a match
anywhere in the string (this is what
Perl does by default).
Note that match
may differ from search
even when using a regular expression
beginning with '^'
: '^'
matches only
at the start of the string, or in
MULTILINE
mode also immediately
following a newline. The “match
”
operation succeeds only if the pattern
matches at the start of the string
regardless of mode, or at the starting
position given by the optional pos
argument regardless of whether a
newline precedes it.
Now, enough talk. Time to see some example code:
# example code:
string_with_newlines = """something
someotherthing"""
import re
print re.match('some', string_with_newlines) # matches
print re.match('someother',
string_with_newlines) # won't match
print re.match('^someother', string_with_newlines,
re.MULTILINE) # also won't match
print re.search('someother',
string_with_newlines) # finds something
print re.search('^someother', string_with_newlines,
re.MULTILINE) # also finds something
m = re.compile('thing$', re.MULTILINE)
print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines,
re.MULTILINE) # also matches
Best Answer
The unicode CLDR contains the postal code regex for each country. (158 regex's in total!)
core.zip
from http://unicode.org/Public/cldr/26.0.1/common/supplemental/postalCodeData.xml
from the unzipped content (direct content: common/supplemental/postalCodeData.xml)Google also has a web service with per-country address formatting information, including postal codes, here - http://i18napis.appspot.com/address (I found that link via http://unicode.org/review/pri180/ )
Edit
Here a copy of postalCodeData.xml regex :