Regular expression to categorize the parts of a service address

parsingregexvb.net

The app I am writing deals with utility service addresses, and right now I am forcing the user to know enough to separate the parts of the address and put them in the appropriate fields before adding to the database. It has to be done this way for sorting purposes because a straight alphabetical sort isn't always right when there is a pre-direction in the address. For example, right now if the user wanted to put in the service address 123 N Main St, they would enter it as:

  • Street Number = 123
  • Pre-direction = N
  • Street Name = Main
  • Street Type = St

I've tried to separate this address into its parts by using the Split function and iterating through each part. What I have so far is below:

Public Shared Function ParseServiceAddress(ByVal Address As String) As String()
        'this assumes a valid address - 101 N Main St South
        Dim strResult(5) As String  '0=st_num, 1=predir, 2=st_name, 3=st_type, 4=postdir
        Dim strParts() As String
        Dim strSep() As Char = {Char.Parse(" ")}
        Dim i As Integer
        Dim j As Integer = 0
        Address = Address.Trim()
        strParts = Address.Split(strSep)  'split using spaces
        For i = 0 To strParts.GetUpperBound(0)
            If Integer.TryParse(strParts(i), j) Then
                'this is a number, is it the house number?
                If i = 0 Then
                    'we know this is the house number
                    strResult(0) = strParts(i)
                Else
                    'part of the street name
                    strResult(2) = strResult(2) & " " & strParts(i)
                End If
            Else
                Select Case strParts(i).ToUpper()
                    Case "TH", "ND"
                        'know this is part of the street name
                        strResult(2) = strResult(2) & strParts(i)
                    Case "NORTH", "SOUTH", "EAST", "WEST", "N", "S", "E", "W"
                        'is this a predirection?
                        If i = 1 Then
                            strResult(1) = strParts(i)
                        ElseIf i = strParts.GetUpperBound(0) Then
                            'this is the post direction
                            strResult(4) = strParts(i)
                        Else
                            'part of the name
                            strResult(2) = strResult(2) & strParts(i)
                        End If
                    Case Else
                        If i = strParts.GetUpperBound(0) Then
                            'street type
                            strResult(3) = strParts(i)
                        Else
                            'part of the street name
                            strResult(2) = strResult(2) & " " & strResult(i)
                        End If
                End Select
            End If
        Next i
        Return strResult
    End Function

I've found this method to be cumbersome, slow, and even totally wrong when given a wonky address. I'm wondering if what I'm trying to do here would be a good application for a regular expression? Admittedly I've never used regex in anything before and am a total newbie in that regard.

Thank you in advance for any help. 🙂

Edit – Seems more and more like I'm going to need a parser and not just regex. Does anyone know of any good address parser libraries in .NET? Writing our own is just not in the cards right now, and would be sent to the back burner if it came to that.

Best Answer

I don't have a set of addresses to (easily) test against, but here is something to try at least. It may be too permissive in places or too restrictive in others, but you should be able to tweak it. You'll definitely need to tweak the list of predirections, but you will have to specify those explicitly. Also, be sure to set your regex options to be case-insensitive.

^(?<StreetNumber>[0-9]+)\s*(?<Predirection>(n)|(s)|(e)|(w)|(north)|(south)|(east)|(west))?\s+(?<StreetName>[a-z0-9 -'.]+)\s+(?<StreetType>[a-z.]+)$

In reality though, it would probably be better to delegate this to an address parser if possible, like the one NoahD suggested. You'll have to do some digging to find something for .NET probably, but if you can't find anything, then I would go with a regular expression for sure.

edit: do'h, \s, not /s

edit: changed regex for more semantic grouping. You can access the group values like so:

string address = "123 n main st";
Regex regex = new Regex("insert the regex above here", RegexOptions.IgnoreCase); 
MatchCollection matches = regex.Matches(address);

foreach(Match match in matches)
{
    string streetAddress = matches.Groups["StreetAddress"];
    string predirection = matches.Groups["Predirection"];
    string streetName = matches.Groups["StreetName"];
    string streetType = matches.Groups["StreetType"];
}