Python – Abort HTMLParser processing in Python

htmlparsingpython

When using the HTMLParser class in Python, is it possible to abort processing within a handle_* function? Early in the processing, I get all the data I need, so it seems like a waste to continue processing. There's an example below of extracting the meta description for a document.

from HTMLParser import HTMLParser

class MyParser(HTMLParser):

    def handle_start(self, tag, attrs):
        in_meta = False
        if tag == 'meta':
          for attr in attrs:
              if attr[0].lower() == 'name' and attr[1].lower() == 'description':
                  in_meta = True
              if attr[0].lower() == 'content':
                  print(attr[1])
                  # Would like to tell the parser to stop now,
                  # since I have all the data that I need

Best Answer

You can raise an exception and wrap your .feed() call in a try block.

You can also call self.reset() when you decide, that you are done (I have not actually tried it, but according to documentation "Reset the instance. Loses all unprocessed data.", - this is precisely what you need).