Javascript – What’s a good tool to screen-scrape with Javascript support?

javascriptscreen-scraping

Is there a good test suite or tool set that can automate website navigation — with Javascript support — and collect the HTML from the pages?

Of course I can scrape straight HTML with BeautifulSoup. But this does me no good for sites that require Javascript. 🙂

Best Answer

You could use Selenium or Watir to drive a real browser.

Ther are also some JavaScript-based headless browsers:

  • PhantomJS is a headless Webkit browser.
    • pjscrape is a scraping framework based on PhantomJS and jQuery.
    • CasperJS is a navigation scripting & testing utility bsaed on PhantomJS, if you need to do a little more than point at URLs to be scraped.
  • Zombie for Node.js

Personally, I'm most familiar with Selenium, which has support for writing automation scripts in a good number of languagues and has more mature tooling, such as the excellent Selenium IDE extension for Firefox, which can be used to write and run testcases, and can export test scripts to many languages.