Python – Newbie Q about Scrapy pipeline.py

pythonscrapyweb-crawler

I am studying the Scrapy tutorial. To test the process I created a new project with these files:

See my post in Scrapy group for links to scripts, I cannot post more than 1 link here.

The spider runs well and scrapes the text between title tags and puts it in FirmItem

[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner ']) 

But I am stuck in the pipeline process. I want to add this FirmItem into a csv file so that I can add it to the database.

I am new to python and I am learning as I go along. I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv.

Thank you.

Best Answer

I think they address your specific question in the Scrapy Tutorial.

It suggest, as others have here using the CSV module. Place the following in your pipelines.py file.

import csv

class CsvWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'wb'))

    def process_item(self, domain, item):
        self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
        return item

Don’t forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this:

ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']

Adjust to suit the specifics of your project.

Related Topic