Python – how to implement nested item in scrapy

jsonpythonscrapy

I am scraping some data with complex hierarchical info and need to export the result to json.

I defined the items as

class FamilyItem():
    name = Field()
    sons = Field()

class SonsItem():
    name = Field()
    grandsons = Field()

class GrandsonsItem():
    name = Field()
    age = Field()
    weight = Field()
    sex = Field()

and when the spider runs complete, I will get a printed item output like

{'name': 'Jenny',
   'sons': [
            {'name': u'S1',
             'grandsons': [
                   {'name': u'GS1',
                    'age': 18,
                    'weight': 50
                   },
                   {
                    'name':u'GS2',
                    'age': 19,
                    'weight':51}]
                   }]
}

but when I run scrapy crawl myscaper -o a.json, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts…

Best Answer

When saving the nested items, make sure to wrap them in a call to dict(), e.g.:

gs1 = GrandsonsItem()
gs1['name'] = 'GS1'
gs1['age'] = 18
gs1['weight'] = 50

gs2 = GrandsonsItem()
gs2['name'] = 'GS2'
gs2['age'] = 19
gs2['weight'] = 51

s1 = SonsItem()
s1['name'] = 'S1'
s1['grandsons'] = [dict(gs1), dict(gs2)]

jenny = FamilyItem()
jenny['name'] = 'Jenny'
jenny['sons'] = [dict(s1)]