Database Design for Web based RSS Feed Agreegator

database-design

I am working on an open source application which can allow users to add RSS feeds. All users of the site can read the content of those RSS feeds. It's not just for user's own feeds.

Using PHP and SimplePie library I have created a simple app. I am trying to implement category based listing.

I have a list of RSS/Atom feeds in a database. Each post in the feed can have a category/label associated to it and a single feed can have multiple posts falling under multiple categories.

Question:
I want to get only the posts of category "Science".

In worst case, if I have 1000 RSS feeds, should I read posts from each feed and check the category of each post?

Storing all posts of each feeds and its categories is not a good solution.

How best can we have a database design for this?

Database schema is provided below. Nothing is fixed and I am free to change any schema or design.

http://sqlfiddle.com/#!2/2b519/1

NOTE: The fiddle is just for reference and does not need to worry. Its optional as I thought it might help to make others understand.

Best Answer

Looking at your sample DDL in sqlfiddle, it seems like you are trying to store categories about posts, but reckon its a bad idea to store the posts themselves.

Given what it seems you are seeking, I think you will need a posts table with attributes likes (URL, category(ies), title, teaser/details) .. with the feed table's feed-id or feed-url as a foreign key. This posts table can be indexed on a post-id which could also serve as its primary key

If you allow a given post to belong to more than one category, then having a separate categories table makes sense and categories IDs can then be foreign keys in the categories columns of posts.

Why do you reckon its a bad idea to store all posts? I do think that if you are unwilling to store individual posts, then your options are limited into tagging entire feeds with categories, and that does not sound like what you are seeking to do.

Again, if the volume will grow rapidly and you are worrying about indexing and performance, perhaps you should look into graph DBs or some other NOSQL DBs as faster alternatives ..

Related Topic