Database – graph or relational database

databasedatabase-design

I'm starting to think that a lot of my tables could be replaced by only a graph db:

For example:
I have 4 tables: accounts, votes, posts, relationships

but I can represent all these in a graph table with different edges,

NODE1 -> type of relation -> NODE2
account -> vote_+1 -> post
account -> wrote -> post
account -> friend -> account2

is there a difference of performance or other between them?

Best Answer

you can always represent relational data in graph form

the key is how are you using the data - mostly transactional updates, mostly graph-traversal queries?

if you have the time, do the conversion and profile your most common operations on both DBs

Related Solutions

Database – Why Relational Databases Don’t Support Nested Format Queries

C. J. Date goes into detail about this in Chapter 7 and Appendix B of SQL and Relational Theory. You're right, there's nothing in relational theory that prohibits an attribute's data type from being a relation itself, as long as it's the same relation type on every row. Your example would qualify.

But Date says structures like this are "usually--but not invariably--contraindicated" (i.e. a Bad Idea) because hierarchies of relations are asymmetric. For example, a transformation from nested structure to a familiar "flat" structure cannot always be reversed to recreate the nesting.

Queries, constraints, and updates are more complex, harder to write, and harder for the RDBMS to support if you allow relation-valued attributes (RVA's).

It also muddies database design principles, because the best hierarchy of relations isn't so clear. Should we design a relation of Suppliers with a nested RVA for parts supplied by a given Supplier? Or a relation of Parts with a nested RVA for suppliers who supply a given Part? Or store both, to make it easy to run different types of queries?

This is the same dilemma that results from the hierarchical database and the document-oriented database models. Eventually, the complexity and cost of accessing nested data structures drives designers to store data redundantly for easier lookup by different queries. The relational model discourages redundancy, so RVA's can work against the goals of relational modeling.

From what I understand (I have not used them), Rel and Dataphor are RDBMS projects that support relation-valued attributes.

Re comment from @dportas:

Structured types are part of SQL-99, and Oracle supports these. But they don't store multiple tuples in the nested table per row of the base table. The common example is an "address" attribute which appears to be a single column of the base table, but has further sub-columns for street, city, postal code, etc.

Nested tables are also supported by Oracle, and these do allow multiple tuples per row of the base table. But I am not aware that this is part of standard SQL. And keep in mind the conclusion of one blog: "I'll never use a nested table in a CREATE TABLE statement. You spend all of your time UN-NESTING them to make them useful again!"

Database Design – Managing Objects with Multiple States

Sounds like you have 3 major categories of data you are trying to store:

General job data (job id, job requester id, job receiver id etc)
State transitions (job started, job finished)
State-specific job data
(optional) job-related events (price changed, job receiver user reassigned etc)

The key is to separate event-like data from everything else.

Schema design

Here are some details:

1. General jobs table

All of the information that's NOT state-specific goes into the (let's say) "jobs" table. Auto-generated primary key: job_id

2. State transitions table

All information about state transitions goes into "job_state_transitions" table, which might have the following columns:

job_transition_id
job_id
created_at
from_state
to_state

Ideally this table is append-only. Nothing is every updated or removed here.

Using such a table, you can find out the latest status of any particular job by selecting the latest row for a given job_id from the job_transitions table. You can further denormalize that and introduce a "job_state" column, the contents of which are updated every time a new row is inserted in job_transition table (stored procedures might help here if that's your thing).

You can also do all sorts of analysis on state transitions, because the timing data is preserved (created_at is a date/time field that can help with that)

3. State-specific job data

All of the state-specific data goes into "[state]-jobs" tables. Primary key: some sequence id. Main index: job_transition_id

4. Optional event data

You can also introduce an "audit trail" table that allows you to keep track of various changes that users might request for each job, like your "change agreed upon price". This is a generalization of the state transition table: one main table that contains events and a one supplemental table for each event type (for example, price_changes table with job_id, created_at, from_price and to_price columns).

Scaling to millions of users

If the main "jobs" table grows to be unwieldy, you can shard it by job_id or requesting_user_id or something like that.

Likewise, events table should be append-only and can be rotated or purged of events related to jobs that have been finished.