Java Database – How to Store Collections of Linked Text Objects

databasejava

I'm writing a Java program that works as a presentation of a written document, which needs to be stored in a database (I'm using MySQL, but am open to suggestions of other DB types).

The Essay class is a collection of objects of the Paragraph class (more specifically Essay extends LinkedList<Paragraph>, which consists primarily of an int ID and a String content. The Essay object is stored as its own table in the DB, which is fairly straightforward (with columns int paragraphId and Blob text), and I do a simple query that creates a new Paragraph object for each row in the table when the program gets initialized.

Within each paragraph, certain hyperlinked words link to other Essay objects, which also need to be stored in the database. My question is how best to save them – the options I'm considering are:

  • Creating a single additional DB table where each essay is stored on a row, with paragraphs being stored in a single text object separated by a delimiter, and the program separates them into separate Paragraph object as it pulls them from the DB. (this is fairly manageable because these linked essays are much shorter than the main essay).
  • Create a single additional table, with a large number of content columns, storing each paragraph in a separate column. Then, when the data is pulled from the database, it creates a new essay for each row and a new paragraph for each column in that row, until it comes upon one that is null.
  • Dynamically create a separate DB table for each linked essay, setting it up identically to the primary one. This doesn't seem ideal because it would end up being a very large number of tables, and many of them would only have two or three paragraphs, making a new table seem wasteful.
  • Do a different arrangement altogether: Have one Paragraphs table, where all paragraphs from all essays are stored, and another table called Essays, which uses a SET datatype to store the IDs (foreign keys) of the paragraphs included in each essay.

I've modified my actual use case here to make it more simple to explain. I'm new to database programming, and am doing this largely as an exercise, so I'm looking for general tips about the pros/cons of these approaches, as well as any additional suggestions.

Best Answer

What you describe doesn't sound like a good fit for a SQL database.

Basically, what you have is a graph of documents, I would store those in either a document database (e.g. Cassandra, CouchDB, MongoDB, Redis) or a graph database (e.g. Neo4J), depending on how you want to traverse / query the data. (Or maybe even both: store the documents in a document database and the graph structure in a graph database, that way you can traverse the graph in the graph database, which is what they are good at, and then retrieve the document from the document database, which is what those are good at.)

There are even databases such as ArangoDB which combine a graph-oriented, document-oriented and key-value-based storage model in a single database.

This idea of choosing a database model that fits the shape of your data instead of trying to awkwardly fit the shape of your data to match the (SQL) database model has been gathering a lot of attention over the past 10 years or so, and is known as NoSQL (Not Only SQL). Note that NoSQL doesn't mean you shouldn't use SQL. It means that you should use SQL when it is the right tool for the job, i.e. when your data is actually table-shaped and relational.

Related Topic