Database Design – Designing a System to Archive Databases with Referential Integrity

databasedatabase-designdesignjavaoracle

I have a very large Oracle database, with many many tables and millions of rows. I need to delete some of the rows end-to-end. By end-to-end, I mean, if I want to delete a row from a table then I must delete all the corresponding records in child tables first and ensure that deletion happens without any constraint violation errors while deleting the parent table entry.

As an example:

Table A defines dependency over Table B and Table C (means Table A has foreign key constraint on Table B and Table C).
Table B defines dependency over Table C and Table D.

I want do delete rows in this specific order something like this:

Table A<=Table B<=Table C<=Table D

I have come up with two approaches:

  1. Using a service API which takes parent table rowID and archives whole data in JSON format and keeps on deleting the entry using cascade option.

  2. Other option is to use delete-with-cascade along with triggers to save the deleted values.

Once data is fetched, I want to better put it in similar archived tables Table_A_archived, Table_B_archived etc or If it could be fetched in file, that is fine too.

Any thoughts on initial approaches will be really valuable. If anyone has done this sort of thing before, it will be great to learn from their experience.

Just FYI, this is a live database and I can't do absurd things (I can obviously test it before pushing it to prod :)). Also, deletion/archival is not just on the basis of time.

Thanks in advance!

Best Answer

Moving the data to archive tables will reduce the amount of data in the "live tables", but it won't reduce your overall database size. So if that is your goal, use files, which can be compressed easily if necessary.

However, if you want to be able to query your archive data later on, and database size is not really your problem, the archive tables may be the better choice.

The usage of triggers makes IMHO only sense if your archiving process has to happen synchronously during your production process (for example, whenever a DELETE command is issued to your data in the normal production, you want immediately the related records to be archived). On the other hand, if your archiving process is designed to be an asynchronous process, maybe executed at after-work hours, with a time-stamp filter for identifying the data to be archived, there is no need to use triggers. Instead, you can design your archive process using some bulk deletes, which can be significantly faster than deleting one record after another.

Related Topic