Sql-server – SQL Server Database Design for Archiving Process

archiveschemasql serverstorage

I am currently working on a large project to move an MS Access application to SQL server. Our team has been having discussions on the best way to design a part of the new database so I figured I would reach out for feedback.

In the current MS Access design, we load data daily which is then processed in the UI. Since we are limited to the 2GB in size for Access, we do the following.

  1. Morning Import – data from previous day is deleted first, then new data is loaded
  2. During the day users process data
  3. End of day, we take database snapshot (basically direct copies of the Access DB which are stored for users to be able to go back in time for research)

In the new version, all data inserted into each table will have a Business Date assigned to it which is the date it was added to the table. Then on the front-end, the UI will have a search by date to get the records from that day.

Right now we are contemplating the following:

  1. Create 2 databases – one for the
    current day and then an Archive. The
    data that is normally deleted daily
    would be moved to the Archive DB
    each morning. For the front-end we
    would have to UNION the 2
    databases to get the final result
    set
  2. Create 2 schemas in the same database. One for the Current and
    then an Archive. Again we would
    have to union the data but it would
    be in the same database
  3. just leave the data in the current day tables, we wouldn't need
    a Union on the data for the UI
    searches.

Let me add that we are required to hold at least 6 years of data that is loaded to the database, so the table sizes will grow pretty large.

I am just trying to figure out the best approach to this sort of problem. I am sure we are not the only ones in this type of situation. I am open to any suggestions, thoughts on how we should proceed.

Best Answer

Each has possible pros and cons, and there may be an option you haven't considered. Here's some general input, hope it helps.

  1. Create 2 databases
    • Maintenance becomes slightly more complex, with 2 databases to keep in sync, 2 databases to manage backups for, check performance on, etc.
    • Front-end access becomes slightly more complex, where you need to UNION the data. I would recommend setting up a View to handle this, rather than coding the logic into your front-end.
    • Maintenance operations are more flexible: backups, reindexing, table locks, etc can all occur independently in the two databases, and you can potentially separate them onto different storage to gain performance benefits.
  2. Create 2 schemas
    • I don't see any pros to this approach that don't exist in the other two options
    • A con to this approach is that you make the access more complex, but you don't get any(?) additional maintenance or performance flexibility in return.
  3. Keep single DB, simple schema:
    • This has the advantage of simplicity. It's probably the best way to go if you don't know what you need in the long term, and if you don't have much experience in terms of SQL Server maintenance, design, etc
    • The main risk is that you end up with too much data in a single place, but this can probably be offset with proper indexing - for a database to get truly unmanageable you probably need to get into the hundreds of gigs
    • If you find you need to split things up for performance reasons at a later time, you can always implement option 1, or option 4, (or even 2 I guess) at a later time - with a view you can "hide" the implementation details from your front-end, and potentially even from your ETL processes (although you probably wouldn't want that)
  4. extra option Partitioning with multiple filegroups
    • You get to keep your data in one database, but have the benefits of keeping the data in different files, potentially in different places.
    • This is probably the best way to handle "swapping in" and "swapping out" with very large sets of data (something for the future, maybe)
    • You still need to create a view to consolidate the underlying data
    • This is still pretty complex, and not something you want to do lightly - probably something you'd look into if you found that you started handling unreasonably large amounts of data on a daily basis.

Random ref on partitioning that looks pretty good (sorry, haven't done a thorough search for reference materials, and don't have any handy): http://www.sqlservercentral.com/articles/partition/64740/

Related Topic