Java – Copy table having millions of records from one database to another – Spring Boot + Spring JDBC

javaspringspring-bootspring-jdbc

Working on a small example where have to copy millions of records from
teradata database to Oracle DB.

Environment: Spring Boot + Spring JDBC (jdbcTemplate) + Spring REST + Spring Scheduler + Maven + Oracle + Teradata

Using Spring JDBC's batchUpdate to insert data into target Database Oracle.

Using teradata's 'top 1000' in SQL query from source database.

fecthDataResults = repository.queryForList(
                "select top 1000 A, B, C, D, E from " + schemaName + ".V_X");

Querying from a View "V_X".

This View has 40 million records and spring boot application will choke if it runs.

Also inserting into 2 tables (Primary and Backup) in target Oracle DB.

Whats the best way to fetch and load/copy 40 million records making sure that copying was done successfully into 2 tables.

Spring Scheduler to schedule the batch copy at specified time/interval.
Spring Rest to invoke copying manually – both of which is achieved.

Any suggestions would be appreciated.

Thanks.

Best Answer

There are different ways you can solution this:

  1. Logstash Approach - Specify your source and destination data and load the data to both the destination DBs. It has cron support and the logstash can run based on the schedule. It is quite faster. You can specify how many rows you wanna fetch every time.

  2. Use an ETL tool. You can go with any of the open source versions if you do have the ecosystem in place. Talend is a good candidate where you can design your job and export as runnable Jar. You can schedule this by using any component of your choice.

  3. Spring Batch. Please refer this question. Spring RESTful web services - High volume data processing

  4. Spring Cloud Data Flow or Spring boot with a MQ as an intermediate store between your datasources. You may have to introduce message queues to handle failover, fallback mechanisms. Highly reliable and can implemented in a async manner.

My personal opinion is to go with Logstash. If you feel any of the above solutions make sense. I can elaborate them if you want.

Related Topic