It depends on how you want these web services to scale. For example, if your 4 GET calls are called 1 million more times per day than the 1 PUT request, then you'll want to split them out so you can load balance them more easily.
Your problem and technology stacks are very similar to a project that I am working on as an application architect at this time so I am going to give you my best advice on how to proceed given the information and constraints you have provided.
Your instincts are correct that the best choice for this project would be to use Spring Batch or something similar to it. What you are effectively doing is exactly what batch processing is and your attempts to introduce multi threading and working to avoid running out of memory in processing are easily handled in Spring Batch. It sounds like from my perspective that your client had a poorly designed application for the intended features, and you were asked to clean up the mess but not at the expense of a rewrite.
So I am not saying that you must use Spring Batch but I want to give you some context as to why Spring Batch is the best choice. This will help you design your approach properly.
Readers, Processors and Writers
The idea behind Readers is to read in a subset of the data to be processed. This can typically be done however you are reading the XML file now. Your reader keeps track of where it is on the file position. It is creating objects for the processor.
The processor will perform any business or integration logic that you might have.
The writer can use a tool like Hibernate to write out individual records to the relational database.
Chunking and Transactions
A chunk of data is just a subset of data objects that you read in, process and write out in a single contiguous transaction. If the transaction completes through all the way then it is clearly okay to commit to the database. In the event of an exception you will want to define exception behaviour to where you rollback the transaction at the database level and properly log which chunk of records failed to be completed successfully. Perhaps as part of this rollback behaviour you want to include some notification event behaviour to email a support group to look at the problem. Utilizing a transaction framework through Spring + JTA is the best approach.
Realistically though you can't have a discussion about what to do when there is an exception without looking at your business requirements (or as I suspect, perhaps the lack of business requirements from your client here). Defining what happens when some records don't process isn't something we can tell you, it is something that has to be addressed in your business requirements or else it is a gap.
Regardless how you approach what to do in your rollback behaviour, 1GB of data for a single file is too much for a single transaction and it would be wasteful to throw away all of the processing that went into that file because of what might amount to an unexpected character in some arbitrary record.
- You want to chunk your input data to a reasonable size such that there is adequate memory for all currently processing files at the same time.
- You want your chunk to be individually transacted such that once it completes you will not have to revisit these records again
- You want to process these files one chunk at a time at first, and only after you are not reaching desired performance metrics should you consider a multi threaded or distributed approach.
- You want to log what chunk you are currently processing in the database in some kind of meta data table, and if a chunk fails then in your exception rollback behaviour you want to update within the database which chunk failed on the job.
- If a chunk fails on a file, you should stop processing altogether until the problem is identified and fixed. This might be human involvement so you probably need to consider a support functionality to restart a failed job where it left off.
Performance and Scaling
This is hard for me to help with since I don't know where the file comes from, how the file processing job is invoked, and what non-functional requirements you have around performance. My advice here of course is that the safe bet is to process as individual transactions in a single threaded way to start off. Multi threading or even introducing parallel processing and distributed computing here could be potentially very complicated if you are trying to roll your own. Frameworks like Spring Batch help you manage this if you need it but there is a good chance that you won't if the client did not offer any strict performance requirements. Your concerns about deadlocking the database and staying within the memory constraints on your server are alleviated by handling this in a single threaded way.
Best Answer
The Maven file structure may help with this
In essence the Spring configuration files (that can have any name by the way, not just the generic
applicationContext.xml
) are treated as classpath resources and filed undersrc/main/resources
. During the build process, these are then copied into theWEB-INF/classes
directory which is the normal place for these files to end up.Variations include an additional
spring
directory (e.g.src/main/resources/spring
) to separate the Spring contexts from other resources dedicated to application frameworks. You may wish to split the application contexts into dedicated layers such as:and so on.
What about different environments like dev/test/production?
Typically, your Spring configuration should pick up the environment configuration from its, ahem, environment. Usually this means using JNDI, JDBC, environment variables or external properties files to provide the necessary configuration. I list those in order of preference since JNDI is generally easier to administer than external properties files in a controlled production cluster.
In the case of integration testing you may need to use a "test-only" Spring configuration file. This would contain special contexts that use test beans or configuration. These would be present under src/test/resources and may have a
test-
prefix to make sure that developers are aware of their purpose. A typical use would be to provide a non-JNDI DataSource perhaps targeting a HSQLDB database during the build automated tests and would be referenced within the test case.However, in general the majority of your Spring context files should not need specialised modification as they move between tiers. It should be the case that the same build artifact (e.g. WAR file) is used in dev/test/production just with different credentials.