What you're referring to is called internationalization (often abbreviated as i18n), which in desktop Java is accomplished by creating properties files for each locale, then using resource bundles to get strings from those files as needed (see this tutorial). It's not universally used, because there is some setup involved, and it's shorter to type "greetings"
compared to messages.getString("greetings")
. Usually you only see it when people actually need to have their program translated into different languages.
You can hard code strings in Android too, if you want, but they emphasized using the xml from the start in all their documentation and tutorials, and set up the tools to make it easier.
I cannot be sure that each CSV file has only been processed once...
You might want to attempt to solve your question by handling this first. If I am getting this right, the crux of your problem doesn't appear to be individual duplicate transactions (since you mentioned "I know for sure that there are no duplicate records in each CSV file"), but to prevent duplicate processing per file.
Hence, you can consider adding some kind of state logic in your Java application that knows whether a file has been processed by computing and storing its checksum, e.g. its MD5 hash. Once you have a matching checksum, you know that there's a good chance the file has been processed before. You can perform further verification such as by inspecting the number of lines, or other certain unique identifiers of each file.
Further extending this idea, if there are possibilities of the same transaction appear across different CSV files, then your only other option, besides updating the database schema to handle duplicate records properly, is to store all the processed transactions locally within your Java application. If there can be multiple instances of your application (either on the same computer, or across a network), then you'll either need yet another centralized database to handle this, or some distributed data grid... by then, the better option is still back to the drawing board to improve on your existing database schema.
edit
To flip things around, the other considerations to look into , if changing the database schema to handle duplicates nicely is entirely don't-even-think-about-it nigh impossible, is to evaluate how much data your Java application will need to process at any given time, and how fast the connection is between the database and your application.
On the lower end, say your application is processing only 10 records per file, averaging one file an hour. The network connection is very good, say almost as good as accessing a locally-hosted database. In this case, I don't think there's much of a performance impact from having to query all the records.
On the extreme end, your application is expected to read thousand-line-long transaction files every 10 seconds, and the network connection is extremely bad, say taking a minute to query all the records. In this case, you have more concern about processing the files in a speedy manner, and this is how you can probably suggest modifying the database schema. :)
So, assuming all is fine in the lower-end case, what would be an efficient way of comparing a relatively large data set with a smaller input set for duplicates? I'll suggest marshalling the XML payload you get into a HashSet
. Also, I hope you have a Transaction
domain class that has properly-implemented hashCode()
and equals()
methods. A Java 8 potential solution would then be:
// assuming your database records are marshalled into currentSet
inputSet.stream().filter(v -> !currentSet.contains(v))
.forEach( /* these are the new records to send to the database */);
Also, the elephant in the room: concurrent insertions. Will there be any? If so, how do you intend to handle it then?
Best Answer
If you use a JAX-RS implementation like Jersey, you'll get much improved parameter handling and could map it directly to a List.
If you don't want to go that far, you could also pull tricks like using Guava
Collections2.transform
and aFunction<String, Long>
on anArrays.asList
view of the array. Not the most compact way, but each piece is reusable so it wouldn't be so bad.