Java Design – Creating Better OO Code in Poorly Designed Relational Databases

databasedesignjavaobject-oriented-designweb-development

I am writing a Java web application that consists mainly of a bunch of similar pages in which every page has several tables and a filter that applies to those tables. The data on these tables comes from an SQL database.

I am using myBatis as ORM, which may not be the best choice in my case, since the database is poorly designed and mybatis is a more database oriented tool.

I am finding that I am writing a lot of duplicate code because, due to the database's poor design, I have to write different queries for similar things as those queries can be very different. That is, I cannot easily parametricise the queries. This propagates into my code and instead of populating rows on columns in my table with a simple loop I have code like:

get A Data (p1, …, pi);

get B Data (p1, …, pi);

get C Data (p1, …, pi);

get D Data (p1, …, pi);
…

And this soon explodes when we have different tables with different columns.

It also adds to the complexity the fact that I'm using "wicket", which is, in effect a mapping of objects to html elements in the page. So my Java code becomes an adapter between the database and the front end, which has me creating a lot of wiring, boilerplate code with some logic intermingled in it.

Would the correct solution be wrapping the ORM mappers with an extralayer that presents a more homogeneous interface to the db or is there a better way to deal with this spaghetti code I'm writing?

EDIT: More info about the database

The database holds mainly phone calls information. The poor design consists of:

Tables with an artificial ID as primary key that has nothing to do with the domain knowledge.

No unique, triggers, checks or foreign keys whatsoever.

Fields with a generic name that match different concepts for different records.

Records that can be categorised only by crossing with other tables with different conditions.

Columns which should be numbers or dates stored as strings.

To sum it up, a messy / lazy design all around.

Best Answer

Object orientation is valuable specifically because these types of scenarios arise, and it gives you tools to reasonably design abstractions that allow you to encapsulate complexity.

The real question here is, where do you encapsulate that complexity?

So let me step back a moment and speak to what 'complexity' I'm referring to here. Your problem (as I understand it; correct me if I'm wrong) is a persistence model which is not an effectively usable model for the tasks you need to complete with the data. It may be effective and usable for other tasks, but not for your tasks.

So what do we do when we have data that does not present a good model for our means?

Translate. It's the only thing you can do. That translation is the 'complexity' I refer to above. So now that we accept we're going to translate the model, we need to decide on a couple of factors.

Do we need to translate both directions? Are both directions going to be translated the same, as in:

(Tbl A, Tbl B) -> Obj X (read)

Obj X -> (Tbl A, Tbl B) (write)

or do insertion/update/delete activities represent a different type of object such that you read data as Obj X, but data is inserted/updated from Obj Y? Which of these two ways you wish to go, or if no update/insert/delete is possible are important factors in where you want to put the translation.

Where do you translate?

Back to the first statement I made in this answer; OO allows you to encapsulate complexity and what I refer to here is the fact that not only should you, but you must encapsulate that complexity if you wish to ensure it doesn't leak out and seep into all of your code. At the same time, it's important to recognize you can't have a perfect abstraction so worry less about that than about having a very effective and usable one.

Again now; your problem is: Where do you put this complexity? Well you have choices.

You can do it in the database using stored procedures. This has the drawback of often not playing very well with ORMs but that's not always true. Stored procedures afford some benefits, including performance often. Stored procedures however can require a lot of maintenance, but it's up to you to analyze your particular scenario and say if the maintenance will be more or less than other choices. I personally am very skilled with stored procedures, and as such this fact of available talent reduces overhead; never underestimate the value of making decisions based on what you do know. Sometimes the suboptimal solution can be more optimal than the correct solution because you or your team know how to create and maintain it better than the optimal solution.

Another in-database option is views. Depending on your database server these may be highly optimal or sub-optimal or not even effective at all, one of the drawbacks can be query times depending on what indexing options are available in your database. Views become an even better choice if you never need to make any data modification (insert/update/delete).

Stepping past the database you have the old standby of using the repository pattern. This is a time-tested approach which can be very effective. Drawbacks tend to include boiler plate but well-factored repositories can avoid some amount of this, and even when these do result in unfortunate amounts of boiler plate, repository's tend to be simple code that's easy to understand and maintain as well as presenting a good API/abstraction. Also repositories can be good for their unit-testability which you lose with in-database options.

There are tools like auto-mapper out there that may make using an ORM plausible where they can do the translation between database-model from orm to usable models, but some of these tools can be tricky to maintain/understand behaving more like magic; though they create a minimum of overhead code resulting in less maintenance overhead when well understood.

Next you're stepping further and further from the database, which means there's going to be greater amounts of code that are going to deal with the un-translated persistence model, which is going to be genuinely unpleasant. In these scenarios you talk about putting the translation layer in your UI which it sounds like you may be doing now. This is generally a very bad idea, and decays terribly over time.

Now let's start talking crazy.

The Object is not the only end-all be-all abstraction that exists. There have been a profundity of abstractions developed over the many years that computer science has been studied and even before then from the study of math. If we're going to start getting creative, let's start talking about known abstractions available that have been studied.

There's the actor model. This is an interesting approach because it says all you do is send messages to other code which effectively delegates all the work to that other code, which is very effective in encapsulating the complexity away from all your code. This could work in so far as you send a message to an actor saying "I need Obj X sent to Y" and you have a receptacle waiting for a response at location Y which then processes Obj X. You could even send a message that instructs "I need Obj X and computation Y, Z done to it" and then you don't even need to wait; the translation occurs on the other side of that message pass and you can just move on if you don't need a read of it's result. This can be slight abuse of the actor model for your purposes, but it all depends; the main goal of Actor Model is asynchrony and concurrency to be handled easily and well, both of those things are however just abstractions which can act as boundaries to encapsulate the complexity we refer to here (or any form of complexity for that matter).

Another encapsulation boundary is process boundaries. These can be used for segregating complexity very effectively. You could create the translation code as a web service where the communication is simple HTTP, using SOAP, REST, or if you really want your own protocol (not suggested). STOMP isn't altogether a bad newer protocol. Or use a normal daemon service with a system-local publicized memory pipe for communicating very quickly again using whichever protocol you choose. This actually has some pretty good benefits:

You can have multiple processes running that do translation for older and newer version support at the same time allowing you to update the translation service to publicize an object model V2, and then separately at a later point update the consuming code to work with the new object model.
You can do interesting things like pinning the process to a core for performance, you also get an amount of security safety in this approach by making that the only process running with the security privileges to touch that data.
You will get a very strong boundary when you're talking about process boundaries that will stay fixed ensuring minimum leakage of your abstraction for a long time because writing code in the translation space will not be able to be called outside of the translation space since they won't share process scope, ensuring a fixed set of usage scenarios by contract.
Ability for asynchronous/non-blocking updates being simpler.

Drawbacks are obviously more maintenance than is commonly necessary, communication overhead affecting performance and the maintenance.

There is a great variety of ways to encapsulate complexity that may allow that complexity to be placed in ever more strange and curious places in your system. Using forms of higher order functions (often times faked using strategy pattern or various other odd forms of object patterns), you can do some very interesting things.

That's right, let's start talking about a monad. You could create this translation layer in a very independent fashion of small specific functions that do the independent translations necessary, but hide all of those translation functions away not visible so they are hardly accessible to outside code. This has the benefit of reducing reliance on them allowing them to change easily without effecting much external code. You then create a class that accepts higher order functions (anonymous functions, lambda functions, strategy objects, however you need to structure them) which work on any of the nice OO model type objects. You then let the underlying code which accepts those functions do the literal execution using the appropriate translation methods.

This creates a boundary where all the translation not only exists on the other side of the boundary away from all your code; it is only used on that side allowing the rest of your code to not even know anything about it other than where the entry point for that boundary is.

Ok, yeah that really is talking crazy, but who knows; you might just be that crazy (seriously, do not undertake monads with a craziness rating below 88%, there is real risk of bodily injury).

Best Answer

So what do we do when we have data that does not present a good model for our means?

Where do you translate?

Now let's start talking crazy.

Related Solutions

Database Design – Evaluating the Use of Views for Each Table

Database – One wide table or multiple themed tables

Related Topic