What’s the correct approach to DAO layer in presence of ORM framework

daohibernatejpaormrepository

I'm using JPA/Hibernate but probably it doesn't limit question.

Currently I'm writing my data access code in my web controller classes. Thanks to JPA, in most cases this code is very simple, basically simple query with trivial setup.

I tried to move those methods into separate classes to improve code reuse. But I've encountered an important problem. Although there are similar methods, they are not the same. This especially applies to fetching joined classes (tables). In some controllers I don't need fetching those classes. But in some controllers I do need that. So I have the following variants:

  1. Move all methods, even slightly different, as a separate methods into DAO class. This makes DAO complex, their method names will be very long, it would be hard to find correct one.

  2. Forget about fetch joins. Setup application to perform loading when needed (lazy loading). This will reduce number of methods, but database access will be much less effective and transaction must span across entire request, not just controller code.

  3. Put complex logic into method when needed. E.g. supply boolean parameters fetchSomething. And method will tweak query according to those parameters. Method implementation will be very complex as I would have to either build query string from pieces or use very cumbersome Criteria API. Method will be hard to read and understand.

  4. Leave as is. I generally like this approach. It leads to some code duplication, but not very much and queries are declarative so I don't see much problems with that. The main problem is lack of testability. To test controller I would need to prepare a database, because there's no (or I didn't found) complex JPA mock implementation with supplied in-memory data.

I wonder what approach used in complex projects? I saw 4-th approach and I saw once very complex in-house ORM based on Hibernate.

I'm talking now about so-called "repository pattern". I don't really understand what people mean by "service layer" and never saw any need with that, so currently the question is about data repository classes.

Best Answer

As you see, the solution might not be very satisfying. The problem of data access at all is, that we have to consider many different aspects which make code not 100% reusable. It's about transactions and locking, and the way e.g. ORM maps a relational database to objects and vice versa.

Mostly if you try to write architecture components, trying to solve the problem you'll soon notice, that you just move the problem to another place. The reason is simply that a bottom-top approach as your question is, is a bad idea.

What we want is an agile top-down strategy. You will see, that most of the accesses are read-only, which (nearly) means no locking and no transactions. Some writes are field updates which must not be synchronized and a few inserts into different tables might need locking and should be in a transaction.

Now to your variants:

Move all methods, even slightly different, as a separate methods into DAO class. This makes DAO complex, their method names will be very long, it would be hard to find correct one.

violates OCP. there should be one entity/table-definition and querybuilding components around it but only depending on the entity itself. That means no large accessor classes.

Forget about fetch joins. Setup application to perform loading when needed (lazy loading). This will reduce number of methods, but database access will be much less effective and transaction must span across entire request, not just controller code.

lazy loading is cool for code which should be more readable focused than performant, e.g. backends with small lists and a few administrators. Not a good idea for uncached permanently accessed frontend pages.

Put complex logic into method when needed. E.g. supply boolean parameters fetchSomething. And method will tweak query according to those parameters. Method implementation will be very complex as I would have to either build query string from pieces or use very cumbersome Criteria API. Method will be hard to read and understand.

parameters like offset and limit are okay, but boolean parameters whether to load sparse or full is too much overhead in code, which might leads to magic behavior.

Leave as is. I generally like this approach. It leads to some code duplication, but not very much and queries are declarative so I don't see much problems with that. The main problem is lack of testability. To test controller I would need to prepare a database, because there's no (or I didn't found) complex JPA mock implementation with supplied in-memory data.

For integration tests yes, you need to save some data to your database (you can use e.g. sqlite). For unit tests no, you mock the service layer according to your controller input and test how controllers behave.

I wonder what approach used in complex projects? I saw 4-th approach and I saw once very complex in-house ORM based on Hibernate.

Big projects actually use service orientated setups on the highest level and use ORM or plain queries according to their needs. There is no one and only solution.

I'm talking now about so-called "repository pattern". I don't really understand what people mean by "service layer" and never saw any need with that, so currently the question is about data repository classes.

IO-Applications are nearly equal. There is a request and a response. The controller is responsible for collecting data from the request to call a service which produces data which will be collected and returned. E.g. business logic and all the use-cases of the software are 100% part of the service layer. Unit-testing the controller by mocking the services is much easier.

"Repository pattern":

I've spent a lot of time on Doctrine and PHP and over the last years I dropped the repositories, because 1% are simply repository operations like a list of countries ordered by name, 80% are more complex DQL or querybuilder queries, also with flat results and the rest are SQL queries I cannot write with DQL.

Result of most of the queries are table with columns from different joined tables. ORM would not makes sense at all, thus I don't use repositories anymore, but a clean service layer and a couple of data access services.

Related Topic