I'll try to tackle part of your question: why do I need a DAO layer if I never intend to swap databases?
As another respondent mentioned part of your goal when writing any class is the Single Responsibility Principle: a class should have only one reason to change. Simply put, you don't want to modify code without necessity, because doing so introduces risk. To minimize that risk, we try and avoid touching the implementation of a class for too many reasons. If I had to alter my class because the business rules changed, or because I decided to change how I mapped it into persistent storage, I would have two reasons to change my code. By removing one of those responsibilities, database mapping, I can avoid the risk of making an mistake that impacts the other, and the testing burden of checking both. I don't want to check my Db mappings, just because I change a business rule. You may also hear people talking about 'separation of concerns', that's another way of expressing this idea. A class that handles my logic for arranging shipping containers should not care about persistence.
It also comes into play when we think about unit testing. When I unit test my shipping container arrangement class, I don't want those tests to be coupled to other concerns, such as persistence. I want to be isolated from them. So I need to be easily able to test in isolation from them. If my class does not contain persistence logic, then I can easily test it in memory, without worrying about how it is persisted.
This driver produces an additional benefit: reasoning about our code. If we don't perform data-access at random points in our code, but through accessing to the DAO through an application service layer and then access to objects returned from that, I will not suddenly have unexpected I/O code running at some point during execution of business logic. It is also much easier to reason about our persistence code if it is not interspersed with business logic.
This makes it much easier to think about our code. We can follow domain logic without having our understanding polluted by access to persistent storage. This ability to reason about code easily is a key to productivity
The ultimate expression of many of these layering ideas is an hexagonal architecture.
You should be aware though that a domain model is not the ideal for all scenarios; particularly where you have no business logic. In that case a transaction script (one thing after another) might be the right solution. Some models, such as CQRS exist to try and gain the benefits of a domain model, without the costs for simple read access, for example for displaying a web page.
Sometime ago, i asked on SO a question that include if i have to implements my service as they're using JPA managed entities ? Here are the two points of my own answer that should interest you :
I don't have to abstract the fact that i use managed entities, this will lead to complex and unefficient code
I choosed JPA, i won't switch for it which is true unless rewriting the full model to stand for something based on non relationnal database.
If you try to abstract the fact that you're using lazy-loading/managed entities, you will just have troubles. For instance, i saw a project where there was DAO Pojo and business POJO, they were always converting the DAO to business POJO using adapter, and thus always triggering loading lazy list in the N+1 select way.
So if I adapt what i was told in that question, at the very least, your should have 2 differents business implementations, one relying on the strength of JPA, and the other, either one that don't care about JPA/RDF, either one that will get the best of RDF DAO layer.
I know that doesn't seems to follow the rule of POO about totally encapsulating behaviour in each layer, but really, try to abstract complex JPA model without triggering cascading N+1 select on every relations or manually fetching each relations when you shouldn't need to with JPA. It just can't work. JPA is really invasive, but really powerful too, as you have very few code to write (and maintain!) to use it.
Of course if you're just doing something really simple having quite few data, a demo, ... you can stick to one business layer that is not using JPA at his best.
But if you need some basic performance, like loading all your database line-by-line in the ORM is not an option, you will probably end up stuck with one implementations that don't use JPA right.
Best Answer
As you see, the solution might not be very satisfying. The problem of data access at all is, that we have to consider many different aspects which make code not 100% reusable. It's about transactions and locking, and the way e.g. ORM maps a relational database to objects and vice versa.
Mostly if you try to write architecture components, trying to solve the problem you'll soon notice, that you just move the problem to another place. The reason is simply that a bottom-top approach as your question is, is a bad idea.
What we want is an agile top-down strategy. You will see, that most of the accesses are read-only, which (nearly) means no locking and no transactions. Some writes are field updates which must not be synchronized and a few inserts into different tables might need locking and should be in a transaction.
Now to your variants:
violates OCP. there should be one entity/table-definition and querybuilding components around it but only depending on the entity itself. That means no large accessor classes.
lazy loading is cool for code which should be more readable focused than performant, e.g. backends with small lists and a few administrators. Not a good idea for uncached permanently accessed frontend pages.
parameters like offset and limit are okay, but boolean parameters whether to load sparse or full is too much overhead in code, which might leads to magic behavior.
For integration tests yes, you need to save some data to your database (you can use e.g. sqlite). For unit tests no, you mock the service layer according to your controller input and test how controllers behave.
Big projects actually use service orientated setups on the highest level and use ORM or plain queries according to their needs. There is no one and only solution.
IO-Applications are nearly equal. There is a request and a response. The controller is responsible for collecting data from the request to call a service which produces data which will be collected and returned. E.g. business logic and all the use-cases of the software are 100% part of the service layer. Unit-testing the controller by mocking the services is much easier.
"Repository pattern":
I've spent a lot of time on Doctrine and PHP and over the last years I dropped the repositories, because 1% are simply repository operations like a list of countries ordered by name, 80% are more complex DQL or querybuilder queries, also with flat results and the rest are SQL queries I cannot write with DQL.
Result of most of the queries are table with columns from different joined tables. ORM would not makes sense at all, thus I don't use repositories anymore, but a clean service layer and a couple of data access services.