C# – Repository Pattern and database queries

cdatabasedesign-patternspatterns-and-practicesrepository

Having read others posts, it already seems to me that the concept "repository" and "database" do not go well hand in hand, as they are meant to be completely separate concepts…. but I'll ask the question anyway.

I currently have to import different types of data (one type of data may consist of several thousand records) from several databases that happen to be all different (oracle, sybase, sql server), process that data (depending on what kind of data set is) and then write that processed data into a different database. The language I am using now is C#.

I have been told that using the repository pattern in my situation might come in handy, but I am unsure how to engineer it, and more importantly, where to place all the different parametrized SQL queries in this context. Having so many different products and different database sources only contributes to increase my confusion.

For the reason mentioned in my first paragraph, I have the feeling my SQL queries should be part of my data access layer, while my repositories actually live in layers above. Am I getting all this wrong? Is the repository pattern actually a terrible way of solving my problem?

Best Answer

My 2 cents: A Database is a database. A repository is something that will abstract the concept of the database for you.

When we say database, it comes directly in mind some keywords like MS SQL Server, Oracle, DB2, MySQL, PostgreSQL.

But if you expand a little bit the concept, a database is a place where you can put records, usually in a structured way, and you run queries against these records.

In this sense, a database may be a directory in your file system, with a bunch of text files spread apart, each text file with several lines, each line representing a single record, for example.

That being said, what is a repository then ? A repository is a higher level of abstraction, above the database. It's where you store, retrieve and query objects, from some magical place, that your application may not be fully aware of.

Concrete example: Let's say you have records of 3 different types: Car, TrafficTicket and Person. They relate to each other in the following schema:

  • A person has zero or many cars;
  • A car have zero or many traffic tickets;

Also, they have these attributes:

  • Person { Name, SSN } = SSN is a unique identifier.
  • Car { OwnerSSN, LicensePlate } = License Plate is a unique identifier.
  • TrafficTicket { Id, LicensePlate, Severity, Points, Value } = Id is a unique identifier.

So far so good, right ?

Now, imagine that you have a huge database of Traffic Tickets, provided as text file you can download from somewhere. Nothing illegal, obviously.

Also, you have in-house an Oracle Database where you have a list of Persons (each one being an employee of your company).

Also, you have a list of Cars that you can query from a partner that gave you direct access to his MySQL database.

Your manager, a nice guy - with some shady ethical background - asks you to make him a report about how many of his employess have a bad driving record. He wants to use that to leverage parking spaces in the company's parking lot, and he threatens you to provide him this report OR ELSE...

The plot thickens. You have little time to do your report. The bills in your house are getting due, and you have a family to raise (what will little Ben do without his morning cereal ?). So you decide to cope with the shady manager, and report him to FBI later. After all, it's nice to have some technical challenge here and there.

So you decide to create a C# project, because you're smart, cool, and you know LINQ-fu. And you structure your project as following:

Namespaces:

ShadyReport.TextFiles.DAL
    TrafficTicketReader.cs
        public List<TrafficTicket> GetAllTrafficTicketsFromTextFile(string pathOfCompletelyLegalTextFile)

ShadyReport.Oracle.DAL
    EF6OracleContext.cs    *(your Entity Framework Oracle DB Context)*

ShadyReport.MySQL.DAL
    EF6MySQLContext.cs     *(your Entity Framework MySQL DB Context)*

ShadyReport.Repository
    PersonRepository.cs
        public List<Person> GetAllPersons()

    CarRepository.cs
        public List<Car> GetCarsOfPersons(List<Person> persons)

    TrafficTicketRepository.cs
        public List<TrafficTicket> GetTrafficTicketsOfCars(List<Car> cars)

ShadyReport.Entity
    TrafficTicket.cs *(simple POCO object)*
    Person.cs        *(simple POCO object)*
    Car.cs           *(simple POCO object)*

ShadyReport.Business
    ShadyReportGenerator.cs
        public void GenerateShadyReport()

I'm assuming you know how to work with Entity Framework here, and that you understand that different DBContexts may have different ConnectionStrings, each pointing to a different database.

That being said, this is how your GenerateShadyReport() method would look like:

public void GenerateShadyReport()
{
    var personRepository = new ShadyReport.Repository.PersonRepository();
    var carRepository = new ShadyReport.Repository.CarRepository();
    var ticketRepository = new ShadyReport.Repository.TrafficTicketRepository();

    var persons = personRepository.GetAllPersons();
    var cars = carRepository.GetCarsOfPersons(persons);
    var tickets = ticketRepository.GetTrafficTicketsOfCars(cars);

    var shadyReport = <LINQ magic stuff to tie the objects above together>
}

As you can see, your report generator knows nothing of where the data comes from. From the report generator point of view, the database is a non-existent concept, as it fetches the data from the repositories.

That being said, your specific parameterized queries should go in your specific repositories. A query that filters cars by persons must be in the Cars repository, while a query that ties things from different sources together is done in the Business layer, above the repositories. This is how they look like:

+-----------------+---+
| UI              | E |
+-----------------+ n |
| Business        | t |
+-----------------+ i |
| Repository      | t |
+-----------------+ y |
| DAL             |   |
+-----------------+---+

And the call graph is always one way: DOWN. UI calls Business, which calls Repositories, which calls DALs. And they all "speak" entities.

I hope this helps. It's NOT THE DEFINITIVE ANSWER about repositories, but these kind of answers usually help me understand the concepts.