NoSQL – Better Ways to Update Records in Bulk

netnosqlravendbsql server

When we started our application we had the choice of going with the traditional MS-SQL normalized database or with a NoSQL database (RavenDB is what we tried out). Here is a simplified version of our schema:

<Order OrderNumber="1000" OrderVersion="1" ...>
  <Products>
    <Product Code="X58" Price="$5.50" Quantity="3" ... />
    <Product Code="X70" Price="$9.99" Quantity="1" ... />
  </Products>
  <Contacts>
    <Contact ... />
    <Contact ... />
  </Contacts>
</Order>

While we could have went with a MS-SQL database, we didn't because we wanted all of our data for an order to be kept historically; we did not want to FK the contact even if they had the same name and address because while it might change, the order should have the contact info at the time of purchase. Also, yes, we could have FK our Products instead of using the Code to reference the product, but Products never get deleted, and again… if they are updated we want to capture their information at the time of purchase. Basically, every Order seems to be a self-contained document that should never rely on FKs.

OrderVersion is just to keep track of our process version when the order was created, we could easily just have used OrderDate to track versioning.


Anyways, so our decision was to go with RavenDB. We built out our application and website, and internally we are testing it out. The one scenario we are running into problems is: "How do you do a bulk update on records as good as SQL does it?"

For example, if we had just pushed our code to production during offline maintenance and somehow all records from MA with ZipCodes that have a leading zero (0) had the zero trimmed off (i.e. the zip code now read 1234 instead of 01234)?

Normally, we would patch things immediately on one fail-over server at a time, and updating records would be easy:

UPDATE c
SET c.ZipCode = ...
FROM Contact c
    INNER JOIN Order o ON c.OrderId = o.OrderId
WHERE o.Version = 1
  AND ...

But, it seems that the best solution we could come up with if we use RavenDB is to get all documents, then programmatically update the records through a custom C# console app. It seems that TSQL is much easier/faster. I'm not sure if there is a better way that we are unaware of, or is this a common shortcoming of using RavenDB (or any NoSQL document store)?

My biggest worry is, while our console app seemed to run just fine enough with a sample of 100K orders, what happens if we get to the point of 50M+ orders!? Or, is that also an issue with T-SQL that we would eventually run into with the example I gave?

Best Answer

Wow, there are sure a lot of extraneous statements in your question. I think I can boil it down to:

How do I update documents in RavenDB?

That question already has been answered on StackOverflow here and here, which will point you at RavenDB's Patching API. There's also extensive documentation available here, and you can even kick off a bulk patch operation from the RavenDB Studio, as described here. (No custom app required.)

Also, I wouldn't fall into the trap of assuming that the SQL Server way is necessarily better. It's just different. Have you ever tried updating 50M+ records with a single update statement in SQL Server? I think you'll find that the locking model is quite prohibitive. It's likely you would get timeouts or deadlocks. You'd probably have to implement some batching and retry operations. With RavenDB, while it may take a while, you're at least ensured that the operation will complete eventually, and your database will stay online during the process.

Related Topic