Reading CSV files located on a Linux server and updating the tables in a SQL Server database

linuxsql server

I was wondering how we could ingest CSV files located on a Red Hat Linux server into SQL Server database tables.

I know we can write a stored procedure/bulk insert to read the files that are located on the same Windows server as SQL Server and update the database, but not sure how to do it when the files are present on a Linux server.

Any help would be greatly appreciated.

Best Answer

The key for connecting to a MS SQL Server from arbitrary places is ODBC.

You are looking for some language that can handle reading a csv easily, and inserting the data back across ODBC.

There are several choices for ODBC. If you are a Java shop, one can go and get the jdbc for SQL Server and connect to the database that way. Have that .jar be part of your class path and connect away. This should probably be the option if you have people on high dictating technologies (java is a fairly safe one and you can always find or hire a java programmer).

Me? I'm a perl guy at heart. Crunching one data file and exporting it into another is a classic perl solution domain.

WIth perl, one would get DBI (docs) for the interface to a database layer, and then DBD::CSV (tutorial) to read the data from the csv file as if it was a database itself, and then DBD::ODBC to write out to an ODBC connection. Installing perl modules through CPAN is fairly easy (though I do admit that I say that after having done it many, many times).

This could be done with Text::CSV instead. However, there is a certain elegance in the DBD approach of writing a program to read from one arbitrary database (do any appropriate transforms on the data) and write to another arbitrary database. If done right, and you find yourself wishing to copy data from another database at some point, it becomes a fairly minor change to the code (install the appropriate DBD driver for the database and read the data.

Taking a completely different approach that bends more into the sysadmin world than the programmer domain, one could move the data from the linux machine to the windows machine on a regular basis.

You could put the files on a samba share and mount them from windows.
Similar option to above, you could make the file available on a web server and then fetch and process it on windows.
With the appropriate packages, one could mount a windows share on linux (something along the line of mount -t cifs -o guest,uid=client_user,gid=users //192.168.1.100/share /path_to/mount or //192.168.44.100/share /path_to/mount cifs guest,_netdev 0 0 in /etc/fstab)
Instead of sharing linux to windows, or windows to linux, NAS (network attached storage) is an easy way to have a single server (not linux or windows) that serves files to each.
Set up an ftp server on linux and have a scheduled job to pull the file from windows.
Have an automated job on linux (cron) ftp the file to windows (this assumes you've set up an ftp server on some windows machine).
Create a web app on windows that can insert a csv file into a database, call this web app via curl from windows.

These options take a greater degree of synchronization between the machines. Making the file available requires that both machines can access each other. This may be difficult if one is say, in the dmz and another is not. Having two sets of programs to do the work on different machines (copy file, process file) requires that clocks be synced reasonably and multiple sets of scheduled processes be able to run.

Best Answer

Related Solutions

SQL Server and MongoDB – Can SQL Server and Mongo Be Used Together?

Database Design – Evaluating the Use of Views for Each Table

Related Topic