I have a large sql file (20 GB) that I would like to convert into csv. I plan to load the file into Stata for analysis. I have enough ram to load the entire file (my computer has 32GB in RAM)
Problem is: the solutions I found online with Python so far (sqlite3) seem to require more RAM than my current system has to:
- read the SQL
- write the csv
Here is the code
import sqlite3
import pandas as pd
con=sqlite3.connect('mydata.sql')
query='select * from mydata'
data=pd.read_sql(query,con)
data.to_csv('export.csv')
con.close()
The sql file contains about 15 variables that can be timestamps, strings or numerical values. Nothing really fancy.
I think one possible solution could be to read the sql and write the csv file one line at a time. However, I have no idea how to do that (either in R or in Python)
Any help really appreciated!
Best Answer
You can read the SQL database in batches and write them to file instead of reading the whole database at once. Credit to How to add pandas data to an existing csv file? for how to add to an existing CSV file.