I am trying to import and sort data from a large excel report into a new file using Excel 2007 VBA. I have come up with two methods so far for doing this:
-
Have Excel actually open the file (code below), gather all data into arrays and output the arrays onto new sheets in the same file and save/close it.
Public Sub GetData() Dim FilePath As String FilePath = "D:\File_Test.xlsx" Workbooks.OpenText Filename:=FilePath, FieldInfo:=Array(Array(2, 2)) ActiveWorkbook.Sheets(1).Select End Sub
-
Use ADO to get all data out of the closed workbook, import the whole datasheet into an array (code below) and sort data from there and then output data into a new workbook and save/close that.
Private Sub PopArray() 'Uses ADO to populate an array that will be used to sort data Dim dbConnection As ADODB.Connection, rs As ADODB.Recordset Dim Getvalue, SourceRange, SourceFile, dbConnectionString As String SourceFile = "D:\File_Test.xlsx" SourceRange = "B1:Z180000" dbConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" & _ "Data Source=" & SourceFile & ";" & _ "Extended Properties=""Excel 12.0 Xml;HDR=No"";" Set dbConnection = New ADODB.Connection dbConnection.Open dbConnectionString 'open the database connection Set rs = dbConnection.Execute("SELECT * FROM [" & SourceRange & "]") Arr = rs.GetRows UpBound = UBound(Arr, 2) rs.Close End Sub
The test file used has about 65000 records to sort through (about a third of what I will end up using it for). I was kind of disappointed when the ADO version only performed marginally better than the open worksheet (~44 seconds vs ~40 seconds run time). I was wondering if there is some way to improve the ADO import method (or a completely different method – ExecuteExcel4Macro maybe? – if there is one) that would boost my speed. The only thing I could think of was that I am using "B1:Z180000"
as my SourceRange
as a maximum range that is then truncated by setting Arr = rs.GetRows
to accurately reflect the total number of records. If that is what is causing the slow down, I'm not sure how I would go about finding how many rows are in the sheet.
Edit – I am using Range("A1:A" & i) = (Array) to insert data into the new worksheet.
Best Answer
This answer might not be what you are looking for but I still felt compelled to post it based on your side note [...] or a completely different method ]...].
Here, I am working with files of 200MB (and more) each which are merely text files including delimiters. I do not load them into Excel anymore. I also had the problem that Excel was too slow and needs to load the entire file. Yet, Excel is very fast at opening these files using the
Open
method:In this case Excel is not loading the entire file but merely reading it line by line. So, Excel can already process the data (forward it) and then grab the next line of data. Like this Excel does not neet the memory to load 200MB.
With this method I am then loading the data in a locally installed SQL which transfers the data directly to our DWH (also SQL). To speed up the transfer using the above mething and getting the data fast into the SQL server I am transferring the data in chunks of 1000 rows each. The string variable in Excel can hold up to 2 billion characters. So, there is not problem there.
One might wonder why I am not simply using SSIS if I am already using a local installation of SQL. Yet, the problem is that I am not the one loading all these files anymore. Using Excel to generate this "import tool" allowed me to forward these tools to others, who are now uploading all these files for me. Giving all of them access to SSIS was not an option nor the possibility of using a destined network drive where one could place these files and SSIS would automatically load them (ever 10+ minutes or so).
In the end my code looks something like this.