I have a lot of excel files that contains data and it contains empty rows and empty columns.
like shown bellow
I am trying to remove Empty rows and columns from excel using interop.
I create a simple winform application and used the following code and it works fine.
Dim lstFiles As New List(Of String)
lstFiles.AddRange(IO.Directory.GetFiles(m_strFolderPath, "*.xls", IO.SearchOption.AllDirectories))
Dim m_XlApp = New Excel.Application
Dim m_xlWrkbs As Excel.Workbooks = m_XlApp.Workbooks
Dim m_xlWrkb As Excel.Workbook
For Each strFile As String In lstFiles
m_xlWrkb = m_xlWrkbs.Open(strFile)
Dim m_XlWrkSheet As Excel.Worksheet = m_xlWrkb.Worksheets(1)
Dim intRow As Integer = 1
While intRow <= m_XlWrkSheet.UsedRange.Rows.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(intRow, 1).EntireRow) = 0 Then
m_XlWrkSheet.Cells(intRow, 1).EntireRow.Delete(Excel.XlDeleteShiftDirection.xlShiftUp)
Else
intRow += 1
End If
End While
Dim intCol As Integer = 1
While intCol <= m_XlWrkSheet.UsedRange.Columns.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(1, intCol).EntireColumn) = 0 Then
m_XlWrkSheet.Cells(1, intCol).EntireColumn.Delete(Excel.XlDeleteShiftDirection.xlShiftToLeft)
Else
intCol += 1
End If
End While
Next
m_xlWrkb.Save()
m_xlWrkb.Close(SaveChanges:=True)
Marshal.ReleaseComObject(m_xlWrkb)
Marshal.ReleaseComObject(m_xlWrkbs)
m_XlApp.Quit()
Marshal.ReleaseComObject(m_XlApp)
But when cleaning big excel files it takes a lot of time.
Any suggestions for optimizing this code? or another way to clean this excel files faster? Is there a function that can delete empty rows in one click?
I don't have problem if answers are using C#
EDIT:
I uploaded a sample file Sample File. But not all files have same structure.
Best Answer
I found that looping through the excel worksheet can take some time if the worksheet is large. So my solution tried to avoid any looping in the worksheet. To avoid looping through the worksheet, I made a 2 dimensional object array from the cells returned from
usedRange
with:This is the array I loop through to get the indexes of the empty rows and columns. I make 2 int lists, one keeps the row indexes to delete the other keeps the column indexes to delete.
These lists will be sorted from high to low to simplify deleting rows from the bottom up and deleting columns from right to left. Then simply loop through each list and delete the appropriate row/col.
Finally after all the empty rows and columns have been deleted, I SaveAs the file to a new file name.
Hope this helps.
EDIT:
Addressed the UsedRange issue such that if there are empty rows at the top of the worksheet, those rows will now be removed. Also this will remove any empty columns to the left of the starting data. This allows for the indexing to work properly even if there are empty rows or columns before the data starts. This was accomplished by taking the address of the first cell in UsedRange this will be an address of the form “$A$1:$D$4”. This will allow the use of an offset if the empty rows at the top and empty columns to the left are to remain and not be deleted. In this case I am simply deleting them. To get the number of rows to delete from the top can be calculated by the first “$A$4” address where the “4” is the row that the first data appears. So we need to delete the top 3 rows. The Column address is of the form “A”, “AB” or even “AAD” this required some translation and thanks to How to convert a column number (eg. 127) into an excel column (eg. AA) I was able to determine how many columns on the left need to be deleted.
EDIT 2: For testing I made a method that loops thru the the worksheet and compared it to my code that loops thru an object array. It shows a significant difference.
Method to Loop thru the worksheet and delete empty rows and columns.
Then a Main for testing/timing the two methods.
EDIT 3 As per OP request... I updated and changed the code to match the OP code. With this I found some interesting results. See below.
I changed the code to match the functions you are using ie… EntireRow and CountA. The code below I found that it preforms terribly. Running some tests I found the code below was in the 800+ milliseconds execution time. However one subtle change made a huge difference.
On the line:
This is slowing things down a lot. If you create a range variable for UsedRang and not keep regrabbibg it with each iteration of the while loop will make a huge difference. So… when I change the while loop to…
This performed very close to my object array solution. I did not post the results, as you can use the code below and change the while loop to grab the UsedRange with each iteration or use the variable usedRange to test this.
UPDATE by @Hadi
You can alter
DeleteCols
andDeleteRows
function to get better performance if excel contains extra blank rows and columns after the last used ones:check my answer at Get Last non empty column and row index from excel using Interop