Excel file growing huge (>150 MB)

microsoft excelmicrosoft-office

There is one particular Excel file that is used by a number of employees at my company. It is edited from both Excel 2003 and 2007, with the "Sharing" feature turned on to allow multiple writers at once.

The file has a decent amount of data on several sheets with some basic formatting, and used to be about 6MB, which seems reasonable for its content. But after a few weeks of editing, the file grew to 10, then 20 MB, and eventually skyrocketed to more than 150 MB, even though it still has about the same amount of data as before. It now takes 5-10 minutes to open it, and that much time again to save it.

The first time this happened, I copied the content of each sheet into a new, blank workbook, and saved the new workbook; this brought it back down to about 6MB. Now, it has blown up again.

The workbook uses the "Data Validation" feature to limit the values in certain columns to the contents of a few named ranges. Copying all the data into a new workbook means re-setting up all the data validation, which is a pain and not something that we want to do every month.

As a troubleshooting step, I tried saving the file in "XML Spreadsheet 2003" format, hoping to get some insight into what was being stored. Sure enough, the file was almost a gig, and almost all of the 10 million lines look like this:

<NamedCell ss:Name="Z_21D5114F_E50C_46AC_AA4F_C3FF540C717F_.wvu.FilterData"/>
<NamedCell ss:Name="Z_1EE2BA5E_3011_4F9A_8ACD_E58835250FC4_.wvu.FilterData"/>
<NamedCell ss:Name="Z_1E3BDCEA_6A72_4ECC_BF4F_7B03CC66181E_.wvu.FilterData"/>

I've seen a few VBScripts online to manage and enumerate named cells that are hidden in Excel's built-in interface, though I wonder how they'd handle my 10 million named cells. What I really need, though, is an understanding of why this keeps happening. What actions in excel could be causing this?

UPDATE:

Here's an experiment I tried that provides some more detail:

  • I turned off sharing; the file remained huge.
  • I saved the file as an .xlsx file, and it shrank to 5MB.
  • Then I closed that file, opened it back up, and saved it as an .xls file, with sharing still turned off; it got huge again!
  • When an '03 user tries to open that nice, compact .xlsx file, it takes several minutes to open it, even though '07 opens it fine.

So, this seems to be an '03 specific issue, and saving the file in '03 format immediately recreated a bunch of junk that clearly had not been in the '07 file at all.

Best Answer

Say it with me: Excel is not a database.

You're running up against the design limitations of the software: it's only got so much ability to store transactional data, so when multiple people are writing to it, it has to store a frickton of information in order to reconcile. You have so much data in it, that the transactional copies are HUGE.

Microsoft assumes (correctly) that if you have that much data, it's stored in a database, and you're just using Excel as a front end.

If you're going to work like that, you should at least knock together a little Access database. It will save you a world of headache, because it's meant to WORK like that and Excel just isn't.

@Josh: Yes, it absolutely IS leaking. When you share a document, it has to keep track of the modifications done by every user...I'll call this "transactional data" but you can just think of it as history. Since there is never an "official" version, it keeps keeping track of changes, and the document bloats up faster than Kirstie Alley in a doughnut shop.

It's by design. Someone who is an excel guru may be able to tell you how to make it stop, but the best solution is just not to use excel for data that's being constantly maintained. It's not really what it's designed for.

I am sympathetic to your problem, but it's a better solution to explain the problem to the higher-ups and work out a new procedure, than it is to try and prolong an unfortunate hack.

@Josh: If you just want to shrink it once, temporarily, copy all the data, and paste it into a new spreadsheet. THAT will kill all the metadata, guaranteed (make sure you don't select the whole sheet, but only the part with data in it) But this is a temporary solution at best.