Holy cow. I can reproduce. Tried the single-append variation too; it's even speedier. Looks like "-append" just magically makes HDF5-based save() 30x faster. I don't have an explanation but I wanted to share what I found.
I wrapped up your test code in a function, refactoring it to make the save logic agnostic about the test data structure so you can run it on other data sets, and added some more diagnostic output.
Don't see the big speedup everywhere. It's huge on my 64-bit XP box and a 32-bit Server 2003 box, big on my 64-bit Windows 7 box, nonexistent on a 32-bit XP box. (Though multiple appends are a huge loss on Server 2003.) R2010b is slower in many cases. Maybe HDF5 appends or save's use of it just rock on newer Windows builds. (XP x64 is actually the Server 2003 kernel.) Or maybe it's just a machine config difference. There's a fast RAID on the XP x64 machine, and the 32-bit XP has less RAM than the rest. What OS and architecture are you running? Can you try this repro too?
19:36:40.289: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft(R) Windows(R) XP Professional x64 Edition 5.2.3790 Service Pack 2 Build 3790
19:36:55.930: Save the simple way: 11.493 sec
19:37:07.415: Save using multiple append: 1.594 sec
19:37:09.009: Save using one big append: 0.424 sec
19:39:21.681: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows XP Professional 5.1.2600 Service Pack 3 Build 2600
19:39:37.493: Save the simple way: 10.881 sec
19:39:48.368: Save using multiple append: 10.187 sec
19:39:58.556: Save using one big append: 11.956 sec
19:44:33.410: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft Windows 7 Professional 6.1.7600 N/A Build 7600
19:44:50.789: Save the simple way: 14.354 sec
19:45:05.156: Save using multiple append: 6.321 sec
19:45:11.474: Save using one big append: 2.143 sec
20:03:37.907: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft(R) Windows(R) Server 2003, Enterprise Edition 5.2.3790 Service Pack 2 Build 3790
20:03:58.532: Save the simple way: 19.730 sec
20:04:18.252: Save using multiple append: 77.897 sec
20:05:36.160: Save using one big append: 0.630 sec
This looks huge. If it holds up on other data sets, I might use this trick in a lot of places myself. It may be something to bring up with MathWorks, too. Could they use the fast append technique in normal saves or other OS versions, too?
Here's the self-contained repro function.
function out = reproMatfileAppendSpeedup(nPasses, tests, imax, formats)
%REPROMATFILEAPPENDSPEEDUP Show how -append makes v7.3 saves much faster
%
% Examples:
% reproMatfileAppendSpeedup()
% reproMatfileAppendSpeedup(2, [], 0, {'7.3','7','6'}); % low-entropy test
if nargin < 1 || isempty(nPasses); nPasses = 1; end
if nargin < 2 || isempty(tests); tests = {'basic','multiappend','bigappend'}; end
if nargin < 3 || isempty(imax); imax = 255; end
if nargin < 4 || isempty(formats); formats = '7.3'; end % -v7 and -v6 do not show the speedup
tests = cellstr(tests);
formats = cellstr(formats);
fprintf('%s: Testing speed, imax=%d, R%s on %s\n',...
timestamp, imax, version('-release'), systemDescription());
tempDir = setupTempDir();
testData = generateTestData(imax);
testMap = struct('basic','saveSimple', 'multiappend','saveMultiAppend', 'bigappend','saveBigAppend');
for iFormat = 1:numel(formats)
format = formats{iFormat};
formatFlag = ['-v' format];
%fprintf('%s: Format %s\n', timestamp, formatFlag);
for iTest = 1:numel(tests)
testName = tests{iTest};
saveFcn = testMap.(testName);
te = NaN(1, nPasses);
for iPass = 1:nPasses
fprintf('%s: %-30s', timestamp, [testName ' ' formatFlag ':']);
t0 = tic;
matFile = fullfile(tempDir, sprintf('converted-%s-%s-%d.mat', testName, format, i));
feval(saveFcn, matFile, testData, formatFlag);
te(iPass) = toc(t0);
if iPass == nPasses
fprintf('%7.3f sec %5.3f GB used %5.0f MB file %5.3f sec mean\n',...
te(iPass), physicalMemoryUsed/(2^30), getfield(dir(matFile),'bytes')/(2^20), mean(te));
else
fprintf('%7.3f sec %5.3f GB used\n', te(iPass), physicalMemoryUsed/(2^30));
end
end
% Verify data to make sure we are sane
gotBack = load(matFile);
gotBack = rmfield(gotBack, intersect({'dummy'}, fieldnames(gotBack)));
if ~isequal(gotBack, testData)
fprintf('ERROR: Loaded data differs from original for %s %s\n', formatFlag, testName);
end
end
end
% Clean up
rmdir(tempDir, 's');
%%
function saveSimple(file, data, formatFlag)
save(file, '-struct', 'data', formatFlag);
%%
function out = physicalMemoryUsed()
if ~ispc
out = NaN;
return; % memory() only works on Windows
end
[u,s] = memory();
out = s.PhysicalMemory.Total - s.PhysicalMemory.Available;
%%
function saveBigAppend(file, data, formatFlag)
dummy = 0;
save(file, 'dummy', formatFlag);
fieldNames = fieldnames(data);
save(file, '-struct', 'data', fieldNames{:}, '-append', formatFlag);
%%
function saveMultiAppend(file, data, formatFlag)
fieldNames = fieldnames(data);
for i = 1:numel(fieldNames)
if (i > 1); appendFlag = '-append'; else; appendFlag = ''; end
save(file, '-struct', 'data', fieldNames{i}, appendFlag, formatFlag);
end
%%
function testData = generateTestData(imax)
nBlocks = 40;
blockSize = [65 480 240];
for i = 1:nBlocks
testData.(sprintf('block_%03u', i)) = struct('blockNo',i,...
'frames', randi([0 imax], blockSize, 'uint8'));
end
%%
function out = timestamp()
%TIMESTAMP Showing timestamps to make sure it is not a tic/toc problem
out = datestr(now, 'HH:MM:SS.FFF');
%%
function out = systemDescription()
if ispc
platform = [system_dependent('getos'),' ',system_dependent('getwinsys')];
elseif ismac
[fail, input] = unix('sw_vers');
if ~fail
platform = strrep(input, 'ProductName:', '');
platform = strrep(platform, sprintf('\t'), '');
platform = strrep(platform, sprintf('\n'), ' ');
platform = strrep(platform, 'ProductVersion:', ' Version: ');
platform = strrep(platform, 'BuildVersion:', 'Build: ');
else
platform = system_dependent('getos');
end
else
platform = system_dependent('getos');
end
arch = getenv('PROCESSOR_ARCHITEW6432');
if isempty(arch)
arch = getenv('PROCESSOR_ARCHITECTURE');
end
try
[~,sysMem] = memory();
catch
sysMem.PhysicalMemory.Total = NaN;
end
out = sprintf('%s, arch=%s, %.0f GB, os=%s',...
computer, arch, sysMem.PhysicalMemory.Total/(2^30), platform);
%%
function out = setupTempDir()
out = fullfile(tempdir, sprintf('%s - %s', mfilename, datestr(now, 'yyyymmdd-HHMMSS-FFF')));
mkdir(out);
EDIT: I modified the repro function, adding multiple iterations and parameterizing it for save styles, file formats, and imax for the randi generator.
I think filesystem caching is a big factor to the fast -append behavior. When I do a bunch of runs in a row with reproMatfileAppendSpeedup(20) and watch System Information in Process Explorer, most of them are under a second, and physical memory usage quickly ramps up by a couple GB. Then every dozen passes, the write stalls and takes 20 or 30 seconds, and physical RAM usage slowly ramps down to about where it started. I think this means that Windows is caching a lot of writes in RAM, and something about -append makes it more willing to do so. But the amortized time including those stalls is still a lot faster than the basic save, for me.
By the way, after doing multiple passes for a couple hours, I'm having a hard time reproducing the original timings.
Best Answer
Version 7.3 of MAT-files uses HDF5 format, this format has a significant storage overhead to describe the contents of the file, especially so for complex nested cellarrays and structures. Its main advantage over previous versions of MAT-files is that it allows storing data larger than 2GB on 64-bit systems.
Note that both v7 and v7.3 are compressed and use Unicode encoding (unlike v6), yet they are two completely different formats...
References: