Electronic – Data compression for large amount of binary data

algorithmccommunicationdata storage

In an embedded application of mine, the device generates a large amount of instrumentation data that is stored every few minutes in an NVM, such as a set of EEPROMs or a Data Flash. This data is downloaded from the device by field personnel using a slow serial(RS-232) connection at a baud rate of 9600bps. Just to give you an idea, the data for one instance is 64 bytes, stored every 15 minutes, for 180 days. With protocol overheads, this takes well over 25-30 minutes to download, which is a hassle. The data is then analysed at a central station on PC grade systems. Also the system is extremely cost sensitive, and any savings in NVM costs are welcome.

I was looking for some reasonable compression schemes that I can use for compression the data during storage. The scheme that I was thinking off was to buffer a certain number of instances in a smaller NVM device and compress them at one go after the buffer is full. I tried using an implementation of the LZW algorithm, but could not go under 75% compression rate. A sample of the data is as reproduced below. These are hexadecimal values.(Each instance is only 32 bytes in this sample.)

00 12 23 09 12 5A 69 5B 4E 5B 96 00 3B 00 21 F7
00 44 03 2F 03 B6 01 D6 00 00 5A 4B 58 00 00 FD
15 12 23 09 12 5A 0E 5A F2 5B 19 00 3B 00 24 14
00 50 03 66 03 F7 02 08 00 00 5A 4D 51 00 00 4B
30 12 23 09 12 59 DF 5B 71 5A AA 00 3C 00 27 15
00 67 03 D9 04 9C 02 7B 00 00 59 4D 53 00 00 A7
45 12 23 09 12 59 AF 5B 61 5A 97 00 3D 00 27 52
00 54 03 93 04 38 02 3A 00 00 59 4E 52 00 00 A5
00 13 23 09 12 5A 08 5B A8 5A DF 00 3B 00 27 AF
00 50 03 61 03 F2 02 03 00 00 58 4E 56 00 00 56
15 13 23 09 12 5A 7E 5C 67 5B 95 00 3B 00 28 AC
00 5B 03 A2 04 56 02 53 00 00 57 4D 50 00 00 5D
30 13 23 09 12 5B 72 5C CE 5C 35 00 3B 00 28 94
00 63 03 CA 04 A1 02 A3 00 00 57 4B 4F 00 00 95
45 13 23 09 12 5B 41 5C 3E 5C 45 00 3B 00 28 30
00 51 03 84 04 2E 02 3F 00 00 58 4C 50 00 00 C1
00 14 23 09 12 5A E8 5B ED 5B ED 00 3C 00 28 78
00 4D 03 6B 04 15 02 35 00 00 56 4D 48 00 00 0A
15 14 23 09 12 5A FA 5B EA 5B 95 00 3D 00 27 AC
00 67 03 DE 04 B5 02 A3 00 00 58 4E 49 00 00 6B
30 14 23 09 12 5A 2B 5A FD 5A F2 00 40 00 27 EF
00 5B 03 B6 04 6A 02 5D 00 00 5A 4E 50 00 00 27
45 14 23 09 12 5A 9D 5B 69 5B 43 00 23 00 16 D7
00 43 02 8F 02 FD 01 7C 00 00 62 59 56 00 00 9F
00 15 23 09 12 5B 6F 5C 2F 5B DC 00 0B 00 06 10
00 4A 01 C7 02 1C 01 18 00 00 5C 50 4F 00 00 BC
15 15 23 09 12 5B E0 5C E9 5C 37 00 0D 00 05 73
00 32 01 6D 01 A4 00 C8 00 00 64 44 3D 00 00 0E
30 15 23 09 12 5B F1 5C C1 5C 36 00 0D 00 04 71
00 30 01 59 01 8B 00 AF 00 00 57 4D 55 00 00 42
45 15 23 09 12 5B E7 5D 04 5C 3C 00 09 00 01 23
00 30 01 36 01 68 00 A5 00 00 5B 4F 56 00 00 8B
00 16 23 09 12 5C 5E 5D 57 5C A8 00 0B 00 01 2E
00 27 01 0E 01 3B 00 91 00 00 62 4F 56 00 00 F6
15 16 23 09 12 5C C1 5D AF 5C D8 00 0D 00 01 2C
00 31 01 4F 01 81 00 B4 00 00 62 48 55 00 00 4A
30 16 23 09 12 5C B2 5D 7D 5C F6 00 0C 00 02 34
00 27 01 1D 01 4A 00 9B 00 00 64 4A 3B 00 00 EC
45 16 23 09 12 5D 2D 5D 68 5D 23 00 08 00 02 8E
00 2E 01 22 01 54 00 A0 00 00 62 49 56 00 00 B9
00 17 23 09 12 5C C4 5D 76 5C C3 00 08 00 02 8F
00 22 00 F0 01 13 00 7D 00 00 59 49 57 00 00 64
15 17 23 09 12 5C E5 5D C8 5D 1B 00 08 00 02 AE
00 27 01 09 01 2C 00 87 00 00 64 4A 40 00 00 2D
30 17 23 09 12 5D 1A 5D F2 5D 0E 00 09 00 02 3F
00 28 01 0E 01 31 00 87 00 00 64 4A 57 00 00 0B
45 17 23 09 12 5D 2C 5E 09 5D 65 00 09 00 02 A9
00 25 01 09 01 2C 00 82 00 00 5C 49 40 00 00 3D
00 18 23 09 12 5D 1B 5D 4B 5D 4A 00 09 00 02 D8
00 28 01 13 01 3B 00 8C 00 00 64 4A 58 00 00 F6
15 18 23 09 12 5D 2B 5D 96 5D 97 00 09 00 02 1B
00 22 00 F0 01 13 00 78 00 00 62 4A 57 00 00 5F
30 18 23 09 12 5B C6 5B F5 5B EE 00 09 00 00 B7
00 2B 01 18 01 3B 00 8C 00 00 61 00 58 00 00 3B
45 18 23 09 12 5A E1 5B 10 5A EF 00 09 00 00 6D
00 33 01 3B 01 5E 00 96 00 00 61 00 58 00 00 E3
00 19 23 09 12 5A F3 5B 1E 5B 23 00 09 00 00 5C
00 29 01 0E 01 2C 00 7D 00 00 62 00 58 00 00 64
15 19 23 09 12 5A AC 5A DF 5A E3 00 0A 00 00 0E
00 28 01 13 01 2C 00 78 00 00 61 00 3C 00 00 82
30 19 23 09 12 5A 9C 5A B2 5A CC 00 0A 00 00 47
00 2A 01 09 01 1D 00 69 00 00 62 00 5B 00 00 88
45 19 23 09 12 5A BA 5A DB 5A EA 00 0A 00 00 CD
00 27 01 09 01 22 00 69 00 00 62 00 5A 00 00 87
00 20 23 09 12 5A EA 5B 0E 5B 41 00 0A 00 00 4F
00 1D 00 DC 00 EB 00 55 00 00 62 00 5B 00 00 0A
15 20 23 09 12 5B 5D 5B C0 5B B8 00 0A 00 00 9D
00 27 01 09 01 22 00 6E 00 00 61 00 5B 00 00 82
30 20 23 09 12 5B C0 5B AB 5B EF 00 0A 00 00 FD
00 26 01 0E 01 22 00 73 00 00 62 00 5A 00 00 79
45 20 23 09 12 5B E0 5B FC 5B F8 00 0A 00 00 6E
00 27 01 09 01 22 00 73 00 00 61 00 59 00 00 7F
00 21 23 09 12 5C 30 5C 65 5C 6E 00 0A 00 00 80
00 1D 00 D7 00 F0 00 5F 00 00 61 00 5A 00 00 02
15 21 23 09 12 5C 55 5C 93 5C 9B 00 0A 00 00 EB
00 1D 00 E1 00 F0 00 55 00 00 61 00 5A 00 00 02
30 21 23 09 12 5C EE 5D 4A 5C FE 00 0A 00 00 1C
00 23 01 04 01 1D 00 69 00 00 61 00 9D 00 00 53

Can someone point me to a suitable algorithm to compress the above data? The algorithm should have a small memory(RAM) footprint (<5kBytes or so).

The implementation that I tried is from here.

EDIT 1:
I Just had an epiphany, where I lined up each 32/64 byte record one under the other and traversed it vertically. This way, that data turned out to a lot more repetitive and I am now getting much better compression rates.

I am still open to other approaches, however.

Best Answer

You have about 1 Mbyte of data to store. EEPROM is cheap in the scheme of things. The expensive part is waiting 20 minutes to capture the data. The obvious answer there is to increase the baud rate.

At 115.2 kBaud the transfer would take 12 times less, or about 2 minutes after accounting for some protocol overhead. So go fix the firmware to use a reasonable baud rate.