Networking – How to Reconstruct a File from a TCP Stream

cnetworking

I have a client and a server and a third box which sees all packets from the server to the client (but not the other way around). Now when the client requests a file from the server (over HTTP), the third box sees the response. I am trying to reconstruct the file there. I am using libpcap to capture TCP datagrams and trying to reconstruct the file there. Here is what I did

  1. Listen for packets on an interface
  2. Group all packets which have the same ACK number
  3. Sort the group based on SEQ number
  4. Extract data from each packet and combine them and write to the disk

The problem is, the file thus generated is not exactly the same as the original file. Their sizes are the same each, but when I open the images in an image viewer, they look different. time Does everything sound correct here?

EDIT
Hexdump of first 512 bytes of the recovered file

0000000 009a 0095 0090 008b 0086 0081 007c 0077
0000010 0072 006d 0068 0063 005e 0059 0054 004f
0000020 004a 0045 0040 003b 0037 0032 002d 0028
0000030 0023 001e 0019 0014 000f 000a 0005 0000
0000040 0400 0000 0000 0000 7276 6375 5420 4352
0000050 0000 0000 6720 7369 0000 0000 0000 0000
0000060 0000 0000 0002 0000 028f 0000 0000 0000
0000070 0001 0000 0000 0000 6173 6d65 1fe7 0057
0000080 0000 0050 0956 004c 0000 0000 5a20 5859
0000090 0001 0000 5c9e 0003 130b 0004 edcc 0003
00000a0 cf14 0010 5f2e 0014 a4fe 0013 0000 0000
00000b0 6577 7669 0000 0000 0000 0000 0000 0000
00000c0 0000 0000 0000 0000 0000 0000 2e31 2d32
00000d0 3636 3139 4336 4945 6e20 2069 6f6e 7469
00000e0 6469 6f6e 2043 6e67 7769 6965 2056 6365
00000f0 656e 6572 6566 2c52 0000 0000 0000 0000
0000100 0000 3100 322e 362d 3936 3631 4543 2049
0000110 696e 6e20 696f 6974 6e64 436f 6720 696e
0000120 6577 5669 6520 6e63 7265 6665 5265 002c
0000130 0000 0000 0000 7363 6465 0000 0000 0000
0000140 0000 0000 0000 0000 0000 0000 0000 0000
0000150 4742 7352 2d20 6520 6163 7370 7220 6f75
0000160 6f6c 2063 4742 2052 6c74 6175 6566 2044
0000170 2e31 2d32 3636 3139 2036 4543 2e49 0000
0000180 0000 0000 0000 0000 4200 5247 2073 202d
0000190 6365 7061 2073 7572 6c6f 636f 4220 5247
00001a0 7420 756c 6661 4465 3120 322e 362d 3936
00001b0 3631 4320 4945 002e 0000 0000 0000 7363
00001c0 6465 0000 0000 0000 0000 0000 0000 0000
00001d0 0000 0000 0000 0000 0000 0000 0000 0000
*
00001f0 6368 632e 6965 772e 7777 2f2f 703a 7474
0000200

Hexdump of first 512 bytes of the original file:

0000000 d8ff e0ff 1000 464a 4649 0100 0101 2c01
0000010 2c01 0000 e2ff 6d1c 4349 5f43 5250 464f
0000020 4c49 0045 0101 0000 5d1c 694c 6f6e 1002
0000030 0000 6e6d 7274 4752 2042 5958 205a ce07
0000040 0200 0900 0600 3100 0000 6361 7073 534d
0000050 5446 0000 0000 4549 2043 5273 4247 0000
0000060 0000 0000 0000 0000 0000 0000 d6f6 0100
0000070 0000 0000 2dd3 5048 2020 0000 0000 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
*
00000a0 0000 0000 0000 0000 1100 7063 7472 0000
00000b0 5001 0000 3300 6564 6373 0000 8301 0000
00000c0 6c00 7477 7470 0000 ef01 0000 1400 6b62
00000d0 7470 0000 0302 0000 1400 5872 5a59 0000
00000e0 1702 0000 1400 5867 5a59 0000 2b02 0000
00000f0 1400 5862 5a59 0000 3f02 0000 1400 6d64
0000100 646e 0000 5302 0000 7000 6d64 6464 0000
0000110 c302 0000 8800 7576 6465 0000 4b03 0000
0000120 8600 6976 7765 0000 d103 0000 2400 756c
0000130 696d 0000 f503 0000 1400 656d 7361 0000
0000140 0904 0000 2400 6574 6863 0000 2d04 0000
0000150 0c00 5472 4352 0000 3904 0000 0c08 5467
0000160 4352 0000 450c 0000 0c08 5462 4352 0000
0000170 5114 0000 0c08 6574 7478 0000 0000 6f43
0000180 7970 6972 6867 2074 6328 2029 3931 3839
0000190 4820 7765 656c 7474 502d 6361 616b 6472
00001a0 4320 6d6f 6170 796e 6400 7365 0063 0000
00001b0 0000 0000 7312 4752 2042 4549 3643 3931
00001c0 3636 322d 312e 0000 0000 0000 0000 0000
00001d0 1200 5273 4247 4920 4345 3136 3639 2d36
00001e0 2e32 0031 0000 0000 0000 0000 0000 0000
00001f0 0000 0000 0000 0000 0000 0000 0000 0000
0000200

Some more details:

  1. I am using C++
  2. The packet data is being stored as std::vector<char>
  3. I did change the byte order while reading the ack number and seq number from the packet using ntohl
  4. I am not sure if I need to change the byte order for the data as well. I tried to reverse the data from each packet before combining them, even that did not work.

Is there something I am missing?

EDIT
Original file:
Original

Recovered file:
Recovered

Best Answer

TCP is a stream protocol, usually on top of IP (IPv4 or 6).It doesn't cover file transmission. Instead, this is typically delegated to other protocols such as FTP, HTTP, HTTPS, NFS, CIFS or rsync.

You are apparently removing the IP and TCP headers from the data, but not the metadata added by the other protocol(s) you're using. Since you haven't told us what those are, we can't specifically address them, but in general you'll have to implement a protocol handler for that.

Note that with advanced protocols such as HTTPS, you obviously and by design cannot recover a file from only the server side packets.

Related Topic