Linux – Split large log file WITHOUT keeping the original (splitting in-place)

awklinuxsed

I have a 226GB log file, and I want to split it up into chunks for easier xzing. Problem is I only have 177GB of space left in my workable space.

Is there a way to split a file in half or into N number of chunks without keeping an additional copy of the original?

    $ split myFile.txt
    $ ls -halF
-rw-r--r--   1 user group 35 Dec 29 13:17 myFile.txt
-rw-r--r--   1 user group 8 Dec 29 13:18 xaa
-rw-r--r--   1 user group 3 Dec 29 13:18 xab
-rw-r--r--   1 user group 5 Dec 29 13:18 xac
-rw-r--r--   1 user group 10 Dec 29 13:18 xad
-rw-r--r--   1 user group 8 Dec 29 13:18 xae
-rw-r--r--   1 user group 1 Dec 29 13:18 xaf

I would rather just have no myFile.txt left over, and have the split files only. I would gladly just stick with the default behavior and delete the original, but I just don't have the space available to work in to accomplish that.

I'm not an expert at sed or awk but I thought maybe there would be a way to "remove into another file" kind of behavior that could be achieved with one of them?

Best Answer

What might work is to stream parts of it directly into xz - I guess you can compress a log file good enough to fit both the original and the compressed parts into your space left.

  1. Get the number of lines:

    wc -l myFile.txt
    
  2. Divide this into as many parts as you like, e.g. 10k lines per part.
  3. Use sed to pipe the part you want into xz:

    sed -n '1,10000p' myFile.txt | xz > outfile01.xz 
    sed -n '10001,20000p' myFile.txt | xz > outfile02.xz
    

etc. This could be done by a script of course.

But honestly, do as EEAA said...

Related Topic