How to Zip Large Files Through SSH Commands

ssh

I have a folder Backissues/ in the root directory. the folder size is much higher which is about 132G. if i run the command zip -r backissue.zip Backissues/ it takes too much time and returns some sort of disconnection after few hours. you can assume my knowledge is not good in the SSH commands. just following through few tutorial get the task done. either if it can possible to split the folders into splitted files. I tried to do the zip to folder that contains not much files e.g. around 250m. that works perfectly and am able to download the files and re-use it someone where else. i'm in the shared hosting environment

the folder is organized as 1991 1992 1993

Best Answer

This is not really an ssh question; the reason ssh is an issue here is because you have timeouts due to the action taking so much time.

Your main problem, the timeouts, is fixed by using background and nohup. Instead of executing zip -r backissue.zip Backissues/, you execute

nohup zip -r backissue.zip Backissues/ &

This does two things: the trailing & makes the zip process execute in the background, and the nohup disconnects the zip process from the connection you are using so that your ssh disconnecting will not interrupt the execution of the zip process. As long as your connection stays open, you can use the jobs bash command to manage the background zip command, but once you log off or your ssh connection breaks and you have to reconnect, you will have to check your process using ps or top, and the output in the file nohup.out that the nohup command creates. You do have to do that, remembering that the job may still be running in the background, because if you restart a process when another one is still executing you may run into needless problems.

I have two improvements on the above.

Using this command instead:

nohup bash -c "zip -r backissue.zip Backissues/ && touch backissue.finished" &

will leave you a tell-tale file that guarantees that the long-running zip operation finished correctly. Otherwise you have to rely on an error showing up in nohup.out, or zip only renaming the output file to its intended name when the process completes successfully.

You can in any case use the zip command's functionality to split output files into pieces, but I suppose you have many back issues. An improvement would be that instead of making one zip file of everything, you make one zip file for every back issue (assuming that your goal is to reduce space, and not to simply get a single file). This breaks the work into distinct parts that can be executed separately and would fail separately. It is also better practice: when you have a new back issue you will not need to redo the whole archive, and when you need access to a back issue, you only have to unzip the one you need. To do this, of course, you need to know how the Backissues folder is structured.

You add that (if I understand correctly) the Backissues folder is organized like this:

Backissues/1991
Backissues/1992
Backissues/1993
...
Backissues/2019

That means that if you instead write year=1991 ; nohup zip -r Backissues.$year.zip Backissues/$year/ & and repeat for all the years, waiting for the previous one to finish so you don't overload your server, you will get one zip file per year, which should be manageable without splitting the files further.

It can get more complicated, of course. As an example you can make use of a script like this:

#!/bin/bash

if [ ! -d Backissues ] ; then
    echo Move to correct folder >&2
    exit 1
fi

mkdir -p Backissues.compressed

cd Backissues

for year in [0-9][0-9][0-9][0-9] ; do
    zipfile="../Backissues.compressed/$year.zip"
    if [ ! -f $zipfile ] ; then
        zip -r $zipfile $year && echo "Compressed $year successfully"
    fi
done

This script can be executed at any time, except that once made, a yearly archive will not be recreated automatically.

Someone competent in shell script programming would be able to customize this to your operational needs (only archiving one year at a time, not making an archive of the current year until the last issue is published or else updating the relevant archive(s) with the latest issue(s), deleting the originals to save space once one is totally certain that the archive is correct, copying backups to some safe storage -- on S3 Glacier Deep Archive your 132 GB would cost you less than USD 2.00 per year, doing all this automatically...)