Juniper Filesystem Full – How to Resolve

juniperjuniper-exsyslog

Our Splunk server recently reported an error on one of our Juniper EX 4200s. 1

Aug  4 11:45:16  25SRV01 /kernel: pid 7661 (dd), uid 0 inumber 217 on /var: filesystem full

It appears our /var filesystem is full and is no longer accepting log messages. This is also causing premature rotation of some of our files.

rj@25SRV01# run show log interactive-commands.0.gz | last 1
Aug  4 11:40:16 25SRV01 newsyslog[7609]: logfile turned over due to -F request
rj@25SRV01# run show log firewall.0.gz | last 1
Aug  4 11:40:16 25SRV01 newsyslog[7609]: logfile turned over due to -F request

There doesn’t seem to be anything out of the ordinary with this system versus the rest of our devices. Below is our configuration. 2

rj@25SRV01# show system syslog
user * {
    any emergency;
}
file syslog {
    any any;
}
file firewall {
    firewall any;
}
file messages {
    any notice;
    authorization info;
}
file interactive-commands {
    interactive-commands any;
}
{master:0}[edit] 

The strange thing is that there isn’t that much actual data in our log files.

root@25SRV01:RE:0% du -h /var/log/.
2.0K    /var/log/./flowc/failed
4.0K    /var/log/./flowc
2.0K    /var/log/./ext
2.0K    /var/log/./ggsn/gtppcdr
4.0K    /var/log/./ggsn
2.8M    /var/log/.      <--- very reasonable log size

I've looked in the Juniper Knowledge Article: How to resolve the '/var: filesystem full' issue which occurs as result of the WTMP file not being archived, but my WTMP file is resonably sized.

root@25SRV01:RE:0% ls -lsah wtmp*
3040 -rw-rw-r--  1 root  wheel   1.5M Aug  4 13:48 wtmp   <----- Small enough
   4 -rw-rw-r--  1 root  wheel    91B Nov 19  2013 wtmp.0.gz
   4 -rw-rw-r--  1 root  wheel    57B Jun 14  2013 wtmp.1.gz
   4 -rw-rw-r--  1 root  wheel    82B Nov 19  2013 wtmp.2.gz
root@25SRV01:RE:0%

How do I figure out what's taking up the space and fix it?


1. I am aware that this pid was induced with dd, which was intentional to replicate an issue we have had in the past. I noticed a device getting near the limit and want to share a common issue we have. Command used: dd if=/dev/random of=/var/overrun.pkg bs=1M count=20.
2. Some log configuration removed for organizational restrictions, such as syslog host and syslog source-address.

Best Answer

There are 3 real ways to fix this, all of which are fairly simple.

Automated Storage Cleanup

Juniper has a system cleanup tool for handling this automatically. It operates almost exclusively under the /var/* directory structure; meaning it isn’t all that critical, unless you care about your log files (which you should!).

Below is a system storage cleanup on multiple FPCs.

rj@25SRV01# run request system storage cleanup
Please check the list of files to be deleted using the dry-run option. i.e.
request system storage cleanup dry-run
Do you want to proceed ? [yes,no] (no) yes

fpc0:
--------------------------------------------------------------------------

List of files to delete:

         Size Date         Name
    11B Nov 19  2013 /var/jail/tmp/alarmd.ts
  71.8K Jun 28 20:54 /var/log/chassisd.0.gz
   147B Aug  5 07:52 /var/log/default-log-messages.0.gz
   142B Aug  4 13:16 /var/log/default-log-messages.1.gz
   125B Aug  4 11:40 /var/log/default-log-messages.2.gz
   135B Aug  5 07:52 /var/log/firewall.0.gz
   130B Aug  4 13:16 /var/log/firewall.1.gz
  3045B Aug  4 11:40 /var/log/firewall.2.gz
  8265B Nov 19  2013 /var/log/firewall.3.gz
   298B Nov 19  2013 /var/log/install.0.gz
  1708B Aug  5 07:52 /var/log/interactive-commands.0.gz
  1275B Aug  4 13:16 /var/log/interactive-commands.1.gz
  8465B Aug  4 11:40 /var/log/interactive-commands.2.gz
 <snip>
 124.0K Jun 14  2013 /var/tmp/gres-tp/env.dat
     0B Jun 14  2013 /var/tmp/gres-tp/lock
 106.3M Jul 31 08:16 /var/tmp/mchassis-install.tgz
     0B Jun 14  2013 /var/tmp/rtsdb/if-rtsdb

fpc1:
--------------------------------------------------------------------------

List of files to delete:

         Size Date         Name
    11B Jun 14  2013 /var/jail/tmp/alarmd.ts
   147B Aug  5 07:52 /var/log/default-log-messages.0.gz
   144B Aug  4 13:16 /var/log/default-log-messages.1.gz
   126B Aug  4 11:40 /var/log/default-log-messages.2.gz
   135B Aug  5 07:52 /var/log/firewall.0.gz
   <snip>
    27B Oct 20  2013 /var/log/wtmp.2.gz
    27B Sep 20  2013 /var/log/wtmp.3.gz
    27B Sep 20  2013 /var/log/wtmp.4.gz
     5B Jun 14  2013 /var/lost+found/#04112
     5B Jun 14  2013 /var/lost+found/#04163
 124.0K Jul 30 16:43 /var/tmp/gres-tp/env.dat
     0B Jun 14  2013 /var/tmp/gres-tp/lock
     0B Jun 14  2013 /var/tmp/rtsdb/if-rtsdb

{master:0}[edit]
rj@25SRV01# 

This is the preferred Juniper way of handling this. Running this with the dry-run option will show you what will be deleted ahead of time. You should do this first.


Locate the Offending File

You should be sending all of your logs to a syslog collector, making option 1 the best solution. If, however, you aren't and would rather not delete all of your log files, you may fare better at finding the offending file yourself. You will need a tad bit more familiarity with Unix systems, but if you know your way around the CLI, you should be alright.

First you’ll want to see how much over you are on that specific volume.

rj@25SRV01# run start shell user root
Password:
root@25SRV01:RE:0% df -h
Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/da0s2a      183M    129M     39M    77%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/md0          68M     68M      0B   100%    /packages/mnt/jbase
/dev/md1         5.8M    1.1M    4.2M    21%    /packages/mfs-fips-mode-powerpc
/dev/md2         2.9M    2.9M      0B   100%    /packages/mnt/fips-mode-powerpc-12.3R3.4
/dev/md3         9.0M    4.4M    3.9M    53%    /packages/mfs-jcrypto-ex
/dev/md4          12M     12M      0B   100%    /packages/mnt/jcrypto-ex-12.3R3.4
/dev/md5         8.1M    3.5M    4.0M    47%    /packages/mfs-jdocs-ex
/dev/md6         6.2M    6.2M      0B   100%    /packages/mnt/jdocs-ex-12.3R3.4
/dev/md7          43M     39M    718K    98%    /packages/mfs-jkernel-ex
/dev/md8         107M    107M      0B   100%    /packages/mnt/jkernel-ex-12.3R3.4
/dev/md9          12M    7.5M    3.6M    68%    /packages/mfs-jpfe-ex42x
/dev/md10         21M     21M      0B   100%    /packages/mnt/jpfe-ex42x-12.3R3.4
/dev/md11         17M     12M    3.2M    79%    /packages/mfs-jroute-ex
/dev/md12         38M     38M      0B   100%    /packages/mnt/jroute-ex-12.3R3.4
/dev/md13         12M    7.2M    3.6M    66%    /packages/mfs-jswitch-ex
/dev/md14         21M     21M      0B   100%    /packages/mnt/jswitch-ex-12.3R3.4
/dev/md15         14M    9.5M    3.4M    73%    /packages/mfs-jweb-ex
/dev/md16         25M     25M      0B   100%    /packages/mnt/jweb-ex-12.3R3.4
/dev/da0s3e      123M    122M   -8.6M   108%    /var  <---- # This doesn't look right
/dev/md17        126M     12K    116M     0%    /tmp
/dev/da0s3d      369M    106M    233M    31%    /var/tmp
/dev/da0s4d       62M    368K     57M     1%    /config
/dev/md18        118M     22M     87M    20%    /var/rundb
procfs           4.0K    4.0K      0B   100%    /proc
/var/jail/etc    123M    122M   -8.6M   108%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/etc
/var/jail/run    123M    122M   -8.6M   108%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/run
/var/jail/tmp    123M    122M   -8.6M   108%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/tmp
/var/tmp         369M    106M    233M    31%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/tmp/uploads
devfs            1.0K    1.0K      0B   100%    /packages/mnt/jweb-ex-12.3R3.4/jail/dev
root@25SRV01:RE:0%

Next, you’ll need to figure out where you’re large directories are.

root@25SRV01:RE:0% du /var/ | sort -r
8       /var/transfer
24      /var/lost+found/#08193/certs/common
24      /var/etc/filters
24      /var/db/certs/common
232     /var/jail
223156  /var/lost+found
217992  /var/tmp
217756  /var/lost+found/#04099
217736  /var/lost+found/#04099/remote   <----- # Possible Issue
208     /var/jail/etc
root@25SRV01:RE:0% du -h /var/lost+found/#04099/remote
2.0K    /var/lost+found/#04099/remote/.ssh
106M    /var/lost+found/#04099/remote  <------ # Culprit
root@25SRV01:RE:0%

In this instance, we can see that something is happening inside /var/lost+found/#04099/remote, it’s using 106M of a 123M volume.

Go there, find the file and remove it.

root@25SRV01:RE:0% cd /var/lost+found/#04099/remote
root@25SRV01:RE:0% ls -lsah
total 217740
     4 drwxr-xr-x  3 remote  20      512B Nov  7  2013 .
     4 drwxr-xr-x  5 root    wheel   512B Nov 26  2012 ..
     4 drwxr-xr-x  2 remote  20      512B Nov 26  2012 .ssh
217728 -rw-r--r--  1 remote  20      106M Nov  7  2013 jinstall-ex-4200-12.3R3.4-domestic-signed.tgz
root@25SRV01:RE:0% rm jinstall-ex-4200-12.3R3.4-domestic-signed.tgz

Now, our filestructure is way under the limit we need it to be at.

root@25SRV01:RE:0% df -h
Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/da0s2a      183M    129M     39M    77%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/md0          68M     68M      0B   100%    /packages/mnt/jbase
/dev/md1         5.8M    1.1M    4.2M    21%    /packages/mfs-fips-mode-powerpc
/dev/md2         2.9M    2.9M      0B   100%    /packages/mnt/fips-mode-powerpc-12.3R3.4
/dev/md3         9.0M    4.4M    3.9M    53%    /packages/mfs-jcrypto-ex
/dev/md4          12M     12M      0B   100%    /packages/mnt/jcrypto-ex-12.3R3.4
/dev/md5         8.1M    3.5M    4.0M    47%    /packages/mfs-jdocs-ex
/dev/md6         6.2M    6.2M      0B   100%    /packages/mnt/jdocs-ex-12.3R3.4
/dev/md7          43M     39M    718K    98%    /packages/mfs-jkernel-ex
/dev/md8         107M    107M      0B   100%    /packages/mnt/jkernel-ex-12.3R3.4
/dev/md9          12M    7.5M    3.6M    68%    /packages/mfs-jpfe-ex42x
/dev/md10         21M     21M      0B   100%    /packages/mnt/jpfe-ex42x-12.3R3.4
/dev/md11         17M     12M    3.2M    79%    /packages/mfs-jroute-ex
/dev/md12         38M     38M      0B   100%    /packages/mnt/jroute-ex-12.3R3.4
/dev/md13         12M    7.2M    3.6M    66%    /packages/mfs-jswitch-ex
/dev/md14         21M     21M      0B   100%    /packages/mnt/jswitch-ex-12.3R3.4
/dev/md15         14M    9.5M    3.4M    73%    /packages/mfs-jweb-ex
/dev/md16         25M     25M      0B   100%    /packages/mnt/jweb-ex-12.3R3.4
/dev/da0s3e      123M     15M     98M    14%    /var   <----- # Much better
/dev/md17        126M     12K    116M     0%    /tmp
/dev/da0s3d      369M    106M    233M    31%    /var/tmp
/dev/da0s4d       62M    368K     57M     1%    /config
/dev/md18        118M     22M     87M    20%    /var/rundb
procfs           4.0K    4.0K      0B   100%    /proc
/var/jail/etc    123M     15M     98M    14%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/etc
/var/jail/run    123M     15M     98M    14%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/run
/var/jail/tmp    123M     15M     98M    14%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/tmp
/var/tmp         369M    106M    233M    31%    /packages/mnt/jweb-ex-12.3R3.4/jail/var/tmp/uploads
devfs            1.0K    1.0K      0B   100%    /packages/mnt/jweb-ex-12.3R3.4/jail/dev

Dry-Run and Locate File

This way is a little more arguous, since you have to look through everything and make sure not to mistake any Ms for Ks. Run the request system storage cleanup with the dry-run option and look through it.

rj@25SRV01# run request system storage cleanup dry-run
fpc0:
--------------------------------------------------------------------------

List of files to delete:

         Size Date         Name
    11B Nov 19  2013 /var/jail/tmp/alarmd.ts
  71.8K Jun 28 20:54 /var/log/chassisd.0.gz
   142B Aug  4 13:16 /var/log/default-log-messages.0.gz
   125B Aug  4 11:40 /var/log/default-log-messages.1.gz
   130B Aug  4 13:16 /var/log/firewall.0.gz
   <snip>
     0B Nov 19  2013 /var/lost+found/#00124
   231B Nov 19  2013 /var/lost+found/#00125
   606B Nov 19  2013 /var/lost+found/#00126
  40.0K Nov 19  2013 /var/lost+found/#00139
  40.0K Nov 19  2013 /var/lost+found/#00142
106.3M Nov  7  2013 /var/lost+found/#04099/remote/
         jinstall-ex-4200-12.3R3.4-domestic-signed.tgz <---- # Here it is
124.0K Jun 14  2013 /var/tmp/gres-tp/env.dat
     0B Jun 14  2013 /var/tmp/gres-tp/lock
 106.3M Jul 31 08:16 /var/tmp/mchassis-install.tgz
     0B Jun 14  2013 /var/tmp/rtsdb/if-rtsdb

fpc1:
--------------------------------------------------------------------------

List of files to delete:

         Size Date         Name
    11B Jun 14  2013 /var/jail/tmp/alarmd.ts
   144B Aug  4 13:16 /var/log/default-log-messages.0.gz
   126B Aug  4 11:40 /var/log/default-log-messages.1.gz
   <snip>
     0B Jun 14  2013 /var/tmp/rtsdb/if-rtsdb

{master:0}[edit]

Then go and delete it.

rj@25SRV01# run start shell user root
Password:
root@25SRV01:RE:0% cd /var/lost+found/#04099/remote
root@25SRV01:RE:0% rm jinstall-ex-4200-12.3R3.4-domestic-signed.tgz

From here, you should have a healthy, usable filesystem that can resume logging.

Related Topic