Our Splunk server recently reported an error on one of our Juniper EX 4200s. 1
Aug 4 11:45:16 25SRV01 /kernel: pid 7661 (dd), uid 0 inumber 217 on /var: filesystem full
It appears our /var
filesystem is full and is no longer accepting log messages. This is also causing premature rotation of some of our files.
rj@25SRV01# run show log interactive-commands.0.gz | last 1
Aug 4 11:40:16 25SRV01 newsyslog[7609]: logfile turned over due to -F request
rj@25SRV01# run show log firewall.0.gz | last 1
Aug 4 11:40:16 25SRV01 newsyslog[7609]: logfile turned over due to -F request
There doesn’t seem to be anything out of the ordinary with this system versus the rest of our devices. Below is our configuration. 2
rj@25SRV01# show system syslog
user * {
any emergency;
}
file syslog {
any any;
}
file firewall {
firewall any;
}
file messages {
any notice;
authorization info;
}
file interactive-commands {
interactive-commands any;
}
{master:0}[edit]
The strange thing is that there isn’t that much actual data in our log files.
root@25SRV01:RE:0% du -h /var/log/.
2.0K /var/log/./flowc/failed
4.0K /var/log/./flowc
2.0K /var/log/./ext
2.0K /var/log/./ggsn/gtppcdr
4.0K /var/log/./ggsn
2.8M /var/log/. <--- very reasonable log size
I've looked in the Juniper Knowledge Article: How to resolve the '/var: filesystem full' issue which occurs as result of the WTMP file not being archived, but my WTMP file is resonably sized.
root@25SRV01:RE:0% ls -lsah wtmp*
3040 -rw-rw-r-- 1 root wheel 1.5M Aug 4 13:48 wtmp <----- Small enough
4 -rw-rw-r-- 1 root wheel 91B Nov 19 2013 wtmp.0.gz
4 -rw-rw-r-- 1 root wheel 57B Jun 14 2013 wtmp.1.gz
4 -rw-rw-r-- 1 root wheel 82B Nov 19 2013 wtmp.2.gz
root@25SRV01:RE:0%
How do I figure out what's taking up the space and fix it?
1. I am aware that this pid was induced with dd
, which was intentional to replicate an issue we have had in the past. I noticed a device getting near the limit and want to share a common issue we have. Command used: dd if=/dev/random of=/var/overrun.pkg bs=1M count=20
.
2. Some log configuration removed for organizational restrictions, such as syslog host
and syslog source-address
.
Best Answer
There are 3 real ways to fix this, all of which are fairly simple.
Automated Storage Cleanup
Juniper has a system cleanup tool for handling this automatically. It operates almost exclusively under the
/var/*
directory structure; meaning it isn’t all that critical, unless you care about your log files (which you should!).Below is a
system storage cleanup
on multiple FPCs.This is the preferred Juniper way of handling this. Running this with the
dry-run
option will show you what will be deleted ahead of time. You should do this first.Locate the Offending File
You should be sending all of your logs to a syslog collector, making option 1 the best solution. If, however, you aren't and would rather not delete all of your log files, you may fare better at finding the offending file yourself. You will need a tad bit more familiarity with Unix systems, but if you know your way around the CLI, you should be alright.
First you’ll want to see how much over you are on that specific volume.
Next, you’ll need to figure out where you’re large directories are.
In this instance, we can see that something is happening inside
/var/lost+found/#04099/remote
, it’s using 106M of a 123M volume.Go there, find the file and remove it.
Now, our filestructure is way under the limit we need it to be at.
Dry-Run and Locate File
This way is a little more arguous, since you have to look through everything and make sure not to mistake any
M
s forK
s. Run therequest system storage cleanup
with thedry-run
option and look through it.Then go and delete it.
From here, you should have a healthy, usable filesystem that can resume logging.