I may not know where I set down my keys or my latest copy of Linux Magazine, but I do know where my programs are installed. If I cannot think of where one of my files is I can always use the find command.
The find utility can be used in a lot of interesting ways to help you find something. I did briefly cover the find command for locating individual files, but it can also locate directories, old files, big files or even something specified by inode.
I also do a pretty good job of ensuring that when I install something on production it will look after itself as far as space consumed – old data and log files don’t normally hold a lot of added value.
Yet, some of the guys I work with apparently feel that the disks are large and they never feel the need to clean up after themselves. I know that because I start to receive emails from IT that the system is filling up while leaving us to find out which directories are the culprit.
Disk Usage(du)
The disk usage tool actually will show how much space is being consumed by individual directories. As long as those directories are no deeper than one level, you don’t need more than this.
du -h .
This will show the disk space used in human measurements (ie -h) so it is possible to see how many kilo, mega or giga bytes you are using. By default the disk usage command will show how much space is used for the directories under the path given and also display how much space any other directories under that take.
This is both annoying and less than transparent if someone created a new sub-directory for each day or months processed data files.
3.1M ./purchaseord/processed/201505 302K ./purchaseord/processed/201506 563K ./purchaseord/processed/201507 1.7M ./purchaseord/processed/201505 271K ./purchaseord/processed/201508 255K ./purchaseord/processed/201509 96K ./purchaseord/processed/201510 65K ./purchaseord/processed/201511 175K ./purchaseord/processed/201512 1.5M ./purchaseord/processed/201503 21K ./purchaseord/processed/201601 8.9M ./purchaseord/processed 409K ./purchaseord/manual-processed
Finding only the top directories is harder as the number of directories, sub-directories and sub-sub directories increases in inconsistent ways.
The good news is that the find command can be used to not only find directories (ie -type d) but also limit the directories to those found at the top level (! -name . -prune). This is actually not all that different from when I was trying to create a list of files or directories as an input to another program.
DATADIRECTOR=/vend/samba/configsys/interfaces/ find $DATADIRECTORY -type d ! -name . -prune | xargs du -hs
The find command will just generate a list of top directories. This clever way of piping it allows us to take that list and pass it to the disk usage command one at a time by using the xargs utility.
1.0G ./purchaseord 883K ./calendars 10K ./winhistory 63M ./repository 2K ./acctrpt 96M ./regulatory