command line fun – finding a list of files

In Unix at the command line there is almost always an alternative method to do something. I usually remember this when I need to process a few files in the current working directory and I may not always have the most concise command at hand.

Recently, I only wanted older files and none of the contents from the subdirectories.  Of course you can simply gather up the files in a small temporary file and then process them from there.  This also gives you a chance to even manually edit the file a bit.

This doesn’t work well, when you have to ask IT to perform some task for you.  They don’t like dealing with manual steps, nor do they like dealing with more than a few files in the directory for a single request.

With this in mind, I needed to come up with a command that they could execute that would gather up all of the files without me having to create a hard coded script with hundreds of move commands.

ie.

mv  20151029_0605200032_66677701232222299.xml  manual-processed
mv  20151030_0605200032_66677701232222299.xml  manual-processed
mv  20151031_0605200032_66677701232222299.xml  manual-processed

I wouldn’t have to deal with this at all, if some my colleague’s programs cleverly dealt with cleanup after they processed their data but that is another story.  So this time I did some research on the internet and found out a elegant solution for my  problem.

The task
Move any old xml data files older than three days to the manual-processed subdirectory which exists in the current directory at the same time ignoring any other subdirectories and their contents.

The solution
Use the find command to create a list of these old files.

I like the find command because it is possible to process a directory with a virtually unlimited number of files without any “line length to long errors”.  The find command can also find files based on the last time a file has been modified.  This is the same as finding files based on a creation date if the file doesn’t get altered since creation.

First attempt

cd <data directory>
find . ! -name . -prune -name "*xml" -mtime +3 -exec mv {} manual-processed ";"

The find command on some operating systems provides maxdepth and mindepth however, as that is not an option on the Solaris I am using, I will have to use the -prune command line option.  This will not list anything underneath “pattern”, which in this case is the current working directory.

The command works just fine except it includes both the manual-processed and processed subdirectories from my working directory in the list of files.

It is possible to excludes things using the exclamation symbol.  Thus it is possible to add patterns or choices that should be excluded.

Second attempt

cd <data directory>
find . ! -name . -prune -name "*xml" ! -name processed ! -name manual-processed -mtime +3 -exec mv {} manual-processed ";"

This actually works just fine.  It filters out both the processed and manual-processed directories.  This can easily be used to solve my problem.

Note: Because I was using -name “*xml” I don’t need to worry about any subdirectories that do not contain xml in the name.  If you want to look at all files but exclude these directories then you would leave off the name (ie -name “*xml”) but add the directories (! -name processed)

Extra Credit edition

In my case, I only have two subdirectories that I need to filter out so the previous solution is just fine.  Yet, it doesn’t scale well if some unexpected directories also exist and also need to be excluded.

The key is the exclamation symbol which is used to negate the arguments that follow as part of the pattern thus excluding them.  We can use the -type argument of find command to limit what we are looking for to a certain type and then negate it.  In this case it is directories.

cd <data directory>
find . ! -name . -prune ! -type d  -mtime +3 -exec mv {} manual-processed ";"

This final command limits the find to everything in the current working directory while excluding any directories.  This will be further limited to any files that are older than 3 days.  I wasn’t really only interested in XML files I was really interested in everything.

Note: The prune command line option is pretty finicky.  In order to use it to ignore the subdirectories you pretty much need to start out your find like this.

find . ! -name . -prune (more options here)

 

This entry was posted in Command line and tagged , . Bookmark the permalink.