While looking into another problem, plethora of shells, I was initially afraid that some of my fooling around was causing some odd side effects. While investigating I ended up with a different solution that was actually overkill but it help eliminate other options as a source of the problem.
My solution was to create a file with the names of all of the files that I needed to process and use that as my input. This allowed me to look at as many files as I wanted to without seeing anything like this.
-bash: /usr/bin/ls: Arg list too long
This isn’t the only solution, nor even the most efficient but it sure is a fun one that uses both head and tail on the same line. A more efficient solution would probably to use xargs, find or perhaps a more efficient shell script.
Rather than to do the efficient, I am going to show off the fun solution. This is something you could use even in a production environment as long as the number of files is reasonably small.
I decided to use both head and tail to pick out lines from a file for processing. Simply save a list of all files to a temporary file and then process each file one at a time.
The fun example
INPUT=/home/bob/input TMPFILE=`mktemp /tmp/XXXXXX` pushd `pwd` cd $INPUT ls -1 > $TMPFILE CNT=`cat $TMPFILE | wc -l` IDX=1 while [ $IDX -le $CNT ] do NAME=`head -n ${IDX} $TMPFILE | tail -1` echo processing $NAME # do some processing here IDX=$(($IDX +1)) done popd rm $TMPFILE
The actual interesting part is where the variable NAME is being assigned but before getting ahead of myself, here is a blow by blow explanation of the script.
# | Line | Description |
---|---|---|
1. | INPUT=/home/bob/input | set up our input location |
2. | TMPFILE=`mktemp /tmp/XXXXXX` | create a unique filename in the /tmp directory |
3. | pushd `pwd` | save our current directory |
4. | cd $INPUT | change to input directory |
5. | ls -1 > $TMPFILE | save a list of file names (no directory) into our temporary file. |
6. | CNT=`cat $TMPFILE | wc -l` | get a count of the number of lines in our file. |
7. | IDX=1 | which line of the file we are on at the moment. |
8. | while [ $IDX -le $CNT ] | while there are lines left to process |
9. | do | |
10. | NAME=`head -n ${IDX} $TMPFILE | tail -1` | get the entry from the file. |
11. | echo processing $NAME | in this case echo the filename to the screen. |
12. | # do some processing here | nothing to see here, move along. |
13. | IDX=$(($IDX +1)) | increase index to the next line |
14. | done | |
15. | popd | change back to original directory |
16. | rm $TMPFILE | remove temporary file. |
The code is pretty obvious what it is doing for most of the script. The interesting lines might be 2 and 10.
The mktemp command simply creates a unique file that is six characters long designated by the XXXXXX format. This is really useful as the operating system will do the hard work for you. It is even possible to create a subdirectory with a file or just a unique subdirectory.
Line number ten is actually pretty simple when you think about it. The head command will take that many lines from our temporary file. the first time through this will be the first line. the tail command will then take the last line from this batch. When the index is 1 both the head and tail will deliver a single line.
When the index is larger (ie 5) then more lines will be delivered. The head command will deliver all of the previously processed lines along with one new one. The tail command will pull off that one new line so it can be processed.
This method of processing data might also be interesting if there are spaces in the filenames or paths. You would know with certainty that each line represents a single parameter or value.
Again, this is a terribly inefficient way of processing through a million lines, especially if these lines are each hundreds of characters long.