Eight
Ways to Reverse a File
Ed Schaefer and John Spurgeon
In this article, we present eight ways to reverse a file. Suppose
the data file contains the following:
1. There
2. is
3. more
4. than
5. one
6. way
7. to
8. skin
9. a
10. cat
We just want to reverse it to:
10. cat
9. a
8. skin
7. to
6.way
5. one
4. than
3. more
2. is
1. There
This is a simple problem, so why do we present eight methods for
solving it? Since this is essentially a file-access problem, we're
hoping to make you think about which tool to use the next time you
process a large file. We're also testing the methods to determine
which are the most efficient. See the sidebar for a comparison.
Here are the methods we tested:
- Using vi
- Using an array (using both awk and the shell)
- Using Perl's print reverse command
- Using the tail command's reverse (-r) option
- Using GNU's tac
- Using sed
- Using a dynamic numeric field
- Using shell variables
Using vi
One obscure method is using vi's move attribute. From within
vi, using the command mode, execute:
:g/^/move0
Identify each line, /^/, and move it below line 0. Since line
0 does not exist, each line in the file is placed at the top of the
file pushing each succeeding line down.
The vi editor recognizes the move pattern using the first letter
abbreviation, so the command can be shortened to:
:g/^/m0
Consider another example, where the file isn't to be reversed until
after line 2:
1. There
2. is
10. cat
9. a
8. skin
7. to
6. way
5. one
4. than
3. more
The vi command to do this is:
:g/^/m2
You may think that using vi within a shell script is out, but you
can call vi using a Unix here document:
vi datafile > /dev/null << EJ
:g/^/m0
:x
EJ
Using an Array
We can use a utility that supports arrays, including the shell.
Read the file into an array and index it in reverse order. This
is an awk solution:
awk '{ a[NR]=$0 } END { for(i=NR; i; --i) print a[i] } ' datafile
The awk internal number of records variable (NR) increments as each
line of the data file is read. Store each line in an array indexed
by NR, and at the end, loop through the array in reverse order.
We can also let the shell do it using an array:
# reverse a file
#!/bin/ksh
cnt=0
while read line
do
((cnt=$cnt+1))
rr[cnt]=$line
done < datafile
while (( $cnt > 0 ))
do
print ${rr[$cnt]}
((cnt=$cnt-1))
done
Let's consider other external Unix commands.
Using Perl
Perl's print reverse option simply performs the operation
with one line:
perl -e 'print reverse <>' file
Using tail
Let's say no Perl interpreter is available. The tail
command has a reverse option:
tail -r datafile
Using tac
Unfortunately, GNU's tail command doesn't support the reverse
option. However, GNU provides the tac (concatenate and print
files in reverse) utility:
tac datafile
Using sed
From his sed one-liners Web page, Eric Pement provides two methods
using sed:
sed -n '1!G;h;$p' datafile
sed '1!G;h;$!d' datafile
We'll describe the first solution. As with the shell, multiple sed
commands can exist on the same line if they're separated by a semicolon.
The command can be rewritten as:
sed -n '1!G
h
$p
' datafile
As sed processes a file, lines are read into a work space commonly
known as the pattern space. The sed command also owns a temporary
buffer known as the hold space. Sed's G command appends a newline
and the contents of the hold space to the pattern space while the
h command copies the pattern space to the hold space, overwriting
the previous contents.
The 1!G instructs sed to ignore the first address or line.
Even though, at this point, the hold buffer is empty, a newline
still is sent to the pattern space. Without this address instruction,
we would be left with a blank line at the bottom once the file is
reversed.
The h command sends the line written to the pattern space
to the hold space. With each succeeding line, the G command
sends a newline and the current line to the burgeoning pattern space
containing the file in reverse order.
By default, sed prints what is in the pattern space as each line
is processed. The -n option overrides the default, and the
$p command prints the pattern space when sed is done processing
the file.
The only difference between the sed one-liners is how the pattern
space prints. The second sed one-liner, sed '1!G;h;$!d',
drops the -n option and deletes printing the pattern space
until the end, $!d.
Using a Dynamic Numeric Field
Another solution is from Heiner Steven's Shelldorado Web site:
nl -ba datafile | sort -nr | cut -f2-
This one-liner adds a dynamically generated numeric field to each
line in the data file using the nl (line number filter) utility
and then pipes it to sort where the added column is sorted
in numeric, reverse order. Finally, the output is piped to cut
where the added column is removed, which delivers the file in reverse
order.
Heiner warns us of a drawback to using nl "since it interprets
certain strings in the text as being page headers, page footers,
etc., e.g.":
\:\:\:
Any Unix command that creates line numbers can replace the nl
command. This example uses the cat command:
cat -n datafile | sort -nr | cut -f2-
Using Shell Variables
Absolutely the coolest way to solve this problem is to let the
shell do it by dynamically creating and assigning each line to an
individual shell variable and then displaying the variables in reverse
order:
#!/bin/ksh
i=0
while read line
do # create a variable for each line
eval a$i=\$line
((i=i+1))
done < datafile
while (($i > 0))
do # display each variable in reverse order
((i=i-1))
eval echo \"\$a$i\"
done
Create a string by appending an integer counter to a character, assign
each line to the string, and create the variable by eval'ing
the string contents. Repeat this process for each line in the file
as the counter increments.
To reverse the file, create variable strings with the counter
decreasing; eval'ing an echo of the variable displays the
contents of each string as the counter decrements.
This process is analogous to manipulating arrays, but without
the limitations of arrays. In theory, with a 32-bit integer, a file
2,147,483,647-lines long can be processed before integer overflow.
Of course, this really depends on available memory.
References
Dougherty, Dale and Arnold Robbins. 1997. sed & awk.
Sebastopol, CA: O'Reilly & Associates.
Resources
Eric Piement's sed one-liners: http://www.student.northpark.edu/pemente/sed/sed1line52.txt
Steven, Heiner. Shelldorado -- Unix shell scripting resource:
http://www.shelldorado.com
Steven, Heiner. "Re: reverse a file", Email to Ed Schaefer. November
22, 2004.
John Spurgeon is a software developer and systems administrator
for Intel's Factory Integrated Information Systems, FIIS, in Aloha,
Oregon. He is currently training for the Furnace Creek 508 and still
enjoys turfgrass management, triathlons, and spending time with
his family.
Ed Schaefer is a frequent contributor to Sys Admin.
He is a software developer and DBA for Intel's Factory Integrated
Information Systems, FIIS, in Aloha, Oregon. Ed also hosts the monthly
Shell Corner column on UnixReview.com. He can be reached
at: shellcorner@comcast.net. |