Cover V14, i06

Article
Sidebar
Table 1

jun2005.tar

Eight Ways to Reverse a File

Ed Schaefer and John Spurgeon

In this article, we present eight ways to reverse a file. Suppose the data file contains the following:

1. There
2. is
3. more
4. than
5. one
6. way
7. to
8. skin
9. a
10. cat

We just want to reverse it to:

10. cat
9. a
8. skin
7. to
6.way
5. one
4. than
3. more
2. is
1. There

This is a simple problem, so why do we present eight methods for solving it? Since this is essentially a file-access problem, we're hoping to make you think about which tool to use the next time you process a large file. We're also testing the methods to determine which are the most efficient. See the sidebar for a comparison.

Here are the methods we tested:

  • Using vi
  • Using an array (using both awk and the shell)
  • Using Perl's print reverse command
  • Using the tail command's reverse (-r) option
  • Using GNU's tac
  • Using sed
  • Using a dynamic numeric field
  • Using shell variables

Using vi

One obscure method is using vi's move attribute. From within vi, using the command mode, execute:

:g/^/move0
Identify each line, /^/, and move it below line 0. Since line 0 does not exist, each line in the file is placed at the top of the file pushing each succeeding line down.

The vi editor recognizes the move pattern using the first letter abbreviation, so the command can be shortened to:

:g/^/m0
Consider another example, where the file isn't to be reversed until after line 2:

1. There
2. is
10. cat
9. a
8. skin
7. to
6. way
5. one
4. than
3. more

The vi command to do this is:

:g/^/m2
You may think that using vi within a shell script is out, but you can call vi using a Unix here document:

vi datafile > /dev/null << EJ
:g/^/m0
:x
EJ
Using an Array

We can use a utility that supports arrays, including the shell. Read the file into an array and index it in reverse order. This is an awk solution:

awk '{ a[NR]=$0 } END { for(i=NR; i; --i) print a[i] } ' datafile
The awk internal number of records variable (NR) increments as each line of the data file is read. Store each line in an array indexed by NR, and at the end, loop through the array in reverse order.

We can also let the shell do it using an array:

# reverse a file
#!/bin/ksh

   cnt=0
   while read line
   do
      ((cnt=$cnt+1))
      rr[cnt]=$line
   done < datafile

   while (( $cnt > 0 ))
   do
      print ${rr[$cnt]}
      ((cnt=$cnt-1))
   done
Let's consider other external Unix commands.

Using Perl

Perl's print reverse option simply performs the operation with one line:

perl -e 'print reverse <>' file
Using tail

Let's say no Perl interpreter is available. The tail command has a reverse option:

tail -r datafile
Using tac

Unfortunately, GNU's tail command doesn't support the reverse option. However, GNU provides the tac (concatenate and print files in reverse) utility:

tac datafile
Using sed

From his sed one-liners Web page, Eric Pement provides two methods using sed:

sed -n '1!G;h;$p' datafile
sed '1!G;h;$!d' datafile
We'll describe the first solution. As with the shell, multiple sed commands can exist on the same line if they're separated by a semicolon. The command can be rewritten as:

sed -n '1!G
h
$p
' datafile
As sed processes a file, lines are read into a work space commonly known as the pattern space. The sed command also owns a temporary buffer known as the hold space. Sed's G command appends a newline and the contents of the hold space to the pattern space while the h command copies the pattern space to the hold space, overwriting the previous contents.

The 1!G instructs sed to ignore the first address or line. Even though, at this point, the hold buffer is empty, a newline still is sent to the pattern space. Without this address instruction, we would be left with a blank line at the bottom once the file is reversed.

The h command sends the line written to the pattern space to the hold space. With each succeeding line, the G command sends a newline and the current line to the burgeoning pattern space containing the file in reverse order.

By default, sed prints what is in the pattern space as each line is processed. The -n option overrides the default, and the $p command prints the pattern space when sed is done processing the file.

The only difference between the sed one-liners is how the pattern space prints. The second sed one-liner, sed '1!G;h;$!d', drops the -n option and deletes printing the pattern space until the end, $!d.

Using a Dynamic Numeric Field

Another solution is from Heiner Steven's Shelldorado Web site:

nl -ba datafile | sort -nr | cut -f2-
This one-liner adds a dynamically generated numeric field to each line in the data file using the nl (line number filter) utility and then pipes it to sort where the added column is sorted in numeric, reverse order. Finally, the output is piped to cut where the added column is removed, which delivers the file in reverse order.

Heiner warns us of a drawback to using nl "since it interprets certain strings in the text as being page headers, page footers, etc., e.g.":

\:\:\:
Any Unix command that creates line numbers can replace the nl command. This example uses the cat command:

cat -n datafile | sort -nr | cut -f2-
Using Shell Variables

Absolutely the coolest way to solve this problem is to let the shell do it by dynamically creating and assigning each line to an individual shell variable and then displaying the variables in reverse order:

#!/bin/ksh

i=0
while read line
do # create a variable for each line
   eval a$i=\$line
   ((i=i+1))
done < datafile

while (($i > 0))
do # display each variable in reverse order
   ((i=i-1))
   eval echo \"\$a$i\"
done
Create a string by appending an integer counter to a character, assign each line to the string, and create the variable by eval'ing the string contents. Repeat this process for each line in the file as the counter increments.

To reverse the file, create variable strings with the counter decreasing; eval'ing an echo of the variable displays the contents of each string as the counter decrements.

This process is analogous to manipulating arrays, but without the limitations of arrays. In theory, with a 32-bit integer, a file 2,147,483,647-lines long can be processed before integer overflow. Of course, this really depends on available memory.

References

Dougherty, Dale and Arnold Robbins. 1997. sed & awk. Sebastopol, CA: O'Reilly & Associates.

Resources

Eric Piement's sed one-liners: http://www.student.northpark.edu/pemente/sed/sed1line52.txt

Steven, Heiner. Shelldorado -- Unix shell scripting resource: http://www.shelldorado.com

Steven, Heiner. "Re: reverse a file", Email to Ed Schaefer. November 22, 2004.

John Spurgeon is a software developer and systems administrator for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. He is currently training for the Furnace Creek 508 and still enjoys turfgrass management, triathlons, and spending time with his family.

Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Ed also hosts the monthly Shell Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net.