Common
Network Protocols
Randal L. Schwartz
When Larry Wall added the socket function to Perl version 3 more
than a dozen years ago, we Perl programmers rejoiced, because we
no longer needed to write separate C programs to connect over the
(then tiny) network. We could write programs that could send mail
or fetch news articles! Little did anyone foresee the importance
that TCP/IP would play in the lives of everyone within a few years.
Let's take a look at what Perl has learned to do in the past decade.
For example, in my column "The Simplicity of Sockets" in Sys
Admin, July 2003 (http://www.samag.com/documents/s=8284/sam0307g/0307g),
I illustrated an easy way to connect to a socket (in this case,
the daytime service at the NIST) and dump out the entire
response:
use IO::Socket::INET;
my $client = IO::Socket::INET->new('time.nist.gov:13')
or die "new: $@";
print for <$client>;
In this case, there's no interactive protocol. We connect, take everything
handed to us, and dump it out. But nearly always, the net protocols
require an interaction, such as providing authentication credentials,
selecting the various tasks, and providing needed data.
Higher Level Protocols for Clients
Luckily, Perl has been the "duct tape of the Internet" for long
enough that nearly every core net protocol already has a module
in the CPAN providing some level of support. In nearly all cases,
these modules are named with a Net:: prefix, followed immediately
by the traditional name for the protocol (like SMTP or DNS). This
means that we don't have to spend time "reinventing the wheel" to
talk the protocol. We can concentrate on our application instead.
The protocols used on the Internet are worked out with careful
design and discussion. Too often, I've seen network programming
newbies rush in and try to invent their own protocols. Apparently,
they don't realize the pitfalls. A protocol must clearly specify
who is talking, and when, defining all possible state transitions.
A protocol also must include specifications for data, such as how
to escape that odd data that might be confused for delimiters or
special operations. A protocol that uses UDP also has to understand
how to deal with MTUs (the maximum size for a UDP packet on a given
link, which varies with the network topology) and network delays,
drops, and re-sequencing.
Luckily, the Net:: modules hide all of this from us. We
simply use the provided API, and relax, knowing that the module
author has followed the specification accurately. Hopefully.
For example, to send mail to a server, we can use something like:
use Net::SMTP;
my $smtp = Net::SMTP->new('localhost');
$smtp->mail('merlyn@example.com');
$smtp->to('postmaster@example.com');
$smtp->data([split /\n/, <<'END_OF_MESSAGE']);
To: postmaster@example.com
This message was sent using Net::SMTP!
END_OF_MESSAGE
$smtp->quit;
Here, we've connected to the local SMTP server on our box and established
a sender and a recipient. This is followed by the data payload of
the message (from a here document wrapped into an arrayref)
and finally a disconnect.
Although the exact text to be sent to the SMTP server is not much
more difficult than the program as you see it here, the method calls
are automatically taking care of things like timeouts and escaping.
For example, a mail-message line consisting of a single dot must
be escaped as a doubled-up dot before sending, because a single
dot indicates the end of the message. The data method handles
that for us automatically, so it just does the right thing.
Because SMTP is such a common protocol, many layers and extensions
have been added, such as MIME mail for attachments and alternate
representations (plain text vs. rich text). The MIME::Lite
module can create such messages, but then uses Net::SMTP
(or the sendmail command) to push the message on its way.
Again, don't invent this stuff yourself -- modules are already in
place to do the work.
For fetching messages from a server, the Net::IMAP and
Net::POP3 modules provide the low-level functions needed
to fetch (and in some cases, send) email via the corresponding mail
repository protocol.
Another common net protocol is "the web", usually in the form
of HTTP or HTTPS requests. The LWP module suite (found in
the CPAN) provides a very versatile "virtual browser" (using LWP::UserAgent),
including support for cookies and HTML form parsing. Web pages can
be accessed as easily as:
use LWP::Simple qw(get);
my $content = get "http://www.stonehenge.com/perltraining/";
print $content if defined $content;
Although Usenet isn't what it once was, the NNTP protocol provides
a reliable means to share and transfer messages efficiently. The Net::NNTP
module (and the independently implemented News::NNTPClient
module) provide client-side access to an NNTP server. For example,
to get all of the rec.humor.funny postings, it's as simple
as:
use Net::NNTP;
my $nntp = Net::NNTP->new("nntp.example.com");
my ($num, $low, $high) = $nntp->group("rec.humor.funny");
for my $n ($low..$high) {
my $art = $nntp->article($n) or next;
print for @$art;
}
Another common protocol that's been around for a generation of Internet
users is the File Transfer Protocol (FTP), which can be accessed with
the Net::FTP module. Entire hierarchies can be transferred
with Net::FTP::Recursive.
Although Net::Telnet can be used to connect to the Telnet
servers, most experts agree that SSH is the right thing to use these
days, supported by Net::SSH (for connections) and Net::SCP
and Net::SFTP (for file transfer).
For DNS queries of all types (and dynamic DNS updates), Net::DNS
provides a nice pure-Perl interface. For example, to get the MX
records for stonehenge.com, we simply execute:
use Net::DNS;
for my $mx (mx("stonehenge.com")) {
printf "%5d %s\n", $mx->preference, $mx->exchange;
}
which prints (at the moment):
5 blue.stonehenge.com
666 spamtrap.stonehenge.com
Ahh, yes, my wonderful high-MX spamtrap. But that's for another article.
There are also clients for SOAP (SOAP::Lite) and Jabber,
Whois, Ping, LDAP, BEEP, CUPS, Ident, Gopher, NTP, and SNMP (named
with the Net:: prefix) in the CPAN. In general, if you're
looking things up in an RFC, you're probably wasting time unless
you've already verified that the CPAN doesn't have it yet!
Beyond the core RFC'ed Internet Protocols, CPAN modules also provide
connections to X11 servers (for graphical interfaces), AOL Instant
Messenger, Yahoo!Messenger, IRC, Rendevous (mDNS or Bonjour), and
Traceroute. There are also interfaces to the Google and Amazon APIs
for complex queries.
Servers
Most of the CPAN packages permit a Perl application to be a client
to connect to an available server (which typically isn't
written in Perl). However, the CPAN also includes server packages
for such things as SMTP (Mail), FTP, TFTP, DNS, HTTP, and IDENT.
With these packages, we can construct arbitrary servers that can
be used with traditional (or Perl-based) clients.
These servers can perform traditional functions, perhaps with
unusual characteristics. They might be also be filters or proxies,
connecting in turn using the same protocol to the next layer of
servers. Or, they might even be gateways between protocols, turning
a mail message into a news posting, or a DNS lookup into a Web request.
Complex Servers
When a server is dealing with multiple clients, or is acting as
both a client and a server, our program can run into situations
where it must do more than one thing at a time. For example, the
program might need to wait for input to arrive on any of four input
handles and compute some calculation in the meanwhile.
Traditionally, this is handled by having the process fork.
However, in many cases, this only reduces the problem somewhat,
as ultimately some part of the code must face two or more possible
events simultaneously, because the various pieces that are forked
must communicate.
On most modern operating systems, a process can request to be
notified if any one of a set of handles becomes ready, possibly
with a timeout to limit the waiting time. The Perl select
function can handle this at a low level, but the IO::Select
module (included with the Perl distribution) gives us a higher easier-to-use
view.
Once we bring signals into the mix, an additional level of difficulty
can result (because of Perl's lack of safe signals), best handled
by the Event module (in the CPAN), which can safely capture
and deliver signal events at reasonable interruption points in the
code.
An additional level of complexity is introduced when we add client-related
operations. For example, if we had an IRC bot that also performed
Web operations, there might be times when we're in the middle of
retrieving a Web page (which takes time) and be called upon to service
a timely IRC message or ping request. And if we also add a Curses
screen-based interface, we might need to grab a line of text from
the screen while doing everything else.
The best of breed for these kinds of tasks is the POE framework,
which consists of the POE kernel plus a lot of plug-ins for various
protocols. Using POE requires understanding event-driven programming
("when this happens, do that") as well as possibly breaking up our
computationally complex tasks into a series of events to ensure
apparent simultaneity.
Lower Level Access
Modern TCP/IP stacks provide both traditional socket interfaces
(used by the modules already discussed), as well as a raw packet
interface. In the raw packet interface, we can construct an arbitrary
packet to be placed "on the wire", or pick an arbitrary packet "off
the wire".
The most common use of the raw packet interface is to capture
the entire packet for diagnostic purposes. For example, tools like
tcpdump and ethereal grab each packet in its raw form
to analyze the sender, receiver, and payload in much more detail
than is normally provided.
The most flexible package for packet inspection is Net::Pcap,
which works with the freely available libpcap to set up inspection
access and filters, and pull apart the resulting information. Net::Packet
can be used to understand and create raw packets as well.
Well, I hope I've illustrated that Perl is still the reigning
champion "duct tape of the Internet". Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration, security,
technical writing, and training. He has coauthored the "must-have"
standards: Programming Perl, Learning Perl, Learning
Perl for Win32 Systems, and Effective Perl Programming.
He's also a frequent contributor to the Perl newsgroups, and has
moderated comp.lang.perl.announce since its inception. Since 1985,
Randal has owned and operated Stonehenge Consulting Services, Inc. |