Perl/Tk and the Web

In this chapter, we'll examine how Tk can access the wealth of information available on the World Wide Web. Using the Library for WWW Access in Perl (LWP), we'll develop tkcomics, a web client to fetch images of our favorite comics and display them in a Perl/Tk window.

One of the Web's most popular Clients is Netscape. It's an extensible application that allows developers to write plug-ins—loadable chunks of code often written in C—which add functionality to the basic browser. We'll learn how to execute client-side Perl/Tk programs via the PerlPlus plug-in.

22.1. Library for WWW Access in Perl

LWP is a Perl library providing access to the Web, used primarily for writing client applications. It "speaks" Hypertext Transfer Protocol (HTTP),[61] and one of its most basic duties is fetching the content of URLs. The beauty of LWP is that it hides all the operational details while allowing us to do our work with simple objects and methods.

[61] Using HTTP as the underlying transport medium, LWP also supports gopher, FTP, news, file, mailto, and secure HTTP services.

In essence, a user agent object working on our behalf takes a request object and does whatever low-level communication and error handling is required to send a request for web content on its way. The user agent then waits for the reply (or error) and hands it back to our program in the form of a response object. Our job is to create the user agent, then use various LWP methods to manipulate requests and responses. But if all we want to do is simply fetch web content, we can use LWP in "simple" mode and just say "fetch this URL," and LWP creates the user agent and request/response objects internally, eventually returning content data to our program.

At first glance, the simple LWP mechanism would appear to suffice for a client such as tkcomics, but it has some drawbacks. As you might suspect, they have to do with blocking, because there might be a considerable time delay between a request and when the network finally delivers the response. Even handling LWP's user agent manually can't surmount all of these problems. Thus, eventually, we'll have to resort to a multiprocess scheme. So let's begin and see how tkcomics might have evolved.

An excellent place for private viewing of your favorite comics is http://www.comics.com, although all the glamour, glitz, and advertisements may be too much for the stomach. But there's nothing to stop us from fetching just the comics and using Tk to display them.[62] We'll start by learning how to fetch and interpret the http://www.comics.com home page, and then build our Tk client around that framework.

[62] We can only use this material for private consumption. Any other use is illegal. Please read the letter of the law at http://www.comics.com/info/site/copyright.html.

22.1.1. LWP::Simple, the Easiest Way to the Web

The LWP::Simple module is so simple, it's not even object oriented; rather, it exports a handful of subroutines for fetching and mirroring web content. All we're interested in is fetching web content, accomplished with the get subroutine:

use LWP::Simple qw/get/;

To retrieve a web page, we call get with the desired URL:

my $comics_home = 'http://www.comics.com';
my $comics = get $comics_home or die "Can't get $comics_home.";

So now $comics contains a ton of raw Hypertext Markup Language (HTML), stuff we normally let our browser interpret and display for us. If we actually did browse that URL, one of the things we'd see is the actual comic selector widget, which appears to be an Optionmenu (see Figure 22-1).

Figure 22-1. Comic selector widget

It's not a Tk Optionmenu, of course, but it's what the browser renders when it encounters a <SELECT> tag. Looking at the HTML, either by printing $comics or viewing the page source in the browser, we see this:

<SELECT NAME= ... >
    <OPTION>Click to select a comic
    <OPTION VALUE="/universal/adam/ab.html">Adam
    <OPTION VALUE="/comics/alleyoop/ab.html">Alley Oop
    <OPTION VALUE="/comics/andycapp/ab.html">Andy Capp
    <OPTION VALUE="/comics/arlonjanis/ab.html">Arlo & Janis
    <OPTION VALUE="/comics/askshagg/ab.html">Ask Shagg
    <OPTION VALUE="/comics/bc/ab.html">B.C.

...

    <OPTION VALUE="/comics/wizardofid/ab.html">Wizard of Id
    <OPTION VALUE="/universal/ziggy/ab.html">Ziggy
</SELECT>

That looks promising. We seem to have a list of URLs, one per comic. If we're lucky, we should be able to key on the string "OPTION VALUE", then use the following code to pick out the comic names and their associated URLs:

my $n = 0;
foreach (split /\n/, $comics) {
    next unless /OPTION\s+VALUE/i;
    if (my($comic_url, $comic) = m\"([^"]+)">(.*)\) {
        $comic =~ s/\t//g;
        $comic =~ s/\r//g;
        printf "%-30s : %s\n", $comic, $comic_url;
        $n++;
    }
}
print "\nfound $n comics\n";

Indeed, luck is with us, for if we run the code, we see this output:

Adam                           : /universal/adam/ab.html
Alley Oop                      : /comics/alleyoop/ab.html
Andy Capp                      : /comics/andycapp/ab.html
Arlo & Janis                   : /comics/arlonjanis/ab.html
Ask Shagg                      : /comics/askshagg/ab.html
B.C.                           : /comics/bc/ab.html

...

Wizard of Id                   : /comics/wizardofid/ab.html
Ziggy                          : /universal/ziggy/ab.html

found 91 comics

As it happens, these URLs are not the comic images, but another page, relative to the site's home address, within which the actual URL of the comic is embedded. For instance, if we concatenate Ask Shagg's relative URL with $comics_home and view the HTML at http://www.comics.com/comics/askshagg/ab.html, we see an <IMG> tag with the relative URL of the actual GIF image:

<IMG
SRC="/comics/askshagg/archive/images/askshagg21461240000515.gif" ALT="today's 
comics strip" ALIGN=TOP BORDER="0">

tkcomics can easily extract this URL, fetch the image file, convert it to a Photo, and display it. So, we have proof-of-concept code. On the upside, it's extremely simple, but on the downside, it's blocking, and there's no timeout mechanism. Let's try to address these concerns.

22.1.2. Fetching Web Content with LWP::UserAgent

Although a tad more complicated than LWP::Simple, most people use LWP::UserAgent for their web clients, because it gives them complete control over their applications. Unlike LWP::Simple, we create the user agent manually and exert control by sending messages to it. All this means is that we create a user agent object and invoke methods on it. For instance, this is how we set a timeout so network requests won't take longer than 20 seconds:

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(20);

Nevertheless, at this (the Perl) level, a timeout still doesn't buy us anything, because no matter how short the timeout is, Tk is blocked until the LWP request is complete.

Even this trick of configuring a callback to read small chunks of network data is unlikely to be sufficient; blocking for 1 byte of data still prevents Tk from processing events. The following code defines a callback that appends data to a string. As network data arrives, the user agent invokes the callback on every byte. When the request completes (or times out), we use the response object to check the status of the request and die if there's an error. In addition to not solving our problem, processing data a byte at a time is incredibly inefficient.

my $comics_home = 'http://www.comics.com';
my $comics;

sub req_callback {
    my($content, $response, $protocol) = @_;
    $comics .= $content;
}

my $req_object = HTTP::Request->new('GET' => $comics_home);
my $res_object = $ua->request($req_object, \&req_callback, 1);
die "request failed" if $res_object->is_error;

Assuming the get request succeeds, $comics contains the same HTML page content as the LWP::Simple example, which can be processed as before.

While the LWP::UserAgent solution is slightly cooler than the LWP::Simple one, it can still block and ruin our comic-viewing experience. What we need is a nonblocking LWP request, and one sure-fire way to implement that is to use the fork/pipe/exec idiom we developed in Chapter 19, "Interprocess Communication with Pipes and Sockets". In that chapter, we saw that piping information between two asynchronous processes—one of which was a Tk program—and using fileevent prevented Tk from blocking. We can use the same technique here, and let the child fetch web content and pipe it back to tkcomics whenever the data becomes available.

22.1.3. lwp-request and fileevent Rule

To keep Tk events flowing, we need to use a separate process (or thread) and ensure that the two processes can talk in a nonblocking way. The first process, the Tk parent, handles the GUI and event processing, while the child fetches comic images and forwards them to the parent.

The IPC solution developed in Chapter 19, "Interprocess Communication with Pipes and Sockets" was unusually specialized, because it used bidirectional pipes for a two-way conversation between the Tk parent and the child. In contrast, if the tkcomics child can get its URL from the command line and send comics to standard output, we can use the pipe form of the open command and let Perl do the hard work.

Included in the LWP package is lwp-request, a standalone program perfectly matched for our needs. It accepts a URL on the command line and prints the resulting web content on its STDOUT. So that's our plan, and with Perl 5.6.0, it might work on Win32 too.

We start with needed declarations. All the comic images are binary data, but to use them in Tk we must first Base64 encode them; that's why we need MIME::Base64. Additionally, although most of the comics are GIFs, a few are in JPEG or PNG format. Note that Tk::JPEG and Tk::PNG are not bundled with Perl/Tk, so you'll have to install them yourself:

#!/usr/local/bin/perl -w
#
# tkcomics - display comics courtesy of http://www.comics.com
#
# Because LWP::Simple and LWP::UserAgent can block, do the
# network I/O in another thread, er, child, with fileevent( ).
# Add a Stop/Cancel button that kills the pipe.

use MIME::Base64;
use Tk;
use Tk::JPEG;
use Tk::PNG;
use subs qw/get_url show_comic status stop_get/;
use strict;

Here we open the MainWindow in the standard fashion, then initialize a few global variables. $photo holds the comic image object reference. $status displays periodic status messages that keep the user informed of the state of data transfers. $eof is set either when the lwp-request child completes, signaling the web fetch is complete, or by a user button click that prematurely interrupts a transfer. $pid is the process ID of the child, used to kill it if we tire of waiting for the network. The %ext hash maps a comic filename extension to a Tk Photo image processor.

my $mw = MainWindow->new;
my $photo = '';
my $status = '';
my($eof, $pid);
my %ext = qw/
    gif gif
    jpg jpeg
    png png
    tif tiff
    xbm xbm
    ppm ppm
/;                             # file extension => Photo format map
my $help = '<Button-1> fetch comic, <Button-2> interrupt transfer';

As Figure 22-2 shows, the entire application consists of just three Tk widgets: a Listbox that holds the names of the comics, one Label that displays the comic images (except initially, when it displays the string "tkcomics" in large italic letters), and a second Label that displays relevant status information:

my $s = $mw->Label(-textvariable => \$status, -width => 100);
my $lb = $mw->Scrolled(qw/Listbox -selectforeground blue/);
my $l = $mw->Label(
    -text       => 'tkcomics',
    -foreground => 'blue',
    -font       => [qw/-size 100 -slant italic/],
);
$s->pack(qw/-side bottom -fill x -expand 1/);
$lb->pack(qw/side left -fill y -expand 1 -anchor w/);
$l->pack(-side => 'right');

The following code is essentially our old proof-of-concept example, except instead of printing comic names and URLs, we build a hash of comic URLs indexed by comic name. It's important to note that we've eliminated LWP::Simple, using our own get_url subroutine in place of LWP::Simple::get.

my $comics_home = 'http://www.comics.com';
my $comics = get_url $comics_home or die "Can't get $comics_home.";

my(%comics, $comic_url, $comic);
foreach (split /\n/, $comics) {
    next unless /OPTION\s+VALUE/i;
    if (($comic_url, $comic) = m\"([^"]+)">(.*)\) {
        $comic =~ s/\t//g;
        $comic =~ s/\r//g;
        $comics{$comic} = $comic_url;
    }
}

At this point, we display our first status message, telling us how many comics were found and how to use tkcomics. We'll stuff the names of all available comics in a List widget and use a <Button-1> event to select one for viewing. <Button-2>, obviously, is used to interrupt a long (or hung) transfer. Subroutine status simply sets the status Label's -textvariable, then invokes idletasks to flush Tk's low-priority idle events queue. This updates the entire Tk display so the status message appears immediately.

my $help = scalar(keys %comics) .
    ' comics, <Button-1> fetch comic, <Button-2> interrupt transfer';
status $help;

sub status {
    $status = $_[0];
    $mw->idletasks;
}

Finally, populate the Listbox with the comic names, sorted alphabetically; establish two button bindings that give life to tkcomics; and enter the main Tk event loop:

foreach (sort keys %comics) {
    $lb->insert('end', $_);
}

$lb->bind('<ButtonRelease-1>' => \&show_comic);
$mw->bind('<ButtonRelease-2>' => \&stop_get);

MainLoop;

Figure 22-2. Initial tkcomics display

The heart of tkcomics is the subroutine get_url, shown in the following code. Look it over before we explain it. Note that there's an implicit tradeoff between efficiency and simplicity. For an "important" program, it would be better to fork a persistent child once and establish a dialog between it and the Tk parent. In this case, however, it's considerably easier just to use a pipe open and run lwp-request for every URL:

sub get_url {

    my($url) = @_;
    
    status "Fetching $url";
    $pid = open PIPE, "lwp-request -m GET -t 20 -e $url 2>&1 |" or
        die "Open error: $!";
    binmode PIPE if $^O eq 'MSWin32';

    my $content;
    $mw->fileevent(\*PIPE, 'readable' => sub {
        my($stat, $data);
        while ($stat = sysread PIPE, $data, 4096) {
            $content .= $data;
        }
        die "sysread error:  $!" unless defined $stat;
        $eof = 1 if $stat == 0;
    });
    $mw->waitVariable(\$eof);
    $mw->fileevent(\*PIPE, 'readable' => '');
    close PIPE;
    
    $pid = undef;
    
    (my $response, $content) = $content =~ /(.*?)\n\n(.*)/is if $content;

    return wantarray ? ($response, $content) : $content;
                   
} # end get_url

Subroutine get_url is passed a single argument, the URL to fetch, which is immediately posted in the status Label. The open statement does the requisite pipe, fork, and exec-ing of lwp-request for us, so all we (the TK parent) need do is establish a fileevent handler to read web content. The lwp-request option -t effects a 20-second timeout, and the -e option says to return the response headers along with the web content. The response headers are returned first, separated from the content by an empty line.

If you're running a Win32 operating system, the binmode statement is very important, because the comic images are binary data. On Unix, binmode isn't required, but it does no harm, either.

Now we set up the fileevent readable callback on the lwp-request output pipe. The callback simply appends up to 4K of web data to the $content variable and nominally ends at end-of-file. Meanwhile, Tk event processing continues because the Tk parent is spinning on the waitVariable statement, waiting for the $eof variable to change. $eof changes in one of two ways, either when the fileevent callback detects end-of-file, or, as we shall see, when the user clicks <Button-2>.

Once the waitVariable is satisfied, we cancel the fileevent readable callback, close the pipe handle, and undef $pid. Notice that get_url uses wantarray to determine if it was called in scalar or list context. In list context, we assume the caller wants two strings, the response headers and the actual URL content, or else just the content. We'll see how the response headers are used shortly.

To stop a web GET, we click <Button-2>, which invokes the stop_get subroutine. We then set $eo f so the fileevent readable callback terminates, and, if $pid is defined (i.e., the lwp-request child is still running) we kill it.

sub stop_get {
    status "Stopping transfer ...";
    $mw->after(5000, sub {status $help});
    $eof = -1;
    kill 'TERM', $pid if defined $pid;
}

22.1.4. The Recipe for Displaying Web Images

Let's take a peek at what the tkcomics application looks like rendering a comic. Figure 22-3 depicts the program displaying a GIF file. To see how to render it, read on. Basically, using the active Listbox element, find the comic URL and fetch its contents. Within the page is an <IMG> tag with another URL pointing to the actual image, which we then fetch, convert to a Photo, and display. Periodically, we examine $eof to see if any transfer was aborted.

Figure 22-3. Ask Shagg Camel Facts

As with any binding callback of the form:

$lb->bind('<ButtonRelease-1>' => \&show_comic);

Perl/Tk implicitly passes the bound object reference (the Listbox) as the callback's first argument, which we assign to the variable $lb.

sub show_comic {

    my($lb) = @_;

Since we got here on a <ButtonRelease-1> event, we're guaranteed that the active Listbox entry is our target comic. We get it and index into the %comics hash to get the URL of the page containing the comic image URL. We return immediately if the transfer was interrupted.

    my $comic = $lb->get('active');
    my $comic_url = $comics{$comic};
    my $comic_html = get_url 
        $comic_url =~ /^http:/i ? $comic_url : "$comics_home$comic_url";
    return if $eof == -1;

Now extract the image URL from the mass of HTML sitting in $comic_html. Unfortunately, web content changes over time, which is why we use multiple pattern matches. Once we actually find a URL, call get_url in list context to get both the response header and the binary comic image. If the transfer wasn't interrupted, we can assume all the returned data is correct. And given a valid data transfer, we can extract the image size (in bytes) from the response header's Content-Length attribute. As an added touch, we'll adorn the comic's Listbox entry with this information, just to show we've already viewed the comic (see Figure 22-3).

    my $image_url;

    if ( ($image_url) = $comic_html =~ m\.*<IMG SRC="([^"]+)".*? ALT="(today|daily)\is ) {
        print "1: $image_url!\n";
    } elsif ( ($image_url) = $comic_html =~ m\.*bgcolor="#FFFFCC" ALIGN="MIDDLE" COLSPAN="3"><BR><IMG SRC="([^"]+)"\is ) {
        print "2: $image_url!\n";
    } else  {
        status "Cannot find a comic image in $comic_url.";
        print STDERR $comic_html;
        return;
    }

    my ($response, $image) = get_url "$comics_home$image_url";
    return if $eof == -1;
    my($bytes) = $response =~ /Content-Length: (\d+)/is;

Perl/Tk images are generated from either an external file or embedded data. You may recall from Chapter 17, "Images and Animations" that because in Tcl "everything is a string,"[63] embedded image data must be in printable characters, which is why we first Base64 encode the image data. Now we do a little bookkeeping on the variable $photo; the second and subsequent invocations of this callback delete any previous image to stem a possible memory leak. Then we create an image of the appropriate format and configure the image Label to display it. Finally, we append the comic's size, in bytes, to its Listbox entry and update the status help message. The update method is carefully placed so Tk adjusts the application's geometry based on the new image's dimensions, ensuring that see positions the Listbox properly. selectionSet rehighlights the current comic name.

[63] Well, more accurately, everything was a string; recent Tks can handle binary objects.

    my $photo_data = encode_base64($image);
    $photo->delete if UNIVERSAL::isa($photo => 'Tk::Photo');
    my($ext) = $image_url =~ /\.(.*)?/;
    $ext ||= 'gif';
    status "Creating $bytes byte $ext Photo";
    $photo = $mw->Photo(-data => $photo_data, -format => $ext{lc $ext});
    $l->configure(-image => $photo);

    my $index = $lb->index('active');
    $comic .= " ($bytes)";
    $lb->delete($index);
    $lb->insert($index, $comic);
    $lb->update;
    $lb->see($index);
    $lb->selectionSet($index);

    status $help;

} # end show_comic

22.1.5. Win32 Considerations

Under Win32, the code is sound and should work with Perl 5.6.0 and a recent Tk, such as Tk 800.022. But, alas, it doesn't. It hangs on the fileevent, which is never triggered. So, what can we do?

Threads are out, as Tk is not thread-safe. TCP/IP comes to mind and, since sockets are available on both Unix and Win32, this solution retains compatibility, if not simplicity. So let's rewrite get_url to fork a child that uses local Win32 sockets.

22.1.5.1. fork and local Win32 sockets

The parent begins by opening a listen socket on a well-known (but arbitrary) port on the localhost;[64] it then forks a second process to run lwp-request:

[64] Similar to the ipadm daemon described in Chapter 19, "Interprocess Communication with Pipes and Sockets".

sub get_url {

    use IO::Socket;
    use POSIX;

    my $url = shift;
    my $port = 9917;
    my($pid, $handle);

    my $server = IO::Socket::INET->new(
        LocalHost => 'localhost',
        LocalPort => $port,
        Proto     => 'tcp',
        Listen    => SOMAXCONN,
        Reuse => 1,
    );
    die "Parent socket open failure: $!" unless defined $server;

    die "fork failure: $!" unless defined($pid = fork);

The Tk parent then waits on the accept call until it receives a connect, puts the socket in binary mode, unbuffers it, and then uses fileevent to read all the incoming HTML, just like in the pipe-open version of get_url.

    if ($pid) {         # parent
        $handle = $server->accept;
        binmode $handle;
        $handle->autoflush(1);

        $eof = 0;
        my $content;
        $mw->fileevent($handle, 'readable' => sub {
            my($stat, $data);
            while ($stat = sysread $handle, $data, 4096) {
                $content .= $data;
            }
            die "sysread error:  $!" unless defined $stat;
            $eof = 1 if $stat == 0;
        });
        $mw->waitVariable(\$eof);
        $mw->fileevent($handle, 'readable' => '');
        close $handle;
        $pid = undef;
        return $content;

Meanwhile, the child creates its unbuffered binary socket and outputs the HTML generated by a backticked lwp-request:

    } else {            # child
        $handle = IO::Socket::INET->new(
            PeerAddr => 'localhost',
            PeerPort => $port,
            Proto    => 'tcp',
        );
        die "Child socket open failure: $!" unless defined $handle;
        binmode $handle;
        $handle->autoflush(1);
        print $handle `lwp-request -m GET -t 20s -e $url`;
        close $handle;
        POSIX::_exit(0);

    } # ifend fork

} # end get_url

This code works perfectly well on Unix but, once again, hangs on the fileevent under Win32. Perhaps we're pushing Win32's new fork capability too far by using backticks within a forked process. Well, we can test that theory by replacing:

print $handle `lwp-request -m GET -t 20s -e $url`;

with equivalent LWP code. The new code needs a 20-second timeout and must return the response headers in addition to the web content. Here's how we do it.

22.1.5.2. fork and LWP::UserAgent

We begin as before, by creating a new user agent and sending it a timeout message, building a request object, passing it to the user agent, and checking for errors. Now we return the response headers in the same format as lwp-request's -e option, an empty line, and the web content:

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(20);
my $req_object = HTTP::Request->new('GET' => $url);
my $res_object = $ua->request($req_object);
die "request failed" if $res_object->is_error;
foreach my $response (keys %{$res_object->headers}) {
    print $handle "$response: ", $res_object->headers->{$response}, "\n";
}
print $handle "\n";
print $handle $res_object->content;

Once again, this code runs perfectly on Unix but fails on Win32, so we need to take a completely different tack. I suppose we could arrange for the child to write a file and signal the Tk parent by some means, perhaps a semaphore, but that lacks style. We can't use shared memory, since the Unix shmctl/shmget/shmread/shmwrite shared memory functions aren't available in Win32, right? True, but Win32 has its own shared memory capabilities, so let's investigate further.

22.1.5.3. Win32::Process and shared memory

A chance email with Grant Hopwell clued me in to his Tie::Win32MemMap module. With it, Grant would spin off Win32 children using Win32::Process::Create, and the two processes would talk using a shared Perl hash! This seemed an interesting possibility: create a shared hash with two keys, CONTENT and COMPLETE, and have the child run lwp-request and store web content in $shared_hash{CONTENT}, while the Perl/Tk parent watches (using waitVariable) for a change in $shared_hash{COMPLETE}, set by the child when ready.

Tie::Win32MemMap requires Storable, available from CPAN, and Win32::MemMap, written by Amine Moulay Ramdane, available at http://www.generation.net/~aminer/Perl.

As it happens, we can indeed write a drop-in replacement for the subroutine get_url, specifically for Win32, and keep the rest of tkcomics intact. Let's look at get_url for Win32 now.

sub get_url {

    my($url) = @_;
    
    status "Fetching $url";

Here we create and initialize a new chunk of shared memory and tie it to the hash %content. The shared memory is tagged with the name 'tkcomics', which any Win32 process can access if it uses the proper MapName.

    use Win32::Process;
    use Tie::Win32MemMap;

    my %content;
    tie %content, 'Tie::Win32MemMap', {
        Create  => MEM_NEW_SHARE,
        MapName => 'tkcomics',
    };
    $content{'COMPLETE'} = 0;
    $content{'CONTENT'}  = '';

Now fire up the child process, another Perl program stored in the file tkcwin32.kid, whose job is to connect to the shared memory 'tkcomics', fill the CONTENT key with web data from the URL passed on the command line, and set the COMPLETE key when it's finished.

    Win32::Process::Create(
        my $child,
        'c:\\perl\\bin\\perl.exe',
        "perl tkcwin32.kid $url",
        0,
        NORMAL_PRIORITY_CLASS,
       '.',
    ) or die Win32::FormatMessage(Win32::GetLastError);

    $eof = 0;
    $mw->update;

Here we wait for the signal from the child that it has completed. Normally we would use a waitVariable(\$content{'COMPLETE'}) statement, but there is competing magic between the Tie module and Tk, so we have to synthesize our own using this loop:

    while ( $content{'COMPLETE'} != 1 ) {
        last if $eof == -1;
        $mw->after(10);
        $mw->update;
    }

Once the child completes, we separate the response headers from the actual content and return the particular data required by our caller, just like the Unix version.

    my $content = $content{'CONTENT'};
    (my $response, $content) = $content =~ /(.*?)\n\n(.*)/is if $content;
    return wantarray ? ($response, $content) : $content;
                   
} # end get_url

For our purposes, the child, tkcwin32.kid, must reside in the current working directory, because we haven't qualified the pathname in the Win32::Process::Create call. It's certainly trivial to embed the child in tkcomics proper and create it during initialization; we'll do just that in short order. Until then, this is the Win32 child program.

Because we're not in the context of a subroutine, the naked shift statement uses as its argument @ARGV (the command line) rather that @_, thus providing the child the URL specified by the parent:

#!/usr/local/bin/perl -w
#
# Win32 tkcomics helper program that shovels web content to the Tk parent.

use Tie::Win32MemMap;

my $url = shift;

By this point in real time, the Tk parent has already created and tied the shared memory to its hash, so all the child need do is tie to the same MapName in "share" mode.

my %content;
tie %content, 'Tie::Win32MemMap', {
    Create  => MEM_VIEW_SHARE,
    MapName => 'tkcomics',
};

Once again, with Perl 5.6.0 and higher, the child is free to do a pipe open and run lwp-request in the same manner as the Unix code. Do not forget the binmode statement!

open(PIPE, "lwp-request -m GET -t 20s -e $url|") or die "open failure: $!";
binmode PIPE;

Once again, read 4K chunks of web content and build up the scalar $content{'CONTENT'}. When end-of-file is reached, close the pipe and set the complete marker, signaling the Tk parent to proceed.

my($stat, $data);
while ($stat = sysread PIPE, $data, 4096) {
    $content{'CONTENT'} .= $data;
}
die "sysread error:  $!" unless defined $stat;
close PIPE;

$content{'COMPLETE'} = 1;
exit(0);

If you don't believe all this actually works, gaze upon Figure 22-4 and witness tkcomics for Win32!

Figure 22-4. tkcomics works in Win32 too

22.1.6. Tidying an Ugly Mess

So now we have at least three distinct ways of fetching web content without blocking. If this has to be the state of the world, then so be it, but we can at least encapsulate the pipe-open, TCP/IP socket, and the Win32 memmap code directly into tkcomics and conditionally use the variant appropriate for the operating system at hand. We'll do this by testing $^O and eval-ing the correct code.

First, let's change get_url, effectively factoring out the essence of the subroutine, leaving _get_url (defined at compile time) to do the operating system-specific work.

sub get_url {

    my($url) = @_;
    
    status "Fetching $url";
    my $content = &_get_url($url);
    (my $response, $content) = $content =~ /(.*?)\n\n(.*)/is if $content;
    return wantarray ? ($response, $content) : $content;
                   
}

Now, in a BEGIN block, store our three definitions for _get_url in three separate variables, then eval just the one desired for this invocation of tkcomics. Note that we can override the default selection from the command line and that we have to relax our coding strictness in order to eval the symbolic reference.

BEGIN {

    # Different mechanisms to get web content without blocking.

    use vars qw/
        $pipe_open_fileevent
        $tcp_socket_fileevent
        $win32_memmap_waitvariable
    /;

    $pipe_open_fileevent = <<'END';
    # Pipe/open/fileevent version of _get_url( ) here ...
END

    $tcp_socket_fileevent = <<'END';
    # TCP/IP socket/fileevent version of _get_url( ) here ...
END

    $win32_memmap_waitvariable = <<'END';
    # Win32 memmap/waitVariable version of _get_url( ) here ...
END

    my $get_url;
    if (defined $ARGV[0]) {
        $get_url = $ARGV[0];
    } else {
        if ($^O eq 'MSWin32') {
            $get_url = 'win32_memmap_waitvariable';
        } else {
            $get_url = 'pipe_open_fileevent';
        }
    }

    {
        no strict 'refs';
        print "Using $get_url method ...\n";
        eval $$get_url;
        die "_get_url eval error: $@" if $@;
    }

} # end BEGIN

Finally, here's tkcomics in its entirety. It fetches web content in a nonblocking mode so that Tk events flow and configures itself according to the operating system on which it's running. Enjoy.

#!/usr/local/bin/perl -w
#
# tkcomics - display comics courtesy of http://www.comics.com
#
# Because LWP::Simple and LWP::UserAgent can block, do the network
# I/O in another thread, or, child, and using fileevent( ) or
# waitVariable( ) to keep events flowing.
#
# Add a Stop/Cancel button that kills the pipe.
#
# Command line options:
#
#   pipe_open_fileevent
#   tcp_socket_fileevent
#   win32_memmap_waitvariable


use MIME::Base64;
use Tk;
use Tk::JPEG;
use Tk::PNG;
use subs qw/get_url show_comic status stop_get unix_pipe win32_memmap/;
use strict;

my $mw = MainWindow->new;
my $photo = '';
my $status = '';
my($eof, $pid);
my %ext = qw/
    gif gif
    jpg jpeg
    png png
    tif tiff
    xbm xbm
    ppm ppm
/;                # file extension => Photo format map

my $s = $mw->Label(-textvariable => \$status, -width => 100);
my $lb = $mw->Scrolled(qw/Listbox -selectforeground blue/);
my $l = $mw->Label(
    -text       => 'tkcomics',
    -foreground => 'blue',
    -font       => [qw/-size 100 -slant italic/],
);
$s->pack(qw/-side bottom -fill x -expand 1/);
$lb->pack(qw/side left -fill y -expand 1 -anchor w/);
$l->pack(-side => 'right');

# Fetch the main comics page, build a hash of comic URLs
# indexed by comic name,  note the total comic count, and
# populate a Listbox.  Listbox B1 fetches and displays a
# comic. B2 anywhere cancels a transfer.

my $comics_home = 'http://www.comics.com';
my $comics = get_url $comics_home or die "Can't get $comics_home.";

my(%comics, $comic_url, $comic);
foreach (split /\n/, $comics) {
    next unless /OPTION\s+VALUE/i;
    if (($comic_url, $comic) = m\"([^"]+)">(.*)\) {
        $comic =~ s/\t//g;
        $comic =~ s/\r//g;
        $comics{$comic} = $comic_url;
    }
}
my $help = scalar(keys %comics) .
    ' comics, <Button-1> fetch comic, <Button-2> interrupt transfer';
status $help;

foreach (sort keys %comics) {
    $lb->insert('end', $_);
}

$lb->bind('<ButtonRelease-1>' => \&show_comic);
$mw->bind('<ButtonRelease-2>' => \&stop_get);

MainLoop;

sub get_url {

    # Given a URL, return its contents. The exact nonblocking
    # mechanism is adjustable, and either defaults to the best
    # method for the operating system at hand, or is specified
    # on the command line.

    my($url) = @_;
    
    status "Fetching $url";
    my $content = &_get_url($url);
    (my $response, $content) = $content =~ /(.*?)\n\n(.*)/is if $content;
    return wantarray ? ($response, $content) : $content;
                   
} # end get_url

sub show_comic {

    # Using the active listbox element, find the comic URL and fetch
    # its contents.  Within the content is another URL pointing to
    # the actual image, which we then fetch, convert to a Photo and
    # then display.  $eof is -1 if any transfer was aborted.

    my($lb) = @_;

    my $comic = $lb->get('active');
    $comic =~ s/\s\(\d+\)//;
    my $comic_url = $comics{$comic};
    my $comic_html = get_url 
        $comic_url =~ /^http:/i ? $comic_url : "$comics_home$comic_url";
    return if $eof == -1;

    my($image_url);

    if ( ($image_url) = $comic_html =~ m\.*<IMG SRC="([^"]+)".*? ALT="(today|daily)\is ) {
        print "1: $image_url!\n";
    } elsif ( ($image_url) = $comic_html =~ m\.*bgcolor="#FFFFCC" ALIGN="MIDDLE" COLSPAN="3"><BR><IMG SRC="([^"]+)"\is ) {
        print "2: $image_url!\n";
    } else  {
        status "Cannot find a comic image in $comic_url.";
        print STDERR $comic_html;
        return;
    }

    my ($response, $image) = get_url "$comics_home$image_url";
    return if $eof == -1;
    my($bytes) = $response =~ /Content-Length: (\d+)/is;

    my $photo_data = encode_base64($image);
    $photo->delete if UNIVERSAL::isa($photo => 'Tk::Photo');
    my($ext) = $image_url =~ /\.(.*)?/;
    $ext ||= 'gif';
    status "Creating $bytes byte $ext Photo";
    $photo = $mw->Photo(-data => $photo_data, -format => $ext{lc $ext});
    $l->configure(-image => $photo);

    my $index = $lb->index('active');
    $comic .= " ($bytes)";
    $lb->delete($index);
    $lb->insert($index, $comic);
    $lb->update;
    $lb->see($index);
    $lb->selectionSet($index);

    status $help;

} # end show_comic

sub status {
    $status = $_[0];
    $mw->idletasks;
}

sub stop_get {
    status "Stopping transfer ...";
    $mw->after(5000, sub {status $help});
    $eof = -1;
    kill 'TERM', $pid if defined $pid;;
}

BEGIN {

    # Different mechanisms to get web content without blocking.

    use vars qw/
        $pipe_open_fileevent
        $tcp_socket_fileevent
        $win32_memmap_waitvariable
    /;

    $pipe_open_fileevent = <<'END';

sub _get_url {

    my $url = shift;
    my $pid = open PIPE, "lwp-request -m GET -t 20s -e $url 2>&1 |" or
        die "Open error: $!";
    $eof = 0;

    my $content;
    $mw->fileevent(\*PIPE, 'readable' => sub {
        my($stat, $data);
        while ($stat = sysread PIPE, $data, 4096) {
            $content .= $data;
        }
        die "sysread error:  $!" unless defined $stat;
        $eof = 1 if $stat == 0;
    });
    $mw->waitVariable(\$eof);
    $mw->fileevent(\*PIPE, 'readable' => '');
    close PIPE;
    $pid = undef;
    return $content;

} # end pipe_open_fileevent

END

    $tcp_socket_fileevent = <<'END';

sub _get_url {

    # The parent opens a listen socket on a well known port on the
    # localhost, and then starts a second process to run lwp-request.
    # When the parent receives a connect it reads all the HTML sent
    # by the child.

    use IO::Socket;
    use POSIX;

    my $url = shift;
    my $port = 9917;
    my($pid, $handle);

    my $server = IO::Socket::INET->new(
        LocalHost => 'localhost',
        LocalPort => $port,
        Proto     => 'tcp',
        Listen    => SOMAXCONN,
        Reuse => 1,
    );
    die "Parent socket open failure: $!" unless defined $server;

    die "fork failure: $!" unless defined($pid = fork);
    if ($pid) {        # parent
        $handle = $server->accept;
        binmode $handle;
        $handle->autoflush(1);

        $eof = 0;
        my $content;
        $mw->fileevent($handle, 'readable' => sub {
            my($stat, $data);
            while ($stat = sysread $handle, $data, 4096) {
                $content .= $data;
            }
            die "sysread error:  $!" unless defined $stat;
            $eof = 1 if $stat == 0;
        });
        $mw->waitVariable(\$eof);
        $mw->fileevent($handle, 'readable' => '');
        close $handle;
        $pid = undef;
        return $content;

    } else {        # child
        $handle = IO::Socket::INET->new(
            PeerAddr => 'localhost',
            PeerPort => $port,
            Proto    => 'tcp',
        );
        die "Child socket open failure: $!" unless defined $handle;
        binmode $handle;
        $handle->autoflush(1);

        use LWP::UserAgent;
        my $ua = LWP::UserAgent->new;
        $ua->timeout(20);
        my $req_object = HTTP::Request->new('GET' => $url);
        my $res_object = $ua->request($req_object);
        die "request failed" if $res_object->is_error;
        foreach my $response (keys %{$res_object->headers}) {
            print $handle "$response: ", $res_object->headers->{$response}, "\n";
        }
        print $handle "\n";
        print $handle $res_object->content;

        close $handle;
        POSIX::_exit(0);

    } # ifend fork

} # end tcp_socket_fileevent

END

    $win32_memmap_waitvariable = <<'END';

use vars qw/$kidfile/;    
$kidfile = 'tkcwin32.kid';

sub _get_url {

    # The parent creates and initializes a new chunk of shared
    # memory, then starts a child process that shares the same
    # memory.  The parent waits for the child to run lwp-request
    # and save the web content by (essentially) doing a
    # waitvariable( ) on one particular hash element.

    use Win32::Process;
    use Tie::Win32MemMap;

    my $url = shift;

    my %content;
    tie %content, 'Tie::Win32MemMap', {
        Create  => MEM_NEW_SHARE,
        MapName => 'tkcomics',
    };
    $content{'COMPLETE'} = 0;
    $content{'CONTENT'}  = '';

    Win32::Process::Create(
        my $child,
        'c:\\perl\\bin\\perl.exe',
        "perl $kidfile $url",
        0,
        NORMAL_PRIORITY_CLASS,
       '.',
    ) or die Win32::FormatMessage(Win32::GetLastError);

    $eof = 0;
    $mw->update;

    while ( $content{'COMPLETE'} != 1 ) {
        last if $eof == -1;
        $mw->update;
    }
    return $content{'CONTENT'};

} # end win32_memmap

open(KID, ">$kidfile") or die "cannot open file $kidfile: $!";
print KID <<'END-OF-KID';
#!/usr/local/bin/perl -w
#
# Win32 tkcomics helper program that shovels web content to the Tk parent.

use Tie::Win32MemMap;

my $url = shift;

my %content;
tie %content, 'Tie::Win32MemMap', {
    Create  => MEM_VIEW_SHARE,
    MapName => 'tkcomics',
};

open(PIPE, "lwp-request -t 20s -e $url|") or die "open failure: $!";
binmode PIPE;

my($stat, $data);
while ($stat = sysread PIPE, $data, 4096) {
    $content{'CONTENT'} .= $data;
}
die "sysread error:  $!" unless defined $stat;
close PIPE;

$content{'COMPLETE'} = 1;
exit(0);
END-OF-KID
close KID;

END


    my $get_url;
    if (defined $ARGV[0]) {
        $get_url = $ARGV[0];
    } else {
        if ($^O eq 'MSWin32') {
            $get_url = 'win32_memmap_waitvariable';
        } else {
            $get_url = 'pipe_open_fileevent';
        }
    }

    {
        no strict 'refs';
        print "Using $get_url method ...\n";
        eval $$get_url;
        die "_get_url eval error: $@" if $@;
    }

} # end BEGIN

Chapter 22. Perl/Tk and the Web

Contents: