Perl Tutorial

Freeware JavaScript Editor Perl Tutorials

Introduction

To fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted toward the Unix filesystem, for whose syscalls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.

A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.

A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:

Name

Inode

bc

17

du

29

nvi

8

pine

55

vi

8

Every directory is like this, even the root directory (/). To read the file /usr/bin/vi, the operating system reads the inode for /, reads its data blocks to find the entry for /usr, reads /usr's inode, reads its data block to find /usr/bin, reads /usr/bin's inode, reads its data block to find /usr/bin/vi, reads /usr/bin/vi's inode, and then reads the data from its data block.

The name in a directory entry isn't fully qualified. The file /usr/bin/vi has an entry with the name vi in the /usr/bin directory. If you open the directory /usr/bin and read entries one by one, you get filenames like patch, rlogin, and vi instead of fully qualified names like /usr/bin/patch, /usr/bin/rlogin, and /usr/bin/vi.

The inode has more than a pointer to the data blocks. Each inode also contains the type of thing it represents (directory, plain file, etc.), the size of the thing, a set of permissions bits, owner and group information, the time the thing was last modified, the number of directory entries that point to this inode, and so on.

Some operations on files change the contents of the file's data blocks; others change just the inode. For instance, appending to or truncating a file updates its inode by changing the size field. Other operations change the directory entry that points to the file's inode. Changing a file's name changes only the directory entry; it updates neither the file's data nor its inode.

Three fields in the inode structure contain the last access, change, and modification times: atime, ctime, and mtime. The atime field is updated each time the pointer to the file's data blocks is followed and the file's data is read. The mtime field is updated each time the file's data changes. The ctime field is updated each time the file's inode changes. The ctime is not creation time; there is no way under standard Unix to find a file's creation time.

Reading a file changes its atime only. Changing a file's name doesn't change atime, ctime, or mtime, because the directory entry changed (it does change the atime and mtime of the directory the file is in, though). Truncating a file doesn't change its atime (because we haven't read; we've just changed the size field in its directory entry), but it does change its ctime because we changed its size field and its mtime because we changed its contents (even though we didn't follow the pointer to do so).

We can access the inode of a file or directory by calling the built-in function stat on its name. For instance, to get the inode for /usr/bin/vi, say:

@entry = stat("/usr/bin/vi") or die "Couldn't stat /usr/bin/vi : $!";

To get the inode for the directory /usr/bin, say:

@entry = stat("/usr/bin")    or die "Couldn't stat /usr/bin : $!";

You can stat filehandles, too:

@entry = stat(INFILE)        or die "Couldn't stat INFILE : $!";

The stat function returns a list of the values of the fields in the directory entry. If it couldn't get this information (for instance, if the file doesn't exist), it returns an empty list. It's this empty list we test for using the or die construct. Be careful of using || die because that throws the expression into scalar context, in which case stat only reports whether it worked. It doesn't return the list of values. The underscore ( _ ) cache referred to later will still be updated, though.

The values returned by stat are listed in Table 9-1.

Table 9-1. Stat return values

Element

Abbreviation

Description

0

dev

Device number of filesystem

1

ino

Inode number (the "pointer" field)

2

mode

File mode (type and permissions)

3

nlink

Number of (hard) links to the file

4

uid

Numeric user ID of file's owner

5

gid

Numeric group ID of file's owner

6

rdev

The device identifier (special files only)

7

size

Total size of file, in bytes

8

atime

Last access time, in seconds, since the Epoch

9

mtime

Last modify time, in seconds, since the Epoch

10

ctime

Inode change time, in seconds, since the Epoch

11

blksize

Preferred block size for filesystem I/O

12

blocks

Actual number of blocks allocated

The standard File::stat module provides a named interface to these values. It overrides the stat function, so instead of returning the preceding array, it returns an object with a method for each attribute:

use File::stat;

$inode = stat("/usr/bin/vi");
$ctime = $inode->ctime;
$size  = $inode->size;

In addition, Perl provides operators that call stat and return one value only (see Table 9-2). These are collectively referred to as the -X operators because they all take the form of a dash followed by a single character. They're modeled on the shell's test operators.

Table 9-2. File test operators

-X

Stat field

Meaning

-r

mode

File is readable by effective UID/GID

-w

mode

File is writable by effective UID/GID

-x

mode

File is executable by effective UID/GID

-o

mode

File is owned by effective UID

-R

mode

File is readable by real UID/GID

-W

mode

File is writable by real UID/GID

-X

mode

File is executable by real UID/GID

-O

mode

File is owned by real UID

-e

File exists

-z

size

File has zero size

-s

size

File has nonzero size (returns size)

-f

mode,rdev

File is a plain file

-d

mode,rdev

File is a directory

-l

mode

File is a symbolic link

-p

mode

File is a named pipe (FIFO)

-S

mode

File is a socket

-b

rdev

File is a block special file

-c

rdev

File is a character special file

-t

rdev

Filehandle is opened to a tty

-u

mode

File has setuid bit set

-g

mode

File has setgid bit set

-k

mode

File has sticky bit set

-T

N/A

File is a text file

-B

N/A

File is a binary file (opposite of -T)

-M

mtime

Age of file in days when script started

-A

atime

Same for access time

-C

ctime

Same for inode change time (not creation)

The stat and the -X operators cache the values that the stat(2) syscall returned. If you then call stat or a -X operator with the special filehandle _ (a single underscore), it won't call stat again but will instead return information from its cache. This lets you test many properties of a single file without calling stat(2) many times or introducing a race condition:

open(F, "<", $filename )
    or die "Opening $filename: $!\n";
unless (-s F && -T _) {
    die "$filename doesn't have text in it.\n";
}

The stat call just returns the information in one inode, though. How do we list the directory contents? For that, Perl provides opendir, readdir, and closedir:

opendir(DIRHANDLE, "/usr/bin") or die "couldn't open /usr/bin : $!";
while ( defined ($filename = readdir(DIRHANDLE)) ) {
    print "Inside /usr/bin is something called $filename\n";
}
closedir(DIRHANDLE);

These directory-reading functions are designed to look like the file open and close functions. Where open takes a filehandle, though, opendir takes a directory handle. They may look the same to you (the same bare word), but they occupy different namespaces. Therefore, you could open(BIN, "/a/file") and opendir(BIN, "/a/dir"), and Perl won't get confused. You might, but Perl won't. Because filehandles and directory handles are different, you can't use the <> operator to read from a directory handle (<> calls readline on the filehandle).

Similar to what happens with open and the other functions that initialize filehandles, you can supply opendir an undefined scalar variable where the directory handle is expected. If the function succeeds, Perl initializes that variable with a reference to a new, anonymous directory handle.

opendir(my $dh, "/usr/bin") or die;
while (defined ($filename = readdir($dh))) {
  # ...
}
closedir($dh);

Just like any other autovivified reference, when this one is no longer used (for example, when it goes out of scope and no other references to it are held), Perl automatically deallocates it. And just as close is implicitly called on filehandles autovivified through open at that point, directory handles autovivified through opendir have closedir called on them, too.

Filenames in a directory aren't necessarily stored alphabetically. For an alphabetical list of files, read the entries and sort them yourself.

The separation of directory information from inode information can create some odd situations. Operations that update the directory—such as linking, unlinking, or renaming a file—all require write permission only on the directory, not on the file. This is because the name of a file is actually something the directory calls that file, not a property inherent to the file itself. Only directories hold names of files; files are ignorant of their own names. Only operations that change information in the file data itself demand write permission on the file. Lastly, operations that alter the file's permissions or other metadata are restricted to the file's owner or the superuser. This can lead to the interesting situation of being able to delete (i.e., unlink from its directory) a file you can't read, or write to a file you can't delete.

Although these situations may make the filesystem structure seem odd at first, they're actually the source of much of Unix's power. Links, two filenames that refer to the same file, are now extremely simple. The two directory entries just list the same inode number. The inode structure includes a count of the number of directory entries referring to the file (nlink in the values returned by stat). This lets the operating system store and maintain only one copy of the modification times, size, and other file attributes. When one directory entry is unlinked, data blocks are deleted only if the directory entry was the last one that referred to the file's inode—and no processes still have the file open. You can unlink an open file, but its disk space won't be released until the last close.

Links come in two forms. The kind described previously, where two directory entries list the same inode number (like vi and nvi in the earlier table), are called hard links. The operating system cannot tell the first directory entry of a file (the one created when the file was created) from any subsequent hard links to it. The other kind, soft or symbolic links, are very different. A soft link is a special type of file whose data block stores the filename the file is linked to. Soft links have a different mode value, indicating they're not regular files. The operating system, when asked to open a soft link, instead opens the filename contained in the data block.

Executive Summary

Filenames are kept in a directory, separate from the size, protections, and other metadata kept in an inode.

The stat function returns the inode information (metadata).

opendir, readdir, and friends provide access to filenames in a directory through a directory handle.

Directory handles look like filehandles, but they are not the same. In particular, you can't use <> on directory handles.

Permissions on a directory determine whether you can read and write the list of filenames. Permissions on a file determine whether you can change the file's metadata or contents.

Three different times are stored in an inode. None of them is the file's creation time.

Main Page

→