Java Tutorial

File I/O Basics

If you are new to programming file operations, there are a couple of things that may not be apparent to you and can be a source of confusion so let's clarify them before we go any further.

Firstly, let's consider the nature of a file. Once you have written data to a file, you can regard it as just a linear sequence of bytes. The bytes in a file are referenced by their offset from the beginning, so the first byte is byte 0, the next byte is byte 1, the third byte is byte 2, and so on through to the end of the file. If there are n bytes in a file the last byte will be at offset n-1. There is no specific information in the file about how the data originated or what it represents unless you explicitly put it there. Even if there is, you need to know that it's there and read and interpret the data accordingly.

For instance if you write a series of 25 binary values of type int to a file, it will contain 100 bytes. There will be nothing in the file to indicate that the data consists of four byte integers so there is nothing to prevent you from reading the data back as 50 Unicode characters or 10 long values followed by a string, or any other arbitrary collection of data items that corresponds to 100 bytes. Of course, the result is unlikely to be very meaningful unless you interpret the data in the form in which it was written. This implies that to read data from a file correctly, you need to have prior knowledge of the structure and format of the data.

There are many ways in which the form of the data in the file may be recorded or implied. For instance, one way that the format of the data in a file can be communicated is to use an agreed file name extension for data of a particular kind, such as .java or .gif or .wav. Each type of file has a predefined structure so from the file extension you know how to interpret the data in the file. Another way is to use a generalized mechanism for communicating data and its structure such as XML. We will be looking into how we can work with XML in Java in Chapters 21 and 22.

You can access an existing file to read from or write to it in two different ways, described as sequential access or random access. The latter is sometimes referred to as direct access. Sequential access to a file is quite straightforward and works pretty much as you would expect. Sequential read access involves reading bytes from the file starting from the beginning with byte 0. Of course, if you are only interested in the file contents starting at byte 100, you can just read and ignore the first 100 bytes. Sequential write access involves writing bytes to the file either starting at the beginning if you are replacing the existing data or at the end if you are appending new data to the file.

The term random access is often misunderstood initially. Just like sequential access, random access is just a way of accessing data in a file and has nothing to do with how the data in the file is structured or how the physical file was originally written. You can access any file randomly for reading and/or writing. When you access a file randomly, you can read one or more bytes from the file starting at any point. For instance, you could read 20 bytes starting at the thirteenth byte in the file (which will be the byte at offset 12 of course), then read 50 bytes starting at the 101st byte or any other point that you choose. Similarly, you can update an existing file in random access mode by writing data starting at any point in the file. In random access mode, the choice of where to start reading or writing and how many bytes you read or write, is entirely up to you. You just need to know the offset for the byte where a read or write operation should start. Of course, for these to be sensible and successful operations, you have to have a clear idea of how the data in the file is structured.

Important

First a note of caution; before running any of the examples in this chapter, be sure to set up a separate directory for storing the files that you are using when you are testing programs. It's also not a bad idea to back up any files and directories on your system that you don't want to risk losing. But of course, you do back up your files regularly anyway – right?

The old adage, 'If anything can go wrong, it will,' applies particularly in this context, as does the complementary principle, 'If anything can't go wrong, it will'. Remember also that the probability of something going wrong increases in proportion to the inconvenience it is likely to cause.