Java Tutorial

Understanding Streams

A stream is an abstract representation of an input or output device that is a source of, or destination for, data. You can write data to a stream and read data from a stream. You can visualize a stream as a sequence of bytes that flows into or out of your program.

When you write data to a stream, the stream is called an output stream. The output stream can go to any device to which a sequence of bytes can be transferred, such as a file on a hard disk, or a phone line connecting your system to a remote system. An output stream can also go to your display screen, but only at the expense of limiting it to a fraction of its true capability. This is output to the command line. When you write to your display screen using a stream, it can only display characters, not graphical output. Graphical output requires more specialized support that we will discuss from Chapter 15 onwards. Note that while a printer can be considered notionally as a stream, printing in Java does not work this way. A printer in Java is treated as a graphical device, so sending output to the printer is very similar to displaying graphical output on your display screen. You will learn how printing works in Java in Chapter 20.

You read data from an input stream. In principle, this can be any source of serial data, but is typically a disk file, the keyboard, or a remote computer.

Under normal circumstances, file input and output for the machine on which your program is executing is only available to Java applications. It is not available to Java applets except to a strictly limited extent. If this were not so, a malicious Java applet embedded in a web page could trash your hard disk. An IOException will normally be thrown by any attempted operation on disk files on the local machine in a Java applet. The directory containing the .class file for the applet, and its subdirectories, are freely accessible to the applet. Also, the security features in Java can be used to control what an applet (and an application running under a Security Manager) can access so that an applet can only access files or other resources for which it has explicit permission.

The main reason for using a stream as the basis for input and output operations is to make your program code for these operations independent of the device involved. This has two advantages. First, you don't have to worry about the detailed mechanics of each device, which are taken care of behind the scenes. Second, your program will work for a variety of input/output devices without any changes to the code.

Stream input and output methods generally permit very small amounts of data, such as a single character or byte, to be written or read in a single operation. Transferring data to or from a stream like this may be extremely inefficient, so a stream is often equipped with a buffer in memory, in which case it is called a buffered stream. A buffer is simply a block of memory that is used to batch up the data that is transferred to or from an external device. Reading or writing a stream in reasonably large chunks will reduce the number of input/output operations necessary, and thus make the process more efficient.

When you write to a buffered output stream, the data is sent to the buffer, and not to the external device. The amount of data in the buffer is tracked automatically, and the data is usually sent to the device when the buffer is full. However, you will sometimes want the data in the buffer to be sent to the device before the buffer is full, and there are methods provided to do this. This operation is usually termed flushing the buffer.

Buffered input streams work in a similar way. Any read operation on a buffered input stream will read data from the buffer. A read operation for the device that is the source of data for the stream will only be read when the buffer is empty, and the program has requested data. When this occurs, a complete buffer-full of data will be read automatically from the device, if sufficient data is available.

Binary and Character Streams

The java.io package supports two types of streams, binary streams, which contain binary data, and character streams, which contain character data. Binary streams are sometimes referred to as byte streams. These two kinds of streams behave in different ways when you read and write data.

When you write data to a binary stream, the data is written to the stream as a series of bytes, exactly as it appears in memory. No transformation of the data takes place. Binary numerical values are just written as a series of bytes, four bytes for each value of type int, eight bytes for each value of type long, eight bytes for each value of type double, and so on. As we saw in Chapter 2, Java stores its characters internally as Unicode characters, which are 16-bit characters, so each Unicode character is written to a binary stream as two bytes, the high byte being written first.

Character streams are used for storing and retrieving text. You may also use character streams to read text files not written by a Java program. All binary numeric data has to be converted to a textual representation before being written to a character stream. This involves generating a character representation of the original binary data value. Reading numeric data from a stream that contains text involves much more work than reading binary data. When you read a value of type int from a binary stream, you know that it consists of four bytes. When you read an integer from a character stream, you have to determine how many characters make up the value. For each numerical value you read from a character stream, you have to be able to recognize where the value begins and ends, and then convert the token - the sequence of characters that represents the value - to its binary form. This is illustrated below:

When you write strings to a stream as character data, by default, the Unicode characters are automatically converted to the local representation of the characters in the host machine, and these are then written to the stream. When you read a string, the default mechanism is to convert the data from the stream back to Unicode characters from the local machine representation. With character streams, your program reads and writes Unicode characters, but the stream will contain characters in the equivalent character encoding used by the local computer.

You don't have to accept the default conversion process for character streams. Java allows named mappings between Unicode characters and sets of bytes to be defined, called charsets, and you can select an available charset that should apply when data is transferred to, or from, a particular character stream. We won't be going into this in detail, but you can find more information on defining and using charsets in the SDK documentation for the Charset class.