Java Tutorial

Reading Mixed Data

The primes.txt file that we created in the previous chapter contains data of three different types. We have the string length as a binary value of type double of all things, followed by the string itself describing the prime value, followed by the binary prime value as type long. Reading this file is a little trickier than it looks at first sight.

To start with we will set up the file input stream and obtain the channel for the file. Since, apart from the name of the file, this is exactly as in the previous example we won't repeat it here. Of course, the big problem is that we don't know ahead of time exactly how long the strings are. We have two strategies to deal with this:

We can read the string length in the first read operation, then read the string and the binary prime value in the next. The only downside to this approach is that it's not a particularly efficient way to read the file as we will have read operations that each read a very small amount of data.
We can set up a sizable byte buffer of an arbitrary capacity and just fill it with bytes from the file. We can then sort out what we have in the buffer. The problem with this approach is that the buffer's contents may well end part way through one of the data items from the file. We will have to do some work to detect this and figure out what to do next but this will be much more efficient than the first approach since we will vastly reduce the number of read operations that are necessary to read the entire file.

Let's try the first approach first as it's easier.

To read the string length we need a byte buffer with a capacity to hold a single value of type double:

ByteBuffer lengthBuf = ByteBuffer.allocate(8);

We can create a byte buffer to hold both the string and the binary prime value, but only after we know the length of the string. We will also need an array of type byte[] to hold the string characters – remember, we wrote the string as bytes, not Unicode characters. Some variables will come in handy:

int strLength = 0;      // Stores the string length 
ByteBuffer buf = null;  // Stores a reference to the second byte buffer 
byte[] strChars = null; // Stores a reference to an array to hold the string

Since we need two read operations to get at all the data for a single prime, we will adopt a different strategy for reading the entire file. We will put both read operations in an indefinite loop and use a break statement to exit the loop when we hit the end-of-file (EOF). Here's how we can read the file:

while(true) {
  if(inChannel.read(lengthBuf) == -1)  // Read the string length, if its EOF
    break;                             // exit the loop
  
  lengthBuf.flip();
  strLength = (int)lengthBuf.getDouble(); // Extract length & convert to int
  buf = ByteBuffer.allocate(strLength+8); // Buffer for string & prime
  
  if(inChannel.read(buf) == -1) {         // Read string & binary prime value
    assert false;                            // Should not get here!
    break;                                   // Exit loop on EOF
  }
  
  buf.flip();
  strChars = new byte[strLength];     // Create the array for the string
  buf.get(strChars);                  // Extract string & binary prime value
    
  System.out.println("String length: " + strChars.length+ "  String: " +
                     new String(strChars) + "  Binary value: " + buf.getLong());
   
  lengthBuf.clear();                  // Clear the buffer for the next read
}

After reading the string length into lengthBuf we can create the second buffer and allocate the array to store the string characters. We don't need any view buffers at all to get at the data from the file. The getDouble() method for lengthBuf provides us with the length of the string and we get the string and the binary prime value using the get()and getLong() methods for buf. Of course, if we find a string length value, there ought to be a string and a binary prime, so we have an assertion to signal something has gone wrong if this turns out not to be the case.

Let's see how it works out in practice.

Try It Out – Reading Mixed Data from a File

Here's the complete program code:

import java.io.FileInputStream;
import java.io.IOException;
import java.io.File;
import java.io.FileNotFoundException;

import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class ReadPrimesMixedData {
  public static void main(String[] args) {
    File aFile = new File("C:/Beg Java Stuff/primes.txt");
    FileInputStream inFile = null;
   
    try {
      inFile = new FileInputStream(aFile); 

    } catch(FileNotFoundException e) {
      e.printStackTrace(System.err);
      System.exit(1);
    }
  
    FileChannel inChannel = inFile.getChannel();    
    try {
      ByteBuffer lengthBuf = ByteBuffer.allocate(8);
      int strLength = 0;          // Stores the string length 
      ByteBuffer buf = null;      // Stores a reference to the second byte buffer 
      byte[] strChars = null;     // A reference to an array to hold the string
 
      while(true) {
        if(inChannel.read(lengthBuf) == -1)       // Read the string length, 
          break;                                  // if its EOF exit the loop

        lengthBuf.flip();

        // Extract the length and convert to int
        strLength = (int)lengthBuf.getDouble();

        // Buffer for the string & the prime
        buf = ByteBuffer.allocate(strLength+8);

        if(inChannel.read(buf) == -1) {   // Read the string & binary prime value
          assert false;                   // Should not get here!
          break;                          // Exit loop on EOF
        }
        buf.flip();
        strChars = new byte[strLength];   // Create the array for the string
        buf.get(strChars);                // Extract string & binary prime value
          
        System.out.println("String length: " + strChars.length+ "  String: " +
                           new String(strChars) + "  Binary value: " +
                           buf.getLong());
        
        lengthBuf.clear();              // Clear the buffer for the next read
      }

      System.out.println("\nEOF reached.");
      inFile.close();                   // Close the file and the channel

    } catch(IOException e) {
      e.printStackTrace(System.err);
      System.exit(1);
    }
    System.exit(0);
  }
}

Don't forget that you need to specify the -source 1.4 option when you compile code that includes assertions and the -enableassertions option when you execute it. You should get the output:

String length: 9  String: prime = 2  Binary value: 2
String length: 9  String: prime = 3  Binary value: 3
String length: 9  String: prime = 5  Binary value: 5

and so on down to the end:

String length: 11  String: prime = 523  Binary value: 523
String length: 11  String: prime = 541  Binary value: 541

EOF reached.

How It Works

We read the file with a relatively straightforward process. On each iteration of the loop that reads the file, we first read 8 bytes into lengthBuf since this will be the length of the following string as type double. Knowing the length of the string, we are able to create a second buffer, buf, to accommodate this plus the 8-byte long value that is the prime in binary. The loop continues until the read operation using lengthBuf reaches the end-of-file. If we reach EOF while reading data into buf, the program will assert.

Compacting a Buffer

The alternative approach to reading the file that we identified was to read bytes from the file into a large buffer for efficiency and then figure out what is in it. Processing the data will need to take account of the possibility that the last data item in the buffer may be incomplete – part of a double or long value or part of a string. The essence of this approach will therefore be as follows:

Read from the file into the buffer.
Extract the string length, the string, and the binary prime value from the buffer repeatedly until no more complete values are available.
Shift any bytes that are left over in the buffer back to the beginning of the buffer. These will be some part of a complete set of the string length, the string, and the binary prime value. Go back to point 1 to read more from the file.

The buffer classes provide a method, compact(), for performing the operation we need in point 3 here to shift bytes that are left over back to the beginning. An illustration of the action of the compact() method on a buffer is shown below.

As you can see, everything remaining in the buffer, which will be elements from the buffer's position up to but not including the buffer's limit, is copied to the beginning of the buffer. The position is then set to the element following the last element copied and the limit is set to the capacity. This is precisely what you want when you have worked part way through the data in an input buffer and you want to add some more data from the file. Compacting the buffer sets the position and limit such that the buffer is ready to receive more data. The next read operation using the buffer will add data at the end of what was left in the buffer.

Any time we are processing an element from the buffer, accessing the string length, retrieving the string, or getting the binary value for a prime, we will need to check that there are at least the required number of bytes in the buffer. If there aren't, the buffer will need to be compacted, to shift what's left back to the start, and then replenished. Since we want to do this at three different points in the code, a method for this operation will come in handy:

private static int replenish(FileChannel channel, ByteBuffer buf) 
   throws IOException {

  // Number of bytes left in file
  long bytesLeft = channel.size() – channel.position();  

  if(bytesLeft == 0L)                                // If there are none
    return -1;                                       // we have reached the end
  
  buf.compact().limit(buf.position()
              + (bytesLeft<buf.remaining() ? (int)bytesLeft : buf.remaining()));
  return channel.read(buf); 
}

This method first checks that there really are some bytes left in the file with which to replenish the buffer. It then compacts the buffer and sets the limit for the buffer. The limit is automatically set to the capacity, but it is possible that the number of bytes left in the file is insufficient to fill the rest of the buffer. In this case the limit is set to accommodate the number of bytes available from the file. Note the throws clause. This indicates that the method can throw exceptions of type IOException. Exceptions of this type that are thrown are not handled by the code in the body of the method but are passed on to the calling method so we will need to put calls for this method in a try block. We can put the whole program together now.

Try It Out – Reading into a Large Buffer

Here are the changes to the original program code to read data into a large buffer:

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class ReadPrimesMixedData {
  public static void main(String[] args) {

    // Create the file input stream and get the file channel as before... 

    try {
      ByteBuffer buf = ByteBuffer.allocateDirect (1024);
      buf.position(buf.limit());       // Set the position for the loop operation
      int strLength = 0;               // Stores the string length
      byte[] strChars = null;          // Array for string 

      while(true) {
        if(buf.remaining() < 8) {       // Verify enough bytes for string length
          if(replenish(inChannel, buf) == -1)  // If not, replenish the buffer
            break;                             // but exit loop on EOF
          else
            buf.flip();
         }
         strLength = (int)buf.getDouble();
         
         // Verify enough bytes for complete string
         if(buf.remaining()<strLength) { 
           if(replenish(inChannel, buf) == -1) // If not, replenish the buffer
             assert false;                     // and we should never arrive here
           else
             buf.flip();
        }
        strChars = new byte[strLength];
        buf.get(strChars);
        
        if(buf.remaining()<8) {          // Verify enough bytes for prime value
          if(replenish(inChannel, buf) == -1)   // If not, replenish the buffer
            assert false;                       // and we should never arrive here
          else
            buf.flip();
        }

        System.out.println("String length: " + strChars.length + "  String: " +
                           new String(strChars) + "  Binary value: " + 
                           buf.getLong());
            
      }
      System.out.println("\nEOF reached.");
      inFile.close();                   // Close the file and the channel
 
    } catch(IOException e) {
      e.printStackTrace(System.err);
      System.exit(1);
    }
    System.exit(0);
  }
  
  private static int replenish(FileChannel channel, ByteBuffer buf) 
     throws IOException {

    // Number of bytes left in file
    long bytesLeft = channel.size() – channel.position();  
    if(bytesLeft == 0L)                               // If there are none                                    
      return -1;                                      // we have reached the end
    
    buf.compact().limit(buf.position() +
                (bytesLeft<buf.remaining() ? (int)bytesLeft : buf.remaining()));
    return channel.read(buf); 
  }
}

This should result in the same output as the previous example.

How It Works

All the work is done in the indefinite while loop. Before the loop executes we create a direct buffer with a capacity of 1024 bytes by calling the allocateDirect() method. A direct buffer will be faster if we are reading a lot of data from a file as the data are transferred directly from the file to our buffer. The code within the loop determines whether there are data in the buffer by calling the remaining() method for the buffer object. The default settings for the buffer, with the position at zero and the limit at the capacity, would suggest falsely that there are data in the buffer, so we set the position to the limit initially so that the remaining() method will return zero.

Within the loop we first check whether there are sufficient bytes for the double value specifying the string length. On the first iteration, this will definitely not be the case so the replenish() method will be called to compact the buffer and read data from the file. We then flip the buffer and get the length of the string. Of course, data in the file should be in groups of three items – string length, string, and binary prime value – so the end-of-file will be detected when trying to obtain the first of these. In this case we exit the loop by executing a break statement.

Next we get the string itself, after checking that there are sufficient bytes left in the buffer. We should never find EOF so we put an assertion rather than a break if EOF is detected. Finally we obtain the binary prime value in a similar way and output the group of three data items. The loop continues until all data have been read and processed and EOF is recognized when we are looking for a string length value.