
With the basic builtin Java data types we have seen in the previous chapters, each identifier corresponds to a single variable. But when you want to handle sets of values of the same type – the first 1000 primes for example – you really don't want to have to name them individually. What you need is an array.
An array is a named set of variables of the same type. Each variable in the array is called an array element. To reference a particular element in an array you use the array name combined with an integer value of type int, called an index. The index for an array element is the offset of that particular element from the beginning of the array. The first element will have an index of 0, the second will have an index of 1, the third an index of 2, and so on. The index value does not need to be an integer literal. It can be any expression that results in a value of type int equal to or greater than zero. Obviously a for loop control variable is going to be very useful for processing array elements – which is why you had to wait until now to hear about arrays.
You are not obliged to create the array itself when you declare the array variable. The array variable is distinct from the array itself. You could declare the integer array variable primes with the statement:
int[] primes; // Declare an integer array variable
The variable primes is now a place holder for an integer array that you have yet to define. No memory is allocated to hold the array itself at this point. We will see in a moment that to create the array itself we must specify its type and how many elements it is to contain. The square brackets following the type in the previous statement indicates that the variable is for referencing an array of int values, and not for storing a single value of type int.
You may come across an alternative notation for declaring an array variable:
int primes[]; // Declare an integer array variable
Here the square brackets appear after the variable name, rather than after the type name. This is exactly equivalent to the previous statement so you can use either notation. Many programmers prefer the original notation, as int[] tends to indicate more clearly that the type is an int array.
Once you have declared an array variable, you can define an array that it will reference:
primes = new int[10]; // Define an array of 10 integers
This statement creates an array that will store 10 values of type int, and records a reference to the array in the variable primes. The reference is simply where the array is in memory. You could also declare the array variable and define the array of type int to hold 10 prime numbers with a single statement, as shown in the following illustration:
The first part of the definition specifies the type of the array. The type name, int in this case, is followed by an empty pair of square brackets to indicate you are declaring an array rather than a single variable of type int. The part following the equal sign defines the array. The keyword new indicates that you are allocating new memory for the array, and int[10] specifies you want capacity for 10 variables of type int in the array. Since each element in the primes array is an int variable requiring 4 bytes, the whole array will occupy 40 bytes, plus 4 bytes to store the reference to the array. When an array is created like this, all the array elements are initialized to a default value automatically. The initial value is zero in the case of an array of numerical values, false for boolean arrays, '\u0000' for arrays storing type char, and null for an array of a class type.
Before we go any further, let's clarify a bit of terminology we have been using in this discussion. A declaration for an array just defines the variable name. So the statement:
double[] myArray;
is a declaration for the array name, myArray. No memory has been allocated to store the array itself and the number of elements has not been defined.
The statement:
double[] myArray = new double[100];
is a declaration of the array variable myArray and a definition of the array, since the array size is specified. The variable myArray will refer to an array of 100 values of type double and each element will have the value 0.0 assigned by default.
You refer to an element of an array by using the array name followed by the element's index value enclosed between square brackets. You can specify an index value by any expression that produces zero or a positive result of type int. If you use a value of type long as an index, you will get an error message from the compiler; if your calculation of an index uses long variables you will need to cast it to type int. You will no doubt recall from Chapter 2 that expressions involving values of type short and type byte produce a result of type int, so you can use those in an index expression.
The first element of the primes array that we declared previously is referred to as primes[0], and you reference the fifth element in the array as primes[4]. The maximum index value for an array is one less than the number of elements in the array. Java checks that the index values you use are valid. If you use an index value that is less than 0, or greater than the index value for the last element in the array, an exception will be thrown – throwing an exception is just the way errors at execution time are signaled and there are different types of exceptions for signaling various kinds of errors. The exception in this case is called an IndexOutOfBoundsException. When such an exception is thrown, your program will normally be terminated. We will be looking in detail at exceptions in Chapter 7, including how you can deal with exceptions and prevent termination of your program.
The array, primes, is what is sometimes referred to as a onedimensional array, since each of its elements is referenced using one index – running from 0 to 9 in this case. We will see later that arrays can have two or more dimensions, the number of dimensions being the same as the number of indexes required to access an element of the array.
The array variable is separate from the array itself. Rather like the way an ordinary variable can refer to different values at different times, you can use an array variable to reference different arrays at different points in your program. Suppose you have declared and defined the variable primes as before:
int[] primes = new int[10]; // Allocate an array of 10 integer elements
This produces an array of 10 elements of type int. Perhaps a bit later in your program you want the array variable primes to refer to a larger array, with 50 elements say. You would simply write:
primes = new int[50]; // Allocate an array of 50 integer elements
Now the variable primes refers to a new array of values of type int that is entirely separate from the original. When this statement is executed, the previous array of 10 elements is discarded, along with all the data values you may have stored in it. The variable primes can now only be used to reference elements of the new array. This is illustrated in the next diagram.
After executing the statement shown in the diagram, the array variable primes now points to a new integer array of 50 elements, with index values running from 0 to 49. Although you can change the array that an array variable references, you can't alter the type of value that an element stores. All the arrays referenced by a given variable must correspond to the original type specified when the array variable was declared. The variable primes, for example, can only reference arrays of type int. We have used an int array in the illustration, but everything applies equally well to long or double or to any of the basic types. More than that, you can create arrays of any other type of object, including the classes that you will be defining yourself in Chapter 5.
You can initialize an array with your own values when you declare it, and at the same time determine how many elements it will have. Following the declaration of the array variable, simply add an equal sign followed by the list of element values enclosed between braces. For example, if you write:
int[] primes = {2, 3, 5, 7, 11, 13, 17}; // An array of 7 elements
the array is created with sufficient elements to store all of the initializing values that appear between the braces, seven in this case. The array size is determined by the number of initial values so no other information is necessary to define the array. If you specify initializing values for an array, you must include values for all the elements. If you only want to set some of the array elements to values explicitly, you should use an assignment statement for each element. For example:
int[] primes = new int[100]; primes[0] = 2; primes[1] = 3;
The first statement declares and defines an integer array of 100 elements, all of which will be initialized to zero. The two assignment statements then set values for the first two array elements.
You can also initialize an array with an existing array. For example, you could declare the following array variables:
long[] even = {2L, 4L, 6L, 8L, 10L}; long[] value = even;
where the array even is used to initialize the array value in its declaration. This has the effect shown below.
You have created two array variables, but you only have one array. Both arrays refer to the same set of elements and you can access the elements of the array through either variable name – for example, even[2] refers to the same variable as value[2]. One use for this is when you want to switch the arrays referenced by two variables. If you were sorting an array by repeatedly transferring elements from one array to another, by flipping the array you were copying from with the array you were copying to, you could use the same code. For example, if we declared array variables as:
double[] inputArray = new double[100]; // Array to be sorted double[] outputArray = new double[100]; // Reordered array double[] temp; // Temporary array reference
when we want to switch the array referenced by outputArray to be the new input array, we could write:
temp = inputArray; // Save reference to inputArray in temp inputArray = outputArray; // Set inputArray to refer to outputArray outputArray = temp; // Set outputArray to refer to what was inputArray
None of the array elements are moved here. Just the addresses of where the arrays are located in memory are swapped, so this is a very fast process. Of course, if you want to replicate an array, you have to define a new array of the same size and type, and then copy each element of the array individually to your new array.
You can use array elements in expressions in exactly the same way as you might use a single variable of the same data type. For example, if you declare an array samples, you can fill it with random values between 0.0 and 100.0 with the following code:
double[] samples = new double[50]; // An array of 50 double values for(int i = 0; i < 50; i++) samples[i] = 100.0*Math.random(); // Generate random values
To show that array elements can be used in exactly the same way as ordinary variables, you could write:
double result = (samples[10]*samples[0] – Math.sqrt(samples[49]))/samples[29];
This is a totally arbitrary calculation of course. More sensibly, to compute the average of the values stored in the samples array, you could write:
double average = 0.0; // Variable to hold the average for(int i = 0; i < 50; i++) average += samples[i]; // Sum all the elements average /= 50; // Divide by the total number of elements
Within the loop we accumulate the sum of all the elements of the array samples in the variable average. We then divide this sum by the number of elements.
Notice how we use the length of the array, 50, all over the place. It appears in the for loop, and in floating point form as a divisor to calculate the average. When you use arrays you will often find that references to the length of the array are strewn all through your code. And if you later want to change the program, to handle 100 elements for instance, you need to be able to decide whether any particular value of 50 in the code is actually the number of elements, and therefore should be changed to 100, or if it is a value that just happens to be the same and should be left alone. Java helps you avoid this problem, as we will now see.
You can refer to the length of the array using length, a data member of the array object. For our array samples, we can refer to its length as samples.length. We could use this to write the calculation of the average as:
double average = 0.0; // Variable to hold the average for(int i = 0; i < samples.length; i++) average += samples[i]; // Sum all the elements average /= samples.length; // Divide by the total number of elements
Now the code is independent of the number of array elements. If you change the number of elements in the array, the code will automatically deal with that. You will also see in Chapter 6 that being able to obtain the length of an array in this way is very convenient in the context of coding your own class methods that process arrays. You should always use this approach when you need to refer to the length of an array – never use explicit values.
Let's try out an array in an improved program to calculate prime numbers:
Try out the following code derived, in part, from the code we used in Chapter 2.
public class MorePrimes { public static void main(String[] args) { long[] primes = new long[20]; // Array to store primes primes[0] = 2; // Seed the first prime primes[1] = 3; // and the second int count = 2; // Count of primes found – up to now, // which is also the array index long number = 5; // Next integer to be tested outer: for( ; count < primes.length; number += 2) { // The maximum divisor we need to try is square root of number long limit = (long)Math.ceil(Math.sqrt((double)number)); // Divide by all the primes we have up to limit for(int i = 1; i < count && primes[i] <= limit; i++) { if(number%primes[i] == 0) { // Is it an exact divisor? continue outer; // Yes, try the next number } } primes[count++] = number; // We got one! } for(int i=0; i < primes.length; i++) System.out.println(primes[i]); // Output all the primes } }
This program computes as many prime numbers as the capacity of the array primes will allow.
How It Works
Any number that is not a prime must be a product of prime factors, so we only need to divide a prime number candidate by prime numbers that are less than or equal to the square root of the candidate to test for whether it is prime. This is fairly obvious if you think about it. For every factor a number has that is greater than the square root of the number, the result of division by this factor is another factor that is less than the square root. You perhaps can see this more easily with a specific example. The number 24 has a square root that is a bit less than 5. You can factorize it as 2x12, 3x8, 4x6, then we come to cases where the first factor is greater than the square root so the second is less, 6x4, 8x3 etc., and so we are repeating the pairs of factors we already have.
We first declare the array primes to be of type long, and define it as having 20 elements. We set the first two elements of the primes array to 2 and 3 respectively to start the process off, as we will use the primes we have in the array as divisors when testing a new candidate. The variable, count, is the total number of primes we have found, so this starts out as 2. Note that we use count as the for loop counter, so we omit the first expression between parentheses in the loop statement as count has already been set.
The candidate to be tested is stored in number, with the first value set as 5. The for loop statement labeled outer is slightly unusual. First of all, the variable count that determines when the loop ends is not incremented in the for loop statement, but in the body of the loop. We use the third expression between the for loop parentheses to increment number in steps of two, since we don't want to check even numbers. The for loop ends when count is equal to the length of the array. We test the value in number in the inner for loop by dividing number by all of the prime numbers we have in the primes array that are less than, or equal to, the square root of the candidate. If we get an exact division the value in number is not prime, so we go immediately to the next iteration of the outer loop via the continue statement.
We calculate the limit for divisors we need to try with the statement:
long limit = (long)Math.ceil(Math.sqrt((double)number));
The Math.sqrt() method produces the square root of number as a double value, so if number has the value 7, for instance, a value of about 2.64575 will be returned. This is passed to the ceil() method that is also a member of the Math class. The ceil() method returns a value of type double that is the minimum whole number that is not less than the value passed to it. With number as 7, this will return 3.0, the smallest integral value not less than the square root of 7. We want to use this number as the limit for our integer divisors, so we cast it to type long and store the result in limit.
If we get no exact division, we exit normally from the inner loop and execute the statement:
primes[count++] = number; // We got one!
Because count is the number of values we have stored, it also corresponds to the index for the next free element in the primes array. Thus we use count as the index to the array element in which we want to store the value of number, and then increment count.
When we have filled the primes array, the outer loop will end and we will output all the values in the array. Note that, because we have used the length member of the primes object whenever we need the number of elements in the array, changing the number of elements in the definition of the array to generate a larger or smaller number of primes is simple.
We can express the logical process of the program with an algorithm as follows:
Take the number in question and determine its square root.
Set the limit for divisors to be the smallest integer that is greater than this square root value.
Test to see if the number can be divided exactly (without remainder) by any of the primes already in the primes array that are less than the limit for divisors.
If it can, discard the existing number and start a new iteration of the loop with the next candidate number. If it can't, it is a prime, so enter the existing number in the first available empty slot in the array and then move to the next iteration for a new candidate number.
If the array of primes is full, do no more iterations, and print out all the prime number values in the array.
We have only worked with onedimensional arrays up to now, that is, arrays that use a single index. Why would you ever need the complications of using more indexes to access the elements of an array?
Suppose that you have a fanatical interest in the weather, and you are intent on recording the temperature each day at 10 separate geographical locations throughout the year 2002. Once you have sorted out the logistics of actually collecting this information, you can use an array of 10 elements corresponding to the number of locations, where each of these elements is an array of 365 elements to store the temperature values. You would declare this array with the statement:
float[][] temperature = new float[10][365];
This is called a twodimensional array, since it has two dimensions – one with index values running from 0 to 9, and the other with index values from 0 to 364. The first index will relate to a geographical location, and the second index corresponds to the day of the year. That's much handier than a onedimensional array with 3650 elements, isn't it?
The organization of the twodimensional array is shown in the following diagram.
There are 10 arrays, each having 365 elements. In referring to an element, the first square brackets enclose the index for a particular array, and the second pair of square brackets enclose the index value for an element within that array. So to refer to the temperature for day 100 for the sixth location, you would use temperature[5][99]. Since each float variable occupies 4 bytes, the total space required to store the elements in this twodimensional array is 10x365x4 bytes, which is a total of 14,600 bytes.
For a fixed second index value in a twodimensional array, varying the first index direction is often referred to as accessing a column of the array. Similarly, fixing the first index value and varying the second, you access a row of the array. The reason for this terminology is apparent from the last diagram.
You could just as well have used two statements to create the last array, one to declare the array variable, and the other to define the array:
float [][] temperature; // Declare the array variable temperature = new float[10][365]; // Create the array
The first statement declares the array variable temperature for twodimensional arrays of type float. The second statement creates the array with ten elements, each of which is an array of 365 elements.
Let's exercise this twodimensional array in a program to calculate the average annual temperature for each location.
In the absence of real samples, we will generate the temperatures as random values between 10( and 35(. This assumes we are recording temperatures in degrees Celsius. If you prefer Fahrenheit you could use 14( to 95( to cover the same range.
public class WeatherFan { public static void main(String[] args) { float[][] temperature = new float[10][365]; // Temperature array // Generate random temperatures for(int i = 0; i < temperature.length; i++) { for(int j = 0; j < temperature[i].length; j++) temperature[i][j] = (float)(45.0*Math.random() – 10.0); } } // Calculate the average per location for(int i = 0; i < temperature.length; i++) { float average = 0.0f; // Place to store the average for(int j = 0; j < temperature[0].length; j++) average += temperature[i][j]; // Output the average temperature for the current location System.out.println("Average temperature at location " + (i+1) + " = " + average/(float)temperature[i].length); } } }
How It Works
After declaring the array temperature we fill it with random values using nested for loops. Note how temperature.length used in the outer loop refers to the length of the first dimension, 10 in this case. In the inner loop we use temperature[i].length to refer to the length of the second dimension, 365. We could use any index value here; temperature[0].length would have been just as good for all the elements, since the lengths of the rows of the array are all the same in this case.
The Math.random() method generates a value of type double from 0.0 up to, but excluding, 1.0. This value is multiplied by 45.0 in the expression for the temperature, which results in values between 0.0 and 45.0. Subtracting 10.0 from this value gives us the range we require, 10.0 to 35.0.
We then use another pair of nested for loops, controlled in the same way as the first, to calculate the averages of the stored temperatures. The outer loop iterates over the locations and the inner loop sums all the temperature values for a given location. Before the execution of the inner loop, the variable average is declared and initialized, and this is used to accumulate the sum of the temperatures for a location in the inner loop. After the inner loop has been executed, we output the average temperature for each location, identifying the locations by numbers 1 to 10, one more than the index value for each location. Note that the parentheses around (i+1) here are essential. To get the average we divide the variable average by the number of samples, which is temperature[i].length, the length of the array holding temperatures for the current location. Again, we could use any index value here since, as we have seen, they all return the same value, 365.
When you create an array of arrays, the arrays in the array do not need to be all the same length. You could declare an array variable samples with the statement:
float[][] samples; // Declare an array of arrays
This declares the array object samples to be of type float[][]. You can then define the number of elements in the first dimension with the statement:
samples = new float[6][]; // Define 6 elements, each is an array
The variable samples now references an array with six elements, each of which can hold a reference to a onedimensional array. You can define these arrays individually if you want:
samples[2] = new float[6]; // The 3rd array has 6 elements samples[5] = new float[101]; // The 6th array has 101 elements
This defines two of the arrays. Obviously you cannot use an array until it has been defined, but you could conceivably use these two and define the others later – not a likely approach though!
If you wanted the array samples to have a triangular shape, with one element in the first row, two elements in the second row, three in the third row, and so on, you could define the arrays in a loop:
for(int i = 0; i < samples.length; i++) samples[i] = new float[i+1]; // Allocate each array
The effect of this is to produce an array layout that is shown in the diagram below.
The 21 elements in the array will occupy 84 bytes. When you need a twodimensional array with rows of varying length, allocating them to fit the requirement can save a considerable amount of memory compared to just using rectangular arrays where the row lengths are all the same.
To check out that the array is as shown, you could implement this in a program, and display the length member for each of these arrays.
You are not limited to twodimensional arrays either. If you are an international Java Bean grower with multiple farms across several countries, you could arrange to store the results of your bean counting in the array declared and defined in the statement:
long[][][] beans = new long[5][10][30];
The array, beans, has three dimensions. It provides for holding bean counts for each of up to 30 fields per farm, with 10 farms per country in each of 5 countries.
You can envisage this as just a threedimensional array, but remember that beans is an array of five elements, each of which holds a twodimensional array, and each of these twodimensional arrays can be different. For example if you really want to go to town, you can declare the array beans with the statement:
long[][][] beans = new long[3][][]; // Three twodimensional arrays
Each of the three elements in the first dimension of beans can hold a different twodimensional array, so you could specify the first dimension of each explicitly with the statements:
beans[0] = new long[4][]; beans[1] = new long[2][]; beans[2] = new long[5][];
These three arrays have elements that each hold a onedimensional array, and you can also specify the sizes of these independently. Note how the empty square brackets indicate there is still a dimension undefined. You could give the arrays in each of these elements random dimensions between 1 and 7 with the following code:
for(int i = 0; i < beans.length; i++) // Vary over 1st dimension for(int j = 0; j < beans[i].length; j++) // Vary over 2nd dimension beans[i][j] = new long[(int)(1.0 + 6.0*Math.random())];
If you can find a sensible reason for doing so, or if you are just a glutton for punishment, you can extend this to four, or more, dimensions.