Java Tutorial

Operations on Strings

There are many kinds of operations that can be performed on strings, but we can start with one you have used already, joining strings together, often called string concatenation.

Joining Strings

To join two String objects to form a single string you use the + operator, just as you have been doing with the argument to the println() method in the program examples thus far. The simplest use of this is to join two strings together:

myString = "The quick brown fox" + " jumps over the lazy dog";

This will join the two strings on the right of the assignment, and store the result in the String variable myString. The + operation generates a completely new String object that is separate from the original String objects that are the operands, and this new object is stored in myString.

Note that you can also use the += operator to concatenate strings. For example:

String phrase = "Too many";
phrase += " cooks spoil the broth";

After executing these statements the variable phrase will refer to the string "Too many cooks spoil the broth". Note that this does not modify the string "Too many". The string that is referenced by phrase after this statement has been executed is a completely new String object. This is illustrated on the following page.

Let's see how some variations on the use of the + operator with String objects work in an example.

Try It Out - String Concatenation

Enter the following code for the class JoinStrings:

public class JoinStrings {
   public static void main(String[] args) {

      String firstString = "Many ";
      String secondString = "hands ";
      String thirdString = "make light work";

      String myString;          // Variable to store results

      // Join three strings and store the result
      myString = firstString + secondString + thirdString;
      System.out.println(myString);

      // Convert an integer to String and join with two other strings
      int numHands = 99;
      myString = numHands + " " + secondString + thirdString;
      System.out.println(myString);

      // Combining a string and integers
      myString = "fifty five is " + 5 + 5;
      System.out.println(myString);

      // Combining integers and a string
      myString = 5 + 5 + " is ten";
      System.out.println(myString);
   }
}

If you run this example, it will produce some interesting results:

Many hands make light work
99 hands make light work
fifty five is 55
10 is ten

How It Works

The first line of output is quite straightforward. It simply joins the three string values stored in the String variables firstString, secondString, and thirdString into a single string, and stores this in the variable myString.

The second line of output is a use of the + operator we have used regularly with the println() method, but clearly something a little more complicated is happening here. This is illustrated below:

Behind the scenes, the value of the variable numHands is being converted to a string that represents this value as a decimal number. This is prompted by the fact that it is combined with the string literal, " ". Dissimilar types in a binary operation cannot be operated on, so one operand must be converted to the type of the other if the operation is to be possible. Here the compiler arranges that the numerical value stored in numHands is converted to type String to match the type of the right operand of the + operator. If you look back at the table of operator precedences, you will see that the associativity of the operator + is from left to right, so the strings are combined in pairs starting from the left, as shown in the diagram.

The left-to-right associativity of the + operator is important in understanding the next two lines of output. The two statements involved in creating these strings look very similar. Why does "5 + 5" result in 55 in one statement, and 10 in the other? The reason is illustrated below.

The essential difference between the two is that the first statement always has at least one operand of type String, so the operation is one of string concatenation, whereas in the second statement the first operation is an arithmetic add, as both operands are integers. In the first statement each of the integers is converted to type String individually. In the second, the numerical values are added, and the result, 10, is converted to a string representation to allow the literal " is ten" to be concatenated.

You don't need to know about this at this point, but in case you were wondering, the conversion of values of the basic types to type String is actually accomplished by using a static method, toString(), of a standard class that corresponds to the basic type. Each of the basic types has an equivalent class defined, so for the types we have discussed earlier there are the following classes:

Basic Type	Wrapper Class
byte	Byte
short	Short
int	Integer
long	Long
float	Float
double	Double
boolean	Boolean
character	Character

A value of one of the basic types is passed to the toString() method of the corresponding class as an argument, and that returns the String equivalent. All of this happens automatically when you are concatenating strings using the + operator. As we shall see, not only these classes have a toString() method - all classes do. We won't go into the further significance of these classes now, as we'll be covering these in more detail in Chapter 5.

The String class also defines a method, valueOf(), that will create a String object from a value of any of the basic types. You just pass the value you want converted to a string as the argument to the method, for instance:

String doubleString = String.valueOf(3.14159);

You call the valueOf() method using the name of the class, String, as shown above. This is because the method is a static member of the String class. You will learn what this means in Chapter 5. A literal or variable of any of the basic types can be passed to the valueOf() method, and it will return a String representation of the value.

Comparing Strings

Here is where the difference between the String variable and the string it references will become apparent. To compare variables of the basic types for equality you use the == operator. This does not apply to String objects (or any other objects). The expression:

string1 == string2

will check whether the two String variables refer to the same string. If they reference separate strings, this expression will have the value false, regardless of whether or not the strings happen to be identical. In other words, the expression above does not compare the strings themselves, it compares the references to the strings, so the result will be true only if string1 and string2 both refer to one and the same string. We can demonstrate this with a little example.

Try It Out - Two Strings, Identical but not the Same

In the following code, we test to see whether string1 and string3 refer to the same string.

public class MatchStrings {
  public static void main(String[] args) {

    String string1 = "Too many ";
    String string2 = "cooks";
    String string3 = "Too many cooks";

  // Make string1 and string3 refer to separate strings that are identical
    string1 += string2;

    // Display the contents of the strings
    System.out.println("Test 1");
    System.out.println("string3 is now: " + string3);
    System.out.println("string1 is now: " + string1);

    if(string1 == string3)                        // Now test for identity
      System.out.println("string1 == string3 is true." +
                         " string1 and string3 point to the same string");
    else
      System.out.println("string1 == string3 is false." +
                  " string1 and string3 do not point to the same string");

    // Now make string1 and string3 refer to the same string
    string3 = string1;
    // Display the contents of the strings
    System.out.println("\n\nTest 2");
    System.out.println("string3 is now: " + string3);
    System.out.println("string1 is now: " + string1);

    if(string1 == string3)     // Now test for identity
      System.out.println("string1 == string3 is true." +
                         " string1 and string3 point to the same string");
    else
      System.out.println("string1 == string3 is false." +
                  " string1 and string3 do not point to the same string");
  }
}

We have created two scenarios. In the first, the variables string1 and string3 refer to separate strings that happen to be identical. In the second, they both reference the same string. This will produce the output:

Test 1
string3 is now: Too many cooks
string1 is now: Too many cooks
string1==string3 is false. string1 and string3 do not point to the same string

Test 2
string3 is now: Too many cooks
string1 is now: Too many cooks
string1==string3 is true. string1 and string3 point to the same string

How It Works

The three variables string1, string2, and string3 are initialized with the string literals you see. After executing the assignment statement, the string referenced by string1 will be identical to that referenced by string3, but as you see from the output, the comparison for equality in the if statement returns false because the variables refer to two separate strings.

Next we change the value of string3 so that it refers to the same string as string1. The output demonstrates that the if expression has the value true, and that the string1 and string3 objects do indeed refer to the same string. This clearly shows that the comparison is not between the strings themselves, but between the references to the strings. So how do we compare the strings?

Comparing Strings for Equality

To compare two String variables, that is, to decide whether the strings they reference are equal or not, you must use the method equals(), which is defined in the String class. This method does a case sensitive comparison. Two strings are equal if they are the same length, that is, have the same number of characters, and each character in one string is identical to the corresponding character in the other.

To check for equality between two strings ignoring the case of the string characters, you use the method equalsIgnoreCase(). Let's put these in the context of an example to see how they work.

Try It Out - String Identity

Make the following changes to the MatchStrings.java file of the previous example:

public class MatchStrings {
  public static void main(String[] args) {

    String string1 = "Too many ";
    String string2 = "cooks";
    String string3 = "Too many cooks";

  // Make string1 and string3 refer to separate strings that are identical
    string1 += string2;

    // Display the contents of the strings
    System.out.println("Test 1");
    System.out.println("string3 is now: " + string3);
    System.out.println("string1 is now: " + string1);

    if(string1.equals(string3))                  // Now test for equality
      System.out.println("string1.equals(string3) is true." +
                                 " so strings are equal.");
    else
      System.out.println("string1.equals(string3) is false." +
                          " so strings are not equal.");

    // Now make string1 and string3 refer to strings differing in case
    string3 = "TOO many cooks";
    // Display the contents of the strings
    System.out.println("\n\nTest 2");
    System.out.println("string3 is now: " + string3);
    System.out.println("string1 is now: " + string1);

    if(string1.equals(string3))                   // Compare for equality
      System.out.println("string1.equals(string3) is true " +
                                 " so strings are equal.");
    else
      System.out.println("string1.equals(string3) is false" +
                                 " so strings are not equal.");

    if(string1.equalsIgnoreCase(string3))        // Compare, ignoring case
      System.out.println("string1.equalsIgnoreCase(string3) is true" +
                                 " so strings are equal ignoring case.");
    else
      System.out.println("string1.equalsIgnoreCase(string3) is false" +
                                 " so strings are different.");
  }
}

If you run this example, you should get the output:

Test 1
string3 is now: Too many cooks
string1 is now: Too many cooks
string1.equals(string3) is true. so strings are equal.


Test 2
string3 is now: TOO many cooks
string1 is now: Too many cooks
string1.equals(string3) is false so strings are not equal.

string1.equalsIgnoreCase(string3) is true so strings are equal ignoring case.

How It Works

Before we look in detail at how the program works, let's first take some time to look at how the method calls that pepper the code are put together.

In the if expression, we've called the method equals() of the object string1 to test for equality with string3. This is the syntax we have been using to call the method println() in the object out. In general, to call a method belonging to an object you write the object name, then a period, then the name of the method. The parentheses following the method name enclose the information to be passed to the method - string3 in this case. The general form for calling a method for an object is shown below.

Important

We will learn more about this in Chapter 5, when we look at how to define our own classes. For the moment, just note that you don't necessarily need to pass any arguments to a method. On the other hand there can be several. It all depends on how the method was defined in the class.

The equals()method requires one argument that you put between the parentheses. This must be the String object that is to be compared with the original object. The method returns true if the value passed to it (string3 in our example) is identical to the string pointed to by the String object that owns the method, in this case string1. As you may have already guessed, we could just as well call the equals() method for the object string3, and pass string1 as the argument to compare the two strings. In this case, the expression to call the method would be:

string3.equals(string1)

and we would get exactly the same result.

Looking at the program code, after outputting the values of string3 and string1, the next line shows that calling the equals() method for string1 with string3 as the argument returns true. After the if, we make string3 reference a new string. We then compare the values of string1 and string3 once more, and, of course, the result of the comparison is now false.

Finally we compare string1 with string3 using the equalsIgnoreCase() method. Here the result is true since the strings only differ in the case of the first three characters.

String Interning

Having convinced you of the necessity for using the equals method for comparing strings, we can now reveal that there is a way to make comparing strings with the == operator effective. The mechanism to make this possible is called string interning. String interning ensures that no two String objects encapsulate the same string so all String objects encapsulates unique strings. This means that if two String variables reference strings that are identical, the references must be identical too. To put it another way, if two String variables contain references that are not equal, they must refer to strings that are not equal. So how do we arrange that all String objects encapsulate unique strings? You just call the intern() method for every new String object that you create. For instance, let's amend a bit of an earlier example:

    String string1 = "Too many ";

    String string2 = "cooks";

    String string3 = "Too many cooks";



   // Make string1 and string3 refer to separate strings that are identical

   string1 += string2;

 string1 = string1.intern();          // Intern string1

The intern() method will check the string referenced by string1 against all the String objects currently in existence. If it already exists, the current object will be discarded and string1 will contain a reference to the existing object encapsulating the same string. As a result, the expression string1 == string3 will evaluate to true, whereas without the call to intern() it evaluated to false.

All string constants and constant String expressions are automatically interned. Thus if you add another variable to the code fragment above:

String string4 = "Too " +"many ";

the reference stored in string4 will be automatically the same as the reference stored in string1. Only String expressions involving variables need to be interned. We could have written the statement that created the combined string to be stored in string1 with the statement:

string1 = (string1 + string2).intern();

This now interns the result of the expression (string1 + string2), ensuring that the reference stored in string1 will be unique.

String interning has two benefits. First, it reduces the amount of memory required for storing String objects in your program. If your program generates a lot of duplicate strings then this will be significant. Second, it allows the use of == instead of the equals() method when you want to compare strings for equality. Since the == operator just compares two references, it will be much faster than the equals() method, which involves a sequence of character by character comparisons. This implies that you may make your program run much faster, but only in certain cases. Keep in mind that the intern() method has to use the equals() method to determine whether a string already exists. More than that, it will compare the current string against a succession of, and possibly all, existing strings in order to determine whether the current string is unique. Realistically you should stick to using the equals() method in the majority of situations and only use interning when you are sure that the benefits outweigh the cost.

Checking the Start and End of a String

It can be useful to be able to check just part of a string. You can test whether a string starts with a particular character sequence by using the method startsWith(). If string1 has been defined as "Too many cooks", the expression string1.startsWith("Too") will have the value true. So would the expression string1.startsWith("Too man"). The comparison is case sensitive so the expression string1.startsWith("tOO") will be false.

A complementary method endsWith()checks for what appears at the end of a string, so the expression string1.endsWith("cooks") will have the value true. The test is case sensitive here, too.

Sequencing Strings

You will often need to place strings in order, for example, when you have a collection of names. Testing for equality doesn't help - what you need is the method compareTo() in the class String. This method compares the String object from which it is called with the argument passed to it, and returns an integer which is negative if the String object is less than the argument passed, zero if the String object is equal to the argument, and positive if the String object is greater than the argument. It is not that obvious what the terms 'less than', 'equal to', and 'greater than' mean when applied to strings, so let's define that a bit more precisely.

Two strings are compared in the compareTo() method by comparing successive corresponding characters, starting with the first character in each string. The process continues until a pair of corresponding characters are found to be different, or the last character in the shortest string is reached. Individual characters are compared by comparing their Unicode representations - so two characters are equal if the numeric values of their Unicode representations are equal. One character is greater than another if the numerical value of its Unicode representation is greater than that of the other.

One string is greater than another if it has a character greater than the corresponding character in the other string, and all the previous characters were equal. So if string1 has the value "mad dog", and string2 has the value "mad cat", then the expression:

string1.compareTo(string2)

will return a positive value as a result of comparing the fifth characters in the strings: the 'd' in string1 with the 'c' in string2.

What if the corresponding characters in both strings are equal up to the end of the shorter string, but the other string has more characters? In this case the longer string is greater than the shorter string, so "catamaran" is greater than "cat".

One string is less than another string if it has a character less than the corresponding character in the other string, and all the preceding characters are equal. Thus the following expression will return a negative value:

string2.compareTo(string1)

Two strings are equal if they contain the same number of characters and corresponding characters are identical. In this case the compareTo() method returns 0.

We can exercise the compareTo() method in a simple example.

Try It Out - Ordering Strings

We will just create three strings that we can compare using the compareTo() method. Enter the following code:

public class SequenceStrings {
  public static void main(String[] args) {

    // Strings to be compared
    String string1 = "A";
    String string2 = "To";
    String string3 = "Z";

    // Strings for use in output
    String string1Out = "\"" + string1 + "\"";     // string1 with quotes 
    String string2Out = "\"" + string2 + "\"";     // string2 with quotes 
    String string3Out = "\"" + string3 + "\"";     // string3 with quotes 

    // Compare string1 with string3
    if(string1.compareTo(string3) < 0) {
      System.out.println(string1Out + " is less than " + string3Out);

    } else {
      if(string1.compareTo(string3) > 0)
        System.out.println(string1Out + " is greater than " + string3Out);
      else
        System.out.println(string1Out + " is equal to " + string3Out);
    }

    // Compare string2 with string1
    if(string2.compareTo(string1) < 0) {
      System.out.println(string2Out + " is less than " + string1Out);

    } else {
      if(string2.compareTo(string1) > 0)
        System.out.println(string2Out + " is greater than " + string1Out);
      else
        System.out.println(string2Out + " is equal to " + string1Out);
    }
  }
}

The example will produce the output:

"A" is less than "Z"
"To" is greater than "A"

How It Works

You should have no trouble with this example. It declares and initializes three String variables, string1, string2, and string3. We then create three further String variables that correspond to the first three strings with double quote characters at the beginning and the end. This is just to simplify the output statements. We then have an if with a nested if to compare string1 with string3. We compare string2 with string1 in the same way.

As with the equals() method, the argument to the method compareTo() can be any expression that results in a String object.

Accessing String Characters

When you are processing strings, sooner or later you will need to access individual characters in a String object. To refer to a character at a particular position in a string you use an index of type int that is the offset of the character position from the beginning of the string. This is exactly the same principle as we used for referencing an array element. The first character in a string is at position 0, the second is at position 1, the third is at position 2, and so on. However, although the principle is the same, the practice is not. You can't use square brackets to access characters in a string - you must use a method.

Extracting String Characters

You can extract a character from a String object by using the method charAt(). This accepts an argument that is the offset of the character position from the beginning of the string - in other words, an index. If you attempt to use an index that is less than 0 or greater than the index for the last position in the string, you will cause an exception to be thrown, which will cause your program to be terminated. We will discuss exactly what exceptions are, and how you should deal with them, in Chapter 7. For the moment, just note that the specific type of exception thrown in this case is called StringIndexOutOfBoundsException. It's rather a mouthful, but quite explanatory.

To avoid unnecessary errors of this kind, you obviously need to be able to determine the length of a String object. To obtain the length of a string, you just need to call its length()method. Note that this is different from the way you got the length of an array. Here you are calling a method, length(), in the class String, whereas with an array you were accessing a data member, length. We can explore the use of the charAt() and length() methods in the String class with another example.

Try It Out - Getting at Characters in a String

In the following code the soliloquy is analyzed character-by-character to determine the vowels, spaces, and letters used.

public class StringCharacters {
  public static void main(String[] args) {
    // Text string to be analyzed
    String text = "To be or not to be, that is the question;"
                 +"Whether 'tis nobler in the mind to suffer"
                 +" the slings and arrows of outrageous fortune,"
                 +" or to take arms against a sea of troubles,"
                 +" and by opposing end them?";
    int spaces  = 0,                                 // Count of spaces
        vowels  = 0,                                 // Count of vowels
        letters = 0;                                 // Count of letters

    // Analyze all the characters in the string
    int textLength = text.length();                 // Get string length

    for(int i = 0; i < textLength; i++) {
      // Check for vowels
      char ch = Character.toLowerCase(text.charAt(i));
      if(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u')
        vowels++;

      //Check for letters
      if(Character.isLetter(ch))
        letters++;

      // Check for spaces
      if(Character.isWhitespace(ch))
        spaces++;
    }

    System.out.println("The text contained vowels:     " + vowels + "\n" + 
              "        consonants: " + (letters-vowels) + "\n"+
              "        spaces:     " + spaces);
  }
}

Running the example, you'll see:

The text contained vowels:     60
                   consonants: 93
                   spaces:     37

How It Works

The String variable text is initialized with the quotation you see. All the counting of letter characters is done in the for loop, which is controlled by the index i. The loop continues as long as i is less than the length of the string, which is returned by the method text.length() and which we saved in the variable textLength.

Starting with the first character, which has the index value 0, each character is retrieved from the string by calling its charAt() method. The loop index i is used as the index to the character position string. The method returns the character at index position i as a value of type char, and we convert this to lower case, where necessary, by calling the static method toLowerCase() in the class Character. The character to be converted is passed as an argument and the method returns either the original character or, if it is upper case, the lower case equivalent. This enables us to deal with the string in just one case.

There is an alternative to using the toLowerCase() method in the Character class. The String class also contains a method toLowerCase()that will convert a whole string and return the converted string. You could convert the string text to lower case with the statement:

text = text.toLowerCase();    // Convert string to lower case

This statement replaces the original string with the lower case equivalent. If you wanted to retain the original, you could store the lower case string in another variable of type String. For converting strings to upper case, the class String also has a method toUpperCase() which is used in the same way.

The if expression checks for any of the vowels by ORing the comparisons for the five vowels together. If the expression is true we increment the vowels count. To check for a letter of any kind we use the isLetter()method in the class Character, and accumulate the total letter count in the variable letters. This will enable us to calculate the number of consonants by subtracting the number of vowels from the total number of letters. Finally, the loop code checks for a space by using the isWhitespace() method in the class Character. This method returns true if the character passed as an argument is a Unicode whitespace character. As well as spaces, whitespace in Unicode also includes horizontal and vertical tab, newline, carriage return, and form-feed characters. If you just wanted to count the blanks in the text, you could compare for a blank character. After the for loop ends, we just output the results.

Searching Strings for Characters

There are two methods, available to you in the class String, that will search a string, indexOf() and lastIndexOf(). Both of these come in four different flavors to provide a range of search possibilities. The basic choice is whether you want to search for a single character, or for a substring; so let's look first at the options for searching a string for a given character.

To search a string text for a single character, 'a' for example, you could write:

int index = 0;             // Position of character in the string
index = text.indexOf('a'); // Find first index position containing 'a'

The method indexOf() will search the contents of the string text forwards from the beginning, and return the index position of the first occurrence of 'a'. If 'a' is not found, the method will return the value -1.

Important

This is characteristic of both the search methods in the class String. They always return either the index position of what is sought or -1 if the search objective is not found. It is important that you check the index value returned for -1 before you use it to index a string, otherwise you will get an error when you don't find what you are looking for.

If you wanted to find the last occurrence of 'a' in the String variable text, you just use the method lastIndexOf():

index = text.lastIndexOf('a');  // Find last index position containing 'a'

The method searches the string backwards, starting with the last character in the string. The variable index will therefore contain the index position of the last occurrence of 'a', or -1 if it is not found.

We can find the first and last occurrences of a character, but what about the ones in the middle? Well, there's a variation of each of the above methods that has a second argument to specify a 'from position', from which to start the search. To search forwards from a given position, startIndex, you would write:

index = text.indexOf('a', startIndex);

This version of the method indexOf() searches the string for the character specified by the first argument starting with the position specified by the second argument. You could use this to find the first 'b' that comes after the first 'a' in a string with the statements:

int aIndex = -1;                        // Position of 1st 'a'
int bIndex = -1;                        // Position of 1st 'b' after 'a'
aIndex = text.indexOf('a');             // Find first 'a'
if(aIndex >= 0)
   bIndex = text.indexOf('b', ++aIndex); // Find 1st 'b' after 1st 'a'

Once we have the index value from the initial search for 'a', we need to check that 'a' was really found by verifying that aIndex is not negative. We can then search for 'b' from the position following 'a'. As you can see, the second argument of this version of the method indexOf() is separated from the first argument by a comma. Since the second argument is the index position from which the search is to start, and aIndex is the position at which 'a' was found, we should increment aIndex to the position following 'a' before using it in the search for 'b' to avoid checking for 'b' in the position we already know contains 'a'.

If 'a' happened to be the last character in the string, it wouldn't matter, since the indexOf() method just returns -1 if the index value is beyond the last character in the string. If you somehow supplied a negative index value to the method, it would simply search the whole string from the beginning.

Searching for Substrings

The methods indexOf() and lastIndexOf() also come in versions that accept a string as the first argument, which will search for this string rather than a single character. In all other respects they work in the same way as the character searching methods we have just seen. The complete set of indexOf() methods is:

Method	Description
indexOf(int ch)	Returns the index position of the first occurrence of the character ch in the String for which the method is called. If the character ch does not occur, -1 is returned.
indexOf(int ch, int index)	Same as the method above, but with the search starting at position index. If the value of index is outside the legal limits for the String object, -1 is returned.
indexOf(String str)	Returns the index position of the first occurrence of the substring str in the String object for which the method is called. If the substring str does not occur, -1 is returned.
indexOf(String str, int index)	Same as the method above, but with the search starting at position index. If the value of index is outside the legal limits for the String object, -1 is returned.

The four flavors of the lastIndexOf() method have the same parameters as the four versions of the indexOf() method. The difference is that the last occurrence of the character or substring that is sought is returned by the lastIndexOf() method.

The method startsWith() that we mentioned earlier also comes in a version that accepts an additional argument that is an offset from the beginning of the string being checked. The check for the matching character sequence then begins at that offset position. If you have defined a string as:

String string1 = "The Ides of March";

then the expression String1.startsWith("Ides", 4) will have the value true.

We can show the indexOf() and lastIndexOf() methods at work with substrings in an example.

Try It Out - Exciting Concordance Entries

We'll use the indexOf() method to search the quotation we used in the last example for "and" and the lastIndexOf() method to search for "the".

public class FindCharacters {
  public static void main(String[] args) {
    // Text string to be analyzed
    String text = "To be or not to be, that is the question;"
                + " Whether 'tis nobler in the mind to suffer"
                + " the slings and arrows of outrageous fortune,"
                + " or to take arms against a sea of troubles,"
                + " and by opposing end them?";

    int andCount = 0;               // Number of ands
    int theCount = 0;               // Number of thes

    int index = -1;                 // Current index position

    String andStr = "and";          // Search substring
    String theStr = "the";          // Search substring

    // Search forwards for "and"
    index = text.indexOf(andStr);   // Find first 'and'
    while(index >= 0) {
      ++andCount;
      index += andStr.length();    // Step to position after last 'and'
      index = text.indexOf(andStr, index);
    }

    // Search backwards for "the"
    index = text.lastIndexOf(theStr);   // Find last 'the'
    while(index >= 0) {
      ++theCount;
      index -= theStr.length();      // Step to position before last 'the'
      index = text.lastIndexOf(theStr, index);
    }
    System.out.println("The text contains " + andCount + " ands\n"
                     + "The text contains " + theCount + " thes");
  }
}

The program will produce the output:

The text contains 2 ands
The text contains 5 thes

Important

If you were expecting the "the" count to be 3, note that there is one instance in "whether" and another in "them". If you want to find three, you need to refine your program to eliminate such pseudo-occurrences by checking the characters either side of the "the" substring.

How It Works

We define the String variable, text, as before, and set up two counters, andCount and theCount, for the two words. The variable index will keep track of the current position in the string. We then have String variables andStr and theStr holding the substrings we will be searching for.

To find the instances of "and", we first find the index position of the first occurrence of "and" in the string text. If this index is negative, text does not contain "and", and the while loop will not execute as the condition is false on the first iteration. Assuming there is at least one "and", the while loop block is executed and andCount is incremented for the instance of "and" we have just found. The method indexOf() returns the index position of the first character of the substring, so we have to move the index forward to the character following the last character of the substring we have just found. This is done by adding the length of the substring, as shown in the following diagram:

We can then search for the next occurrence of the substring by passing the new value of index to the method indexOf(). The loop continues as long as the index value returned is not -1.

To count the occurrences of the substring "the" the program searches the string text backwards, by using the method lastIndexOf() instead of indexOf(). This works in much the same way, the only significant difference being that we decrement the value of index, instead of incrementing it. This is because the next occurrence of the substring has to be at least that many characters back from the first character of the substring we have just found. If the string "the" happened to occur at the beginning of the string we are searching, the lastIndexOf() method would be called with a negative value for index. This would not cause any problem - it would just result in -1 being returned in any event.

Extracting Substrings

The String class includes a method, substring(), that will extract a substring from a string. There are two versions of this method. The first version will extract a substring consisting of all the characters from a given index position to the end of the string. This works as illustrated in the following code fragment:

String place = "Palm Springs";
String lastWord = place.substring(5);

After executing these statements, lastWord will contain the string Springs. The substring is copied from the original to form a new string. This is useful when a string has basically two constituent substrings, but a more common requirement is to extract several substrings from a string where each substring is separated from the next by a special character such as a comma, a slash, or even just a space. The second version of substring() will help with this.

You can extract a substring from a string by specifying the index positions of the first character in the substring and one beyond the last character of the substring as arguments to the method substring(). With the variable place being defined as before, the following statement will result in the variable segment being set to the string "ring":

String segment = place.substring(7, 11);

Important

The substring() method is not like the indexOf() method when it comes to illegal index values. With either version of the method substring(), if you specify an index that is outside the bounds of the string, you will get an error. As with the charAt() method, substring()will throw a StringIndexOutOfBoundsException exception.

We can see how substring() works with a more substantial example.

Try It Out - Word for Word

We can use the indexOf() method in combination with the substring() method to extract a sequence of substrings that are separated by spaces from a single string:

public class ExtractSubstring {
  public static void main(String[] args) {
    String text = "To be or not to be";        // String to be segmented
    int count = 0;                             // Number of substrings
    char separator = ' ';                      // Substring separator

    // Determine the number of substrings
    int index = 0;
    do {
      ++count;                                 // Increment count of substrings
      ++index;                                 // Move past last position
      index = text.indexOf(separator, index);
    } while (index != -1);

    // Extract the substring into an array
    String[] subStr = new String[count];       // Allocate for substrings
    index = 0;                                 // Substring start index
    int endIndex = 0;                          // Substring end index
    for(int i = 0; i < count; i++) {
      endIndex = text.indexOf(separator,index);  // Find next separator

      if(endIndex == -1)                       // If it is not found
        subStr[i] = text.substring(index);     // extract to the end
      else                                             // otherwise
        subStr[i] = text.substring(index, endIndex);   // to end index

      index = endIndex + 1;                    // Set start for next cycle
    }

    // Display the substrings
    for(int i = 0; i < subStr.length; i++)
      System.out.println(subStr[i]);
  }
}

When you run this example, you should get the output:

To
be
or
not
to
be

How It Works

After setting up the string text to be segmented into substrings, a count variable to hold the number of substrings, and the separator character, separator, the program has three distinct phases.

The first phase counts the number of substrings by using the indexOf() method to find separators. The number of separators is always one less than the number of substrings. By using the do-while loop, we ensure that the value of count will be one more than the number of separators.

The second phase extracts the substrings in sequence from the beginning of the string, and stores them in an array of String variables that has count elements. Following each substring from the first to the penultimate is a separator, so we use the version of the substring() method that accepts two index arguments for these. The last substring is signaled by a failure to find the separator character when index will be 1. In this case we use the substring() method with a single argument to extract the substring through to the end of the string text.

The third phase simply outputs the contents of the array by displaying each element in turn, using a for loop.

What we have been doing here is breaking a string up into tokens - substrings in other words - that are separated by delimiters - characters that separate one token from the next. This is a sufficiently frequent requirement that Java provides you with an easier way to do this - using the StringTokenizer class.

Using a String Tokenizer

We can use an object of the StringTokenizer class to do what we did in the previous example. You can construct a StringTokenizer that can process a given string like this:

String text = "To be or not to be";                 // String to be segmented
StringTokenizer st = new StringTokenizer(text);     // Create a tokenizer for it

The tokenizer object st that we have created here will assume that a delimiter can be a space, a tab, a newline character, a carriage return, or a form-feed character. It is also possible to specify your own set of delimiters when you create the tokenizer object. For example, if we only wanted a comma or a space to be considered as a delimiter we could create the tokenizer with the statement:

StringTokenizer st = new StringTokenizer(text, " ,"); // Tokenize using , or space

The second argument is a string containing all the characters that are to be considered as delimiters in the string text.

First of all, you can call the countTokens() method for the StringTokenizer object to determine how many tokens the string contains. This is handy when you want to store the tokens away in an array as it gives you the means to create the array ahead of time, like this:

String[] subStr = new String[st.countTokens()];

The countTokens() method returns an int value that is the number of tokens in the string - assuming you haven't extracted any in the way we will see next. If you have extracted tokens, the value returned will be the number remaining in the string. Now we have an array that is just large enough to accommodate all the tokens in the string text. All we have to do is extract them.

You can use the StringTokenizer object to pass once through the string to extract each of the tokens in turn. Calling the nextToken() method for the StringTokenizer object will return a reference to a String object that is the next token in the string being processed. We could therefore extract all the tokens like this:

for (int i = 0 ; i< subStr.length ; i++) 
  subStr[i] = st.nextToken();

The StringTokenizer object also has a method hasMoreTokens() that returns true if the string contains more tokens and false when there are none left. We could therefore also extract all the tokens from our string like this:

int i = 0;
while(st.hasMoreTokens() && i<subStr.length)
  subStr[i++] = st.nextToken();

The loop will continue to extract tokens from the string as long as there are still tokens left, and as long as we have not filled the array, subStr. Of course, we should never fill the array since we created it to accommodate all the tokens but it does no harm here to verify that we don't. It is also a reminder of how you can use the && operator.

Try It Out - Using a Tokenizer

Based on what we have just discussed, the whole program to do what the previous example did is as follows:

import java.util.StringTokenizer;                  // Import the tokenizer class

public class TokenizeAString {
  public static void main(String[] args) {
      String text = "To be or not to be";             // String to be segmented
      StringTokenizer st = new StringTokenizer(text); // Create a tokenizer for it
      String[] subStr = new String[st.countTokens()];  // Array to hold the tokens

      // Extract the tokens
      for (int i = 0 ; i< subStr.length ; i++) {
        subStr[i] = st.nextToken();
      }

      // Display the substrings
      for(int i = 0; i < subStr.length; i++) {
        System.out.println(subStr[i]);
      }
  }
}

The import statement is necessary because the StringTokenizer class is not in the java.lang package whose classes are imported by default, but in the java.util package. The program should produce output that is identical to that of the previous example. It's a lot simpler though; isn't it?

Modified Versions of String Objects

There are a couple of methods that you can use to create a new String object that is a modified version of an existing String object. They don't change the original string, of course - as we said, String objects are immutable. To replace one specific character with another throughout a string, you can use the replace() method. For example, to replace each space in our string text with a slash, you could write:

String newText = text.replace(' ', '/');     // Modify the string text

The first argument of the replace() method specifies the character to be replaced, and the second argument specifies the character that is to be substituted in its place. We have stored the result in a new variable newText here, but you could save it back in the original String variable, text, if you wanted.

To remove whitespace from the beginning and end of a string (but not the interior) you can use the trim() method. You could apply this to a string as follows:

String sample = "   This is a string   ";
String result = sample.trim();

after which the String variable result will contain the string "This is a string". This can be useful when you are segmenting a string into substrings and the substrings may contain leading or trailing blanks. For example, this might arise if you were analyzing an input string that contained values separated by one or more spaces.

Creating Character Arrays from String Objects

You can create an array of variables of type char from a String variable by using the toCharArray() method in the class String. Because this method returns an array of type char, you only need to declare the array variable of type char[] - you don't need to allocate the array. For example:

String text = "To be or not to be";
char[] textArray = text.toCharArray();    // Create the array from the string

The toCharArray() method will return an array containing the characters of the String variable text, one per element, so textArray[0] will contain 'T', textArray[1] will contain 'o', textArray[2] will contain ' ', and so on.

You can also extract a substring as an array of characters using the method getChars(), but in this case you do need to create an array that is large enough to hold the characters. This enables you to reuse a single array to store characters when you want to extract a succession of substrings, and thus saves the need to repeatedly create new arrays. Of course, the array must be large enough to accommodate the longest substring. The method getChars()has four parameters. In sequence, these are:

Index position of the first character to be extracted (type int)
Index position following the last character to be extracted (type int)
The name of the array to hold the characters extracted (type char[])
The index of the array element to hold the first character (type int)

You could copy a substring from text into an array with the statements:

String text = "To be or not to be";
char[] textArray = new char[3];
text.getChars(9, 12, textArray, 0);

This will copy characters from text at index positions 9 to 11 inclusive, so textArray[0] will be 'n', textArray[1] will be 'o', and textArray[2] will be 't'.

You can also extract characters into a byte array using the getBytes() method in the class String. This converts the original string characters into the character encoding used by the underlying operating system - which is usually ASCII. For example:

String text = "To be or not to be";         // Define a string
byte[] textArray = text.getBytes();         // Get equivalent byte array

The byte array textArray will contain the same characters as in the String object, but stored as 8-bit characters. The conversion of characters from Unicode to 8-bit bytes will be in accordance with the default encoding for your system. This will typically mean that the upper byte of the Unicode character is discarded resulting in the ASCII equivalent.

Creating String Objects from Character Arrays

The String class also has a static method, copyValueOf(), to create a String object from an array of type char[]. You will recall that a static method of a class can be used even if no objects of the class exist.

Suppose you have an array defined as:

char[] textArray = {'T', 'o', ' ', 'b', 'e', ' ', 'o', 'r', ' ',
                    'n', 'o', 't', ' ', 't', 'o', ' ', 'b', 'e' };

You can then create a String object with the statement:

String text = String.copyValueOf(textArray);

This will result in the object text referencing the string To be or not to be.

Another version of the copyValueOf() method can create a string from a subset of the array elements. It requires two additional arguments to specify the index of the first character in the array to be extracted and the count of the number of characters to be extracted. With the array defined as previously, the statement:

String text = String.copyValueOf(textArray, 9, 3);

extracts three characters starting with textArray[9], so text will contain the string not after this operation.