Main Page

Previous Next

Floating Point Numbers

We often have to deal with very large numbers: the number of protons in the universe, for example, which needs around 79 decimal digits. Clearly there are lots of situations where we need more than the 10 decimal digits we get from a 4 byte binary number. Equally, there are lots of very small numbers. The amount of time in minutes it takes the typical car salesman to accept your offer on his 1982 Ford LTD (and only covered 380,000 miles...). A mechanism for handling both these kinds of numbers is - as you will have guessed from the title of this section - floating-point numbers.

A floating-point representation of a number is a decimal point followed by a fixed number of digits, multiplied by a power of 10 to get the number you want. It's easier to demonstrate than explain, so let's take some examples. The number 365 in normal decimal notation would be written in floating point form as:

0.365E03

where the E stands for "exponent" and is the power of ten that the 0.365 (the mantissa) is multiplied by, to get the required value. That is:

    0.365 x 10 x 10 x 10

which is clearly 365.

Now let's look at a smallish number:

.365E-04

This is evaluated as .365 x 10-4, which is .0000365 - exactly the time in minutes required by the car salesman to accept your cash.

The number of digits in the mantissa of a floating-point number depends on the type of the floating-point number that you are using. The Java type float provides the equivalent of approximately 7 decimal digits, and the type double provides around 17 decimal digits. The number of digits is approximate because the mantissa is binary, not decimal, and there's not an exact mapping between binary and decimal digits.

Suppose we have a large number such as 2,134,311,179. How does this look as a floating-point number? Well, as type float it looks like:

0.2134311E10

It's not quite the same. We have lost three low order digits so we have approximated our original value as 2,134,311,000. This is a small price to pay for being able to handle such a vast range of numbers, typically from 10-38 to 10+38 either positive or negative, as well having an extended representation that goes from a minute 10-308 to a mighty 10+308. As you can see, they are called floating-point numbers for the fairly obvious reason that the decimal point "floats" depending on the exponent value.

Aside from the fixed precision limitation in terms of accuracy, there is another aspect you may need to be conscious of. You need to take great care when adding or subtracting numbers of significantly different magnitudes. A simple example will demonstrate the kind of problem that can arise. We can first consider adding .365E-3 to .365E+7. We can write this as a decimal sum:

.000365 + 3,650,000

This produces the result:

3,650,000.000365

Which when converted back to floating point becomes:

.3650000E+7

So we might as well not have bothered. The problem lies directly with the fact that we only carry 7 digits precision. The 7 digits of the larger number are not affected by any of the digits of the smaller number because they are all further to the left. Funnily enough, you must also take care when the numbers are very nearly equal. If you compute the difference between such numbers you may end up with a result that only has one or two digits precision. It is quite easy in such circumstances to end up computing with numbers that are total garbage.

Previous Next
JavaScript Editor Java Tutorials Free JavaScript Editor