|
|||||||||
|
|
|||||||||
| Floating point data type | |||||||||
The floating point data type allows you to store numbers in floating point form.
This means that the numbers behave like numbers written in scientific notation -- they can have not only a number, but also a base and an exponent.
Floating point numbers are particularly appropriate for physics and astronomical calculations -- calculations where the result is either very small or very large.
Floating point data type formatting lets you control the number of digiits in the output, or to pad with spaces on the right or on the left.
BCD numbers are generally superior to floating point numbers for most applications. There are three principal differences between these two types:
Floating point numbers are represented internally as binary (base 2) numbers. They provide precise representation of fractional numbers that are powers of 2 (1/2, 1/4, 1/8, 1/16, and so forth), but they do not provide precise representation of fractions that are powers of 10 (1/10, 1/100, 1/1000). Any fraction that can be precisely represented in base 2 can be precisely represented in base 10, but not vice versa. (There are, of course, many fractions that cannot be precisely represented in either base 2 or base 10 -- 1/3 for example.)
Floating point numbers are of a limited size and are represented by a fixed number of bytes of memory. BCD numbers, as implemented by the OmniMark BCD library, are of unlimited size.
Floating point numbers are limited in precision.
Floating point numbers, as their name implies, have a floating decimal point. That is, floating point numbers have a fixed number of significant bits which are distributed between the whole number portion and the fractional portion of the number. The larger the whole number portion of the number, the fewer bits are available for the fractional part.
You can mix integer variables and floating point variables in mathematical expressions. Thus, you can write:
include "omfloat.xin"
process
local float price initial {6.37 * float 10 ** 3}
local float total
local integer quantity initial {3}
set total to quantity * price
output "Total = " || "d" % total || "%n"
;Output: "Total = 19110"
Note that if you perform an operation on two integers and assign the result to a floating point number, the operation will be done as an integer operation and the result will be coerced to a float. Thus the following code will fail, even though a float can hold the result of 1000000 * 2000000:
include "omfloat.xin"
process
local integer large initial {1000000}
local integer larger initial {2000000}
local float largest
set largest to float(large * larger)
output "Largest = " || "d" % largest || "%n"
;Output: "Largest = -1454759936" (This is incorrect.)
In this case, the result of the integer operation large * larger will overflow before the coercion to a floating point number. The correct way to code this operation is to force one of the operands to float before the operation is performed. This causes the operation to be performed as a floating point operation, returning a floating point value:
include "omfloat.xin"
process
local integer large initial {1000000}
local integer larger initial {2000000}
local float largest
set largest to float large * larger
output "Largest = " || "d" % largest || "%n"
;Output: "Largest = 2000000000000" (This is correct).
You can use the following operators with floating point numbers:
In the event of an error in a calculation, the Floating Point library will return NaN. NaN means "Not a Number".
include "omfloat.xin"
process
local float total initial {2.2}
local stream foo initial {"foo"}
set total to total + foo
output "Total = " || "d" % total || "%n"
; Output: "Total = NaN"
; Note: "NaN" means "Not a Number"
| ---- |