Floating point data type - OmniMark Concept

Floating point data type

The floating point data type allows you to store numbers in floating point form.

This means that the numbers behave like numbers written in scientific notation -- they can have not only a number, but also a base and an exponent.

Floating point numbers are particularly appropriate for physics and astronomical calculations -- calculations where the result is either very small or very large.

Floating point data type formatting lets you control the number of digiits in the output, or to pad with spaces on the right or on the left.

BCD numbers are generally superior to floating point numbers for most applications. There are three principal differences between these two types:

Base 2 versus base 10

Floating point numbers are represented internally as binary (base 2) numbers. They provide precise representation of fractional numbers that are powers of 2 (1/2, 1/4, 1/8, 1/16, and so forth), but they do not provide precise representation of fractions that are powers of 10 (1/10, 1/100, 1/1000). Any fraction that can be precisely represented in base 2 can be precisely represented in base 10, but not vice versa. (There are, of course, many fractions that cannot be precisely represented in either base 2 or base 10 -- 1/3 for example.)

Limited size versus unlimited size

Floating point numbers are of a limited size and are represented by a fixed number of bytes of memory. BCD numbers, as implemented by the OmniMark BCD library, are of unlimited size.

Floating point versus fixed point

Floating point numbers are limited in precision.

Floating point numbers, as their name implies, have a floating decimal point. That is, floating point numbers have a fixed number of significant bits which are distributed between the whole number portion and the fractional portion of the number. The larger the whole number portion of the number, the fewer bits are available for the fractional part.

Mixing floating point and integer values

You can mix integer variables and floating point variables in mathematical expressions. Thus, you can write:

  include "omfloat.xin"
  process
     local float price initial {6.37 * float 10 ** 3}
     local float total
     local integer quantity initial {3}
     set total to quantity * price
     output "Total = " || "d" % total || "%n"
  ;Output: "Total = 19110"

Note that if you perform an operation on two integers and assign the result to a floating point number, the operation will be done as an integer operation and the result will be coerced to a float. Thus the following code will fail, even though a float can hold the result of 1000000 * 2000000:

  include "omfloat.xin"
  process
     local integer large initial {1000000}
     local integer larger initial {2000000}
     local float largest

     set largest to float(large * larger)
     output "Largest = " || "d" % largest || "%n"
  ;Output: "Largest = -1454759936" (This is incorrect.)

In this case, the result of the integer operation large * larger will overflow before the coercion to a floating point number. The correct way to code this operation is to force one of the operands to float before the operation is performed. This causes the operation to be performed as a floating point operation, returning a floating point value:

  include "omfloat.xin"
  process
     local integer large initial {1000000}
     local integer larger initial {2000000}
     local float largest

     set largest to float large * larger
     output "Largest = " || "d" % largest || "%n"
  ;Output: "Largest = 2000000000000" (This is correct).

Supported operators

You can use the following operators with floating point numbers:

+
-
*
/
modulo
abs
ceiling
floor
round
truncate
<
>
<=
>=
=
!=
%

Handling floating point errors

In the event of an error in a calculation, the Floating Point library will return NaN. NaN means "Not a Number".

  include "omfloat.xin"
  process
     local float total initial {2.2}
     local stream foo initial {"foo"}
     set total to total + foo
     output "Total = " || "d" % total || "%n"
  ; Output: "Total = NaN"
  ;    Note: "NaN" means "Not a Number"

----

[ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ ERRORS ]

OmniMark 6.5 Documentation Generated: December 23, 2002 at 6:24:53 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © OmniMark Technologies Corporation, 1988-2002.