1. Calculation functions
2. Functions
3. Large Number Calculations

## Floating Point Accuracy

### History

Floating point method was introduced historically during computer design to overcome the limited available computer memory. Different applications need to store numbers of widely varying magnitudes to different levels of accuracy. Setting fixed numbers of integer and fractional digits would limit the applications, floating point numbers overcome this problem.

Floating point denotes numbers as two sequences of bits: a significand representing the digits in the number; and an exponent which determines the position of the decimal (radix) point. Negative significands represent negative numbers; negative exponents represent numbers close to zero.

Computer hardware uses floating point in binary format to IEEE-754 standard. The usual formats are 32 or 64 bits in total length:

Format Total bits Significand bits Exponent bits Smallest number Largest number
Single precision 32 23 + 1 sign 8 ca. 1.2 ⋅ 10-38 ca. 3.4 ⋅ 1038
Double precision 64 52 + 1 sign 11 ca. 5.0 ⋅ 10-324 ca. 1.8 ⋅ 10308

### Rounding errors

Floating-point numbers can't represent all real numbers accurately due to the limited number of digits: when there are more digits than the format allows, the number is rounded by omitting the extra digits. This is necessary because:

• Large Denominators - In any base, the larger the denominator of an (irreducible) fraction, the more digits it needs in positional notation. A sufficiently large denominator will require rounding, no matter what the base or number of available digits is.
• Periodical digits - Any (irreducible) fraction where the denominator has a prime factor that does not occur in the base requires an infinite number of digits that repeat periodically after a certain point.
• Non-rational numbers - Non-rational numbers cannot be represented as a regular fraction at all, and in positional notation they require an infinite number of non-recurring digits.