Single precsion floating point numbers are usually called 'float', or 'real'. They are 4 bytes long, and are packed the following way, from left to right:
X | XXXX XXXX | XXX XXXX XXXX XXXX XXXX XXXXX |
Sign 1 bit |
Exponent 8 bits |
Mantissa 23 bits |
---|
The sign indicates if the number is positive or negative (zero for positive, one for negative).
The real exponent is computed by substracting 127 to the value of the exponent field. It's the exponent of the number as it is expressed in the scientific notation.
The full mantissa, which is also sometimes called significand, should be considered as a 24 bits value. As we are using scientific notation, there is an implicit leading bit (sometimes called the hidden bit), always set to 1, as there is never a leading 0 in the scientific notation.
For instance, you won't say 0.123 · 105
but 1.23 · 104
.
The conversion is performed the following way:
-1S · 1.M · 2( E - 127 )
Where S is the sign, M the mantissa, and E the exponent.
For instance, 0100 0000 1011 1000 0000 0000 0000 0000
, which is 0x40B80000
in hexadecimal.
Hex | 4 | 0 | B | 8 | 0 | 0 | 0 | 0 |
---|---|---|---|---|---|---|---|---|
Bin | 0100 | 0000 | 1011 | 1000 | 0000 | 0000 | 0000 | 0000 |
Sign | Exponent | Mantissa |
---|---|---|
0 | 1000 0001 | (1) 011 1000 0000 0000 0000 0000 |
0
, so the number is positive.1000 0001
, which is 129 in decimal. The real exponent value is then 129 - 127, which is 2.1011 1000 0000 0000 0000 0000
.The final representation of the number in the binary scientific notation is:
-10 · 1.0111 · 22
Mathematically, this means:
1 · ( 1 · 20 + 0 · 2-1 + 1 · 2-2 + 1 · 2-3 + 1 · 2-4 ) · 22 ( 20 + 2-2 + 2-3 + 2-4 ) · 22 22 + 20 + 2-1 + 2-2 4 + 1 + 0.5 + 0.25
The floating point value is then 5.75.
Depending on the value of the exponent field, some numbers can have special values. They can be:
If the value of the exponent field is 0 and the value of the mantissa field is greater than 0, then the number has to be treated as a denormalized number.
In such a case, the exponent is not -127, but -126, and the implicit leading bit is not 1 but 0.
That allows smaller numbers to be represented.
The scientific notation for a denormalized number is:
-1S · 0.M · 2-126
If the exponent and the mantissa fields are both 0, then the final number is zero. The sign bit is permitted, even if it does not have much sense mathematically, allowing a positive or a negative zero.
Note that zero can be considered as a denormalized number. In that case, it would be 0 · 2-126
, which is zero.
If the value of the exponent field is 255 (all 8 bits are set) and if the value of the mantissa field is 0, the number is an infinity, either positive or negative, depending on the sign bit.
If the value of the exponent field is 255 (all 8 bits are set) and if the value of the mantissa field is not 0, then the value is not a number. The sign bit as no meaning in such a case.
The range depends if the number is normalized or not. Below are the ranges for that two cases:
±1.1754944909521E-38
/ ±1.00000000000000000000001-126
±3.4028234663853E+38
/ ±1.11111111111111111111111128
±1.4012984643248E-45
/ ±0.00000000000000000000001-126
±1.1754942106924E-38
/ ±0.11111111111111111111111-126
Below is an example of a C program that will converts a binary number to its float representation: