Floating Point Basics
Introduction to Floating-Point Representation
In computer programming, floating-point numbers are a way to represent real numbers with fractional parts. Unlike integers, floating-point numbers can represent a wide range of values with decimal points. In C, these are typically implemented using the IEEE 754 standard.
Binary Representation
Floating-point numbers are stored in binary format using three key components:
Component |
Description |
Bits |
Sign |
Indicates positive or negative |
1 bit |
Exponent |
Represents the power of 2 |
8 bits |
Mantissa |
Stores the significant digits |
23 bits |
graph TD
A[Floating-Point Number] --> B[Sign Bit]
A --> C[Exponent]
A --> D[Mantissa/Fraction]
Basic Data Types
C provides several floating-point types:
float // Single precision (32 bits)
double // Double precision (64 bits)
long double // Extended precision
Simple Example Demonstration
#include <stdio.h>
int main() {
float a = 0.1;
double b = 0.1;
printf("Float value: %f\n", a);
printf("Double value: %f\n", b);
return 0;
}
Key Characteristics
- Floating-point numbers have limited precision
- Not all decimal numbers can be exactly represented in binary
- Arithmetic operations can introduce small errors
Memory Allocation
On most modern systems using LabEx development environments:
float
: 4 bytes
double
: 8 bytes
long double
: 16 bytes
Precision Limitations
Floating-point representation cannot exactly represent all real numbers due to finite binary storage. This leads to potential precision issues that developers must understand and manage carefully.