The na_values parameter in the pandas.read_csv() function is used to specify additional strings that should be considered as NaN (Not a Number) values when reading a CSV file. This is useful for handling missing or invalid data that may be represented by specific strings in your dataset.
Syntax:
df = pd.read_csv('data.csv', na_values=['string1', 'string2', ...])
Example:
Suppose you have a CSV file named data.csv that looks like this:
column1,column2,column3
1,2,3
4,NA,6
7,8,?
In this example, NA and ? are used to represent missing values. You can use the na_values parameter to treat these strings as NaN when loading the data:
import pandas as pd
# Read the CSV file and specify additional NA values
df = pd.read_csv('data.csv', na_values=['NA', '?'])
print(df)
Output:
column1 column2 column3
0 1 2.0 3.0
1 4 NaN 6.0
2 7 8.0 NaN
In this output, the NA and ? values in the DataFrame are replaced with NaN, making it easier to handle missing data in your analysis.
