How to use `na_values`?

The na_values parameter in the pandas.read_csv() function is used to specify additional strings that should be considered as NaN (Not a Number) values when reading a CSV file. This is useful for handling missing or invalid data that may be represented by specific strings in your dataset.

Syntax:

df = pd.read_csv('data.csv', na_values=['string1', 'string2', ...])

Example:

Suppose you have a CSV file named data.csv that looks like this:

column1,column2,column3
1,2,3
4,NA,6
7,8,?

In this example, NA and ? are used to represent missing values. You can use the na_values parameter to treat these strings as NaN when loading the data:

import pandas as pd

# Read the CSV file and specify additional NA values
df = pd.read_csv('data.csv', na_values=['NA', '?'])

print(df)

Output:

   column1  column2  column3
0        1      2.0      3.0
1        4      NaN      6.0
2        7      8.0      NaN

In this output, the NA and ? values in the DataFrame are replaced with NaN, making it easier to handle missing data in your analysis.