To specify multiple NA values when reading a CSV file using pandas.read_csv(), you can pass a list of strings to the na_values parameter. Each string in the list will be treated as a representation of a missing value (NaN) in the resulting DataFrame.
Example:
Suppose you have a CSV file named data.csv that contains various representations of missing values:
column1,column2,column3
1,2,3
4,NA,6
7,8,?
9,,10
In this example, the missing values are represented by NA, ?, and empty strings. You can specify all these representations as NA values like this:
import pandas as pd
# Read the CSV file and specify multiple NA values
df = pd.read_csv('data.csv', na_values=['NA', '?', ''])
print(df)
Output:
column1 column2 column3
0 1 2.0 3.0
1 4 NaN 6.0
2 7 8.0 NaN
3 9 NaN 10.0
In this output, the NA, ?, and empty string values in the DataFrame are replaced with NaN, allowing you to handle missing data effectively.
