Prepare the Data
In this step, you will learn how to read the training and testing data from CSV files, and perform label encoding on the non-numeric features.
-
Open the predict.py file in your code editor.
-
In the getData() function, complete the following tasks:
- Read the training data from the
credit_risk_train.csv file using pd.read_csv().
- Read the testing data from the
credit_risk_test.csv file using pd.read_csv().
- Call the
label() function to perform label encoding on the non-numeric features in both the training and testing data.
- Split the training data into
x_train, y_train, x_test, and y_test.
def getData():
"""
Read data from csv files. And split the train data into train and test for validation.
"""
## step1. read data from csv files
data = pd.read_csv(trainfile)
test = pd.read_csv(testfile)
## step2. label encoding
data = label(data)
test = label(test)
## step3. split train data into train and test
x_train, y_train = data.iloc[:, :-1].to_numpy(), data.iloc[:, -1].to_numpy()
x_test = test.iloc[:, :].to_numpy()
y_test = None
return x_train, y_train, x_test, y_test
- In the
label() function, complete the implementation of the label encoding process:
- Iterate through each column in the data.
- If the column data type is
object, create a LabelEncoder instance and fit it to the column data.
- If the column name is
"RISK", store the LabelEncoder instance in the convertor variable.
- Transform the column data using the
LabelEncoder instance and update the column in the data.
- Return the updated data.
def label(data):
"""
Use label encoding to process the non-numeric features.
"""
global convertor
for col in data.columns:
if data[col].dtype == "object":
le = LE()
if col == "RISK":
convertor = le
le.fit(data[col])
data[col] = le.transform(data[col])
return data
After completing this step, you will have the training and testing data ready for the next step.