Prepare the Data
In this step, you will learn how to read the training and testing data from CSV files, and perform label encoding on the non-numeric features.
-
Open the predict.py
file in your code editor.
-
In the getData()
function, complete the following tasks:
- Read the training data from the
credit_risk_train.csv
file using pd.read_csv()
.
- Read the testing data from the
credit_risk_test.csv
file using pd.read_csv()
.
- Call the
label()
function to perform label encoding on the non-numeric features in both the training and testing data.
- Split the training data into
x_train
, y_train
, x_test
, and y_test
.
def getData():
"""
Read data from csv files. And split the train data into train and test for validation.
"""
## step1. read data from csv files
data = pd.read_csv(trainfile)
test = pd.read_csv(testfile)
## step2. label encoding
data = label(data)
test = label(test)
## step3. split train data into train and test
x_train, y_train = data.iloc[:, :-1].to_numpy(), data.iloc[:, -1].to_numpy()
x_test = test.iloc[:, :].to_numpy()
y_test = None
return x_train, y_train, x_test, y_test
- In the
label()
function, complete the implementation of the label encoding process:
- Iterate through each column in the data.
- If the column data type is
object
, create a LabelEncoder
instance and fit it to the column data.
- If the column name is
"RISK"
, store the LabelEncoder
instance in the convertor
variable.
- Transform the column data using the
LabelEncoder
instance and update the column in the data.
- Return the updated data.
def label(data):
"""
Use label encoding to process the non-numeric features.
"""
global convertor
for col in data.columns:
if data[col].dtype == "object":
le = LE()
if col == "RISK":
convertor = le
le.fit(data[col])
data[col] = le.transform(data[col])
return data
After completing this step, you will have the training and testing data ready for the next step.