Introduction
In this project, you will learn how to build a machine learning classification model to predict the risk status of credit card holders. The project involves preprocessing the data, training a support vector machine (SVM) model, and saving the prediction results to a CSV file.
🎯 Tasks
In this project, you will learn:
- How to prepare the data by performing label encoding on non-numeric features
- How to train a machine learning classification model using the training data
- How to save the prediction results to a CSV file
🏆 Achievements
After completing this project, you will be able to:
- Preprocess and prepare data for machine learning tasks
- Train a support vector machine (SVM) model for classification
- Save the prediction results to a CSV file
Prepare the Data
In this step, you will learn how to read the training and testing data from CSV files, and perform label encoding on the non-numeric features.
Open the
predict.pyfile in your code editor.In the
getData()function, complete the following tasks:- Read the training data from the
credit_risk_train.csvfile usingpd.read_csv(). - Read the testing data from the
credit_risk_test.csvfile usingpd.read_csv(). - Call the
label()function to perform label encoding on the non-numeric features in both the training and testing data. - Split the training data into
x_train,y_train,x_test, andy_test.
- Read the training data from the
def getData():
"""
Read data from csv files. And split the train data into train and test for validation.
"""
## step1. read data from csv files
data = pd.read_csv(trainfile)
test = pd.read_csv(testfile)
## step2. label encoding
data = label(data)
test = label(test)
## step3. split train data into train and test
x_train, y_train = data.iloc[:, :-1].to_numpy(), data.iloc[:, -1].to_numpy()
x_test = test.iloc[:, :].to_numpy()
y_test = None
return x_train, y_train, x_test, y_test
- In the
label()function, complete the implementation of the label encoding process:- Iterate through each column in the data.
- If the column data type is
object, create aLabelEncoderinstance and fit it to the column data. - If the column name is
"RISK", store theLabelEncoderinstance in theconvertorvariable. - Transform the column data using the
LabelEncoderinstance and update the column in the data. - Return the updated data.
def label(data):
"""
Use label encoding to process the non-numeric features.
"""
global convertor
for col in data.columns:
if data[col].dtype == "object":
le = LE()
if col == "RISK":
convertor = le
le.fit(data[col])
data[col] = le.transform(data[col])
return data
After completing this step, you will have the training and testing data ready for the next step.
Train the Model
In this step, you will learn how to train a machine learning classification model using the training data.
- In the
predict()function, complete the following tasks:- Create an instance of the
SVCmodel from thesklearn.svmmodule. - Fit the model to the
x_trainandy_traindata using thefit()method.
- Create an instance of the
def predict(model=MODEL):
"""
Use the model to predict the result.
"""
## step1. get the model
predictor = model()
## step2. get the data
x_train, y_train, x_test, _ = getData()
## step3. train the model
predictor.fit(x_train, y_train)
## step4. predict and save
res = predictor.predict(x_test)
save(res)
After completing this step, the model will be trained and ready for making predictions on the testing data.
Save the Predictions
In this step, you will learn how to save the prediction results to the credit_risk_pred.csv file.
- In the
save()function, complete the following tasks:- Use the
convertorvariable to inverse transform the prediction results back to the original labels. - Create a Pandas DataFrame with the prediction results and save it to the
credit_risk_pred.csvfile usingpd.DataFrame().to_csv().
- Use the
def save(result):
"""
Save the result to csv file.
"""
result = convertor.inverse_transform(result)
dataframe = pd.DataFrame({"RISK": result})
dataframe.to_csv("credit_risk_pred.csv", index=False, sep=",")
After completing this step, the prediction results will be saved to the credit_risk_pred.csv file.
Run the Prediction
In this final step, you will run the prediction process and check the output.
- In the
if __name__ == "__main__":block, call thepredict()function to run the prediction process. - In your terminal, run the predict.py file with the following command:
python3 predict.py
- After running the
predict.pyfile, you should see the following output:
Predict done!
- Check the
credit_risk_pred.csvfile in the project directory. It should contain the prediction results for the testing data.
Congratulations! You have successfully completed the credit card holder risk prediction project. You have learned how to:
- Prepare the data by performing label encoding on non-numeric features
- Train a machine learning classification model using the training data
- Save the prediction results to a CSV file
Summary
Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.



