Credit Card Holder Risk Prediction

PandasPandasBeginner
Practice Now

Introduction

In this project, you will learn how to build a machine learning classification model to predict the risk status of credit card holders. The project involves preprocessing the data, training a support vector machine (SVM) model, and saving the prediction results to a CSV file.

🎯 Tasks

In this project, you will learn:

  • How to prepare the data by performing label encoding on non-numeric features
  • How to train a machine learning classification model using the training data
  • How to save the prediction results to a CSV file

🏆 Achievements

After completing this project, you will be able to:

  • Preprocess and prepare data for machine learning tasks
  • Train a support vector machine (SVM) model for classification
  • Save the prediction results to a CSV file

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL pandas(("`Pandas`")) -.-> pandas/ReadingDataGroup(["`Reading Data`"]) sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) pandas(("`Pandas`")) -.-> pandas/WritingDataGroup(["`Writing Data`"]) pandas/ReadingDataGroup -.-> pandas/read_csv("`Read CSV`") sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/svm("`Support Vector Machines`") pandas/WritingDataGroup -.-> pandas/write_csv("`Write to CSV`") subgraph Lab Skills pandas/read_csv -.-> lab-300206{{"`Credit Card Holder Risk Prediction`"}} sklearn/datasets -.-> lab-300206{{"`Credit Card Holder Risk Prediction`"}} sklearn/svm -.-> lab-300206{{"`Credit Card Holder Risk Prediction`"}} pandas/write_csv -.-> lab-300206{{"`Credit Card Holder Risk Prediction`"}} end

Prepare the Data

In this step, you will learn how to read the training and testing data from CSV files, and perform label encoding on the non-numeric features.

  1. Open the predict.py file in your code editor.

  2. In the getData() function, complete the following tasks:

    • Read the training data from the credit_risk_train.csv file using pd.read_csv().
    • Read the testing data from the credit_risk_test.csv file using pd.read_csv().
    • Call the label() function to perform label encoding on the non-numeric features in both the training and testing data.
    • Split the training data into x_train, y_train, x_test, and y_test.
def getData():
    """
    Read data from csv files. And split the train data into train and test for validation.
    """
    ## step1. read data from csv files
    data = pd.read_csv(trainfile)
    test = pd.read_csv(testfile)

    ## step2. label encoding
    data = label(data)
    test = label(test)

    ## step3. split train data into train and test
    x_train, y_train = data.iloc[:, :-1].to_numpy(), data.iloc[:, -1].to_numpy()
    x_test = test.iloc[:, :].to_numpy()
    y_test = None
    return x_train, y_train, x_test, y_test
  1. In the label() function, complete the implementation of the label encoding process:
    • Iterate through each column in the data.
    • If the column data type is object, create a LabelEncoder instance and fit it to the column data.
    • If the column name is "RISK", store the LabelEncoder instance in the convertor variable.
    • Transform the column data using the LabelEncoder instance and update the column in the data.
    • Return the updated data.
def label(data):
    """
    Use label encoding to process the non-numeric features.
    """
    global convertor
    for col in data.columns:
        if data[col].dtype == "object":
            le = LE()
            if col == "RISK":
                convertor = le
            le.fit(data[col])
            data[col] = le.transform(data[col])
    return data

After completing this step, you will have the training and testing data ready for the next step.

Train the Model

In this step, you will learn how to train a machine learning classification model using the training data.

  1. In the predict() function, complete the following tasks:
    • Create an instance of the SVC model from the sklearn.svm module.
    • Fit the model to the x_train and y_train data using the fit() method.
def predict(model=MODEL):
    """
    Use the model to predict the result.
    """
    ## step1. get the model
    predictor = model()
    ## step2. get the data
    x_train, y_train, x_test, _ = getData()
    ## step3. train the model
    predictor.fit(x_train, y_train)
    ## step4. predict and save
    res = predictor.predict(x_test)
    save(res)

After completing this step, the model will be trained and ready for making predictions on the testing data.

Save the Predictions

In this step, you will learn how to save the prediction results to the credit_risk_pred.csv file.

  1. In the save() function, complete the following tasks:
    • Use the convertor variable to inverse transform the prediction results back to the original labels.
    • Create a Pandas DataFrame with the prediction results and save it to the credit_risk_pred.csv file using pd.DataFrame().to_csv().
def save(result):
    """
    Save the result to csv file.
    """
    result = convertor.inverse_transform(result)
    dataframe = pd.DataFrame({"RISK": result})
    dataframe.to_csv("credit_risk_pred.csv", index=False, sep=",")

After completing this step, the prediction results will be saved to the credit_risk_pred.csv file.

Run the Prediction

In this final step, you will run the prediction process and check the output.

  1. In the if __name__ == "__main__": block, call the predict() function to run the prediction process.
  2. In your terminal, run the predict.py file with the following command:
python3 predict.py
  1. After running the predict.py file, you should see the following output:
Predict done!
  1. Check the credit_risk_pred.csv file in the project directory. It should contain the prediction results for the testing data.

Congratulations! You have successfully completed the credit card holder risk prediction project. You have learned how to:

  • Prepare the data by performing label encoding on non-numeric features
  • Train a machine learning classification model using the training data
  • Save the prediction results to a CSV file

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Pandas Tutorials you may like