Model Training and Evaluation
In this step, you will utilize Python's scikit-learn library to build machine learning models for predicting the locations of potential underwater treasures based on the preprocessed data. You will train and evaluate the performance of various machine learning algorithms such as decision trees, random forests, and support vector machines.
In ~/project/model_training.py
:
## model_training.py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
## Read the data from the "underwater_data.csv" file
data = pd.read_csv("/home/labex/project/underwater_data.csv")
## Convert data to a NumPy array
data = np.array(data)
## Extract feature matrix X and target variable y
X = data[:, :-1] ## Use all rows, except the last column as the feature matrix X
y = data[:, -1] ## Use all rows, the last column as the target variable y
## Split the preprocessed data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
## Initialize and train a random forest regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
## Evaluate the model's performance
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")
Run the script:
python model_training.py
The information below should be displayed on your terminal:
Mean Squared Error: 1.8009639999999907