TensorFlow | Docker | Machine Learning Model Deployment

Introduction

This project is designed to guide you through the process of creating a simple TensorFlow model, exporting it, and then serving it using Docker and TensorFlow Serving. TensorFlow is an open-source machine learning framework, and TensorFlow Serving is a flexible, high-performance serving system for machine learning models. Docker containers make it easy to package and deploy these models consistently. By the end of this project, you'll understand how to set up a basic machine learning model in TensorFlow, export it for serving, and deploy it using TensorFlow Serving inside a Docker container.

👀 Preview

## Send a prediction request to the TensorFlow Serving container
curl -X POST \
  http://localhost:9501/v1/models/half_plus_two:predict \
  -d '{"signature_name":"serving_default","instances":[[1.0], [2.0], [5.0]]}'

Output:

{
  "predictions": [[2.5], [3.0], [4.5]
  ]
}

🎯 Tasks

In this project, you will learn:

How to install TensorFlow and TensorFlow Serving dependencies
How to create a simple TensorFlow model for basic arithmetic operations
How to export the model in a format suitable for serving with TensorFlow Serving
How to serve the model using Docker and TensorFlow Serving
How to send prediction requests to the deployed model and receive predictions

🏆 Achievements

After completing this project, you will be able to:

Set up a basic machine learning model in TensorFlow
Export a TensorFlow model for serving
Deploy a TensorFlow model using Docker and TensorFlow Serving
Send prediction requests to the deployed model and observe the results

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL docker(("`Docker`")) -.-> docker/ContainerOperationsGroup(["`Container Operations`"]) docker(("`Docker`")) -.-> docker/ImageOperationsGroup(["`Image Operations`"]) docker(("`Docker`")) -.-> docker/VolumeOperationsGroup(["`Volume Operations`"]) docker(("`Docker`")) -.-> docker/DockerfileGroup(["`Dockerfile`"]) docker/ContainerOperationsGroup -.-> docker/run("`Run a Container`") docker/ContainerOperationsGroup -.-> docker/ps("`List Running Containers`") docker/ContainerOperationsGroup -.-> docker/exec("`Execute Command in Container`") docker/ImageOperationsGroup -.-> docker/pull("`Pull Image from Repository`") docker/ImageOperationsGroup -.-> docker/save("`Save Image`") docker/VolumeOperationsGroup -.-> docker/volume("`Manage Volumes`") docker/DockerfileGroup -.-> docker/build("`Build Image from Dockerfile`") subgraph Lab Skills docker/run -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/ps -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/exec -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/pull -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/save -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/volume -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} docker/build -.-> lab-298840{{"`Deploying a Simple TensorFlow Model`"}} end

Install Dependencies

Before you start, you need to install TensorFlow in your environment. Additionally, you'll pull the TensorFlow Serving image from Docker Hub to prepare for serving your model in a containerized environment. Execute the following commands in your terminal.

Install TensorFlow:

## Install TensorFlow
pip install tensorflow==2.14.0
## Downgrade numpy to 1.26.4 to avoid compatibility issues with TensorFlow
pip install numpy==1.26.4

Pull TensorFlow Serving Docker image:

## Pull TensorFlow Serving image from Docker Hub
docker pull tensorflow/serving

In this step, you installed TensorFlow, a powerful library for numerical computation and machine learning, and then pulled the TensorFlow Serving Docker image.

TensorFlow Serving is specifically designed for serving machine learning models in production environments. Using Docker ensures that TensorFlow Serving runs in an isolated environment with all its dependencies met, thus avoiding conflicts with other software on your machine.

Create and Export Your Model

In this step, you'll define a simple TensorFlow model that performs a basic arithmetic operation: multiplying its input by 0.5 and then adding 2. After defining the model, you'll export it to a format that TensorFlow Serving can use.

Create and export the model in ~/project/half_plus_two.py:

## Import TensorFlow
import tensorflow as tf

## Define a simple Sequential model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1], use_bias=True)
])

## Set the weights to achieve the "multiply by 0.5 and add 2" functionality
weights = [tf.constant([[0.5]]), tf.constant([2.0])]
model.set_weights(weights)

## Compile the model (required even if not training)
model.compile(optimizer='sgd', loss='mean_squared_error')

## Export the model to a SavedModel
export_path = './saved_model_half_plus_two/1'
tf.saved_model.save(model, export_path)

This step involves defining a TensorFlow model that performs a simple operation on its input: multiplying by 0.5 and adding 2. The model is then exported to a format suitable for serving.

The model is defined using TensorFlow's Keras API, which is a high-level API for building and training deep learning models. The model consists of a single dense layer, which is a fully connected neural network layer.
The weights of the model are manually set to achieve the desired operation (multiply by 0.5 and add 2).
Even though this model won't be trained further, it's compiled to finalize its structure, which is a required step in TensorFlow.
Finally, the model is saved in the TensorFlow SavedModel format, which is a directory containing a protobuf file and a TensorFlow checkpoint containing the model weights. This format is required by TensorFlow Serving for model deployment.

To export the model, run the script in terminal:

python half_plus_two.py

The model is saved in the ~/project/saved_model_half_plus_two, and the file struct is as follows:

.
└── saved_model_half_plus_two
└── 1
├── assets
├── fingerprint.pb
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index

✨ Check Solution and Practice

Serve the Model Using Docker and TensorFlow Serving

After exporting the model, the next step is to serve it using TensorFlow Serving inside a Docker container. This allows your model to be accessible over a network and can respond to prediction requests.

Serve the model with Docker in terminal:

## Serve the model using TensorFlow Serving in a Docker container
docker run -t --rm -p 9500:8500 -p 9501:8501 \
  -v "/home/labex/project/saved_model_half_plus_two:/models/half_plus_two" \
  -e MODEL_NAME=half_plus_two \
  tensorflow/serving

In this step, the exported model is served using TensorFlow Serving inside a Docker container. The Docker run command starts a TensorFlow Serving instance and makes the model available for inference requests.

The -p flags map the ports from the Docker container to your host machine, allowing you to send requests to the TensorFlow Serving model server from your local machine.
The -v flag mounts a volume from your host machine to the Docker container, making the exported model available to TensorFlow Serving.
The -e MODEL_NAME environment variable tells TensorFlow Serving the name of the model to serve.

This setup encapsulates the model serving environment, ensuring that it runs consistently regardless of where it's deployed.

Send a Prediction Request to Your Model

Finally, you'll test the deployed model by sending a prediction request. This request will ask the model to apply its logic (multiply by 0.5 and add 2) to a set of input values.

Send a prediction request in another new terminal:

## Send a prediction request to the TensorFlow Serving container
curl -X POST \
  http://localhost:9501/v1/models/half_plus_two:predict \
  -d '{"signature_name":"serving_default","instances":[[1.0], [2.0], [5.0]]}'

Output:

{
  "predictions": [[2.5], [3.0], [4.5]
  ]
}

This final step involves testing the deployed model by sending an HTTP POST request. This request includes a JSON payload with instances for which predictions are needed.

The curl command is used to send a POST request to the TensorFlow Serving server. The URL specifies the model and the predict API endpoint.
The -d flag provides the data for the prediction request in JSON format. The signature_name key specifies the serving signature to use, which is a way to tell TensorFlow Serving which computational graph to execute. The instances key contains the input data for the prediction.

The response from the server includes the predictions made by the model on the provided input instances, demonstrating that the model is successfully deployed and serving predictions.

Summary

In this project, you've learned how to create a simple TensorFlow model, export it for serving, and deploy it using TensorFlow Serving and Docker. You started by installing the necessary dependencies, then defined and exported a basic model. You then served the model using TensorFlow Serving inside a Docker container and tested it by sending a prediction request. This workflow is a fundamental skill for deploying machine learning models in a scalable and reproducible way.

Deploying a Simple TensorFlow Model