Mlops - II

 

6. Model Packaging & Reproducibility


Here’s a detailed explanation of how to define environments using requirements.txt and Conda environment files (environment.yml), both of which are crucial in MLOps for ensuring reproducibility, consistency, and portability of ML pipelines.


✅ 1. requirements.txt (For pip)

πŸ“Œ Purpose:

Used to list Python packages and their versions to be installed via pip.

πŸ“„ Example: requirements.txt

numpy==1.24.3
pandas>=1.5.0,<2.0
scikit-learn
mlflow
dvc
tensorflow==2.13.0
matplotlib

πŸ’‘ Usage:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # on Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Freeze to regenerate file
pip freeze > requirements.txt

✅ When to Use:

  • You use pip and venv

  • You want simplicity and lightweight environments

  • CI/CD pipelines and Docker integrations


✅ 2. environment.yml (For conda)

πŸ“Œ Purpose:

Defines a Conda environment, including:

  • Python version

  • Pip packages

  • Conda-specific packages

  • Channels

πŸ“„ Example: environment.yml

name: mlops-env
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.10
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - pip
  - pip:
      - mlflow
      - dvc
      - wandb

πŸ’‘ Usage:

# Create environment
conda env create -f environment.yml

# Activate environment
conda activate mlops-env

# Export current env
conda env export > environment.yml

✅ When to Use:

  • You're using Anaconda/Miniconda

  • Need non-Python dependencies (e.g., libgomp, libx11)

  • Complex data science stacks (GPU support, etc.)


πŸ” Side-by-Side Comparison

Feature requirements.txt environment.yml
Package Manager pip conda (+ pip)
Language Python-only Supports system libs too
Format Plaintext YAML
Virtual Env Tool venv, virtualenv conda
Portability Very portable More robust for data science

πŸ’‘ Best Practices

Lock dependencies with versions
✅ Always version control these files (commit to Git)
✅ Use requirements-dev.txt for testing tools like pytest, flake8, etc.
✅ Regenerate files after updates using pip freeze or conda env export


πŸ§ͺ Bonus: Hybrid conda + pip in ML Projects

Many ML tools (like MLflow, DVC, wandb) are only on PyPI, so you use both:

dependencies:
  - conda dependencies
  - pip:
    - PyPI packages (like wandb, mlflow, etc.)



🐳 What is Docker?

Docker is a platform that allows you to package your application and its dependencies into containers. This helps:

  • Avoid "it works on my machine" problems

  • Simplify deployment

  • Standardize environments


✅ Why Use Docker for ML?

Benefit Description
Environment Reproducibility Same code = same result everywhere
Easy Deployment Deploy models as APIs or batch jobs in any cloud/server
Dependency Isolation Avoid conflicts between different projects
Portability Run containers anywhere: local, cloud, CI/CD
Scalability Combine with Kubernetes or ECS for horizontal scaling

πŸ› ️ Key Docker Concepts

Concept Description
Dockerfile Script to build a Docker image
Image Snapshot of environment and code
Container Running instance of an image
Volume Persistent data storage
Ports Used to expose services (e.g., API)

πŸ“„ Sample Dockerfile for ML

# 1. Base Image
FROM python:3.10-slim

# 2. Set working directory
WORKDIR /app

# 3. Copy code and requirements
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# 4. Default command
CMD ["python", "train.py"]

πŸ§ͺ Example: ML Project Structure

ml-project/
├── Dockerfile
├── requirements.txt
├── train.py
├── model.pkl
└── utils.py

πŸš€ Build and Run Docker

πŸ”§ Build the Image

docker build -t ml-trainer .

▶️ Run the Container

docker run --rm ml-trainer

🌐 Run with Port Binding (for APIs)

docker run -p 5000:5000 ml-api

πŸ“¦ Use Case Examples

Use Case Description
Model Training train.py inside Docker — GPU support if needed
Model Serving Flask / FastAPI container exposing REST API
Data Pipelines Combine Docker + Airflow for batch jobs
Jupyter Notebook Run notebooks inside container with EXPOSE 8888
CI/CD Integration Run tests and training in pipelines

🧠 GPU Support (Optional)

For TensorFlow or PyTorch GPU:

🐍 Dockerfile (PyTorch + CUDA)

FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime

WORKDIR /app
COPY . .
RUN pip install -r requirements.txt

CMD ["python", "train.py"]

Then use:

docker run --gpus all ml-gpu-trainer

πŸ” Best Practices

  • Use .dockerignore to skip unnecessary files (like .git, __pycache__)

  • Use ENTRYPOINT if you want CLI-style container apps

  • Keep image size small with slim or alpine base images

  • Avoid running containers as root (USER app)


πŸ“ .dockerignore Example

__pycache__/
*.pyc
*.pkl
*.csv
.env
.git



🐳 What is a Dockerfile?

A Dockerfile is a script with step-by-step instructions to build a Docker image (your app + environment).


✅ Sample Dockerfile for a ML Project (Training + Inference)

# Use an official Python base image
FROM python:3.10-slim

# Set the working directory inside the container
WORKDIR /app

# Copy local code to the container
COPY . .

# Install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Run the training or inference script
CMD ["python", "train.py"]

πŸ“¦ Typical File Structure

ml-project/
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── train.py
├── serve.py (Flask/FastAPI)
└── data/

πŸ§ͺ Example: requirements.txt

pandas
scikit-learn
flask
joblib

πŸš€ Docker Commands

Command What it does
docker build -t ml-app . Builds an image named ml-app
docker run ml-app Runs a container from ml-app
docker run -v $(pwd)/data:/app/data ml-app Mounts local data/ to container
docker run -p 5000:5000 ml-app Exposes Flask/FastAPI server

🧩 What is Docker Compose?

Docker Compose is a tool to run multi-container apps (like an ML API + a DB + message queue) using a single .yml file.


✅ Sample docker-compose.yml for ML App + Postgres + Jupyter

version: "3.8"
services:
  ml-api:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ml-api
    ports:
      - "5000:5000"
    volumes:
      - .:/app
    command: python serve.py

  jupyter:
    image: jupyter/scipy-notebook
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/home/jovyan/work

  db:
    image: postgres:13
    environment:
      POSTGRES_USER: mluser
      POSTGRES_PASSWORD: mlpass
      POSTGRES_DB: mldb
    ports:
      - "5432:5432"

πŸ’‘ Run Compose

docker-compose up --build

This starts:

  • ml-api service (Flask/FastAPI)

  • jupyter service

  • db (PostgreSQL)


⚙️ Advanced Tips

Tip Description
.env file Store secrets and config
depends_on: Specify container startup order
restart: always Auto-restart crashed containers
networks: Custom network for services to talk to each other

πŸ“ Optional .env File

POSTGRES_USER=mluser
POSTGRES_PASSWORD=mlpass
POSTGRES_DB=mldb

And in docker-compose.yml:

env_file:
  - .env

πŸ”„ Common Use Cases

Use Case Compose Needed?
Train a model once ❌ Only Dockerfile
Serve API + DB ✅ Yes
Jupyter + Training API ✅ Yes
ML Pipeline in CI/CD ✅ Yes


Here’s a step-by-step guide to building and running ML containers using Docker. This will help you containerize your ML projects — for both training and inference.


✅ 1. Prepare Your ML Project

Assume your project has this structure:

ml-project/
├── Dockerfile
├── requirements.txt
├── train.py
├── serve.py        # (Flask or FastAPI)
├── model.pkl
└── data/

🐳 2. Create Dockerfile

# Use an official Python image
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy files
COPY . .

# Install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Default command
CMD ["python", "train.py"]

πŸ“¦ 3. Create requirements.txt

numpy
pandas
scikit-learn
flask
joblib

πŸ› ️ 4. Build the Docker Image

Open terminal inside the ml-project/ folder and run:

docker build -t ml-container .
  • -t ml-container → tags the image with name ml-container

  • . → builds from current directory


▶️ 5. Run the Container

A. Train model:

docker run --name train-ml ml-container

This runs train.py inside a container.

B. Serve model as API:

Modify your Dockerfile to:

CMD ["python", "serve.py"]

Then rebuild and run:

docker build -t ml-api .
docker run -p 5000:5000 ml-api

πŸ” Alternatively: Override CMD during runtime

docker run -p 5000:5000 ml-container python serve.py

πŸ“‚ 6. Mount Local Volume (For Data or Models)

docker run -v $(pwd)/data:/app/data ml-container
  • Mounts your data/ directory into the container


πŸ” 7. Access Container Logs

docker logs train-ml

🧽 8. Clean Up

docker ps -a               # List all containers
docker rm train-ml         # Remove container
docker rmi ml-container    # Remove image

🌐 9. Testing API Inside Container (Optional)

If serve.py runs a Flask app on port 5000:

curl http://localhost:5000/predict -X POST -H "Content-Type: application/json" -d 
'{"features": [1, 2, 3]}'

πŸ§ͺ Example: Simple serve.py for FastAPI

from fastapi import FastAPI
import joblib
import numpy as np

model = joblib.load("model.pkl")
app = FastAPI()

@app.post("/predict")
def predict(features: list):
    prediction = model.predict([features])
    return {"prediction": prediction.tolist()}

πŸ’‘ BONUS: Run in Detached Mode

docker run -d -p 5000:5000 ml-api
  • Use docker logs <container_id> to monitor output

  • -d detaches the container so it runs in background


7. Model Deployment



πŸš€ What Are Deployment Strategies?

Deployment strategies define how you release a new model (or model version) into production without disrupting users, compromising performance, or introducing bugs.

These strategies often mirror DevOps deployment approaches but are adapted for ML-specific considerations like drift, accuracy, retraining, and latency.


✅ Common Deployment Strategies in MLOps

1. Recreate / Replace

  • πŸ” Takes down the old model, then deploys the new one.

  • ⏳ Downtime expected.

Use When: Non-critical models or low traffic.

kubectl delete deployment old-model
kubectl apply -f new-model.yaml

2. Blue-Green Deployment

  • 🟦 Blue = current production model

  • 🟩 Green = new model deployed alongside

  • πŸ’‘ Switch traffic to green after verification.

Pros:

  • Instant rollback

  • Zero downtime

Cons:

  • Requires double resources

Use When: High uptime is critical.


3. Canary Deployment

  • 🐦 Roll out to a small subset of users (e.g., 5%)

  • Monitor performance (latency, accuracy, user feedback)

  • Gradually increase traffic

Pros:

  • Safer than full rollout

  • Real-time feedback

Use When: Testing model quality, performance, or monitoring risk of concept/data drift.


4. A/B Testing (Shadow or Split Testing)

  • πŸ…°️ Model A (existing)

  • πŸ…±️ Model B (new)

  • Route users randomly (e.g., 50/50) and compare outputs or outcomes

Pros:

  • Compare performance metrics (conversion rate, accuracy, etc.)

  • User-driven validation

Use When: Comparing two models for business impact.


5. Shadow Deployment

  • πŸ•΅️ Serve predictions silently in the background

  • New model runs on real-time data but does not affect users

  • Log and compare predictions vs live model

Pros:

  • No risk to user

  • Evaluate on real-world data

Cons:

  • Doubles compute load

Use When: Auditing, regulatory review, or performance benchmarking.


6. Multi-Armed Bandit

  • Like A/B testing, but uses adaptive traffic allocation

  • Routes more traffic to the better-performing model dynamically

Use When: Want to maximize reward during testing phase (e.g., maximize clicks or accuracy).


7. Rolling Update

  • Gradually update pods or services

  • Often used with Kubernetes (Deployment strategy)

Use When: Containerized models using tools like Kubernetes or Docker Swarm


πŸ“¦ Tools to Support Model Deployment

Tool Use Case
FastAPI / Flask Serving ML models as REST APIs
MLflow / TorchServe Model packaging and deployment
Docker + Kubernetes Scalable containerized deployment
Seldon Core / KFServing K8s-native ML model deployment
Triton Inference Server Optimized inference for deep learning
TensorFlow Serving Serving TensorFlow models in production

πŸ’‘ Best Practices

  • ✅ Always monitor model performance post-deployment (latency, accuracy, drift)

  • ✅ Use feature stores to ensure consistency in training vs inference

  • ✅ Automate deployment through CI/CD (GitHub Actions, Jenkins, etc.)

  • ✅ Use rollback mechanisms (Blue-Green, Canary) in case of failure



✅  Batch Inference vs Real-Time Inference

Feature Batch Inference Real-Time Inference
Definition Predictions are made on bulk data at once Predictions are made instantly per request
Latency High (minutes to hours) Low (milliseconds to seconds)
Trigger Scheduled (e.g., daily, hourly) On-demand (API call, event, UI trigger)
Use Cases - Monthly credit risk scoring- Email spam tagging- Customer churn scoring - Fraud detection- Chatbots- Product recommendations
Deployment Mode Often offline / serverless batch jobs Usually via REST API or streaming systems
Cost Efficiency More efficient at scale Expensive if traffic is high
Examples Run via Airflow, Spark, DVC pipelines Run via FastAPI, Flask, TensorFlow Serving

πŸ“¦ Example:

  • Batch Inference: Every night at 2 AM, predict churn for 10 million customers and store results in a database.

  • Real-Time Inference: When a user logs in, instantly recommend 5 products based on their activity.


✅  Online Inference vs Offline Inference

These are broader categories related to when and how predictions are generated and delivered.

Feature Offline Inference Online Inference
Definition Predictions are precomputed & stored Predictions are computed on-the-fly
When Used Before the user needs it When the user triggers it
Model Execution Happens ahead of time Happens per request
Data Source Static snapshot Real-time features (API, current session)
Storage Predictions are stored in DB, files, etc. Predictions returned directly to UI/API
Use Cases - Risk scoring- Lead prioritization- Email categorization - Self-driving car inputs- Voice assistants- Stock prediction dashboard

πŸ” Relationship with Batch/Real-Time:

Offline Online
Batch ✅ Yes (classic use) ❌ No
Real-Time ❌ No ✅ Yes (classic use)
  • Offline + Batch = Nightly scoring for marketing

  • Online + Real-Time = Instant fraud detection on transactions


🧠 Summary Diagram

            +-----------------------+-----------------------+
            |        Batch          |       Real-Time       |
+-----------+-----------------------+-----------------------+
| Offline   | ✔ Scheduled Scoring   | ✘ Not Common          |
|           | ✔ Stored Predictions  |                       |
+-----------+-----------------------+-----------------------+
| Online    | ✘ Not Practical       | ✔ Instant Prediction  |
|           |                       | ✔ API-based           |
+-----------+-----------------------+-----------------------+

πŸ“Œ How to Choose?

Criteria Prefer Batch/Offline Prefer Real-Time/Online
Latency-critical? ❌ No ✅ Yes
Prediction volume ✅ High (millions at once) ❌ One at a time
Data freshness ❌ Static features ✅ Real-time data needed
Infrastructure ✅ Cheaper and easier ❌ Needs always-on API + low latency
Examples Marketing, churn scoring Chatbot, recommender, fraud alerts



πŸš€ What is Flask?

Flask is a lightweight, easy-to-use Python web framework for building APIs and web applications. In MLOps, it is widely used to serve ML models as REST APIs so they can be accessed in real-time by applications or other services.


✅ Key Features of Flask

  • Simple and minimalistic

  • Great for small to medium ML projects

  • Easily integrates with machine learning libraries (scikit-learn, TensorFlow, PyTorch)

  • Supports REST API routes

  • Can be containerized (Docker) and deployed to cloud


πŸ” Flask Workflow for ML Model Deployment

  1. Train your ML model
    Save it using pickle, joblib, or a framework's native format (like model.h5 for Keras)

  2. Create a Flask API app
    Load the model and expose it via an endpoint (e.g., /predict)

  3. Test the API using tools like Postman or Curl

  4. Deploy it using Docker, AWS EC2, or other platforms


πŸ“¦ Sample Flask App for Model Deployment

model.pkl — Trained ML model file (e.g., a scikit-learn model)

app.py

from flask import Flask, request, jsonify
import pickle
import numpy as np

# Load the trained model
model = pickle.load(open("model.pkl", "rb"))

# Initialize Flask app
app = Flask(__name__)

# Define home route
@app.route('/')
def home():
    return "Welcome to the ML Model API!"

# Define predict route
@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)  # Get JSON payload
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    return jsonify({'prediction': prediction.tolist()})

# Run the Flask app
if __name__ == '__main__':
    app.run(debug=True)

πŸ§ͺ Test Request Example

curl -X POST http://127.0.0.1:5000/predict \
-H "Content-Type: application/json" \
-d '{"features": [6.2, 3.4, 5.4, 2.3]}'

πŸ“ Typical Flask Project Structure

ml-flask-app/
│
├── model.pkl
├── app.py
├── requirements.txt
└── Dockerfile  (optional for containerization)

🧰 requirements.txt

Flask==2.3.2
numpy
scikit-learn

🐳 Optional: Dockerfile to Containerize

FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]



πŸš€ What is FastAPI?

FastAPI is a high-performance web framework for building APIs with Python 3.7+ based on ASGI (Asynchronous Server Gateway Interface). It's designed for speed and includes automatic data validation, type hints, and auto-generated Swagger documentation.


✅ Why Use FastAPI for ML?

Feature FastAPI Advantage
⚡ Speed Faster than Flask
πŸ” Data validation Automatic (via Pydantic)
πŸ§ͺ Interactive Docs Built-in Swagger & ReDoc
πŸ€– Async support Native support for async I/O
πŸ“¦ Dependency Injection Built-in and clean

🧠 ML Model Deployment with FastAPI – End-to-End Example

1. model.pkl: Trained ML model (e.g., scikit-learn)

Save your model using:

import pickle
pickle.dump(model, open("model.pkl", "wb"))

2. main.py: FastAPI App Code

from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
import pickle

# Load model
model = pickle.load(open("model.pkl", "rb"))

# Initialize app
app = FastAPI(title="ML Model Inference API")

# Define input schema
class Features(BaseModel):
    features: list[float]

# Root endpoint
@app.get("/")
def read_root():
    return {"message": "Welcome to the FastAPI ML model server!"}

# Predict endpoint
@app.post("/predict")
def predict(data: Features):
    features = np.array(data.features).reshape(1, -1)
    prediction = model.predict(features)
    return {"prediction": prediction.tolist()}

3. requirements.txt

fastapi
uvicorn
numpy
scikit-learn
pydantic

4. πŸ§ͺ Run Locally

Install dependencies:

pip install -r requirements.txt

Run the API:

uvicorn main:app --reload

You can now visit:


5. πŸ§ͺ Sample Test (via curl or Postman)

curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"features\": [5.1, 3.5, 1.4, 0.2]}"

🐳 Optional: Dockerfile to Containerize FastAPI App

FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

πŸ”₯ Bonus: Swagger UI Automatically Available

Thanks to FastAPI, you get Swagger UI without extra effort — excellent for testing, collaboration, or exposing to product teams.


✅ Flask vs FastAPI Quick Recap

Feature Flask FastAPI
Speed Slower ⚡ Faster
Type Validation Manual ✅ Automatic
Async Support ❌ No ✅ Yes
Docs ❌ Add-ons ✅ Built-in
Production Ready ✅ Yes ✅ Yes



1. What is Model Serving?

Model serving is the process of exposing a trained ML model via an API so other applications (like a web or mobile app) can send input data and receive predictions in real-time.


2. Flask vs FastAPI: Quick Comparison

Feature Flask FastAPI
Performance Slower (sync) Faster (async, Starlette + Pydantic)
Type Hint Support Optional Fully supports and requires it
Validation Manual or with Flask-RESTful Built-in via Pydantic
Best For Simpler, legacy projects High-performance APIs, production ML APIs
Learning Curve Easier for beginners Slightly steeper but modern

3. Serving ML Model Using Flask (Example)

πŸ§ͺ File: model.pkl

Assume you’ve trained and saved a model using joblib or pickle.

πŸš€ Flask API Code:

from flask import Flask, request, jsonify
import joblib

# Load the model
model = joblib.load("model.pkl")

# Create the app
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = [data['feature1'], data['feature2']]
    prediction = model.predict([features])
    return jsonify({'prediction': prediction[0]})

if __name__ == '__main__':
    app.run(debug=True)

➕ cURL Test:

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" \
     -d '{"feature1": 1.5, "feature2": 3.2}'

4. Serving ML Model Using FastAPI (Example)

⚙️ Install FastAPI + Uvicorn:

pip install fastapi uvicorn joblib

πŸš€ FastAPI Code:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

# Load the model
model = joblib.load("model.pkl")

# Define request schema
class InputData(BaseModel):
    feature1: float
    feature2: float

# Initialize app
app = FastAPI()

@app.post("/predict")
def predict(data: InputData):
    features = [[data.feature1, data.feature2]]
    prediction = model.predict(features)
    return {"prediction": prediction[0]}

πŸ”₯ Run with Uvicorn:

uvicorn your_script_name:app --reload

πŸ§ͺ Open Swagger UI:

Visit: http://localhost:8000/docs


5. Production Enhancements

Area Suggestions
Security OAuth2 / JWT, rate limiting, API keys
Monitoring Prometheus + Grafana, request logging
Error Handling Graceful 4xx/5xx responses
Scaling Use Docker, gunicorn (Flask), or uvicorn workers
CI/CD Use GitHub Actions, Jenkins, or CodePipeline

✅ Summary

Category Flask FastAPI
Simplicity Easier for starters Modern, fast, robust
Speed Slower (WSGI) Faster (ASGI + async I/O)
Validation Manual Built-in with Pydantic
Use Cases POCs, simple APIs High-performance ML API, production use



πŸ“Œ What is TensorFlow Serving?

TensorFlow Serving is a flexible, high-performance model server for deploying machine learning models built with TensorFlow.

✅ It allows you to:

  • Serve models over a gRPC or RESTful API

  • Dynamically load new models (versioning support)

  • Scale easily in production


🧠 Why TensorFlow Serving?

Feature Benefit
πŸš€ High performance Optimized for inference speed
πŸ“¦ Model versioning Serve multiple versions
πŸ”„ Hot swapping No downtime model updates
πŸ’» Protocols Supports REST & gRPC
⚙️ TensorFlow native Designed specifically for TF models

πŸ› ️ Core Concepts

1. Model Format

You need to export your TensorFlow model using the SavedModel format:

model.save("my_model/1")

Directory structure should look like:

my_model/
└── 1/
    ├── saved_model.pb
    └── variables/

2. Model Versioning

  • Each version is a numbered folder (1/, 2/, etc.)

  • You can host multiple versions, and TensorFlow Serving will use the latest by default.


πŸš€ Serving a Model using Docker

Step 1: Export model in SavedModel format

import tensorflow as tf

model = tf.keras.models.load_model("my_model.h5")
tf.saved_model.save(model, "exported_model/1")

Step 2: Pull TensorFlow Serving Docker image

docker pull tensorflow/serving

Step 3: Run the container

docker run -p 8501:8501 \
  --mount type=bind,source=$(pwd)/exported_model,target=/models/my_model \
  -e MODEL_NAME=my_model \
  -t tensorflow/serving

πŸ“Œ This serves your model at: http://localhost:8501/v1/models/my_model:predict


πŸ”— REST API Example

Send a request using curl:

curl -X POST http://localhost:8501/v1/models/my_model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

Response:

{
  "predictions": [[0.1, 0.9]]
}

🧰 Optional: Use gRPC (Advanced)

You can also use gRPC instead of REST for lower latency. This requires protobuf stubs and a gRPC client.


πŸ§ͺ Testing

Test endpoint:

curl http://localhost:8501/v1/models/my_model

Response:

{
  "model_version_status": [...],
  "model_name": "my_model"
}

πŸ“ Docker Folder Structure Summary

project/
├── exported_model/
│   └── 1/
│       ├── saved_model.pb
│       └── variables/
└── Dockerfile (optional if customizing serving)

πŸ—️ Production Integration Ideas

  • Use NGINX to reverse proxy for secure API gateway

  • Deploy on Kubernetes (e.g., with TFServing Helm chart)

  • Integrate with Prometheus/Grafana for monitoring

  • Use Triton for multi-framework support (TF + PyTorch)



πŸ”₯ 1. TorchServe

🧾 What is it?

TorchServe is the official model serving framework for PyTorch developed by AWS and Facebook.

πŸš€ Key Features

Feature Description
✅ Native PyTorch support Built for PyTorch models specifically
πŸ“¦ Model archiver (.mar) Package model + code + config in .mar file
🌐 REST & gRPC APIs Serve models using standard APIs
πŸ”„ Model versioning Load/unload multiple versions
πŸ“Š Metrics & logs Prometheus integration, model logs
πŸ”§ Custom handlers Customize preprocessing/postprocessing logic

⚙️ How it works

  1. Archive your model using torch-model-archiver:

torch-model-archiver --model-name resnet18 \
 --version 1.0 \
 --model-file model.py \
 --serialized-file resnet18.pt \
 --handler image_classifier
  1. Serve the model:

torchserve --start --model-store model_store --models resnet=resnet18.mar
  1. Inference via REST:

curl http://127.0.0.1:8080/predictions/resnet -T image.jpg

πŸ“‚ Directory structure

project/
├── model_store/
│   └── resnet18.mar
├── model.py
└── config.properties

πŸ” Model Versioning

You can serve multiple .mar files with different versions and configure them in config.properties.


πŸ₯‘ 2. BentoML

🧾 What is it?

BentoML is a framework-agnostic platform to build, package, and deploy machine learning models as microservices.

⚡ Best for when you want flexibility + clean API with customization + auto Docker build + YAML config.


πŸš€ Key Features

Feature Description
πŸ” Framework-agnostic Supports PyTorch, TensorFlow, XGBoost, etc.
🧰 CLI & Python SDK Easy to package and serve
🐳 Docker auto-pack Generates container with one command
πŸͺ„ BentoML "Services" Write custom API logic using Python
πŸ§ͺ Local dev server bentoml serve with hot reload
πŸ”„ Model registry Built-in local model management

πŸ“¦ Sample Service (for PyTorch)

# service.py
import torch
import bentoml
from bentoml.io import NumpyNdarray
from torchvision import models, transforms

model_ref = bentoml.pytorch.load_model("resnet50:latest")

@bentoml.service()
class ResNetService:
    @bentoml.api(input=NumpyNdarray(), output=NumpyNdarray())
    def predict(self, arr):
        tensor = torch.tensor(arr).unsqueeze(0)
        return model_ref(tensor).detach().numpy()

πŸ”§ CLI Commands

bentoml serve service:ResNetService
bentoml build
bentoml containerize ResNetService:latest
bentoml deploy  # for cloud platforms like AWS Lambda, K8s

πŸ₯Š TorchServe vs BentoML

Feature TorchServe BentoML
🧠 Framework Support Only PyTorch Framework-agnostic
πŸ› ️ Custom API logic Handlers Native Python + FastAPI-style syntax
🐳 Dockerization Manual (or use custom Dockerfile) Auto Dockerfile generation
🌐 REST/gRPC Support REST/gRPC REST (gRPC planned)
πŸ§ͺ Local Dev Experience Moderate Excellent (hot reloads, CLI tools)
🧱 Model Registry .mar archives Local Bento Store + Cloud Registry
🧰 Monitoring & Metrics Prometheus support Prometheus + optional integrations
☁️ Cloud deployment Manual AWS Lambda, K8s, SageMaker ready

✅ When to Use What?

Use Case Tool
You need official, production-grade PyTorch model serving TorchServe
You want to serve any ML model with full control & flexibility BentoML
You want integrated Docker build and REST API in Python BentoML
You prefer configuration over code with strict control TorchServe



πŸ” What is AWS Lambda?

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. You just upload your code, and Lambda takes care of everything required to run and scale it.

πŸ’‘ Great for lightweight ML inference, automation, data preprocessing, and event-driven tasks.


⚙️ Core Concepts

Concept Description
Function The unit of deployment in Lambda (your code + configuration)
Event Triggers that invoke Lambda (e.g., API Gateway, S3, SNS)
Handler Entry point of the Lambda function
Runtime Language-specific execution environment (Python, Node.js, etc.)
Timeout Max execution time (default 3 sec, max 15 mins)
Memory 128 MB to 10 GB, affects CPU power & pricing

πŸ”¬ Use Cases in MLOps

Use Case Example
πŸ” Model Inference (small models) Serve XGBoost/LightGBM/TinyML
🧹 Data Cleaning Pipelines Preprocess uploaded data from S3
πŸ”” Event-Driven Triggers Trigger retraining when new data is added
πŸ“₯ Batch Prediction Predict on small batches via API
πŸ“¬ Slack/Email Alerts Auto alert on pipeline failures
πŸ“‚ Glue/Athena Orchestration Trigger downstream processes

πŸ§ͺ Sample Python Lambda Function (ML Inference)

# lambda_function.py
import json
import joblib
import numpy as np

# Load your model (ensure it's small or load from S3)
model = joblib.load("/opt/model.pkl")

def lambda_handler(event, context):
    data = json.loads(event['body'])
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': prediction.tolist()})
    }

πŸ“¦ Packaging ML Models for Lambda

  1. Package code + dependencies + model (must be < 250MB unzipped):

project/
├── lambda_function.py
├── model.pkl
├── requirements.txt
  1. Zip and deploy:

pip install -r requirements.txt -t ./package
cp lambda_function.py model.pkl ./package
cd package && zip -r ../lambda.zip .
  1. Upload lambda.zip to AWS Lambda console or via AWS CLI.


πŸš€ Triggering Lambda

Trigger Type Use
πŸ”— API Gateway Create REST endpoint for inference
πŸ“ S3 Upload Trigger when new file added
πŸ”„ CloudWatch Run on schedule (like cron job)
πŸ’¬ SNS Topic Alert on specific event
πŸ”€ Step Functions As part of ML pipeline

🧊 AWS Lambda Limitations

Limitation Value
Max runtime 15 mins
Max memory 10 GB
Max package size 250 MB (unzipped)
No GPU support

Use AWS SageMaker or ECS for large models/GPU inference.


πŸ” Permissions (IAM Role)

Make sure the Lambda has a proper execution role with:

  • S3 read/write (if using S3)

  • CloudWatch logs

  • (Optional) Secrets Manager or Parameter Store access


🧠 Best Practices

Tip Description
✅ Use /opt layer for models Separate large model files into Lambda Layers
🐳 Local test with Docker Use AWS SAM CLI
🧩 Combine with API Gateway For RESTful inference API
πŸ“ Optimize size Use scikit-learn, joblib, lightgbm etc. with care
πŸ” Use async if needed For faster parallel executions

πŸ”„ Alternatives for Larger Models

Tool Use When
AWS SageMaker Endpoint Large model, GPU needed
ECS with Docker Custom ML containers
EKS (Kubernetes) Complex ML infra with scaling
BentoML + Lambda BentoML can export Lambdas directly



πŸš€ What is Google Cloud Functions?

Google Cloud Functions is a serverless execution environment on Google Cloud Platform (GCP). You deploy code snippets that automatically run in response to events like HTTP requests, Pub/Sub messages, or Cloud Storage triggers.

⚡ It’s similar to AWS Lambda — perfect for lightweight ML inference, real-time data handling, or orchestrating workflows.


🧠 When to Use GCF in MLOps?

Use Case Example
✅ Lightweight model inference Deploy Scikit-learn or XGBoost models
✅ Event-driven automation Retrain when new data arrives in GCS
✅ Preprocessing pipelines Clean/validate data on upload
✅ Alerts & notifications Trigger Slack/email alerts on failure
✅ API interface for ML models REST endpoint to serve predictions

πŸ“¦ Supported Runtimes

Language Status
Python ✅ (3.7–3.11)
Node.js
Go, Java
.NET, Ruby

πŸ› ️ Components of a Cloud Function

Component Description
Trigger What invokes the function (HTTP, Cloud Pub/Sub, GCS, etc.)
Entry Point Your main function/method
Dependencies Listed in requirements.txt
Memory & Timeout Configurable up to 16 GB & 60 min

πŸ” MLOps Use Case Example — Inference API with Scikit-learn

main.py

import functions_framework
import joblib
import numpy as np
from flask import jsonify, request

model = joblib.load("model.pkl")

@functions_framework.http
def predict(request):
    try:
        data = request.get_json()
        features = np.array(data['features']).reshape(1, -1)
        prediction = model.predict(features)
        return jsonify({'prediction': prediction.tolist()})
    except Exception as e:
        return jsonify({'error': str(e)}), 500

requirements.txt

flask
joblib
numpy
scikit-learn

πŸ› ️ Deploying the Function

  1. Authenticate and set project:

gcloud auth login
gcloud config set project <your-gcp-project-id>
  1. Deploy:

gcloud functions deploy predict \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --memory=1024MB \
  --entry-point=predict
  1. Test via curl or Postman:

curl -X POST <your-function-url> \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

πŸ“ Directory Structure

project/
├── main.py
├── requirements.txt
├── model.pkl

πŸ” IAM & Permissions

  • For public HTTP functions, use --allow-unauthenticated

  • For private triggers, manage IAM with roles like:

    • roles/cloudfunctions.invoker

    • roles/pubsub.subscriber

    • roles/storage.objectViewer


⚙️ Triggers Supported

Trigger Type Use Case
HTTP ML inference APIs
Cloud Storage Run when new data is uploaded
Pub/Sub Trigger model retraining
Firestore/BigQuery Event-based ETL
Cloud Scheduler Scheduled jobs (e.g., batch prediction)

⚡ Limitations

Attribute Limit
Max timeout 60 mins
Max memory 16 GB
Max deployment package 500 MB zipped
GPU support ❌ Not available
Cold start ⏱️ ~1s typical

For heavy ML workloads, consider Cloud Run or Vertex AI.


✅ Best Practices

Practice Tip
Use lightweight models scikit-learn, XGBoost (small)
Separate logic from I/O Helps in testing & scaling
Logging Use print() or logging for Cloud Logging
Test locally With Functions Framework
Model versioning Use GCS to store & load models dynamically

πŸ” Cloud Functions vs Cloud Run vs Vertex AI

Feature Cloud Functions Cloud Run Vertex AI
Serverless
ML Inference ✅ (small) ✅ (larger)
GPU Support
Cold Starts Yes Less frequent Depends
Custom Docker

🧠 Tips for MLOps Workflows

  • Store models in Google Cloud Storage

  • Use Pub/Sub to automate retraining on new data

  • Integrate with Vertex AI Pipelines for orchestration

  • Use Secrets Manager for secure API keys or credentials

  • Monitor with Cloud Logging & Cloud Monitoring



πŸš€ What is Kubernetes?

Kubernetes (K8s) is an open-source platform for automating deployment, scaling, and management of containerized applications.

In MLOps, it's widely used for model serving, training pipelines, autoscaling workloads, and managing distributed ML systems.


🎯 Why Use Kubernetes in MLOps?

Feature Benefit
✅ Scalability Scale model inference under load
✅ Portability Works across cloud/on-prem
✅ Isolation Manage separate environments (prod/dev/staging)
✅ Reproducibility Define infra as code (YAML)
✅ Rollback Revert broken versions easily
✅ Scheduling Schedule batch jobs, training runs
✅ GPU Support Yes, with node pools

🎩 What is Helm?

Helm is a package manager for Kubernetes — like pip for Python or apt for Ubuntu.

It lets you:

  • Define Kubernetes manifests as templates

  • Version and package deployments as charts

  • Manage configuration with values.yaml


⚙️ Kubernetes + Helm Workflow for MLOps

πŸ‘‡ Step-by-Step Breakdown:

1. 🐳 Containerize Your Model

Create a Dockerfile for your model/app:

FROM python:3.11-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "serve.py"]

2. ☸️ Create Kubernetes Manifests

Basic structure:

  • deployment.yaml – app definition

  • service.yaml – expose internally or via LoadBalancer

  • ingress.yaml – optional (for HTTP routing)

Example: deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
        - name: ml-container
          image: your-dockerhub/ml-model:latest
          ports:
            - containerPort: 5000
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

3. πŸ“¦ Helm Chart Structure

helm create ml-model

Generated structure:

ml-model/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml

You can now template your YAML using values.yaml:

templates/deployment.yaml (Helm-style):

spec:
  replicas: {{ .Values.replicaCount }}
  containers:
    - image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

values.yaml

replicaCount: 2
image:
  repository: your-dockerhub/ml-model
  tag: latest

4. πŸš€ Deploy using Helm

# Install Helm Chart
helm install ml-model ./ml-model

# Upgrade version
helm upgrade ml-model ./ml-model

# Uninstall
helm uninstall ml-model

πŸ”§ Expose the App

  • Use ClusterIP for internal services (e.g., other ML microservices)

  • Use LoadBalancer or Ingress for external REST APIs

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 5000
  selector:
    app: ml-model

πŸ“Š MLOps-Specific Add-ons

Tool Use
KubeFlow End-to-end ML pipelines
Prometheus + Grafana Metrics, monitoring
ELK Stack Logs
KEDA Event-based autoscaling (Pub/Sub, Kafka)
Istio Secure traffic control between ML services

🧠 Example: Real Use Case

“Serve a FastAPI model using Kubernetes & Helm”

  1. Create FastAPI prediction app

  2. Build image, push to Docker Hub/GCR

  3. Write Helm chart

  4. Deploy on GKE or Minikube:

helm install iris-predictor ./iris-chart

✅ Best Practices

Practice Reason
Use Helm for reusable templates Easy rollout of multiple models
Use ConfigMaps/Secrets for credentials Avoid hardcoding
Use resource limits Prevent cluster overload
Use liveness/readiness probes Auto-heal unhealthy containers
Version charts and rollback Safe deployments



🧾 What is YAML?

YAML stands for "YAML Ain’t Markup Language" (recursive acronym).

It’s a human-readable data serialization format, often used for configuration files.


πŸ”Ή Where YAML is used?

Tool / Framework YAML Use
Kubernetes Deployment & Service specs (.yaml)
Docker Compose docker-compose.yaml to define multi-container apps
GitHub Actions .github/workflows/*.yml CI/CD pipelines
MLflow, Airflow Task configs, pipelines
Ansible, Helm Infrastructure as Code
KubeFlow ML pipelines definitions
PyTorch Lightning Training configs
Streamlit / Gradio App settings

πŸ”Ή Example: Kubernetes Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-app
  template:
    metadata:
      labels:
        app: ml-app
    spec:
      containers:
        - name: ml-container
          image: yourrepo/ml-model:latest
          ports:
            - containerPort: 5000

✅ YAML Features

  • Indentation defines structure (use spaces, not tabs)

  • Supports:

    • Lists (- item)

    • Key-value pairs (key: value)

    • Nested objects


YAML (YAML Ain’t Markup Language) is a human-readable data serialization language, often used for:

  • Configuration files

  • Kubernetes manifests

  • CI/CD pipelines (e.g., GitHub Actions, GitLab CI)


πŸ” Key Features:

  • Simple syntax (indentation-based, like Python)

  • Supports scalars, lists, dictionaries

  • Comments with #

  • Easily converted to/from JSON


πŸ“˜ YAML Syntax Basics:

1. Key-Value Pairs:

name: Sanjay
age: 25

2. Lists:

skills:
  - Python
  - Docker
  - Kubernetes

3. Nested Objects:

database:
  host: localhost
  port: 5432

4. List of Objects:

users:
  - name: Alice
    role: admin
  - name: Bob
    role: user

5. Anchors & Aliases (reuse blocks):

defaults: &default_settings
  retries: 3
  timeout: 5

api_config:
  <<: *default_settings
  base_url: http://api.example.com

πŸ€– YARN: Yet Another Resource Negotiator

Not related to YAML, though the names sound similar.

✅ What is YARN?

YARN (Yet Another Resource Negotiator) is a core component of Apache Hadoop for cluster resource management.


πŸ› ️ Purpose:

YARN acts as the Resource Manager in Hadoop’s ecosystem. It:

  • Allocates CPU, memory to jobs

  • Manages job scheduling & monitoring

  • Enables running multiple distributed applications (like Spark, MapReduce, Hive) in a single Hadoop cluster


πŸ“¦ Components of YARN:

Component Description
ResourceManager (RM) Central master that allocates resources
NodeManager (NM) Agent on each worker node to monitor resources
ApplicationMaster (AM) Job-specific manager to request resources from RM
Container Actual compute unit (CPU, memory) running the job

🧬 YARN vs Kubernetes

Feature YARN Kubernetes
Origin Big Data (Hadoop Ecosystem) Cloud-native (Containers)
Workload Type Batch jobs (MapReduce, Spark) Microservices, ML, API services
Resource Type CPU & memory (fixed JVMs) Pod-based (containers)
Scaling Manual or with autoscalers Native autoscaling

πŸ“š When to Learn YARN in MLOps?

  • Only if you're working with Hadoop or Spark clusters

  • Most modern MLOps pipelines use Kubernetes or cloud-native platforms instead


✅ Summary:

Term Description
YAML Human-friendly config language (used in K8s, CI/CD, Docker Compose, etc.)
YARN Resource manager in Hadoop ecosystem (used in Spark, MapReduce jobs)



Comments

Popular posts from this blog

Resume Work and Project Details

Time Series and MMM basics

LINEAR REGRESSION