Mlops - II

6. Model Packaging & Reproducibility

Here’s a detailed explanation of how to define environments using requirements.txt and Conda environment files (environment.yml), both of which are crucial in MLOps for ensuring reproducibility, consistency, and portability of ML pipelines.

✅ 1. `requirements.txt` (For `pip`)

📌 Purpose:

Used to list Python packages and their versions to be installed via pip.

📄 Example: `requirements.txt`

numpy==1.24.3
pandas>=1.5.0,<2.0
scikit-learn
mlflow
dvc
tensorflow==2.13.0
matplotlib

💡 Usage:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # on Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Freeze to regenerate file
pip freeze > requirements.txt

✅ When to Use:

You use pip and venv
You want simplicity and lightweight environments
CI/CD pipelines and Docker integrations

✅ 2. `environment.yml` (For `conda`)

📌 Purpose:

Defines a Conda environment, including:

Python version
Pip packages
Conda-specific packages
Channels

📄 Example: `environment.yml`

name: mlops-env
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.10
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - pip
  - pip:
      - mlflow
      - dvc
      - wandb

💡 Usage:

# Create environment
conda env create -f environment.yml

# Activate environment
conda activate mlops-env

# Export current env
conda env export > environment.yml

✅ When to Use:

You're using Anaconda/Miniconda
Need non-Python dependencies (e.g., libgomp, libx11)
Complex data science stacks (GPU support, etc.)

🔍 Side-by-Side Comparison

Feature	`requirements.txt`	`environment.yml`
Package Manager	pip	conda (+ pip)
Language	Python-only	Supports system libs too
Format	Plaintext	YAML
Virtual Env Tool	`venv`, `virtualenv`	`conda`
Portability	Very portable	More robust for data science

💡 Best Practices

✅ Lock dependencies with versions
✅ Always version control these files (commit to Git)
✅ Use requirements-dev.txt for testing tools like pytest, flake8, etc.
✅ Regenerate files after updates using pip freeze or conda env export

🧪 Bonus: Hybrid `conda` + `pip` in ML Projects

Many ML tools (like MLflow, DVC, wandb) are only on PyPI, so you use both:

dependencies:
  - conda dependencies
  - pip:
    - PyPI packages (like wandb, mlflow, etc.)

🐳 What is Docker?

Docker is a platform that allows you to package your application and its dependencies into containers. This helps:

Avoid "it works on my machine" problems
Simplify deployment
Standardize environments

✅ Why Use Docker for ML?

Benefit	Description
Environment Reproducibility	Same code = same result everywhere
Easy Deployment	Deploy models as APIs or batch jobs in any cloud/server
Dependency Isolation	Avoid conflicts between different projects
Portability	Run containers anywhere: local, cloud, CI/CD
Scalability	Combine with Kubernetes or ECS for horizontal scaling

🛠️ Key Docker Concepts

Concept	Description
Dockerfile	Script to build a Docker image
Image	Snapshot of environment and code
Container	Running instance of an image
Volume	Persistent data storage
Ports	Used to expose services (e.g., API)

📄 Sample Dockerfile for ML

# 1. Base Image
FROM python:3.10-slim

# 2. Set working directory
WORKDIR /app

# 3. Copy code and requirements
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# 4. Default command
CMD ["python", "train.py"]

🧪 Example: ML Project Structure

ml-project/
├── Dockerfile
├── requirements.txt
├── train.py
├── model.pkl
└── utils.py

🚀 Build and Run Docker

🔧 Build the Image

docker build -t ml-trainer .

▶️ Run the Container

docker run --rm ml-trainer

🌐 Run with Port Binding (for APIs)

docker run -p 5000:5000 ml-api

📦 Use Case Examples

Use Case	Description
Model Training	`train.py` inside Docker — GPU support if needed
Model Serving	Flask / FastAPI container exposing REST API
Data Pipelines	Combine Docker + Airflow for batch jobs
Jupyter Notebook	Run notebooks inside container with `EXPOSE 8888`
CI/CD Integration	Run tests and training in pipelines

🧠 GPU Support (Optional)

For TensorFlow or PyTorch GPU:

🐍 Dockerfile (PyTorch + CUDA)

FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime

WORKDIR /app
COPY . .
RUN pip install -r requirements.txt

CMD ["python", "train.py"]

Then use:

docker run --gpus all ml-gpu-trainer

🔐 Best Practices

Use .dockerignore to skip unnecessary files (like .git, __pycache__)
Use ENTRYPOINT if you want CLI-style container apps
Keep image size small with slim or alpine base images
Avoid running containers as root (USER app)

📁 .dockerignore Example

__pycache__/
*.pyc
*.pkl
*.csv
.env
.git

🐳 What is a Dockerfile?

A Dockerfile is a script with step-by-step instructions to build a Docker image (your app + environment).

✅ Sample Dockerfile for a ML Project (Training + Inference)

# Use an official Python base image
FROM python:3.10-slim

# Set the working directory inside the container
WORKDIR /app

# Copy local code to the container
COPY . .

# Install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Run the training or inference script
CMD ["python", "train.py"]

📦 Typical File Structure

ml-project/
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── train.py
├── serve.py (Flask/FastAPI)
└── data/

🧪 Example: requirements.txt

pandas
scikit-learn
flask
joblib

🚀 Docker Commands

Command	What it does
`docker build -t ml-app .`	Builds an image named `ml-app`
`docker run ml-app`	Runs a container from `ml-app`
`docker run -v $(pwd)/data:/app/data ml-app`	Mounts local `data/` to container
`docker run -p 5000:5000 ml-app`	Exposes Flask/FastAPI server

🧩 What is Docker Compose?

Docker Compose is a tool to run multi-container apps (like an ML API + a DB + message queue) using a single .yml file.

✅ Sample `docker-compose.yml` for ML App + Postgres + Jupyter

version: "3.8"
services:
  ml-api:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ml-api
    ports:
      - "5000:5000"
    volumes:
      - .:/app
    command: python serve.py

  jupyter:
    image: jupyter/scipy-notebook
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/home/jovyan/work

  db:
    image: postgres:13
    environment:
      POSTGRES_USER: mluser
      POSTGRES_PASSWORD: mlpass
      POSTGRES_DB: mldb
    ports:
      - "5432:5432"

💡 Run Compose

docker-compose up --build

This starts:

ml-api service (Flask/FastAPI)
jupyter service
db (PostgreSQL)

⚙️ Advanced Tips

Tip	Description
`.env` file	Store secrets and config
`depends_on:`	Specify container startup order
`restart: always`	Auto-restart crashed containers
`networks:`	Custom network for services to talk to each other

📁 Optional `.env` File

POSTGRES_USER=mluser
POSTGRES_PASSWORD=mlpass
POSTGRES_DB=mldb

And in docker-compose.yml:

env_file:
  - .env

🔄 Common Use Cases

Use Case	Compose Needed?
Train a model once	❌ Only Dockerfile
Serve API + DB	✅ Yes
Jupyter + Training API	✅ Yes
ML Pipeline in CI/CD	✅ Yes

Here’s a step-by-step guide to building and running ML containers using Docker. This will help you containerize your ML projects — for both training and inference.

✅ 1. Prepare Your ML Project

Assume your project has this structure:

ml-project/
├── Dockerfile
├── requirements.txt
├── train.py
├── serve.py        # (Flask or FastAPI)
├── model.pkl
└── data/

🐳 2. Create Dockerfile

# Use an official Python image
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy files
COPY . .

# Install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Default command
CMD ["python", "train.py"]

📦 3. Create requirements.txt

numpy
pandas
scikit-learn
flask
joblib

🛠️ 4. Build the Docker Image

Open terminal inside the ml-project/ folder and run:

docker build -t ml-container .

-t ml-container → tags the image with name ml-container
. → builds from current directory

▶️ 5. Run the Container

A. Train model:

docker run --name train-ml ml-container

This runs train.py inside a container.

B. Serve model as API:

Modify your Dockerfile to:

CMD ["python", "serve.py"]

Then rebuild and run:

docker build -t ml-api .
docker run -p 5000:5000 ml-api

🔁 Alternatively: Override CMD during runtime

docker run -p 5000:5000 ml-container python serve.py

📂 6. Mount Local Volume (For Data or Models)

docker run -v $(pwd)/data:/app/data ml-container

Mounts your data/ directory into the container

🔍 7. Access Container Logs

docker logs train-ml

🧽 8. Clean Up

docker ps -a               # List all containers
docker rm train-ml         # Remove container
docker rmi ml-container    # Remove image

🌐 9. Testing API Inside Container (Optional)

If serve.py runs a Flask app on port 5000:

curl http://localhost:5000/predict -X POST -H "Content-Type: application/json" -d

'{"features": [1, 2, 3]}'

🧪 Example: Simple `serve.py` for FastAPI

from fastapi import FastAPI
import joblib
import numpy as np

model = joblib.load("model.pkl")
app = FastAPI()

@app.post("/predict")
def predict(features: list):
    prediction = model.predict([features])
    return {"prediction": prediction.tolist()}

💡 BONUS: Run in Detached Mode

docker run -d -p 5000:5000 ml-api

Use docker logs <container_id> to monitor output
-d detaches the container so it runs in background

7. Model Deployment

🚀 What Are Deployment Strategies?

Deployment strategies define how you release a new model (or model version) into production without disrupting users, compromising performance, or introducing bugs.

These strategies often mirror DevOps deployment approaches but are adapted for ML-specific considerations like drift, accuracy, retraining, and latency.

✅ Common Deployment Strategies in MLOps

1. Recreate / Replace

🔁 Takes down the old model, then deploys the new one.
⏳ Downtime expected.

Use When: Non-critical models or low traffic.

kubectl delete deployment old-model
kubectl apply -f new-model.yaml

2. Blue-Green Deployment

🟦 Blue = current production model
🟩 Green = new model deployed alongside
💡 Switch traffic to green after verification.

Pros:

Instant rollback
Zero downtime

Cons:

Requires double resources

Use When: High uptime is critical.

3. Canary Deployment

🐦 Roll out to a small subset of users (e.g., 5%)
Monitor performance (latency, accuracy, user feedback)
Gradually increase traffic

Pros:

Safer than full rollout
Real-time feedback

Use When: Testing model quality, performance, or monitoring risk of concept/data drift.

4. A/B Testing (Shadow or Split Testing)

🅰️ Model A (existing)
🅱️ Model B (new)
Route users randomly (e.g., 50/50) and compare outputs or outcomes

Pros:

Compare performance metrics (conversion rate, accuracy, etc.)
User-driven validation

Use When: Comparing two models for business impact.

5. Shadow Deployment

🕵️ Serve predictions silently in the background
New model runs on real-time data but does not affect users
Log and compare predictions vs live model

Pros:

No risk to user
Evaluate on real-world data

Cons:

Doubles compute load

Use When: Auditing, regulatory review, or performance benchmarking.

6. Multi-Armed Bandit

Like A/B testing, but uses adaptive traffic allocation
Routes more traffic to the better-performing model dynamically

Use When: Want to maximize reward during testing phase (e.g., maximize clicks or accuracy).

7. Rolling Update

Gradually update pods or services
Often used with Kubernetes (Deployment strategy)

Use When: Containerized models using tools like Kubernetes or Docker Swarm

📦 Tools to Support Model Deployment

Tool	Use Case
FastAPI / Flask	Serving ML models as REST APIs
MLflow / TorchServe	Model packaging and deployment
Docker + Kubernetes	Scalable containerized deployment
Seldon Core / KFServing	K8s-native ML model deployment
Triton Inference Server	Optimized inference for deep learning
TensorFlow Serving	Serving TensorFlow models in production

💡 Best Practices

✅ Always monitor model performance post-deployment (latency, accuracy, drift)
✅ Use feature stores to ensure consistency in training vs inference
✅ Automate deployment through CI/CD (GitHub Actions, Jenkins, etc.)
✅ Use rollback mechanisms (Blue-Green, Canary) in case of failure

✅ Batch Inference vs Real-Time Inference

Feature	Batch Inference	Real-Time Inference
Definition	Predictions are made on bulk data at once	Predictions are made instantly per request
Latency	High (minutes to hours)	Low (milliseconds to seconds)
Trigger	Scheduled (e.g., daily, hourly)	On-demand (API call, event, UI trigger)
Use Cases	- Monthly credit risk scoring- Email spam tagging- Customer churn scoring	- Fraud detection- Chatbots- Product recommendations
Deployment Mode	Often offline / serverless batch jobs	Usually via REST API or streaming systems
Cost Efficiency	More efficient at scale	Expensive if traffic is high
Examples	Run via Airflow, Spark, DVC pipelines	Run via FastAPI, Flask, TensorFlow Serving

📦 Example:

Batch Inference: Every night at 2 AM, predict churn for 10 million customers and store results in a database.
Real-Time Inference: When a user logs in, instantly recommend 5 products based on their activity.

✅ Online Inference vs Offline Inference

These are broader categories related to when and how predictions are generated and delivered.

Feature	Offline Inference	Online Inference
Definition	Predictions are precomputed & stored	Predictions are computed on-the-fly
When Used	Before the user needs it	When the user triggers it
Model Execution	Happens ahead of time	Happens per request
Data Source	Static snapshot	Real-time features (API, current session)
Storage	Predictions are stored in DB, files, etc.	Predictions returned directly to UI/API
Use Cases	- Risk scoring- Lead prioritization- Email categorization	- Self-driving car inputs- Voice assistants- Stock prediction dashboard

🔁 Relationship with Batch/Real-Time:

	Offline	Online
Batch	✅ Yes (classic use)	❌ No
Real-Time	❌ No	✅ Yes (classic use)

Offline + Batch = Nightly scoring for marketing
Online + Real-Time = Instant fraud detection on transactions

🧠 Summary Diagram

            +-----------------------+-----------------------+
            |        Batch          |       Real-Time       |
+-----------+-----------------------+-----------------------+
| Offline   | ✔ Scheduled Scoring   | ✘ Not Common          |
|           | ✔ Stored Predictions  |                       |
+-----------+-----------------------+-----------------------+
| Online    | ✘ Not Practical       | ✔ Instant Prediction  |
|           |                       | ✔ API-based           |
+-----------+-----------------------+-----------------------+

📌 How to Choose?

Criteria	Prefer Batch/Offline	Prefer Real-Time/Online
Latency-critical?	❌ No	✅ Yes
Prediction volume	✅ High (millions at once)	❌ One at a time
Data freshness	❌ Static features	✅ Real-time data needed
Infrastructure	✅ Cheaper and easier	❌ Needs always-on API + low latency
Examples	Marketing, churn scoring	Chatbot, recommender, fraud alerts

🚀 What is Flask?

Flask is a lightweight, easy-to-use Python web framework for building APIs and web applications. In MLOps, it is widely used to serve ML models as REST APIs so they can be accessed in real-time by applications or other services.

✅ Key Features of Flask

Simple and minimalistic
Great for small to medium ML projects
Easily integrates with machine learning libraries (scikit-learn, TensorFlow, PyTorch)
Supports REST API routes
Can be containerized (Docker) and deployed to cloud

🔁 Flask Workflow for ML Model Deployment

Train your ML model
Save it using pickle, joblib, or a framework's native format (like model.h5 for Keras)
Create a Flask API app
Load the model and expose it via an endpoint (e.g., /predict)
Test the API using tools like Postman or Curl
Deploy it using Docker, AWS EC2, or other platforms

📦 Sample Flask App for Model Deployment

`model.pkl` — Trained ML model file (e.g., a scikit-learn model)

`app.py`

from flask import Flask, request, jsonify
import pickle
import numpy as np

# Load the trained model
model = pickle.load(open("model.pkl", "rb"))

# Initialize Flask app
app = Flask(__name__)

# Define home route
@app.route('/')
def home():
    return "Welcome to the ML Model API!"

# Define predict route
@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)  # Get JSON payload
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    return jsonify({'prediction': prediction.tolist()})

# Run the Flask app
if __name__ == '__main__':
    app.run(debug=True)

🧪 Test Request Example

curl -X POST http://127.0.0.1:5000/predict \
-H "Content-Type: application/json" \
-d '{"features": [6.2, 3.4, 5.4, 2.3]}'

📁 Typical Flask Project Structure

ml-flask-app/
│
├── model.pkl
├── app.py
├── requirements.txt
└── Dockerfile  (optional for containerization)

🧰 requirements.txt

Flask==2.3.2
numpy
scikit-learn

🐳 Optional: Dockerfile to Containerize

FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]

🚀 What is FastAPI?

FastAPI is a high-performance web framework for building APIs with Python 3.7+ based on ASGI (Asynchronous Server Gateway Interface). It's designed for speed and includes automatic data validation, type hints, and auto-generated Swagger documentation.

✅ Why Use FastAPI for ML?

Feature	FastAPI Advantage
⚡ Speed	Faster than Flask
🔐 Data validation	Automatic (via Pydantic)
🧪 Interactive Docs	Built-in Swagger & ReDoc
🤖 Async support	Native support for async I/O
📦 Dependency Injection	Built-in and clean

🧠 ML Model Deployment with FastAPI – End-to-End Example

1. `model.pkl`: Trained ML model (e.g., scikit-learn)

Save your model using:

import pickle
pickle.dump(model, open("model.pkl", "wb"))

2. `main.py`: FastAPI App Code

from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
import pickle

# Load model
model = pickle.load(open("model.pkl", "rb"))

# Initialize app
app = FastAPI(title="ML Model Inference API")

# Define input schema
class Features(BaseModel):
    features: list[float]

# Root endpoint
@app.get("/")
def read_root():
    return {"message": "Welcome to the FastAPI ML model server!"}

# Predict endpoint
@app.post("/predict")
def predict(data: Features):
    features = np.array(data.features).reshape(1, -1)
    prediction = model.predict(features)
    return {"prediction": prediction.tolist()}

3. `requirements.txt`

fastapi
uvicorn
numpy
scikit-learn
pydantic

4. 🧪 Run Locally

Install dependencies:

pip install -r requirements.txt

Run the API:

uvicorn main:app --reload

You can now visit:

Swagger UI: http://127.0.0.1:8000/docs
Redoc: http://127.0.0.1:8000/redoc

5. 🧪 Sample Test (via `curl` or Postman)

curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"features\": [5.1, 3.5, 1.4, 0.2]}"

🐳 Optional: Dockerfile to Containerize FastAPI App

FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

🔥 Bonus: Swagger UI Automatically Available

Thanks to FastAPI, you get Swagger UI without extra effort — excellent for testing, collaboration, or exposing to product teams.

✅ Flask vs FastAPI Quick Recap

Feature	Flask	FastAPI
Speed	Slower	⚡ Faster
Type Validation	Manual	✅ Automatic
Async Support	❌ No	✅ Yes
Docs	❌ Add-ons	✅ Built-in
Production Ready	✅ Yes	✅ Yes

✅ 1. What is Model Serving?

Model serving is the process of exposing a trained ML model via an API so other applications (like a web or mobile app) can send input data and receive predictions in real-time.

✅ 2. Flask vs FastAPI: Quick Comparison

Feature	Flask	FastAPI
Performance	Slower (sync)	Faster (async, Starlette + Pydantic)
Type Hint Support	Optional	Fully supports and requires it
Validation	Manual or with Flask-RESTful	Built-in via Pydantic
Best For	Simpler, legacy projects	High-performance APIs, production ML APIs
Learning Curve	Easier for beginners	Slightly steeper but modern

✅ 3. Serving ML Model Using Flask (Example)

🧪 File: `model.pkl`

Assume you’ve trained and saved a model using joblib or pickle.

🚀 Flask API Code:

from flask import Flask, request, jsonify
import joblib

# Load the model
model = joblib.load("model.pkl")

# Create the app
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = [data['feature1'], data['feature2']]
    prediction = model.predict([features])
    return jsonify({'prediction': prediction[0]})

if __name__ == '__main__':
    app.run(debug=True)

➕ cURL Test:

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" \
     -d '{"feature1": 1.5, "feature2": 3.2}'

✅ 4. Serving ML Model Using FastAPI (Example)

⚙️ Install FastAPI + Uvicorn:

pip install fastapi uvicorn joblib

🚀 FastAPI Code:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

# Load the model
model = joblib.load("model.pkl")

# Define request schema
class InputData(BaseModel):
    feature1: float
    feature2: float

# Initialize app
app = FastAPI()

@app.post("/predict")
def predict(data: InputData):
    features = [[data.feature1, data.feature2]]
    prediction = model.predict(features)
    return {"prediction": prediction[0]}

🔥 Run with Uvicorn:

uvicorn your_script_name:app --reload

🧪 Open Swagger UI:

Visit: http://localhost:8000/docs

✅ 5. Production Enhancements

Area	Suggestions
Security	OAuth2 / JWT, rate limiting, API keys
Monitoring	Prometheus + Grafana, request logging
Error Handling	Graceful 4xx/5xx responses
Scaling	Use Docker, gunicorn (Flask), or uvicorn workers
CI/CD	Use GitHub Actions, Jenkins, or CodePipeline

✅ Summary

Category	Flask	FastAPI
Simplicity	Easier for starters	Modern, fast, robust
Speed	Slower (WSGI)	Faster (ASGI + async I/O)
Validation	Manual	Built-in with Pydantic
Use Cases	POCs, simple APIs	High-performance ML API, production use

📌 What is TensorFlow Serving?

TensorFlow Serving is a flexible, high-performance model server for deploying machine learning models built with TensorFlow.

✅ It allows you to:

Serve models over a gRPC or RESTful API
Dynamically load new models (versioning support)
Scale easily in production

🧠 Why TensorFlow Serving?

Feature	Benefit
🚀 High performance	Optimized for inference speed
📦 Model versioning	Serve multiple versions
🔄 Hot swapping	No downtime model updates
💻 Protocols	Supports REST & gRPC
⚙️ TensorFlow native	Designed specifically for TF models

🛠️ Core Concepts

1. Model Format

You need to export your TensorFlow model using the SavedModel format:

model.save("my_model/1")

Directory structure should look like:

my_model/
└── 1/
    ├── saved_model.pb
    └── variables/

2. Model Versioning

Each version is a numbered folder (1/, 2/, etc.)
You can host multiple versions, and TensorFlow Serving will use the latest by default.

🚀 Serving a Model using Docker

Step 1: Export model in SavedModel format

import tensorflow as tf

model = tf.keras.models.load_model("my_model.h5")
tf.saved_model.save(model, "exported_model/1")

Step 2: Pull TensorFlow Serving Docker image

docker pull tensorflow/serving

Step 3: Run the container

docker run -p 8501:8501 \
  --mount type=bind,source=$(pwd)/exported_model,target=/models/my_model \
  -e MODEL_NAME=my_model \
  -t tensorflow/serving

📌 This serves your model at: http://localhost:8501/v1/models/my_model:predict

🔗 REST API Example

Send a request using curl:

curl -X POST http://localhost:8501/v1/models/my_model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

Response:

{
  "predictions": [[0.1, 0.9]]
}

🧰 Optional: Use gRPC (Advanced)

You can also use gRPC instead of REST for lower latency. This requires protobuf stubs and a gRPC client.

🧪 Testing

Test endpoint:

curl http://localhost:8501/v1/models/my_model

Response:

{
  "model_version_status": [...],
  "model_name": "my_model"
}

📁 Docker Folder Structure Summary

project/
├── exported_model/
│   └── 1/
│       ├── saved_model.pb
│       └── variables/
└── Dockerfile (optional if customizing serving)

🏗️ Production Integration Ideas

Use NGINX to reverse proxy for secure API gateway
Deploy on Kubernetes (e.g., with TFServing Helm chart)
Integrate with Prometheus/Grafana for monitoring
Use Triton for multi-framework support (TF + PyTorch)

🔥 1. TorchServe

🧾 What is it?

TorchServe is the official model serving framework for PyTorch developed by AWS and Facebook.

🚀 Key Features

Feature	Description
✅ Native PyTorch support	Built for PyTorch models specifically
📦 Model archiver (`.mar`)	Package model + code + config in `.mar` file
🌐 REST & gRPC APIs	Serve models using standard APIs
🔄 Model versioning	Load/unload multiple versions
📊 Metrics & logs	Prometheus integration, model logs
🔧 Custom handlers	Customize preprocessing/postprocessing logic

⚙️ How it works

Archive your model using torch-model-archiver:

torch-model-archiver --model-name resnet18 \
 --version 1.0 \
 --model-file model.py \
 --serialized-file resnet18.pt \
 --handler image_classifier

Serve the model:

torchserve --start --model-store model_store --models resnet=resnet18.mar

Inference via REST:

curl http://127.0.0.1:8080/predictions/resnet -T image.jpg

📂 Directory structure

project/
├── model_store/
│   └── resnet18.mar
├── model.py
└── config.properties

🔁 Model Versioning

You can serve multiple .mar files with different versions and configure them in config.properties.

🥡 2. BentoML

🧾 What is it?

BentoML is a framework-agnostic platform to build, package, and deploy machine learning models as microservices.

⚡ Best for when you want flexibility + clean API with customization + auto Docker build + YAML config.

🚀 Key Features

Feature	Description
🔁 Framework-agnostic	Supports PyTorch, TensorFlow, XGBoost, etc.
🧰 CLI & Python SDK	Easy to package and serve
🐳 Docker auto-pack	Generates container with one command
🪄 BentoML "Services"	Write custom API logic using Python
🧪 Local dev server	`bentoml serve` with hot reload
🔄 Model registry	Built-in local model management

📦 Sample Service (for PyTorch)

# service.py
import torch
import bentoml
from bentoml.io import NumpyNdarray
from torchvision import models, transforms

model_ref = bentoml.pytorch.load_model("resnet50:latest")

@bentoml.service()
class ResNetService:
    @bentoml.api(input=NumpyNdarray(), output=NumpyNdarray())
    def predict(self, arr):
        tensor = torch.tensor(arr).unsqueeze(0)
        return model_ref(tensor).detach().numpy()

🔧 CLI Commands

bentoml serve service:ResNetService
bentoml build
bentoml containerize ResNetService:latest
bentoml deploy  # for cloud platforms like AWS Lambda, K8s

🥊 TorchServe vs BentoML

Feature	TorchServe	BentoML
🧠 Framework Support	Only PyTorch	Framework-agnostic
🛠️ Custom API logic	Handlers	Native Python + FastAPI-style syntax
🐳 Dockerization	Manual (or use custom Dockerfile)	Auto Dockerfile generation
🌐 REST/gRPC Support	REST/gRPC	REST (gRPC planned)
🧪 Local Dev Experience	Moderate	Excellent (hot reloads, CLI tools)
🧱 Model Registry	`.mar` archives	Local Bento Store + Cloud Registry
🧰 Monitoring & Metrics	Prometheus support	Prometheus + optional integrations
☁️ Cloud deployment	Manual	AWS Lambda, K8s, SageMaker ready

✅ When to Use What?

Use Case	Tool
You need official, production-grade PyTorch model serving	TorchServe
You want to serve any ML model with full control & flexibility	BentoML
You want integrated Docker build and REST API in Python	BentoML
You prefer configuration over code with strict control	TorchServe

🔍 What is AWS Lambda?

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. You just upload your code, and Lambda takes care of everything required to run and scale it.

💡 Great for lightweight ML inference, automation, data preprocessing, and event-driven tasks.

⚙️ Core Concepts

Concept	Description
Function	The unit of deployment in Lambda (your code + configuration)
Event	Triggers that invoke Lambda (e.g., API Gateway, S3, SNS)
Handler	Entry point of the Lambda function
Runtime	Language-specific execution environment (Python, Node.js, etc.)
Timeout	Max execution time (default 3 sec, max 15 mins)
Memory	128 MB to 10 GB, affects CPU power & pricing

🔬 Use Cases in MLOps

Use Case	Example
🔁 Model Inference (small models)	Serve XGBoost/LightGBM/TinyML
🧹 Data Cleaning Pipelines	Preprocess uploaded data from S3
🔔 Event-Driven Triggers	Trigger retraining when new data is added
📥 Batch Prediction	Predict on small batches via API
📬 Slack/Email Alerts	Auto alert on pipeline failures
📂 Glue/Athena Orchestration	Trigger downstream processes

🧪 Sample Python Lambda Function (ML Inference)

# lambda_function.py
import json
import joblib
import numpy as np

# Load your model (ensure it's small or load from S3)
model = joblib.load("/opt/model.pkl")

def lambda_handler(event, context):
    data = json.loads(event['body'])
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': prediction.tolist()})
    }

📦 Packaging ML Models for Lambda

Package code + dependencies + model (must be < 250MB unzipped):

project/
├── lambda_function.py
├── model.pkl
├── requirements.txt

Zip and deploy:

pip install -r requirements.txt -t ./package
cp lambda_function.py model.pkl ./package
cd package && zip -r ../lambda.zip .

Upload lambda.zip to AWS Lambda console or via AWS CLI.

🚀 Triggering Lambda

Trigger Type	Use
🔗 API Gateway	Create REST endpoint for inference
📁 S3 Upload	Trigger when new file added
🔄 CloudWatch	Run on schedule (like cron job)
💬 SNS Topic	Alert on specific event
🔀 Step Functions	As part of ML pipeline

🧊 AWS Lambda Limitations

Limitation	Value
Max runtime	15 mins
Max memory	10 GB
Max package size	250 MB (unzipped)
No GPU support	❌

Use AWS SageMaker or ECS for large models/GPU inference.

🔐 Permissions (IAM Role)

Make sure the Lambda has a proper execution role with:

S3 read/write (if using S3)
CloudWatch logs
(Optional) Secrets Manager or Parameter Store access

🧠 Best Practices

Tip	Description
✅ Use `/opt` layer for models	Separate large model files into Lambda Layers
🐳 Local test with Docker	Use AWS SAM CLI
🧩 Combine with API Gateway	For RESTful inference API
📏 Optimize size	Use `scikit-learn`, `joblib`, `lightgbm` etc. with care
🔁 Use async if needed	For faster parallel executions

🔄 Alternatives for Larger Models

Tool	Use When
AWS SageMaker Endpoint	Large model, GPU needed
ECS with Docker	Custom ML containers
EKS (Kubernetes)	Complex ML infra with scaling
BentoML + Lambda	BentoML can export Lambdas directly

🚀 What is Google Cloud Functions?

Google Cloud Functions is a serverless execution environment on Google Cloud Platform (GCP). You deploy code snippets that automatically run in response to events like HTTP requests, Pub/Sub messages, or Cloud Storage triggers.

⚡ It’s similar to AWS Lambda — perfect for lightweight ML inference, real-time data handling, or orchestrating workflows.

🧠 When to Use GCF in MLOps?

Use Case	Example
✅ Lightweight model inference	Deploy Scikit-learn or XGBoost models
✅ Event-driven automation	Retrain when new data arrives in GCS
✅ Preprocessing pipelines	Clean/validate data on upload
✅ Alerts & notifications	Trigger Slack/email alerts on failure
✅ API interface for ML models	REST endpoint to serve predictions

📦 Supported Runtimes

Language	Status
Python	✅ (3.7–3.11)
Node.js	✅
Go, Java	✅
.NET, Ruby	✅

🛠️ Components of a Cloud Function

Component	Description
Trigger	What invokes the function (HTTP, Cloud Pub/Sub, GCS, etc.)
Entry Point	Your main function/method
Dependencies	Listed in `requirements.txt`
Memory & Timeout	Configurable up to 16 GB & 60 min

🔁 MLOps Use Case Example — Inference API with Scikit-learn

`main.py`

import functions_framework
import joblib
import numpy as np
from flask import jsonify, request

model = joblib.load("model.pkl")

@functions_framework.http
def predict(request):
    try:
        data = request.get_json()
        features = np.array(data['features']).reshape(1, -1)
        prediction = model.predict(features)
        return jsonify({'prediction': prediction.tolist()})
    except Exception as e:
        return jsonify({'error': str(e)}), 500

`requirements.txt`

flask
joblib
numpy
scikit-learn

🛠️ Deploying the Function

Authenticate and set project:

gcloud auth login
gcloud config set project <your-gcp-project-id>

Deploy:

gcloud functions deploy predict \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --memory=1024MB \
  --entry-point=predict

Test via curl or Postman:

curl -X POST <your-function-url> \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

📁 Directory Structure

project/
├── main.py
├── requirements.txt
├── model.pkl

🔐 IAM & Permissions

For public HTTP functions, use --allow-unauthenticated
For private triggers, manage IAM with roles like:
- roles/cloudfunctions.invoker
- roles/pubsub.subscriber
- roles/storage.objectViewer

⚙️ Triggers Supported

Trigger Type	Use Case
HTTP	ML inference APIs
Cloud Storage	Run when new data is uploaded
Pub/Sub	Trigger model retraining
Firestore/BigQuery	Event-based ETL
Cloud Scheduler	Scheduled jobs (e.g., batch prediction)

⚡ Limitations

Attribute	Limit
Max timeout	60 mins
Max memory	16 GB
Max deployment package	500 MB zipped
GPU support	❌ Not available
Cold start	⏱️ ~1s typical

For heavy ML workloads, consider Cloud Run or Vertex AI.

✅ Best Practices

Practice	Tip
Use lightweight models	scikit-learn, XGBoost (small)
Separate logic from I/O	Helps in testing & scaling
Logging	Use `print()` or `logging` for Cloud Logging
Test locally	With Functions Framework
Model versioning	Use GCS to store & load models dynamically

🔁 Cloud Functions vs Cloud Run vs Vertex AI

Feature	Cloud Functions	Cloud Run	Vertex AI
Serverless	✅	✅	✅
ML Inference	✅ (small)	✅ (larger)	✅
GPU Support	❌	❌	✅
Cold Starts	Yes	Less frequent	Depends
Custom Docker	❌	✅	✅

🧠 Tips for MLOps Workflows

Store models in Google Cloud Storage
Use Pub/Sub to automate retraining on new data
Integrate with Vertex AI Pipelines for orchestration
Use Secrets Manager for secure API keys or credentials
Monitor with Cloud Logging & Cloud Monitoring

🚀 What is Kubernetes?

Kubernetes (K8s) is an open-source platform for automating deployment, scaling, and management of containerized applications.

In MLOps, it's widely used for model serving, training pipelines, autoscaling workloads, and managing distributed ML systems.

🎯 Why Use Kubernetes in MLOps?

Feature	Benefit
✅ Scalability	Scale model inference under load
✅ Portability	Works across cloud/on-prem
✅ Isolation	Manage separate environments (prod/dev/staging)
✅ Reproducibility	Define infra as code (YAML)
✅ Rollback	Revert broken versions easily
✅ Scheduling	Schedule batch jobs, training runs
✅ GPU Support	Yes, with node pools

🎩 What is Helm?

Helm is a package manager for Kubernetes — like pip for Python or apt for Ubuntu.

It lets you:

Define Kubernetes manifests as templates
Version and package deployments as charts
Manage configuration with values.yaml

⚙️ Kubernetes + Helm Workflow for MLOps

👇 Step-by-Step Breakdown:

1. 🐳 Containerize Your Model

Create a Dockerfile for your model/app:

FROM python:3.11-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "serve.py"]

2. ☸️ Create Kubernetes Manifests

Basic structure:

deployment.yaml – app definition
service.yaml – expose internally or via LoadBalancer
ingress.yaml – optional (for HTTP routing)

Example: deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
        - name: ml-container
          image: your-dockerhub/ml-model:latest
          ports:
            - containerPort: 5000
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

3. 📦 Helm Chart Structure

helm create ml-model

Generated structure:

ml-model/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml

You can now template your YAML using values.yaml:

templates/deployment.yaml (Helm-style):

spec:
  replicas: {{ .Values.replicaCount }}
  containers:
    - image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

values.yaml

replicaCount: 2
image:
  repository: your-dockerhub/ml-model
  tag: latest

4. 🚀 Deploy using Helm

# Install Helm Chart
helm install ml-model ./ml-model

# Upgrade version
helm upgrade ml-model ./ml-model

# Uninstall
helm uninstall ml-model

🔧 Expose the App

Use ClusterIP for internal services (e.g., other ML microservices)
Use LoadBalancer or Ingress for external REST APIs

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 5000
  selector:
    app: ml-model

📊 MLOps-Specific Add-ons

Tool	Use
KubeFlow	End-to-end ML pipelines
Prometheus + Grafana	Metrics, monitoring
ELK Stack	Logs
KEDA	Event-based autoscaling (Pub/Sub, Kafka)
Istio	Secure traffic control between ML services

🧠 Example: Real Use Case

“Serve a FastAPI model using Kubernetes & Helm”

Create FastAPI prediction app
Build image, push to Docker Hub/GCR
Write Helm chart
Deploy on GKE or Minikube:

helm install iris-predictor ./iris-chart

✅ Best Practices

Practice	Reason
Use Helm for reusable templates	Easy rollout of multiple models
Use ConfigMaps/Secrets for credentials	Avoid hardcoding
Use resource limits	Prevent cluster overload
Use liveness/readiness probes	Auto-heal unhealthy containers
Version charts and rollback	Safe deployments

🧾 What is YAML?

YAML stands for "YAML Ain’t Markup Language" (recursive acronym).

It’s a human-readable data serialization format, often used for configuration files.

🔹 Where YAML is used?

Tool / Framework	YAML Use
Kubernetes	Deployment & Service specs (`.yaml`)
Docker Compose	`docker-compose.yaml` to define multi-container apps
GitHub Actions	`.github/workflows/*.yml` CI/CD pipelines
MLflow, Airflow	Task configs, pipelines
Ansible, Helm	Infrastructure as Code
KubeFlow	ML pipelines definitions
PyTorch Lightning	Training configs
Streamlit / Gradio	App settings

🔹 Example: Kubernetes Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-app
  template:
    metadata:
      labels:
        app: ml-app
    spec:
      containers:
        - name: ml-container
          image: yourrepo/ml-model:latest
          ports:
            - containerPort: 5000

✅ YAML Features

Indentation defines structure (use spaces, not tabs)
Supports:
- Lists (- item)
- Key-value pairs (key: value)
- Nested objects

YAML (YAML Ain’t Markup Language) is a human-readable data serialization language, often used for:

Configuration files
Kubernetes manifests
CI/CD pipelines (e.g., GitHub Actions, GitLab CI)

🔍 Key Features:

Simple syntax (indentation-based, like Python)
Supports scalars, lists, dictionaries
Comments with #
Easily converted to/from JSON

📘 YAML Syntax Basics:

1. Key-Value Pairs:

name: Sanjay
age: 25

2. Lists:

skills:
  - Python
  - Docker
  - Kubernetes

3. Nested Objects:

database:
  host: localhost
  port: 5432

4. List of Objects:

users:
  - name: Alice
    role: admin
  - name: Bob
    role: user

5. Anchors & Aliases (reuse blocks):

defaults: &default_settings
  retries: 3
  timeout: 5

api_config:
  <<: *default_settings
  base_url: http://api.example.com

🤖 YARN: Yet Another Resource Negotiator

Not related to YAML, though the names sound similar.

✅ What is YARN?

YARN (Yet Another Resource Negotiator) is a core component of Apache Hadoop for cluster resource management.

🛠️ Purpose:

YARN acts as the Resource Manager in Hadoop’s ecosystem. It:

Allocates CPU, memory to jobs
Manages job scheduling & monitoring
Enables running multiple distributed applications (like Spark, MapReduce, Hive) in a single Hadoop cluster

📦 Components of YARN:

Component	Description
ResourceManager (RM)	Central master that allocates resources
NodeManager (NM)	Agent on each worker node to monitor resources
ApplicationMaster (AM)	Job-specific manager to request resources from RM
Container	Actual compute unit (CPU, memory) running the job

🧬 YARN vs Kubernetes

Feature	YARN	Kubernetes
Origin	Big Data (Hadoop Ecosystem)	Cloud-native (Containers)
Workload Type	Batch jobs (MapReduce, Spark)	Microservices, ML, API services
Resource Type	CPU & memory (fixed JVMs)	Pod-based (containers)
Scaling	Manual or with autoscalers	Native autoscaling

📚 When to Learn YARN in MLOps?

Only if you're working with Hadoop or Spark clusters
Most modern MLOps pipelines use Kubernetes or cloud-native platforms instead

✅ Summary:

Term	Description
YAML	Human-friendly config language (used in K8s, CI/CD, Docker Compose, etc.)
YARN	Resource manager in Hadoop ecosystem (used in Spark, MapReduce jobs)

Mlops - II

6. Model Packaging & Reproducibility

✅ 1. requirements.txt (For pip)

📌 Purpose:

📄 Example: requirements.txt

💡 Usage:

✅ When to Use:

✅ 2. environment.yml (For conda)

📌 Purpose:

📄 Example: environment.yml

💡 Usage:

✅ When to Use:

🔍 Side-by-Side Comparison

💡 Best Practices

🧪 Bonus: Hybrid conda + pip in ML Projects

🐳 What is Docker?

✅ Why Use Docker for ML?

🛠️ Key Docker Concepts

📄 Sample Dockerfile for ML

🧪 Example: ML Project Structure

🚀 Build and Run Docker

🔧 Build the Image

▶️ Run the Container

🌐 Run with Port Binding (for APIs)

📦 Use Case Examples

🧠 GPU Support (Optional)

🐍 Dockerfile (PyTorch + CUDA)

🔐 Best Practices

📁 .dockerignore Example

🐳 What is a Dockerfile?

✅ Sample Dockerfile for a ML Project (Training + Inference)

📦 Typical File Structure

🧪 Example: requirements.txt

🚀 Docker Commands

🧩 What is Docker Compose?

✅ Sample docker-compose.yml for ML App + Postgres + Jupyter

💡 Run Compose

⚙️ Advanced Tips

📁 Optional .env File

🔄 Common Use Cases

✅ 1. Prepare Your ML Project

🐳 2. Create Dockerfile

📦 3. Create requirements.txt

🛠️ 4. Build the Docker Image

▶️ 5. Run the Container

A. Train model:

B. Serve model as API:

🔁 Alternatively: Override CMD during runtime

📂 6. Mount Local Volume (For Data or Models)

🔍 7. Access Container Logs

🧽 8. Clean Up

🌐 9. Testing API Inside Container (Optional)

🧪 Example: Simple serve.py for FastAPI

💡 BONUS: Run in Detached Mode

7. Model Deployment

🚀 What Are Deployment Strategies?

✅ Common Deployment Strategies in MLOps

1. Recreate / Replace

2. Blue-Green Deployment

3. Canary Deployment

4. A/B Testing (Shadow or Split Testing)

5. Shadow Deployment

6. Multi-Armed Bandit

7. Rolling Update

📦 Tools to Support Model Deployment

💡 Best Practices

✅ Batch Inference vs Real-Time Inference

📦 Example:

✅ Online Inference vs Offline Inference

🔁 Relationship with Batch/Real-Time:

🧠 Summary Diagram

📌 How to Choose?

🚀 What is Flask?

✅ Key Features of Flask

🔁 Flask Workflow for ML Model Deployment

📦 Sample Flask App for Model Deployment

model.pkl — Trained ML model file (e.g., a scikit-learn model)

app.py

🧪 Test Request Example

📁 Typical Flask Project Structure

✅ 1. `requirements.txt` (For `pip`)

📄 Example: `requirements.txt`

✅ 2. `environment.yml` (For `conda`)

📄 Example: `environment.yml`

🧪 Bonus: Hybrid `conda` + `pip` in ML Projects

✅ Sample `docker-compose.yml` for ML App + Postgres + Jupyter

📁 Optional `.env` File

🧪 Example: Simple `serve.py` for FastAPI

`model.pkl` — Trained ML model file (e.g., a scikit-learn model)

`app.py`

1. `model.pkl`: Trained ML model (e.g., scikit-learn)

2. `main.py`: FastAPI App Code

3. `requirements.txt`

5. 🧪 Sample Test (via `curl` or Postman)

🧪 File: `model.pkl`