Tensorflow, Pytorch, Flask, Fast API, Mongo DB, Agentic AI, Recommendation engines, Kubernates, Spark, Grafana

 

TensorFlow – Beginner-Friendly Complete Notes


1. Introduction to TensorFlow

  • What is TensorFlow?

    • Open-source machine learning framework by Google.

    • Used for deep learning, ML, and numerical computation.

    • Works on CPU, GPU, TPU.

  • Key Features:

    • Easy model building (Keras high-level API).

    • Runs on multiple devices.

    • Large ecosystem (TensorBoard, TFLite, TF Serving).


2. Installation

pip install tensorflow

Check version:

import tensorflow as tf
print(tf.__version__)

3. Basic Building Blocks

a) Tensors

  • Tensors = multi-dimensional arrays (like NumPy but GPU-friendly).

x = tf.constant([[1,2],[3,4]])
print(x)  # 2D tensor
  • Tensor Ranks:

    • Scalar (0D), Vector (1D), Matrix (2D), Higher dimensions.

b) Variables

  • Trainable tensors, used to store weights.

w = tf.Variable([0.5, 1.0])

c) Operations

  • Math ops on tensors.

a = tf.constant([1,2,3])
b = tf.constant([4,5,6])
print(tf.add(a,b))   # [5 7 9]

4. TensorFlow vs NumPy

  • NumPy: CPU only, no automatic differentiation.

  • TensorFlow: Works on GPU, supports automatic differentiation.

import numpy as np
np_arr = np.array([1,2,3])
tf_tensor = tf.convert_to_tensor(np_arr)

5. TensorFlow Workflow (Step by Step)

Step 1: Import Data

  • Built-in datasets in tf.keras.datasets.

from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Step 2: Preprocess Data

  • Normalize and reshape.

x_train = x_train / 255.0
x_test = x_test / 255.0

Step 3: Build Model

  • Use Sequential API (simple stack of layers).

from tensorflow.keras import models, layers

model = models.Sequential([
    layers.Flatten(input_shape=(28,28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

Step 4: Compile Model

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Step 5: Train Model

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Step 6: Evaluate

model.evaluate(x_test, y_test)

6. TensorFlow Model APIs

a) Sequential API

  • Linear stack of layers.

  • Best for simple models.

b) Functional API

  • More flexible (multi-input/output, non-linear graphs).

inputs = layers.Input(shape=(28,28))
x = layers.Flatten()(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs, outputs)

c) Subclassing API

  • Full control using Python classes.


7. Common Layers

  • Dense – Fully connected.

  • Conv2D, MaxPooling2D – For images.

  • LSTM, GRU – For sequences/text.

  • Dropout – Prevent overfitting.

  • BatchNormalization – Normalize activations.


8. Training Essentials

Optimizers

  • SGD – Simple gradient descent.

  • Adam – Most common, adaptive learning rate.

Loss Functions

  • Regression → mse

  • Classification → binary_crossentropy, categorical_crossentropy.

Metrics

  • Accuracy, Precision, Recall, F1.


9. Callbacks

  • Add functionality during training.

from tensorflow.keras.callbacks import EarlyStopping
cb = EarlyStopping(patience=3, restore_best_weights=True)
model.fit(x_train, y_train, epochs=20, callbacks=[cb])
  • Common Callbacks:

    • EarlyStopping

    • ModelCheckpoint

    • TensorBoard


10. Saving and Loading Models

# Save
model.save("my_model.h5")

# Load
from tensorflow.keras.models import load_model
model = load_model("my_model.h5")

11. TensorBoard (Visualization)

  • Tool to visualize training (loss, accuracy, graphs).

tensorboard --logdir=logs/

12. TensorFlow Ecosystem

  • TensorFlow Lite (TFLite) → For mobile/IoT.

  • TensorFlow.js → For running ML in browser.

  • TF Serving → For deployment.

  • TF Hub → Pre-trained models.


13. Example: End-to-End Classification

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import fashion_mnist

# Load data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0

# Build model
model = models.Sequential([
    layers.Flatten(input_shape=(28,28)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

# Compile
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate
print(model.evaluate(x_test, y_test))

14. Tips for Beginners

  • Start with Sequential API before Functional.

  • Use callbacks to avoid overfitting.

  • Normalize data always.

  • Experiment with different optimizers and learning rates.

  • Visualize results with TensorBoard.


15. Interview Quick Recap

  • TensorFlow = ML framework by Google.

  • Tensors = multidimensional arrays.

  • APIs: Sequential, Functional, Subclassing.

  • Common optimizers: SGD, Adam.

  • Loss: MSE (regression), CrossEntropy (classification).

  • Ecosystem: TF Lite, TF.js, TF Hub, TensorBoard.




PyTorch – Beginner-Friendly Complete Notes


1. Introduction to PyTorch

  • What is PyTorch?

    • Open-source deep learning framework by Facebook (Meta).

    • Flexible, pythonic, widely used in research.

    • Supports CPU & GPU.

  • Key Features:

    • Dynamic computation graph (eager execution).

    • Strong community for research + production.

    • Integration with NumPy and Python libraries.


2. Installation

pip install torch torchvision torchaudio

Check version:

import torch
print(torch.__version__)

3. Core Building Blocks

a) Tensors

  • Like NumPy arrays, but with GPU acceleration.

import torch
x = torch.tensor([[1, 2], [3, 4]])
print(x)
  • Check device:

print(x.device)   # cpu by default
  • Move to GPU (if available):

if torch.cuda.is_available():
    x = x.to("cuda")

b) Operations

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(a + b)  # tensor([5, 7, 9])

c) Autograd (Automatic Differentiation)

  • PyTorch tracks gradients for optimization.

w = torch.tensor(2.0, requires_grad=True)
y = w**2
y.backward()
print(w.grad)  # dy/dw = 4

4. PyTorch vs TensorFlow

  • PyTorch: Dynamic graph (easy debugging, flexible).

  • TensorFlow: Static + Eager (optimized for deployment).

  • PyTorch is favored for research, TensorFlow more for production.


5. Workflow in PyTorch

Step 1: Dataset

  • Torch has dataset utilities in torchvision.datasets.

from torchvision import datasets, transforms

transform = transforms.ToTensor()
train_data = datasets.MNIST(root="data", train=True, transform=transform, download=True)

Step 2: DataLoader

  • Helps in batching & shuffling data.

from torch.utils.data import DataLoader

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)

Step 3: Define Model

  • Using nn.Module (base class for all models).

import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)      # flatten
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Step 4: Define Loss and Optimizer

model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Step 5: Training Loop

for epoch in range(5):
    for images, labels in train_loader:
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

Step 6: Evaluation

correct, total = 0, 0
with torch.no_grad():  # no gradient calculation
    for images, labels in train_loader:
        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

print("Accuracy:", correct / total)

6. PyTorch Components

a) Tensors

  • torch.tensor(), torch.zeros(), torch.ones(), torch.rand().

b) Autograd

  • requires_grad=True tracks gradients.

  • tensor.backward() computes gradients.

c) Optimizers

  • torch.optim.SGD, torch.optim.Adam, torch.optim.RMSprop.

d) Loss Functions

  • nn.MSELoss() → Regression.

  • nn.CrossEntropyLoss() → Classification.

e) Modules & Layers

  • nn.Linear → Fully connected.

  • nn.Conv2d, nn.MaxPool2d → CNN layers.

  • nn.LSTM, nn.GRU → RNN layers.

  • nn.Dropout, nn.BatchNorm2d.


7. GPU/Device Management

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

Move data:

images, labels = images.to(device), labels.to(device)

8. Saving and Loading Models

# Save
torch.save(model.state_dict(), "model.pth")

# Load
model = SimpleNN()
model.load_state_dict(torch.load("model.pth"))
model.eval()

9. PyTorch Ecosystem

  • torchvision → Computer vision datasets & models.

  • torchaudio → Audio ML.

  • torchtext → NLP datasets & models.

  • PyTorch Lightning → High-level training framework.

  • ONNX → Export models for deployment.


10. Example: End-to-End Classification

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data
transform = transforms.ToTensor()
train_data = datasets.FashionMNIST(root="data", train=True, transform=transform, download=True)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)

# Model
class FashionNN(nn.Module):
    def __init__(self):
        super(FashionNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        return self.fc2(x)

model = FashionNN()

# Loss + Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training
for epoch in range(3):
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss={loss.item():.4f}")

11. Tips for Beginners

  • Always check .shape of tensors.

  • Remember to flatten images before feeding into fully connected layers.

  • Use with torch.no_grad() during evaluation.

  • Use .to(device) to move model/data to GPU.

  • Start with simple models, then try CNNs/RNNs.


12. Interview Quick Recap

  • PyTorch = deep learning framework by Meta.

  • Tensors = multi-dimensional arrays with GPU support.

  • Autograd = automatic differentiation.

  • Models → subclass nn.Module.

  • Optimizers → SGD, Adam.

  • Loss → MSE (regression), CrossEntropy (classification).

  • Ecosystem → torchvision, torchaudio, torchtext, PyTorch Lightning.




MongoDB Complete Notes (Beginner Friendly)


1. What is MongoDB?

  • MongoDB is an open-source NoSQL database that stores data in JSON-like documents (called BSON = Binary JSON).

  • Unlike relational databases (SQL), MongoDB does not require predefined schemas.

  • It's highly scalable, flexible, and works perfectly for modern data apps, APIs, and ML pipelines.


2. Key Features

NoSQL – document-oriented
Schema-less – flexible data model
High performance – fast read/write
Scalable – supports sharding & replication
JSON-style documents
Powerful query language
Integration with Python, Flask, Django, etc.


3. Basic Concepts

Term Description SQL Equivalent
Database Group of collections Database
Collection Group of documents Table
Document JSON-like data record Row
Field Key-value pair in document Column
_id Unique identifier for each document Primary key

4. MongoDB Architecture Overview

+-------------------------------------------------+
|                     MongoDB                     |
|-------------------------------------------------|
| Database → Collection → Document (JSON format)  |
| Example:                                        |
| db.users.insertOne({name: "Sanjay", age: 28})  |
+-------------------------------------------------+

5. Installation

๐Ÿงฉ Option 1: Local Setup

๐Ÿงฉ Option 2: MongoDB Atlas (Cloud)

  • Go to https://cloud.mongodb.com

  • Create cluster → Connect → Copy connection string (e.g.)

    mongodb+srv://username:password@cluster.mongodb.net/test
    

6. MongoDB Data Format Example

{
  "_id": 1,
  "name": "John",
  "age": 30,
  "skills": ["Python", "ML", "Flask"],
  "address": { "city": "Bangalore", "pincode": 560001 }
}

✅ Nested JSON
✅ Arrays supported
✅ Flexible structure


7. Basic MongoDB Commands

Command Description
show dbs List all databases
use mydb Switch/create database
show collections List collections
db.createCollection("users") Create a collection
db.users.insertOne({...}) Insert single document
db.users.insertMany([...]) Insert multiple documents
db.users.find() View all documents
db.users.findOne() View first document
db.users.updateOne() Update single record
db.users.deleteOne() Delete single record
db.dropDatabase() Delete database

8. CRUD Operations

➤ Create

db.students.insertOne({
  name: "Amit",
  age: 22,
  course: "Data Science"
})

➤ Read

db.students.find()
db.students.find({age: {$gt: 20}})
db.students.find({course: "Data Science"}, {name: 1, _id: 0})

➤ Update

db.students.updateOne(
  { name: "Amit" },
  { $set: { age: 23 } }
)

➤ Delete

db.students.deleteOne({ name: "Amit" })

9. Query Operators

Operator Meaning Example
$gt Greater than {age: {$gt: 25}}
$lt Less than {age: {$lt: 25}}
$eq Equal {age: {$eq: 30}}
$ne Not equal {age: {$ne: 25}}
$in In list {city: {$in: ["Delhi","Mumbai"]}}
$and Logical AND {$and:[{age:{$gt:20}},{city:"Pune"}]}
$or Logical OR {$or:[{age:{$lt:20}},{city:"Delhi"}]}

10. Indexing

Used to speed up queries.

db.users.createIndex({name: 1})
db.users.getIndexes()

11. Aggregation Framework

Aggregation = data processing pipelines (like SQL GROUP BY).

Example:

db.sales.aggregate([
  { $match: { region: "Asia" } },
  { $group: { _id: "$country", totalSales: { $sum: "$amount" } } },
  { $sort: { totalSales: -1 } }
])

12. Connection with Python (pymongo)

Install:

pip install pymongo

Connect:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["students"]

Insert:

collection.insert_one({"name": "Riya", "age": 21})

Fetch:

for s in collection.find():
    print(s)

Query:

result = collection.find({"age": {"$gt": 20}})
for r in result:
    print(r)

Update:

collection.update_one({"name": "Riya"}, {"$set": {"age": 22}})

Delete:

collection.delete_one({"name": "Riya"})

13. MongoDB with Flask (Example)

from flask import Flask, request, jsonify
from pymongo import MongoClient

app = Flask(__name__)
client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["users"]

@app.route("/add", methods=["POST"])
def add_user():
    data = request.json
    collection.insert_one(data)
    return jsonify({"message": "User added successfully"})

@app.route("/users", methods=["GET"])
def get_users():
    users = list(collection.find({}, {"_id": 0}))
    return jsonify(users)

if __name__ == "__main__":
    app.run(debug=True)

Access:

  • POST /add with JSON body

  • GET /users to fetch users


14. Data Modeling Best Practices

✅ Use embedded documents for one-to-few relationships
✅ Use references for one-to-many relationships
✅ Keep document size < 16MB
✅ Use indexes on frequently queried fields
✅ Avoid unnecessary nesting


15. Replication & Sharding (Advanced Concepts)

Concept Description
Replication Copies data across multiple servers for high availability
Primary Node Receives all writes
Secondary Node Copies data from primary
Sharding Splits large data into horizontal partitions (scaling)

16. MongoDB vs SQL Summary

SQL MongoDB
Structured schema Schema-less
Tables, Rows Collections, Documents
Joins Embedded/Nested documents
SQL Queries BSON + MongoDB Query Language
Vertical scaling Horizontal scaling
Slower for large data Faster for unstructured data

17. MongoDB Atlas Example (Cloud)

from pymongo import MongoClient

client = MongoClient("mongodb+srv://<username>:<password>@cluster0.mongodb.net/")
db = client["sales_db"]
sales = db["transactions"]

sales.insert_one({"region": "Asia", "amount": 2000})
for doc in sales.find():
    print(doc)

18. Common Commands Recap

Operation Command
Create DB use mydb
Create Collection db.createCollection("users")
Insert db.users.insertOne({name:"Amit"})
Read db.users.find()
Update db.users.updateOne()
Delete db.users.deleteOne()
Drop Collection db.users.drop()

19. Integrations

MongoDB integrates with:

  • Flask, Django, FastAPI

  • Pandas (via pymongo or mongoengine)

  • ML pipelines (store model metadata)

  • Airflow, Streamlit, etc.


20. Quick Summary

Topic Key Point
Type NoSQL (Document-based)
Format JSON-like BSON
Query Language Mongo Query Language (MQL)
Key Libraries pymongo, mongoengine
Best Use Case Dynamic data, APIs, logs, ML storage
Cloud Option MongoDB Atlas



๐Ÿ”น Model Context Protocol (MCP) – In Machine Learning

A Model Context Protocol refers to the way information, metadata, or inputs are structured and passed to a machine learning model so that:

  1. The model understands the input properly.

  2. The output can be interpreted or used consistently.

Think of it as the rules of communication between your model and its environment (data pipeline, serving system, or API).


1. Why Do We Need a Model Context Protocol?

  • Models don’t work in isolation; they need:

    • Input format (what data, how structured).

    • Context (user info, history, environment).

    • Output format (what the model returns, how it’s consumed).

  • MCP ensures standardization → makes model reusable, debuggable, and deployable.


2. What Does It Include?

A typical model context protocol includes:

  1. Input Schema

    • Feature names, types, dimensions.

    • Example: {"user_id": int, "age": float, "clicked_items": list}

  2. Context

    • Additional info that influences predictions.

    • Example: time of day, device type, location.

  3. Model Metadata

    • Model version, training data info, assumptions.

    • Example: "version": "1.2.3", "trained_on": "MovieLens 1M"

  4. Output Schema

    • Structure of prediction.

    • Example: {"recommended_item": str, "confidence": float}


3. Example (Recommendation System MCP)

Input Context Protocol:

{
  "user_id": 123,
  "session_features": {
    "time_of_day": "evening",
    "device": "mobile"
  },
  "interaction_history": [45, 67, 89]  // item IDs
}

Model Output:

{
  "recommended_items": [101, 202, 303],
  "confidence_scores": [0.92, 0.85, 0.80],
  "model_version": "v1.0.5"
}

This ensures every service consuming the model knows exactly what to send and expect back.


4. Protocols in Real World

  • TensorFlow Serving → Uses gRPC/REST with JSON or Protobuf schemas.

  • TorchServe → Defines handler classes for input-output schemas.

  • ONNX Runtime → Standardized model format across frameworks.

  • MLOps Systems (Kubeflow, MLflow, Seldon) → Rely heavily on context protocols for reproducibility.


5. Interview Quick Recap

  • MCP = contract between model and environment.

  • Defines input schema, context info, output schema.

  • Needed for scaling, deploying, and debugging ML models.

  • Real-world implementations → TensorFlow Serving, TorchServe, ONNX, MLflow.




Agentic AI – Beginner-Friendly Notes


1. What is Agentic AI?

  • Agentic AI = AI systems that can act as “agents.”

  • Unlike traditional models (which just take input → give output), agentic AI:

    1. Perceives the environment (via data, sensors, APIs).

    2. Plans actions (chooses strategy or sequence of steps).

    3. Acts on the environment (via tools, APIs, physical systems).

    4. Learns & adapts based on feedback.

In short: Agentic AI doesn’t just answer, it does things autonomously.


2. Why is it Important?

  • Moves AI from passive assistantsactive problem-solvers.

  • Can execute multi-step workflows, not just answer single queries.

  • Key for autonomous research, robotics, personalized assistants, and business automation.


3. Core Components of Agentic AI

  1. Perception

    • Collects information (from text, images, sensors, APIs).

  2. Memory

    • Short-term memory (conversation context).

    • Long-term memory (stored knowledge, databases).

  3. Planning & Reasoning

    • Breaks complex goals into smaller steps.

    • Uses chain-of-thought or planning algorithms.

  4. Tools & Actions

    • Can call APIs, run code, browse web, query databases.

  5. Feedback & Learning

    • Evaluates actions, updates strategy.


4. Techniques Behind Agentic AI

  • LLM + Tools (Tool-Use)

    • LLM calls external tools (calculator, search engine, database).

  • Reasoning + Planning

    • Approaches like Tree of Thoughts, ReAct (Reason + Act).

  • Multi-Agent Systems

    • Several AI agents collaborate (research agent, writing agent, coding agent).

  • Reinforcement Learning (RL)

    • Agents learn optimal actions via trial & error.

  • Memory Augmentation

    • Vector databases (Pinecone, FAISS) to recall past interactions.


5. Examples of Agentic AI

  • AutoGPT: LLM agent that autonomously executes tasks.

  • LangChain Agents: Orchestrate LLMs + tools.

  • ChatGPT with browsing/code interpreter: Uses external tools.

  • Robotic agents: AI agents that can move robots (self-driving cars, drones).

  • Enterprise AI Agents: Automate workflows (customer service, report generation).


6. Comparison

Type Traditional AI Agentic AI
Input/Output Fixed Q → A Dynamic, context-driven
Autonomy No Yes
Tool Usage Limited Uses APIs, tools
Planning None Multi-step reasoning
Adaptability Low High

7. Challenges in Agentic AI

  • Hallucination risk → Wrong actions.

  • Safety & alignment → Ensure AI follows human values.

  • Reliability → Needs guardrails to avoid harmful actions.

  • Scalability → Costly if not optimized.

  • Evaluation → Harder to test compared to static models.


8. Applications

  • Personal assistants (schedule meetings, send emails).

  • Business automation (generate reports, analyze markets).

  • Research (autonomous discovery, literature review).

  • Healthcare (monitor patients, suggest treatments).

  • Robotics (self-driving cars, drones, warehouse robots).


9. Future of Agentic AI

  • More collaborative AI ecosystems (multi-agent teams).

  • Safe & explainable reasoning mechanisms.

  • Integration with IoT & robotics → fully autonomous systems.

  • Potential to become co-workers, not just tools.


10. Interview Quick Recap

  • Agentic AI = AI systems that perceive, plan, act, and learn.

  • Core: Perception, Memory, Planning, Tools, Feedback.

  • Techniques: ReAct, Tree of Thoughts, RL, Multi-agent.

  • Examples: AutoGPT, LangChain, ChatGPT (with tools).

  • Challenges: Hallucinations, safety, evaluation.

  • Applications: Assistants, automation, robotics, research.




Streamlit Complete Notes (Beginner-Friendly)

1. What is Streamlit?

  • Streamlit is an open-source Python framework for building interactive web apps for data science, ML, and visualization.

  • No need to know frontend (HTML/CSS/JS).

  • Just write Python and deploy as a web app.


2. Installation

pip install streamlit

Check version:

streamlit --version

Run an app:

streamlit run app.py

3. Basic App

app.py

import streamlit as st

st.title("Hello Streamlit")
st.write("This is my first Streamlit app")

Run:

streamlit run app.py

Opens in browser at http://localhost:8501.


4. Streamlit Basics

Text and Titles

st.title("Title")
st.header("Header")
st.subheader("Subheader")
st.text("Simple text")
st.markdown("**Bold with Markdown**")

Data Display

import pandas as pd

df = pd.DataFrame({"Name": ["A", "B"], "Age": [23, 34]})
st.dataframe(df)   # Interactive table
st.table(df)       # Static table

Charts

import matplotlib.pyplot as plt
import numpy as np

st.line_chart(df)  # Simple chart
st.bar_chart(df["Age"])

5. User Input Widgets

name = st.text_input("Enter your name:")
age = st.number_input("Enter age", min_value=0, max_value=100)
gender = st.radio("Select gender", ["Male", "Female"])
hobby = st.multiselect("Choose hobbies", ["Reading", "Gaming", "Sports"])
submit = st.button("Submit")

if submit:
    st.write(f"Hello {name}, Age: {age}, Gender: {gender}, Hobby: {hobby}")

6. File Upload

uploaded_file = st.file_uploader("Upload CSV", type="csv")
if uploaded_file:
    df = pd.read_csv(uploaded_file)
    st.dataframe(df.head())

7. Layouts

  • Sidebar for navigation:

st.sidebar.title("Options")
choice = st.sidebar.radio("Menu", ["Home", "About"])
if choice == "Home":
    st.write("Welcome to Home")
else:
    st.write("About Page")
  • Columns:

col1, col2 = st.columns(2)
col1.write("Left Side")
col2.write("Right Side")
  • Tabs:

tab1, tab2 = st.tabs(["Data", "Charts"])
with tab1:
    st.write("Data Section")
with tab2:
    st.write("Charts Section")

8. Caching (for performance)

@st.cache_data
def load_data():
    df = pd.read_csv("big_data.csv")
    return df

9. Machine Learning Demo

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier()
model.fit(X, y)

st.title("Iris Flower Prediction")

sepal_length = st.slider("Sepal Length", 4.0, 8.0, 5.0)
sepal_width = st.slider("Sepal Width", 2.0, 4.5, 3.0)
petal_length = st.slider("Petal Length", 1.0, 7.0, 4.0)
petal_width = st.slider("Petal Width", 0.1, 2.5, 1.0)

prediction = model.predict([[sepal_length, sepal_width, petal_length, petal_width]])
st.write("Predicted Class:", iris.target_names[prediction][0])

10. Deploy Streamlit App

  • Use Streamlit Community Cloud (free).
    Steps:

  1. Push app code to GitHub.

  2. Go to https://streamlit.io/cloud.

  3. Connect GitHub repo.

  4. Deploy.

Alternative: Deploy on Heroku, Render, AWS, or GCP.


11. Example Mini Project: Sales Dashboard

import pandas as pd
import streamlit as st
import plotly.express as px

st.title("Sales Dashboard")

uploaded_file = st.file_uploader("Upload Sales CSV", type="csv")
if uploaded_file:
    df = pd.read_csv(uploaded_file)
    st.write("Data Preview:", df.head())

    fig = px.line(df, x="Date", y="Sales", title="Sales Over Time")
    st.plotly_chart(fig)

    st.metric("Total Sales", df["Sales"].sum())

12. Key Takeaways

  • Streamlit = Python → Web App (no frontend needed).

  • Supports charts, ML models, dashboards, file uploads.

  • Very beginner-friendly and fast to prototype.

  • Free hosting on Streamlit Cloud.







Flask Complete Notes (Beginner-Friendly)

1. What is Flask?

  • Flask is a lightweight Python web framework used to build web applications and APIs.

  • Known as a “micro-framework” because it doesn’t come with built-in tools like database ORM or authentication — you add only what you need.

  • Great for beginners, prototyping, and even production apps.


2. Installing Flask

pip install flask

Check installation:

import flask
print(flask.__version__)

3. Basic Flask App

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello, Flask!"

if __name__ == "__main__":
    app.run(debug=True)
  • Flask(__name__) → creates an app object.

  • @app.route("/") → defines the URL route.

  • app.run(debug=True) → starts server with auto-reload and error tracking.

Run app:

python app.py

Visit: http://127.0.0.1:5000


4. Routing

  • Add multiple pages by defining routes:

@app.route("/about")
def about():
    return "This is the About Page"
  • Dynamic routes:

@app.route("/user/<name>")
def user(name):
    return f"Hello, {name}!"

5. Templates (HTML Integration)

Flask uses Jinja2 templates to render HTML.

Folder structure:

project/
  app.py
  templates/
    index.html

app.py:

from flask import render_template

@app.route("/")
def home():
    return render_template("index.html", name="Sanjay")

templates/index.html:

<!DOCTYPE html>
<html>
<head><title>Flask App</title></head>
<body>
  <h1>Hello, {{ name }}</h1>
</body>
</html>

6. Static Files (CSS, JS, Images)

Folder structure:

project/
  static/
    style.css

HTML usage:

<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">

7. Forms and User Input

from flask import request

@app.route("/login", methods=["GET", "POST"])
def login():
    if request.method == "POST":
        username = request.form["username"]
        return f"Welcome, {username}"
    return '''
        <form method="post">
            <input name="username">
            <input type="submit">
        </form>
    '''

8. REST API with Flask

from flask import jsonify

@app.route("/api/data")
def get_data():
    return jsonify({"name": "Sanjay", "role": "Data Scientist"})

9. Flask with Database (SQLite Example)

import sqlite3
from flask import g

DATABASE = "test.db"

def get_db():
    db = getattr(g, "_database", None)
    if db is None:
        db = g._database = sqlite3.connect(DATABASE)
    return db

@app.route("/add")
def add_data():
    db = get_db()
    db.execute("INSERT INTO users (name) VALUES (?)", ("Sanjay",))
    db.commit()
    return "User added!"

10. Flask Extensions (Popular)

  • Flask-SQLAlchemy → Database ORM

  • Flask-Login → Authentication

  • Flask-RESTful → Build APIs easily

  • Flask-WTF → Form handling

  • Flask-Mail → Send emails


11. Deploying Flask

  • Local run: python app.py

  • Production (Gunicorn):

pip install gunicorn
gunicorn -w 4 app:app
  • Can deploy on Heroku, Render, AWS, GCP, or Railway.


12. Mini Project Example (Hello API + Webpage)

app.py:

from flask import Flask, render_template, jsonify

app = Flask(__name__)

@app.route("/")
def home():
    return render_template("index.html", name="Flask Learner")

@app.route("/api")
def api():
    return jsonify({"message": "This is Flask API!"})

if __name__ == "__main__":
    app.run(debug=True)

index.html:

<h1>Welcome {{ name }}</h1>
<p>Check API at <a href="/api">/api</a></p>

13. Key Points Summary

  • Flask = Lightweight, flexible Python web framework.

  • Uses routes for pages and APIs.

  • Templates + static files = frontend support.

  • Extensions add extra power (DB, login, forms).

  • Easy to deploy anywhere.



Perfect ⚡ — FastAPI is a must-learn framework for anyone working in Data Science, ML, or backend APIs because it’s modern, fast, and production-ready.
Here’s your complete, beginner-friendly, end-to-end FastAPI notes ๐Ÿ‘‡


FastAPI Complete Notes (Beginner Friendly)


1. What is FastAPI?

FastAPI is a modern, fast (high-performance) Python framework used for building APIs and backend services.

✅ Built on Starlette (for web) and Pydantic (for data validation)
✅ Designed for speed and type safety
✅ Ideal for Machine Learning APIs, microservices, and real-time data systems


2. Why Use FastAPI?

Feature Description
๐Ÿš€ Fast Built on ASGI → handles requests asynchronously
๐Ÿง  Data validation Uses Pydantic models for strict schema validation
⚙️ Automatic docs Swagger UI and Redoc auto-generated
๐Ÿ” Easy integration Works well with SQL, NoSQL, ML, and OAuth2
๐Ÿ“ฆ Modern syntax Type hints and async/await supported
๐Ÿงฉ Great for ML Perfect for deploying ML models as REST APIs

3. Installation

pip install fastapi uvicorn

fastapi → API framework
uvicorn → ASGI server to run your app


4. Create Your First FastAPI App

๐Ÿ“„ main.py

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def home():
    return {"message": "Hello, FastAPI!"}

Run the app:

uvicorn main:app --reload

Now visit:
๐Ÿ‘‰ http://127.0.0.1:8000 → API output
๐Ÿ‘‰ http://127.0.0.1:8000/docs → Swagger UI
๐Ÿ‘‰ http://127.0.0.1:8000/redoc → ReDoc UI


5. HTTP Methods

Method Usage Example
GET Read data /users
POST Create new data /users
PUT Update entire record /users/{id}
PATCH Update part of record /users/{id}
DELETE Delete record /users/{id}

Example:

@app.get("/items/{item_id}")
def get_item(item_id: int):
    return {"item_id": item_id}

6. Query Parameters

@app.get("/search/")
def search_items(q: str, limit: int = 5):
    return {"query": q, "limit": limit}

➡️ Access like: http://127.0.0.1:8000/search?q=apple&limit=10


7. Request Body with Pydantic

Used for validating and structuring input JSON.

from pydantic import BaseModel

class Item(BaseModel):
    name: str
    price: float
    in_stock: bool = True

@app.post("/items/")
def create_item(item: Item):
    return {"item_name": item.name, "price": item.price}

Input JSON example:

{
  "name": "Laptop",
  "price": 80000
}

8. Path Parameters + Validation

from fastapi import Path

@app.get("/users/{user_id}")
def read_user(user_id: int = Path(..., gt=0, description="User ID must be > 0")):
    return {"user_id": user_id}

9. Handling Query + Path Together

@app.get("/products/{product_id}")
def read_product(product_id: int, q: str | None = None):
    if q:
        return {"product_id": product_id, "query": q}
    return {"product_id": product_id}

10. Response Models (Structured Output)

class User(BaseModel):
    id: int
    name: str
    email: str

@app.get("/user/{id}", response_model=User)
def get_user(id: int):
    return {"id": id, "name": "Sanjay", "email": "sanjay@example.com"}

11. Handling Errors

from fastapi import HTTPException

@app.get("/divide")
def divide(a: float, b: float):
    if b == 0:
        raise HTTPException(status_code=400, detail="Division by zero not allowed")
    return {"result": a / b}

12. Dependency Injection

Used to manage reusable logic like authentication, DB connections, etc.

from fastapi import Depends

def get_token_header(token: str):
    if token != "abc123":
        raise HTTPException(status_code=403, detail="Invalid token")
    return token

@app.get("/secure-data/")
def read_secure_data(token: str = Depends(get_token_header)):
    return {"data": "Secure content!"}

13. Middleware

Middleware intercepts requests/responses.

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

14. Connect FastAPI with MongoDB

from fastapi import FastAPI
from pymongo import MongoClient
from pydantic import BaseModel

app = FastAPI()
client = MongoClient("mongodb://localhost:27017/")
db = client["fastapi_db"]
collection = db["users"]

class User(BaseModel):
    name: str
    age: int

@app.post("/add_user")
def add_user(user: User):
    collection.insert_one(user.dict())
    return {"message": "User added"}

@app.get("/users")
def get_users():
    users = list(collection.find({}, {"_id": 0}))
    return {"users": users}

15. FastAPI + Machine Learning Example

from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np

app = FastAPI()

# Load model
model = pickle.load(open("model.pkl", "rb"))

class InputData(BaseModel):
    feature1: float
    feature2: float
    feature3: float

@app.post("/predict")
def predict(data: InputData):
    features = np.array([[data.feature1, data.feature2, data.feature3]])
    prediction = model.predict(features)
    return {"prediction": float(prediction[0])}

Run with:

uvicorn main:app --reload

➡️ Try it at http://127.0.0.1:8000/docs


16. Include Routers (for large projects)

from fastapi import APIRouter

router = APIRouter()

@router.get("/info")
def info():
    return {"info": "Sub-router working!"}

app.include_router(router, prefix="/api")

17. Authentication (Basic Example)

from fastapi.security import OAuth2PasswordBearer

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

@app.get("/secure/")
def secure_data(token: str = Depends(oauth2_scheme)):
    return {"token": token}

18. Static Files & Templates

from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from fastapi import Request

app.mount("/static", StaticFiles(directory="static"), name="static")
templates = Jinja2Templates(directory="templates")

@app.get("/home")
def home(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})

19. Common FastAPI Commands

Command Description
uvicorn main:app --reload Run development server
/docs Swagger documentation
/redoc Alternative documentation
Ctrl+C Stop server
pip install "uvicorn[standard]" Install complete server dependencies

20. Best Practices

✅ Use Pydantic models for input/output
✅ Keep routers in separate files for modular code
✅ Add CORS middleware for frontend integration
✅ Implement logging & error handling
✅ Use async functions for I/O-heavy operations
✅ Deploy using Docker / Gunicorn / Uvicorn


21. Deployment (Production)

Option 1: Using Uvicorn + Gunicorn

pip install "uvicorn[standard]" gunicorn
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker

Option 2: Docker

FROM python:3.10
WORKDIR /app
COPY . .
RUN pip install fastapi uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

22. Integration Possibilities

Tool Integration Use
MongoDB / SQLAlchemy Database
Pandas / Numpy Data analysis
Scikit-learn / XGBoost ML model prediction APIs
Streamlit / React Frontend UI
Docker / K8s Deployment
Prometheus API performance monitoring


Awesome ๐Ÿ”ฅ — PySpark is one of the most powerful tools for big data processing, ETL pipelines, and distributed ML.
Here’s your complete, beginner-friendly, end-to-end PySpark notes ๐Ÿ‘‡


PySpark Complete Notes (Beginner Friendly)


1. What is PySpark?

PySpark is the Python API for Apache Spark, a powerful open-source framework used for big data processing, analysis, and machine learning across distributed clusters.

✅ Built on Apache Spark
✅ Handles large-scale data (GBs → TBs) efficiently
✅ Works on clusters (parallel computation)
✅ Supports DataFrames, SQL, MLlib, Streaming


2. Why Use PySpark?

Feature Description
Speed 100x faster than traditional MapReduce
๐Ÿงฉ Scalable Handles terabytes/petabytes of data
๐Ÿง  Easy API Python-like DataFrame operations
๐Ÿ’พ Supports multiple data sources CSV, JSON, Parquet, HDFS, S3
๐Ÿค– Machine Learning support Spark MLlib
☁️ Integrates with AWS EMR, Databricks, Hadoop, Google Dataproc

3. PySpark Architecture

+----------------------------------------------------------+
|                        PySpark                           |
|----------------------------------------------------------|
| Driver Program (main code)                               |
|   ↓                                                      |
| SparkContext → Cluster Manager → Executors (workers)     |
| Each executor runs tasks on partitions of data            |
+----------------------------------------------------------+
  • Driver: Your Python script that controls the job.

  • Executor: Worker nodes that perform computations.

  • Cluster Manager: Allocates resources (e.g., YARN, Mesos, Kubernetes).


4. Installation

Local Installation:

pip install pyspark

Check version:

import pyspark
print(pyspark.__version__)

5. Starting PySpark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MyFirstSparkApp") \
    .getOrCreate()

print(spark)

To stop session:

spark.stop()

6. Create a DataFrame

From Python data:

data = [("Alice", 25), ("Bob", 30), ("Cathy", 27)]
columns = ["Name", "Age"]

df = spark.createDataFrame(data, columns)
df.show()

Output:

+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
|  Bob| 30|
|Cathy| 27|
+-----+---+

7. Read / Write Data

Operation Example
Read CSV df = spark.read.csv("data.csv", header=True, inferSchema=True)
Read JSON df = spark.read.json("data.json")
Read Parquet df = spark.read.parquet("data.parquet")
Write CSV df.write.csv("output/", header=True)

8. Basic DataFrame Operations

df.printSchema()   # View schema
df.columns         # Get column names
df.describe().show()  # Summary stats
df.select("Name").show()
df.filter(df.Age > 25).show()
df.groupBy("Age").count().show()
df.orderBy("Age", ascending=False).show()

9. Add / Rename / Drop Columns

from pyspark.sql.functions import col, lit

df = df.withColumn("Country", lit("India"))         # Add column
df = df.withColumnRenamed("Age", "Years")           # Rename column
df = df.drop("Country")                             # Drop column

10. Handling Missing Data

df.na.drop().show()                    # Drop null rows
df.na.fill({"Age": 0}).show()          # Fill nulls
df.na.replace("Unknown", "N/A").show() # Replace values

11. SQL with PySpark

Register DataFrame as temporary SQL table:

df.createOrReplaceTempView("people")

result = spark.sql("SELECT Name, Age FROM people WHERE Age > 25")
result.show()

12. PySpark Functions

Import frequently used functions:

from pyspark.sql.functions import *

df.select(upper(col("Name")), col("Age") + 5).show()
df.withColumn("AgeGroup", when(col("Age") > 25, "Adult").otherwise("Young")).show()

13. Joins in PySpark

df1.join(df2, on="id", how="inner")
df1.join(df2, on="id", how="left")
df1.join(df2, on="id", how="right")
df1.join(df2, on="id", how="outer")

14. Aggregations

df.groupBy("Country").agg(
    count("*").alias("Count"),
    avg("Age").alias("AvgAge")
).show()

15. Working with Dates

from pyspark.sql.functions import current_date, year, month, dayofmonth

df = df.withColumn("today", current_date())
df = df.withColumn("year", year(col("today")))
df.show()

16. User Defined Functions (UDFs)

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

def greeting(name):
    return "Hello " + name

greet_udf = udf(greeting, StringType())
df = df.withColumn("Greet", greet_udf(col("Name")))
df.show()

17. Machine Learning with PySpark (MLlib)

Example: Linear Regression

from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler

data = [(1, 2.0, 3.0), (2, 3.0, 5.0), (3, 4.0, 7.0)]
columns = ["id", "feature", "label"]
df = spark.createDataFrame(data, columns)

assembler = VectorAssembler(inputCols=["feature"], outputCol="features")
train_data = assembler.transform(df)

lr = LinearRegression(featuresCol="features", labelCol="label")
model = lr.fit(train_data)

print(model.coefficients, model.intercept)

18. PySpark with Pandas

Convert Spark DataFrame to Pandas:

pandas_df = df.toPandas()

Convert Pandas DataFrame to Spark:

spark_df = spark.createDataFrame(pandas_df)

19. Partitioning & Parallelism

  • Spark divides data into partitions to process in parallel.

  • Check partitions:

    df.rdd.getNumPartitions()
    
  • Repartition:

    df = df.repartition(4)
    

20. Saving & Loading Models

model.save("lr_model")
loaded_model = LinearRegression.load("lr_model")

21. Integration with AWS / GCP

Platform Method
AWS S3 spark.read.csv("s3a://bucket/file.csv")
Google Cloud Storage spark.read.csv("gs://bucket/file.csv")
Hadoop HDFS spark.read.csv("hdfs://path/file.csv")

22. Performance Tips

✅ Use Parquet instead of CSV (columnar & compressed)
✅ Use filter() early (predicate pushdown)
✅ Cache DataFrames with .cache() for reuse
✅ Avoid too many small files
✅ Use broadcast joins for small lookup tables


23. PySpark Data Types

PySpark Type Equivalent Python Type
StringType() str
IntegerType() int
DoubleType() float
BooleanType() bool
TimestampType() datetime

Example:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

schema = StructType([
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True)
])
df = spark.createDataFrame(data, schema)

24. Common PySpark Functions

Function Purpose
col() Access column
lit() Add constant value
when() Conditional column
count(), sum(), avg() Aggregations
regexp_extract() Regex matching
concat_ws() String concatenation
explode() Flatten array column

25. Example: End-to-End ETL Pipeline

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession.builder.appName("ETL Example").getOrCreate()

# Read
df = spark.read.csv("sales.csv", header=True, inferSchema=True)

# Transform
df_clean = df.filter(col("Amount").isNotNull())
df_final = df_clean.groupBy("Region").agg(sum("Amount").alias("TotalSales"))

# Load
df_final.write.csv("output/sales_summary", header=True)

26. Spark MLlib Use Cases

  • Regression: Linear, Logistic

  • Classification: Decision Trees, Random Forests

  • Clustering: K-Means

  • Feature Engineering: VectorAssembler, StandardScaler

  • Pipelines: Combine multiple transformations


27. PySpark vs Pandas

Feature Pandas PySpark
Scale Small data (in-memory) Big data (distributed)
Speed Single machine Multi-node cluster
API Easy & rich Similar syntax
Use Case EDA ETL, ML on big data

28. Common Use Cases

✅ ETL on large datasets
✅ Feature engineering for ML
✅ Log analysis
✅ Data cleaning at scale
✅ Joining datasets across clusters




Kubernetes Complete Notes (Beginner-Friendly)


1. What is Kubernetes?

  • Kubernetes (K8s) is an open-source platform for automating deployment, scaling, and management of containerized applications (like Docker containers).

  • It helps you run applications reliably across clusters of machines (physical or virtual).

  • Originally developed by Google, now maintained by the Cloud Native Computing Foundation (CNCF).


2. Why Use Kubernetes?

Automatic scaling of apps
Self-healing — restarts crashed containers
Load balancing between containers
Rolling updates for zero downtime
Portability — works on any cloud or on-prem


3. Basic Terminology

Concept Description
Cluster Set of nodes (machines) managed by Kubernetes
Node A worker machine (physical or VM) that runs pods
Pod The smallest deployable unit — one or more containers
Container Application running inside Docker (or similar runtime)
Service Exposes pods to the network (for communication)
Deployment Manages replicas of pods and ensures desired state
Namespace Logical grouping of resources (like folders)
Ingress Manages external access (HTTP/HTTPS) to services
ConfigMap / Secret Store configuration or sensitive data separately

4. Architecture Overview

+-----------------------------------------------------------+
|                     Kubernetes Cluster                    |
|-----------------------------------------------------------|
|  Control Plane (Master Node)                              |
|    • kube-apiserver  → Handles all requests (API)         |
|    • etcd            → Key-value store for cluster data   |
|    • scheduler       → Assigns pods to worker nodes       |
|    • controller-mgr  → Monitors cluster state             |
|-----------------------------------------------------------|
|  Worker Nodes                                           |
|    • kubelet        → Communicates with control plane     |
|    • kube-proxy      → Networking for pods                |
|    • container runtime (Docker/Containerd)                |
+-----------------------------------------------------------+

5. Installation (Local Setup)

Option 1: Minikube (for local testing)

# Install Minikube (on Ubuntu/macOS)
brew install minikube     # macOS
choco install minikube    # Windows

# Start cluster
minikube start

# Verify setup
kubectl get nodes

Option 2: Cloud Providers

  • Google Kubernetes Engine (GKE)

  • AWS Elastic Kubernetes Service (EKS)

  • Azure Kubernetes Service (AKS)


6. Core Kubernetes Components

๐Ÿงฉ Pods

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: myapp
      image: nginx
      ports:
        - containerPort: 80

Deploy:

kubectl apply -f pod.yaml
kubectl get pods
kubectl describe pod myapp-pod

๐ŸŒ€ Deployment

Used to manage and scale pods automatically.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: nginx
          ports:
            - containerPort: 80

Deploy and check:

kubectl apply -f deployment.yaml
kubectl get deployments
kubectl get pods

๐ŸŒ Service

Exposes deployment to network (internal or external).

apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  type: NodePort
  selector:
    app: myapp
  ports:
    - port: 80
      targetPort: 80
      nodePort: 30001

Check service:

kubectl get svc
minikube service myapp-service

7. Scaling

kubectl scale deployment myapp-deployment --replicas=5
kubectl get pods

8. Rolling Updates

kubectl set image deployment/myapp-deployment myapp=nginx:latest
kubectl rollout status deployment/myapp-deployment

Rollback:

kubectl rollout undo deployment/myapp-deployment

9. ConfigMaps & Secrets

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_MODE: "production"

Secret:

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  DB_PASSWORD: cGFzc3dvcmQ=   # base64 encoded

Mount in pod:

envFrom:
  - configMapRef:
      name: app-config
  - secretRef:
      name: db-secret

10. Namespaces

kubectl create namespace dev
kubectl get namespaces
kubectl apply -f app.yaml -n dev

11. Logs & Monitoring

kubectl logs pod_name
kubectl describe pod pod_name
kubectl top pods      # if metrics-server installed

Popular tools:

  • Prometheus + Grafana (metrics)

  • ELK Stack (logs)

  • Lens (GUI dashboard)


12. Real-World Example Flow

  1. Dockerize your ML/Web app → Dockerfile

  2. Push image to Docker Hub or private registry

  3. Create deployment.yaml and service.yaml

  4. Apply configs:

    kubectl apply -f deployment.yaml
    kubectl apply -f service.yaml
    
  5. Scale if needed:

    kubectl scale deployment myapp --replicas=4
    
  6. Access app:

    minikube service myapp-service
    

13. Useful Commands

Command Description
kubectl get pods List all pods
kubectl get svc List all services
kubectl describe pod <name> Pod details
kubectl delete pod <name> Delete pod
kubectl logs <pod> Show logs
kubectl apply -f file.yaml Apply configuration
kubectl exec -it <pod> -- bash Access container shell

14. Kubernetes vs Docker

Feature Docker Kubernetes
Scope Containerization Orchestration
Scale Single host Multi-host clusters
Self-healing No Yes
Load Balancing Manual Automatic
Configuration Docker CLI YAML manifests

15. Key Takeaways

  • Kubernetes = container orchestrator for scaling & managing apps.

  • Works hand-in-hand with Docker.

  • Core concepts: Pod, Deployment, Service, ConfigMap, Namespace.

  • Ideal for ML model serving, microservices, and production apps.

  • Learn kubectl commands + YAML basics to get started fast.



Prometheus Complete Notes (Beginner-Friendly)


1. What is Prometheus?

  • Prometheus is an open-source monitoring and alerting toolkit designed for time-series data (metrics that change over time).

  • Commonly used for:

    • Monitoring applications, infrastructure, and Kubernetes clusters.

    • Setting alerts when performance issues or failures occur.

    • Visualizing metrics in Grafana dashboards.


2. Key Features

✅ Time-series database
✅ Pull-based metrics collection (scrapes from targets)
✅ Multi-dimensional data model
✅ Powerful query language — PromQL
✅ Integrates easily with Grafana
✅ Lightweight and easy to set up


3. How Prometheus Works (Architecture)

+------------------------------------------------------+
|                     Prometheus                       |
|------------------------------------------------------|
|  1. Targets (exporters, apps, K8s nodes)             |
|  2. Scrapes metrics via HTTP endpoints (/metrics)    |
|  3. Stores data as time-series in local DB           |
|  4. Query metrics via PromQL                         |
|  5. Triggers alerts (Alertmanager)                   |
|  6. Visualize with Grafana                           |
+------------------------------------------------------+

4. Key Components

Component Description
Prometheus Server Collects and stores metrics data
Exporters Expose metrics in Prometheus format
Alertmanager Sends notifications (Email, Slack, etc.)
PromQL Query language for analyzing metrics
Pushgateway For short-lived jobs that push metrics
Grafana For dashboard visualization

5. Installation (Local Setup)

Step 1: Download Prometheus

./prometheus --config.file=prometheus.yml

Visit: http://localhost:9090


6. Configuration File (prometheus.yml)

This file defines what to monitor (targets) and how often to scrape.

Example:

global:
  scrape_interval: 15s   # How often to collect metrics

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "myapp"
    static_configs:
      - targets: ["localhost:8000"]

Here:

  • Prometheus scrapes metrics from itself (9090) and your app (8000).


7. Exporters (for Different Systems)

Exporters are lightweight programs that expose metrics.

Exporter Purpose
Node Exporter OS-level metrics (CPU, RAM, Disk)
cAdvisor Container metrics
Kube State Metrics Kubernetes cluster info
Blackbox Exporter Endpoint uptime check
MySQL/Postgres Exporter Database metrics
JMX Exporter Java apps (JVM metrics)

Run Node Exporter:

./node_exporter

Access metrics: http://localhost:9100/metrics


8. Integrating Prometheus with Python / Flask App

Step 1: Install client library

pip install prometheus-client

Step 2: Add metrics endpoint in your app

from flask import Flask, Response
from prometheus_client import Counter, generate_latest

app = Flask(__name__)

REQUEST_COUNT = Counter('request_count', 'Total web requests')

@app.route("/")
def home():
    REQUEST_COUNT.inc()
    return "Hello, Prometheus!"

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), mimetype="text/plain")

if __name__ == "__main__":
    app.run(port=8000)

Now Prometheus can scrape metrics from /metrics endpoint every few seconds.


9. Querying Metrics with PromQL

PromQL = Prometheus Query Language

Common examples:

Query Meaning
up Show status (1 = up, 0 = down) of all targets
node_cpu_seconds_total Total CPU time
rate(http_requests_total[5m]) Requests per second (last 5 minutes)
sum(rate(http_requests_total[1m])) by (instance) Requests per instance
avg_over_time(cpu_usage[10m]) Average over last 10 minutes

You can run these queries at http://localhost:9090/graph.


10. Setting Alerts

Add alerting rules:

groups:
- name: alert.rules
  rules:
  - alert: HighCPUUsage
    expr: rate(node_cpu_seconds_total[1m]) > 0.8
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "High CPU usage detected"

Start Prometheus with Alertmanager:

./prometheus --config.file=prometheus.yml

Alertmanager can notify:

  • Email

  • Slack

  • PagerDuty

  • Telegram


11. Visualization with Grafana

  1. Install Grafana → https://grafana.com/grafana/download

  2. Open Grafana at http://localhost:3000

  3. Add Prometheus as a data source:

    • URL: http://localhost:9090

  4. Import dashboards (Node, Kubernetes, App metrics)

Now you can see live charts, e.g. CPU, memory, app response time.


12. Prometheus with Kubernetes

In Kubernetes, Prometheus monitors pods, nodes, and services.

You can deploy it easily using:

kubectl create namespace monitoring
kubectl apply -f https://github.com/prometheus-operator/kube-prometheus/releases/latest/download/manifests/setup

Or use Helm:

helm install prometheus prometheus-community/prometheus

This installs:

  • Prometheus server

  • Alertmanager

  • Node exporter

  • kube-state-metrics

Then access:

kubectl port-forward svc/prometheus-server 9090:80 -n monitoring

13. Common Use Cases

  • Monitor Docker / Kubernetes clusters

  • Track ML model latency & prediction counts

  • Set alerts for high memory usage or downtime

  • Integrate with Grafana dashboards for live monitoring

  • Observe system health trends over time


14. Key Takeaways

Concept Description
Prometheus Monitoring + alerting system
Metrics endpoint /metrics — exposes time-series data
PromQL Query and analyze data
Exporters Provide metrics for different systems
Alertmanager Triggers alerts on defined conditions
Grafana Visualization tool for Prometheus data

15. Mini Example: Full Flow Recap

  1. Run a Python/Flask app with /metrics endpoint

  2. Install Prometheus and configure:

    - job_name: 'flask'
      static_configs:
        - targets: ['localhost:8000']
    
  3. Start Prometheus:

    ./prometheus --config.file=prometheus.yml
    
  4. Open http://localhost:9090

  5. Query: request_count_total

  6. Add Grafana dashboard → visualize in charts.


16. Tools That Work With Prometheus

  • Grafana → dashboards

  • Alertmanager → notifications

  • Thanos → long-term storage

  • VictoriaMetrics → scalable alternative

  • Prometheus Operator → easy setup in Kubernetes


17. Quick Commands

Command Description
./prometheus --config.file=prometheus.yml Start Prometheus
kubectl port-forward svc/prometheus-server 9090:80 Access in K8s
curl localhost:9090/metrics Check metrics endpoint
systemctl status prometheus Check service status (Linux)


Here’s a beginner-friendly, short and complete notes summary on Grafana ๐Ÿ‘‡


๐Ÿงญ Grafana – Short Notes


๐ŸŒ 1. What is Grafana?

Grafana is an open-source visualization and monitoring tool used to analyze metrics, logs, and traces from various data sources.
It helps you create interactive dashboards to monitor system performance, infrastructure, and applications.


⚙️ 2. Key Features

Feature Description
๐Ÿ“Š Dashboards Visualize time-series data in real time
๐Ÿงฉ Plugins Extend functionality (data sources, panels, apps)
๐Ÿ”” Alerts Set thresholds and receive alerts via email, Slack, etc.
๐Ÿ—‚️ Data Sources Connect to Prometheus, InfluxDB, ElasticSearch, Loki, MySQL, etc.
๐Ÿ‘ฅ User Management Role-based access control
๐Ÿง  Templating Dynamic, parameterized dashboards

๐Ÿงฑ 3. Grafana Architecture

+-------------------+
|  Web UI (Dashboards) |
+----------+--------+
           |
+----------v--------+
|   Backend Server  |
|  (API + Logic)    |
+----------+--------+
           |
+----------v--------+
|  Data Sources     |
| (Prometheus, DBs) |
+-------------------+
  • Frontend (UI): Displays dashboards

  • Backend: Handles authentication, alerts, queries

  • Data Sources: Provide time-series or metric data


๐Ÿ”Œ 4. Common Data Sources

  • Prometheus – metrics monitoring

  • Loki – log aggregation

  • InfluxDB – time-series data

  • Elasticsearch – search + analytics

  • MySQL / PostgreSQL – SQL databases

  • Cloud Sources – AWS CloudWatch, Azure Monitor, GCP Stackdriver


๐Ÿงญ 5. Installing Grafana (Quick Setup)

On Ubuntu / Debian:

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Access at: http://localhost:3000
Default credentials:
user: admin / password: admin


๐Ÿงฉ 6. Creating Dashboards

  1. Login → Click “+” → Dashboard → Add new panel

  2. Choose a data source (e.g., Prometheus)

  3. Write a query (e.g., up, cpu_usage_total)

  4. Choose visualization type (Graph, Gauge, Table, etc.)

  5. Save dashboard


๐Ÿ”” 7. Alerts & Notifications

  • Add alert on a panel → Set condition (e.g., CPU > 80%)

  • Configure Notification Channel (Slack, Email, PagerDuty)

  • Alert Rules can be viewed & managed centrally


๐Ÿงฑ 8. Panels & Visualizations

Type Use
Time Series Continuous data (CPU, memory)
Gauge Current metric value
Bar Gauge Compare multiple values
Table Tabular data
Stat Single numeric indicator
Heatmap Distribution visualization

๐Ÿงฐ 9. Variables (Templating)

  • Create dynamic dashboards with dropdowns.
    Example:

$server → all available server names
$metric → all available metrics

Used in query as:

avg(cpu_usage{instance="$server"})

๐Ÿง  10. Grafana + Prometheus Workflow

  1. Prometheus collects metrics from servers/applications

  2. Grafana connects to Prometheus as data source

  3. Dashboards visualize time-series metrics

  4. Alerts notify when thresholds are crossed


๐Ÿ›ก️ 11. Authentication & Roles

  • Admin – full control

  • Editor – can modify dashboards

  • Viewer – read-only access

Supports:

  • LDAP, OAuth, Google, Azure AD, GitHub authentication


☁️ 12. Cloud & Enterprise Versions

Type Description
Grafana OSS Free open-source
Grafana Cloud Hosted SaaS version
Grafana Enterprise Adds support, SSO, auditing

๐Ÿงฉ 13. Integration Examples

  • Prometheus + Grafana → system metrics

  • Loki + Grafana → centralized log dashboard

  • Tempo + Grafana → distributed tracing

  • MySQL + Grafana → business analytics


๐Ÿš€ 14. Common Use Cases

✅ Infrastructure & server monitoring
✅ Application performance tracking
✅ Business KPIs visualization
✅ Log + Metric correlation (via Loki)
✅ Cloud resource monitoring


๐Ÿงพ 15. Grafana Query Examples

PromQL (Prometheus):

node_cpu_seconds_total{mode="idle"}
avg(rate(http_requests_total[5m]))

InfluxQL (InfluxDB):

SELECT mean("usage") FROM "cpu" WHERE time > now() - 1h GROUP BY time(1m)

๐Ÿงฉ 16. Short Commands & Ports

Command Purpose
sudo systemctl start grafana-server Start service
sudo systemctl stop grafana-server Stop service
sudo systemctl status grafana-server Check status
Default Port: 3000



Comments

Popular posts from this blog

Resume Work and Project Details

Time Series and MMM basics

LINEAR REGRESSION