Tensorflow, Pytorch, Flask, Fast API, Mongo DB, Agentic AI, Recommendation engines, Kubernates, Spark, Grafana
TensorFlow – Beginner-Friendly Complete Notes
1. Introduction to TensorFlow
-
What is TensorFlow?
-
Open-source machine learning framework by Google.
-
Used for deep learning, ML, and numerical computation.
-
Works on CPU, GPU, TPU.
-
-
Key Features:
-
Easy model building (Keras high-level API).
-
Runs on multiple devices.
-
Large ecosystem (TensorBoard, TFLite, TF Serving).
-
2. Installation
pip install tensorflow
Check version:
import tensorflow as tf
print(tf.__version__)
3. Basic Building Blocks
a) Tensors
-
Tensors = multi-dimensional arrays (like NumPy but GPU-friendly).
x = tf.constant([[1,2],[3,4]])
print(x) # 2D tensor
-
Tensor Ranks:
-
Scalar (0D), Vector (1D), Matrix (2D), Higher dimensions.
-
b) Variables
-
Trainable tensors, used to store weights.
w = tf.Variable([0.5, 1.0])
c) Operations
-
Math ops on tensors.
a = tf.constant([1,2,3])
b = tf.constant([4,5,6])
print(tf.add(a,b)) # [5 7 9]
4. TensorFlow vs NumPy
-
NumPy: CPU only, no automatic differentiation.
-
TensorFlow: Works on GPU, supports automatic differentiation.
import numpy as np
np_arr = np.array([1,2,3])
tf_tensor = tf.convert_to_tensor(np_arr)
5. TensorFlow Workflow (Step by Step)
Step 1: Import Data
-
Built-in datasets in
tf.keras.datasets.
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Step 2: Preprocess Data
-
Normalize and reshape.
x_train = x_train / 255.0
x_test = x_test / 255.0
Step 3: Build Model
-
Use Sequential API (simple stack of layers).
from tensorflow.keras import models, layers
model = models.Sequential([
layers.Flatten(input_shape=(28,28)),
layers.Dense(128, activation='relu'),
layers.Dense(10, activation='softmax')
])
Step 4: Compile Model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Step 5: Train Model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
Step 6: Evaluate
model.evaluate(x_test, y_test)
6. TensorFlow Model APIs
a) Sequential API
-
Linear stack of layers.
-
Best for simple models.
b) Functional API
-
More flexible (multi-input/output, non-linear graphs).
inputs = layers.Input(shape=(28,28))
x = layers.Flatten()(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs, outputs)
c) Subclassing API
-
Full control using Python classes.
7. Common Layers
-
Dense– Fully connected. -
Conv2D,MaxPooling2D– For images. -
LSTM,GRU– For sequences/text. -
Dropout– Prevent overfitting. -
BatchNormalization– Normalize activations.
8. Training Essentials
Optimizers
-
SGD – Simple gradient descent.
-
Adam – Most common, adaptive learning rate.
Loss Functions
-
Regression →
mse -
Classification →
binary_crossentropy,categorical_crossentropy.
Metrics
-
Accuracy, Precision, Recall, F1.
9. Callbacks
-
Add functionality during training.
from tensorflow.keras.callbacks import EarlyStopping
cb = EarlyStopping(patience=3, restore_best_weights=True)
model.fit(x_train, y_train, epochs=20, callbacks=[cb])
-
Common Callbacks:
-
EarlyStopping -
ModelCheckpoint -
TensorBoard
-
10. Saving and Loading Models
# Save
model.save("my_model.h5")
# Load
from tensorflow.keras.models import load_model
model = load_model("my_model.h5")
11. TensorBoard (Visualization)
-
Tool to visualize training (loss, accuracy, graphs).
tensorboard --logdir=logs/
12. TensorFlow Ecosystem
-
TensorFlow Lite (TFLite) → For mobile/IoT.
-
TensorFlow.js → For running ML in browser.
-
TF Serving → For deployment.
-
TF Hub → Pre-trained models.
13. Example: End-to-End Classification
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import fashion_mnist
# Load data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
# Build model
model = models.Sequential([
layers.Flatten(input_shape=(28,28)),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10, activation='softmax')
])
# Compile
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
# Evaluate
print(model.evaluate(x_test, y_test))
14. Tips for Beginners
-
Start with Sequential API before Functional.
-
Use callbacks to avoid overfitting.
-
Normalize data always.
-
Experiment with different optimizers and learning rates.
-
Visualize results with TensorBoard.
15. Interview Quick Recap
-
TensorFlow = ML framework by Google.
-
Tensors = multidimensional arrays.
-
APIs: Sequential, Functional, Subclassing.
-
Common optimizers: SGD, Adam.
-
Loss: MSE (regression), CrossEntropy (classification).
-
Ecosystem: TF Lite, TF.js, TF Hub, TensorBoard.
PyTorch – Beginner-Friendly Complete Notes
1. Introduction to PyTorch
-
What is PyTorch?
-
Open-source deep learning framework by Facebook (Meta).
-
Flexible, pythonic, widely used in research.
-
Supports CPU & GPU.
-
-
Key Features:
-
Dynamic computation graph (eager execution).
-
Strong community for research + production.
-
Integration with NumPy and Python libraries.
-
2. Installation
pip install torch torchvision torchaudio
Check version:
import torch
print(torch.__version__)
3. Core Building Blocks
a) Tensors
-
Like NumPy arrays, but with GPU acceleration.
import torch
x = torch.tensor([[1, 2], [3, 4]])
print(x)
-
Check device:
print(x.device) # cpu by default
-
Move to GPU (if available):
if torch.cuda.is_available():
x = x.to("cuda")
b) Operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(a + b) # tensor([5, 7, 9])
c) Autograd (Automatic Differentiation)
-
PyTorch tracks gradients for optimization.
w = torch.tensor(2.0, requires_grad=True)
y = w**2
y.backward()
print(w.grad) # dy/dw = 4
4. PyTorch vs TensorFlow
-
PyTorch: Dynamic graph (easy debugging, flexible).
-
TensorFlow: Static + Eager (optimized for deployment).
-
PyTorch is favored for research, TensorFlow more for production.
5. Workflow in PyTorch
Step 1: Dataset
-
Torch has dataset utilities in
torchvision.datasets.
from torchvision import datasets, transforms
transform = transforms.ToTensor()
train_data = datasets.MNIST(root="data", train=True, transform=transform, download=True)
Step 2: DataLoader
-
Helps in batching & shuffling data.
from torch.utils.data import DataLoader
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
Step 3: Define Model
-
Using
nn.Module(base class for all models).
import torch.nn as nn
import torch.nn.functional as F
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(28*28, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = x.view(-1, 28*28) # flatten
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Step 4: Define Loss and Optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Step 5: Training Loop
for epoch in range(5):
for images, labels in train_loader:
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Step 6: Evaluation
correct, total = 0, 0
with torch.no_grad(): # no gradient calculation
for images, labels in train_loader:
outputs = model(images)
_, preds = torch.max(outputs, 1)
correct += (preds == labels).sum().item()
total += labels.size(0)
print("Accuracy:", correct / total)
6. PyTorch Components
a) Tensors
-
torch.tensor(),torch.zeros(),torch.ones(),torch.rand().
b) Autograd
-
requires_grad=Truetracks gradients. -
tensor.backward()computes gradients.
c) Optimizers
-
torch.optim.SGD,torch.optim.Adam,torch.optim.RMSprop.
d) Loss Functions
-
nn.MSELoss()→ Regression. -
nn.CrossEntropyLoss()→ Classification.
e) Modules & Layers
-
nn.Linear→ Fully connected. -
nn.Conv2d,nn.MaxPool2d→ CNN layers. -
nn.LSTM,nn.GRU→ RNN layers. -
nn.Dropout,nn.BatchNorm2d.
7. GPU/Device Management
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
Move data:
images, labels = images.to(device), labels.to(device)
8. Saving and Loading Models
# Save
torch.save(model.state_dict(), "model.pth")
# Load
model = SimpleNN()
model.load_state_dict(torch.load("model.pth"))
model.eval()
9. PyTorch Ecosystem
-
torchvision → Computer vision datasets & models.
-
torchaudio → Audio ML.
-
torchtext → NLP datasets & models.
-
PyTorch Lightning → High-level training framework.
-
ONNX → Export models for deployment.
10. Example: End-to-End Classification
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Data
transform = transforms.ToTensor()
train_data = datasets.FashionMNIST(root="data", train=True, transform=transform, download=True)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
# Model
class FashionNN(nn.Module):
def __init__(self):
super(FashionNN, self).__init__()
self.fc1 = nn.Linear(28*28, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = x.view(-1, 28*28)
x = F.relu(self.fc1(x))
return self.fc2(x)
model = FashionNN()
# Loss + Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Training
for epoch in range(3):
for images, labels in train_loader:
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss={loss.item():.4f}")
11. Tips for Beginners
-
Always check
.shapeof tensors. -
Remember to flatten images before feeding into fully connected layers.
-
Use
with torch.no_grad()during evaluation. -
Use
.to(device)to move model/data to GPU. -
Start with simple models, then try CNNs/RNNs.
12. Interview Quick Recap
-
PyTorch = deep learning framework by Meta.
-
Tensors = multi-dimensional arrays with GPU support.
-
Autograd = automatic differentiation.
-
Models → subclass
nn.Module. -
Optimizers → SGD, Adam.
-
Loss → MSE (regression), CrossEntropy (classification).
-
Ecosystem → torchvision, torchaudio, torchtext, PyTorch Lightning.
MongoDB Complete Notes (Beginner Friendly)
1. What is MongoDB?
-
MongoDB is an open-source NoSQL database that stores data in JSON-like documents (called BSON = Binary JSON).
-
Unlike relational databases (SQL), MongoDB does not require predefined schemas.
-
It's highly scalable, flexible, and works perfectly for modern data apps, APIs, and ML pipelines.
2. Key Features
✅ NoSQL – document-oriented
✅ Schema-less – flexible data model
✅ High performance – fast read/write
✅ Scalable – supports sharding & replication
✅ JSON-style documents
✅ Powerful query language
✅ Integration with Python, Flask, Django, etc.
3. Basic Concepts
| Term | Description | SQL Equivalent |
|---|---|---|
| Database | Group of collections | Database |
| Collection | Group of documents | Table |
| Document | JSON-like data record | Row |
| Field | Key-value pair in document | Column |
| _id | Unique identifier for each document | Primary key |
4. MongoDB Architecture Overview
+-------------------------------------------------+
| MongoDB |
|-------------------------------------------------|
| Database → Collection → Document (JSON format) |
| Example: |
| db.users.insertOne({name: "Sanjay", age: 28}) |
+-------------------------------------------------+
5. Installation
๐งฉ Option 1: Local Setup
-
Download from https://www.mongodb.com/try/download/community
-
Start MongoDB service:
mongod -
Open Mongo Shell:
mongosh
๐งฉ Option 2: MongoDB Atlas (Cloud)
-
Create cluster → Connect → Copy connection string (e.g.)
mongodb+srv://username:password@cluster.mongodb.net/test
6. MongoDB Data Format Example
{
"_id": 1,
"name": "John",
"age": 30,
"skills": ["Python", "ML", "Flask"],
"address": { "city": "Bangalore", "pincode": 560001 }
}
✅ Nested JSON
✅ Arrays supported
✅ Flexible structure
7. Basic MongoDB Commands
| Command | Description |
|---|---|
show dbs |
List all databases |
use mydb |
Switch/create database |
show collections |
List collections |
db.createCollection("users") |
Create a collection |
db.users.insertOne({...}) |
Insert single document |
db.users.insertMany([...]) |
Insert multiple documents |
db.users.find() |
View all documents |
db.users.findOne() |
View first document |
db.users.updateOne() |
Update single record |
db.users.deleteOne() |
Delete single record |
db.dropDatabase() |
Delete database |
8. CRUD Operations
➤ Create
db.students.insertOne({
name: "Amit",
age: 22,
course: "Data Science"
})
➤ Read
db.students.find()
db.students.find({age: {$gt: 20}})
db.students.find({course: "Data Science"}, {name: 1, _id: 0})
➤ Update
db.students.updateOne(
{ name: "Amit" },
{ $set: { age: 23 } }
)
➤ Delete
db.students.deleteOne({ name: "Amit" })
9. Query Operators
| Operator | Meaning | Example |
|---|---|---|
$gt |
Greater than | {age: {$gt: 25}} |
$lt |
Less than | {age: {$lt: 25}} |
$eq |
Equal | {age: {$eq: 30}} |
$ne |
Not equal | {age: {$ne: 25}} |
$in |
In list | {city: {$in: ["Delhi","Mumbai"]}} |
$and |
Logical AND | {$and:[{age:{$gt:20}},{city:"Pune"}]} |
$or |
Logical OR | {$or:[{age:{$lt:20}},{city:"Delhi"}]} |
10. Indexing
Used to speed up queries.
db.users.createIndex({name: 1})
db.users.getIndexes()
11. Aggregation Framework
Aggregation = data processing pipelines (like SQL GROUP BY).
Example:
db.sales.aggregate([
{ $match: { region: "Asia" } },
{ $group: { _id: "$country", totalSales: { $sum: "$amount" } } },
{ $sort: { totalSales: -1 } }
])
12. Connection with Python (pymongo)
Install:
pip install pymongo
Connect:
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["students"]
Insert:
collection.insert_one({"name": "Riya", "age": 21})
Fetch:
for s in collection.find():
print(s)
Query:
result = collection.find({"age": {"$gt": 20}})
for r in result:
print(r)
Update:
collection.update_one({"name": "Riya"}, {"$set": {"age": 22}})
Delete:
collection.delete_one({"name": "Riya"})
13. MongoDB with Flask (Example)
from flask import Flask, request, jsonify
from pymongo import MongoClient
app = Flask(__name__)
client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["users"]
@app.route("/add", methods=["POST"])
def add_user():
data = request.json
collection.insert_one(data)
return jsonify({"message": "User added successfully"})
@app.route("/users", methods=["GET"])
def get_users():
users = list(collection.find({}, {"_id": 0}))
return jsonify(users)
if __name__ == "__main__":
app.run(debug=True)
Access:
-
POST /addwith JSON body -
GET /usersto fetch users
14. Data Modeling Best Practices
✅ Use embedded documents for one-to-few relationships
✅ Use references for one-to-many relationships
✅ Keep document size < 16MB
✅ Use indexes on frequently queried fields
✅ Avoid unnecessary nesting
15. Replication & Sharding (Advanced Concepts)
| Concept | Description |
|---|---|
| Replication | Copies data across multiple servers for high availability |
| Primary Node | Receives all writes |
| Secondary Node | Copies data from primary |
| Sharding | Splits large data into horizontal partitions (scaling) |
16. MongoDB vs SQL Summary
| SQL | MongoDB |
|---|---|
| Structured schema | Schema-less |
| Tables, Rows | Collections, Documents |
| Joins | Embedded/Nested documents |
| SQL Queries | BSON + MongoDB Query Language |
| Vertical scaling | Horizontal scaling |
| Slower for large data | Faster for unstructured data |
17. MongoDB Atlas Example (Cloud)
from pymongo import MongoClient
client = MongoClient("mongodb+srv://<username>:<password>@cluster0.mongodb.net/")
db = client["sales_db"]
sales = db["transactions"]
sales.insert_one({"region": "Asia", "amount": 2000})
for doc in sales.find():
print(doc)
18. Common Commands Recap
| Operation | Command |
|---|---|
| Create DB | use mydb |
| Create Collection | db.createCollection("users") |
| Insert | db.users.insertOne({name:"Amit"}) |
| Read | db.users.find() |
| Update | db.users.updateOne() |
| Delete | db.users.deleteOne() |
| Drop Collection | db.users.drop() |
19. Integrations
MongoDB integrates with:
-
Flask, Django, FastAPI
-
Pandas (via
pymongoormongoengine) -
ML pipelines (store model metadata)
-
Airflow, Streamlit, etc.
20. Quick Summary
| Topic | Key Point |
|---|---|
| Type | NoSQL (Document-based) |
| Format | JSON-like BSON |
| Query Language | Mongo Query Language (MQL) |
| Key Libraries | pymongo, mongoengine |
| Best Use Case | Dynamic data, APIs, logs, ML storage |
| Cloud Option | MongoDB Atlas |
๐น Model Context Protocol (MCP) – In Machine Learning
A Model Context Protocol refers to the way information, metadata, or inputs are structured and passed to a machine learning model so that:
-
The model understands the input properly.
-
The output can be interpreted or used consistently.
Think of it as the rules of communication between your model and its environment (data pipeline, serving system, or API).
1. Why Do We Need a Model Context Protocol?
-
Models don’t work in isolation; they need:
-
Input format (what data, how structured).
-
Context (user info, history, environment).
-
Output format (what the model returns, how it’s consumed).
-
-
MCP ensures standardization → makes model reusable, debuggable, and deployable.
2. What Does It Include?
A typical model context protocol includes:
-
Input Schema
-
Feature names, types, dimensions.
-
Example:
{"user_id": int, "age": float, "clicked_items": list}
-
-
Context
-
Additional info that influences predictions.
-
Example: time of day, device type, location.
-
-
Model Metadata
-
Model version, training data info, assumptions.
-
Example:
"version": "1.2.3", "trained_on": "MovieLens 1M"
-
-
Output Schema
-
Structure of prediction.
-
Example:
{"recommended_item": str, "confidence": float}
-
3. Example (Recommendation System MCP)
Input Context Protocol:
{
"user_id": 123,
"session_features": {
"time_of_day": "evening",
"device": "mobile"
},
"interaction_history": [45, 67, 89] // item IDs
}
Model Output:
{
"recommended_items": [101, 202, 303],
"confidence_scores": [0.92, 0.85, 0.80],
"model_version": "v1.0.5"
}
This ensures every service consuming the model knows exactly what to send and expect back.
4. Protocols in Real World
-
TensorFlow Serving → Uses gRPC/REST with JSON or Protobuf schemas.
-
TorchServe → Defines handler classes for input-output schemas.
-
ONNX Runtime → Standardized model format across frameworks.
-
MLOps Systems (Kubeflow, MLflow, Seldon) → Rely heavily on context protocols for reproducibility.
5. Interview Quick Recap
-
MCP = contract between model and environment.
-
Defines input schema, context info, output schema.
-
Needed for scaling, deploying, and debugging ML models.
-
Real-world implementations → TensorFlow Serving, TorchServe, ONNX, MLflow.
Agentic AI – Beginner-Friendly Notes
1. What is Agentic AI?
-
Agentic AI = AI systems that can act as “agents.”
-
Unlike traditional models (which just take input → give output), agentic AI:
-
Perceives the environment (via data, sensors, APIs).
-
Plans actions (chooses strategy or sequence of steps).
-
Acts on the environment (via tools, APIs, physical systems).
-
Learns & adapts based on feedback.
-
In short: Agentic AI doesn’t just answer, it does things autonomously.
2. Why is it Important?
-
Moves AI from passive assistants → active problem-solvers.
-
Can execute multi-step workflows, not just answer single queries.
-
Key for autonomous research, robotics, personalized assistants, and business automation.
3. Core Components of Agentic AI
-
Perception
-
Collects information (from text, images, sensors, APIs).
-
-
Memory
-
Short-term memory (conversation context).
-
Long-term memory (stored knowledge, databases).
-
-
Planning & Reasoning
-
Breaks complex goals into smaller steps.
-
Uses chain-of-thought or planning algorithms.
-
-
Tools & Actions
-
Can call APIs, run code, browse web, query databases.
-
-
Feedback & Learning
-
Evaluates actions, updates strategy.
-
4. Techniques Behind Agentic AI
-
LLM + Tools (Tool-Use)
-
LLM calls external tools (calculator, search engine, database).
-
-
Reasoning + Planning
-
Approaches like Tree of Thoughts, ReAct (Reason + Act).
-
-
Multi-Agent Systems
-
Several AI agents collaborate (research agent, writing agent, coding agent).
-
-
Reinforcement Learning (RL)
-
Agents learn optimal actions via trial & error.
-
-
Memory Augmentation
-
Vector databases (Pinecone, FAISS) to recall past interactions.
-
5. Examples of Agentic AI
-
AutoGPT: LLM agent that autonomously executes tasks.
-
LangChain Agents: Orchestrate LLMs + tools.
-
ChatGPT with browsing/code interpreter: Uses external tools.
-
Robotic agents: AI agents that can move robots (self-driving cars, drones).
-
Enterprise AI Agents: Automate workflows (customer service, report generation).
6. Comparison
| Type | Traditional AI | Agentic AI |
|---|---|---|
| Input/Output | Fixed Q → A | Dynamic, context-driven |
| Autonomy | No | Yes |
| Tool Usage | Limited | Uses APIs, tools |
| Planning | None | Multi-step reasoning |
| Adaptability | Low | High |
7. Challenges in Agentic AI
-
Hallucination risk → Wrong actions.
-
Safety & alignment → Ensure AI follows human values.
-
Reliability → Needs guardrails to avoid harmful actions.
-
Scalability → Costly if not optimized.
-
Evaluation → Harder to test compared to static models.
8. Applications
-
Personal assistants (schedule meetings, send emails).
-
Business automation (generate reports, analyze markets).
-
Research (autonomous discovery, literature review).
-
Healthcare (monitor patients, suggest treatments).
-
Robotics (self-driving cars, drones, warehouse robots).
9. Future of Agentic AI
-
More collaborative AI ecosystems (multi-agent teams).
-
Safe & explainable reasoning mechanisms.
-
Integration with IoT & robotics → fully autonomous systems.
-
Potential to become co-workers, not just tools.
10. Interview Quick Recap
-
Agentic AI = AI systems that perceive, plan, act, and learn.
-
Core: Perception, Memory, Planning, Tools, Feedback.
-
Techniques: ReAct, Tree of Thoughts, RL, Multi-agent.
-
Examples: AutoGPT, LangChain, ChatGPT (with tools).
-
Challenges: Hallucinations, safety, evaluation.
-
Applications: Assistants, automation, robotics, research.
Streamlit Complete Notes (Beginner-Friendly)
1. What is Streamlit?
-
Streamlit is an open-source Python framework for building interactive web apps for data science, ML, and visualization.
-
No need to know frontend (HTML/CSS/JS).
-
Just write Python and deploy as a web app.
2. Installation
pip install streamlit
Check version:
streamlit --version
Run an app:
streamlit run app.py
3. Basic App
app.py
import streamlit as st
st.title("Hello Streamlit")
st.write("This is my first Streamlit app")
Run:
streamlit run app.py
Opens in browser at http://localhost:8501.
4. Streamlit Basics
Text and Titles
st.title("Title")
st.header("Header")
st.subheader("Subheader")
st.text("Simple text")
st.markdown("**Bold with Markdown**")
Data Display
import pandas as pd
df = pd.DataFrame({"Name": ["A", "B"], "Age": [23, 34]})
st.dataframe(df) # Interactive table
st.table(df) # Static table
Charts
import matplotlib.pyplot as plt
import numpy as np
st.line_chart(df) # Simple chart
st.bar_chart(df["Age"])
5. User Input Widgets
name = st.text_input("Enter your name:")
age = st.number_input("Enter age", min_value=0, max_value=100)
gender = st.radio("Select gender", ["Male", "Female"])
hobby = st.multiselect("Choose hobbies", ["Reading", "Gaming", "Sports"])
submit = st.button("Submit")
if submit:
st.write(f"Hello {name}, Age: {age}, Gender: {gender}, Hobby: {hobby}")
6. File Upload
uploaded_file = st.file_uploader("Upload CSV", type="csv")
if uploaded_file:
df = pd.read_csv(uploaded_file)
st.dataframe(df.head())
7. Layouts
-
Sidebar for navigation:
st.sidebar.title("Options")
choice = st.sidebar.radio("Menu", ["Home", "About"])
if choice == "Home":
st.write("Welcome to Home")
else:
st.write("About Page")
-
Columns:
col1, col2 = st.columns(2)
col1.write("Left Side")
col2.write("Right Side")
-
Tabs:
tab1, tab2 = st.tabs(["Data", "Charts"])
with tab1:
st.write("Data Section")
with tab2:
st.write("Charts Section")
8. Caching (for performance)
@st.cache_data
def load_data():
df = pd.read_csv("big_data.csv")
return df
9. Machine Learning Demo
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier()
model.fit(X, y)
st.title("Iris Flower Prediction")
sepal_length = st.slider("Sepal Length", 4.0, 8.0, 5.0)
sepal_width = st.slider("Sepal Width", 2.0, 4.5, 3.0)
petal_length = st.slider("Petal Length", 1.0, 7.0, 4.0)
petal_width = st.slider("Petal Width", 0.1, 2.5, 1.0)
prediction = model.predict([[sepal_length, sepal_width, petal_length, petal_width]])
st.write("Predicted Class:", iris.target_names[prediction][0])
10. Deploy Streamlit App
-
Use Streamlit Community Cloud (free).
Steps:
-
Push app code to GitHub.
-
Go to https://streamlit.io/cloud.
-
Connect GitHub repo.
-
Deploy.
Alternative: Deploy on Heroku, Render, AWS, or GCP.
11. Example Mini Project: Sales Dashboard
import pandas as pd
import streamlit as st
import plotly.express as px
st.title("Sales Dashboard")
uploaded_file = st.file_uploader("Upload Sales CSV", type="csv")
if uploaded_file:
df = pd.read_csv(uploaded_file)
st.write("Data Preview:", df.head())
fig = px.line(df, x="Date", y="Sales", title="Sales Over Time")
st.plotly_chart(fig)
st.metric("Total Sales", df["Sales"].sum())
12. Key Takeaways
-
Streamlit = Python → Web App (no frontend needed).
-
Supports charts, ML models, dashboards, file uploads.
-
Very beginner-friendly and fast to prototype.
-
Free hosting on Streamlit Cloud.
Flask Complete Notes (Beginner-Friendly)
1. What is Flask?
-
Flask is a lightweight Python web framework used to build web applications and APIs.
-
Known as a “micro-framework” because it doesn’t come with built-in tools like database ORM or authentication — you add only what you need.
-
Great for beginners, prototyping, and even production apps.
2. Installing Flask
pip install flask
Check installation:
import flask
print(flask.__version__)
3. Basic Flask App
from flask import Flask
app = Flask(__name__)
@app.route("/")
def home():
return "Hello, Flask!"
if __name__ == "__main__":
app.run(debug=True)
-
Flask(__name__)→ creates an app object. -
@app.route("/")→ defines the URL route. -
app.run(debug=True)→ starts server with auto-reload and error tracking.
Run app:
python app.py
Visit: http://127.0.0.1:5000
4. Routing
-
Add multiple pages by defining routes:
@app.route("/about")
def about():
return "This is the About Page"
-
Dynamic routes:
@app.route("/user/<name>")
def user(name):
return f"Hello, {name}!"
5. Templates (HTML Integration)
Flask uses Jinja2 templates to render HTML.
Folder structure:
project/
app.py
templates/
index.html
app.py:
from flask import render_template
@app.route("/")
def home():
return render_template("index.html", name="Sanjay")
templates/index.html:
<!DOCTYPE html>
<html>
<head><title>Flask App</title></head>
<body>
<h1>Hello, {{ name }}</h1>
</body>
</html>
6. Static Files (CSS, JS, Images)
Folder structure:
project/
static/
style.css
HTML usage:
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
7. Forms and User Input
from flask import request
@app.route("/login", methods=["GET", "POST"])
def login():
if request.method == "POST":
username = request.form["username"]
return f"Welcome, {username}"
return '''
<form method="post">
<input name="username">
<input type="submit">
</form>
'''
8. REST API with Flask
from flask import jsonify
@app.route("/api/data")
def get_data():
return jsonify({"name": "Sanjay", "role": "Data Scientist"})
9. Flask with Database (SQLite Example)
import sqlite3
from flask import g
DATABASE = "test.db"
def get_db():
db = getattr(g, "_database", None)
if db is None:
db = g._database = sqlite3.connect(DATABASE)
return db
@app.route("/add")
def add_data():
db = get_db()
db.execute("INSERT INTO users (name) VALUES (?)", ("Sanjay",))
db.commit()
return "User added!"
10. Flask Extensions (Popular)
-
Flask-SQLAlchemy → Database ORM
-
Flask-Login → Authentication
-
Flask-RESTful → Build APIs easily
-
Flask-WTF → Form handling
-
Flask-Mail → Send emails
11. Deploying Flask
-
Local run:
python app.py -
Production (Gunicorn):
pip install gunicorn
gunicorn -w 4 app:app
-
Can deploy on Heroku, Render, AWS, GCP, or Railway.
12. Mini Project Example (Hello API + Webpage)
app.py:
from flask import Flask, render_template, jsonify
app = Flask(__name__)
@app.route("/")
def home():
return render_template("index.html", name="Flask Learner")
@app.route("/api")
def api():
return jsonify({"message": "This is Flask API!"})
if __name__ == "__main__":
app.run(debug=True)
index.html:
<h1>Welcome {{ name }}</h1>
<p>Check API at <a href="/api">/api</a></p>
13. Key Points Summary
-
Flask = Lightweight, flexible Python web framework.
-
Uses routes for pages and APIs.
-
Templates + static files = frontend support.
-
Extensions add extra power (DB, login, forms).
-
Easy to deploy anywhere.
Perfect ⚡ — FastAPI is a must-learn framework for anyone working in Data Science, ML, or backend APIs because it’s modern, fast, and production-ready.
Here’s your complete, beginner-friendly, end-to-end FastAPI notes ๐
FastAPI Complete Notes (Beginner Friendly)
1. What is FastAPI?
FastAPI is a modern, fast (high-performance) Python framework used for building APIs and backend services.
✅ Built on Starlette (for web) and Pydantic (for data validation)
✅ Designed for speed and type safety
✅ Ideal for Machine Learning APIs, microservices, and real-time data systems
2. Why Use FastAPI?
| Feature | Description |
|---|---|
| ๐ Fast | Built on ASGI → handles requests asynchronously |
| ๐ง Data validation | Uses Pydantic models for strict schema validation |
| ⚙️ Automatic docs | Swagger UI and Redoc auto-generated |
| ๐ Easy integration | Works well with SQL, NoSQL, ML, and OAuth2 |
| ๐ฆ Modern syntax | Type hints and async/await supported |
| ๐งฉ Great for ML | Perfect for deploying ML models as REST APIs |
3. Installation
pip install fastapi uvicorn
✅ fastapi → API framework
✅ uvicorn → ASGI server to run your app
4. Create Your First FastAPI App
๐ main.py
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def home():
return {"message": "Hello, FastAPI!"}
Run the app:
uvicorn main:app --reload
Now visit:
๐ http://127.0.0.1:8000 → API output
๐ http://127.0.0.1:8000/docs → Swagger UI
๐ http://127.0.0.1:8000/redoc → ReDoc UI
5. HTTP Methods
| Method | Usage | Example |
|---|---|---|
| GET | Read data | /users |
| POST | Create new data | /users |
| PUT | Update entire record | /users/{id} |
| PATCH | Update part of record | /users/{id} |
| DELETE | Delete record | /users/{id} |
Example:
@app.get("/items/{item_id}")
def get_item(item_id: int):
return {"item_id": item_id}
6. Query Parameters
@app.get("/search/")
def search_items(q: str, limit: int = 5):
return {"query": q, "limit": limit}
➡️ Access like: http://127.0.0.1:8000/search?q=apple&limit=10
7. Request Body with Pydantic
Used for validating and structuring input JSON.
from pydantic import BaseModel
class Item(BaseModel):
name: str
price: float
in_stock: bool = True
@app.post("/items/")
def create_item(item: Item):
return {"item_name": item.name, "price": item.price}
Input JSON example:
{
"name": "Laptop",
"price": 80000
}
8. Path Parameters + Validation
from fastapi import Path
@app.get("/users/{user_id}")
def read_user(user_id: int = Path(..., gt=0, description="User ID must be > 0")):
return {"user_id": user_id}
9. Handling Query + Path Together
@app.get("/products/{product_id}")
def read_product(product_id: int, q: str | None = None):
if q:
return {"product_id": product_id, "query": q}
return {"product_id": product_id}
10. Response Models (Structured Output)
class User(BaseModel):
id: int
name: str
email: str
@app.get("/user/{id}", response_model=User)
def get_user(id: int):
return {"id": id, "name": "Sanjay", "email": "sanjay@example.com"}
11. Handling Errors
from fastapi import HTTPException
@app.get("/divide")
def divide(a: float, b: float):
if b == 0:
raise HTTPException(status_code=400, detail="Division by zero not allowed")
return {"result": a / b}
12. Dependency Injection
Used to manage reusable logic like authentication, DB connections, etc.
from fastapi import Depends
def get_token_header(token: str):
if token != "abc123":
raise HTTPException(status_code=403, detail="Invalid token")
return token
@app.get("/secure-data/")
def read_secure_data(token: str = Depends(get_token_header)):
return {"data": "Secure content!"}
13. Middleware
Middleware intercepts requests/responses.
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
14. Connect FastAPI with MongoDB
from fastapi import FastAPI
from pymongo import MongoClient
from pydantic import BaseModel
app = FastAPI()
client = MongoClient("mongodb://localhost:27017/")
db = client["fastapi_db"]
collection = db["users"]
class User(BaseModel):
name: str
age: int
@app.post("/add_user")
def add_user(user: User):
collection.insert_one(user.dict())
return {"message": "User added"}
@app.get("/users")
def get_users():
users = list(collection.find({}, {"_id": 0}))
return {"users": users}
15. FastAPI + Machine Learning Example
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np
app = FastAPI()
# Load model
model = pickle.load(open("model.pkl", "rb"))
class InputData(BaseModel):
feature1: float
feature2: float
feature3: float
@app.post("/predict")
def predict(data: InputData):
features = np.array([[data.feature1, data.feature2, data.feature3]])
prediction = model.predict(features)
return {"prediction": float(prediction[0])}
Run with:
uvicorn main:app --reload
➡️ Try it at http://127.0.0.1:8000/docs
16. Include Routers (for large projects)
from fastapi import APIRouter
router = APIRouter()
@router.get("/info")
def info():
return {"info": "Sub-router working!"}
app.include_router(router, prefix="/api")
17. Authentication (Basic Example)
from fastapi.security import OAuth2PasswordBearer
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
@app.get("/secure/")
def secure_data(token: str = Depends(oauth2_scheme)):
return {"token": token}
18. Static Files & Templates
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from fastapi import Request
app.mount("/static", StaticFiles(directory="static"), name="static")
templates = Jinja2Templates(directory="templates")
@app.get("/home")
def home(request: Request):
return templates.TemplateResponse("index.html", {"request": request})
19. Common FastAPI Commands
| Command | Description |
|---|---|
uvicorn main:app --reload |
Run development server |
/docs |
Swagger documentation |
/redoc |
Alternative documentation |
Ctrl+C |
Stop server |
pip install "uvicorn[standard]" |
Install complete server dependencies |
20. Best Practices
✅ Use Pydantic models for input/output
✅ Keep routers in separate files for modular code
✅ Add CORS middleware for frontend integration
✅ Implement logging & error handling
✅ Use async functions for I/O-heavy operations
✅ Deploy using Docker / Gunicorn / Uvicorn
21. Deployment (Production)
Option 1: Using Uvicorn + Gunicorn
pip install "uvicorn[standard]" gunicorn
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker
Option 2: Docker
FROM python:3.10
WORKDIR /app
COPY . .
RUN pip install fastapi uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
22. Integration Possibilities
| Tool | Integration Use |
|---|---|
| MongoDB / SQLAlchemy | Database |
| Pandas / Numpy | Data analysis |
| Scikit-learn / XGBoost | ML model prediction APIs |
| Streamlit / React | Frontend UI |
| Docker / K8s | Deployment |
| Prometheus | API performance monitoring |
Awesome ๐ฅ — PySpark is one of the most powerful tools for big data processing, ETL pipelines, and distributed ML.
Here’s your complete, beginner-friendly, end-to-end PySpark notes ๐
PySpark Complete Notes (Beginner Friendly)
1. What is PySpark?
PySpark is the Python API for Apache Spark, a powerful open-source framework used for big data processing, analysis, and machine learning across distributed clusters.
✅ Built on Apache Spark
✅ Handles large-scale data (GBs → TBs) efficiently
✅ Works on clusters (parallel computation)
✅ Supports DataFrames, SQL, MLlib, Streaming
2. Why Use PySpark?
| Feature | Description |
|---|---|
| ⚡ Speed | 100x faster than traditional MapReduce |
| ๐งฉ Scalable | Handles terabytes/petabytes of data |
| ๐ง Easy API | Python-like DataFrame operations |
| ๐พ Supports multiple data sources | CSV, JSON, Parquet, HDFS, S3 |
| ๐ค Machine Learning support | Spark MLlib |
| ☁️ Integrates with | AWS EMR, Databricks, Hadoop, Google Dataproc |
3. PySpark Architecture
+----------------------------------------------------------+
| PySpark |
|----------------------------------------------------------|
| Driver Program (main code) |
| ↓ |
| SparkContext → Cluster Manager → Executors (workers) |
| Each executor runs tasks on partitions of data |
+----------------------------------------------------------+
-
Driver: Your Python script that controls the job.
-
Executor: Worker nodes that perform computations.
-
Cluster Manager: Allocates resources (e.g., YARN, Mesos, Kubernetes).
4. Installation
Local Installation:
pip install pyspark
Check version:
import pyspark
print(pyspark.__version__)
5. Starting PySpark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("MyFirstSparkApp") \
.getOrCreate()
print(spark)
To stop session:
spark.stop()
6. Create a DataFrame
From Python data:
data = [("Alice", 25), ("Bob", 30), ("Cathy", 27)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()
Output:
+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
|Cathy| 27|
+-----+---+
7. Read / Write Data
| Operation | Example |
|---|---|
| Read CSV | df = spark.read.csv("data.csv", header=True, inferSchema=True) |
| Read JSON | df = spark.read.json("data.json") |
| Read Parquet | df = spark.read.parquet("data.parquet") |
| Write CSV | df.write.csv("output/", header=True) |
8. Basic DataFrame Operations
df.printSchema() # View schema
df.columns # Get column names
df.describe().show() # Summary stats
df.select("Name").show()
df.filter(df.Age > 25).show()
df.groupBy("Age").count().show()
df.orderBy("Age", ascending=False).show()
9. Add / Rename / Drop Columns
from pyspark.sql.functions import col, lit
df = df.withColumn("Country", lit("India")) # Add column
df = df.withColumnRenamed("Age", "Years") # Rename column
df = df.drop("Country") # Drop column
10. Handling Missing Data
df.na.drop().show() # Drop null rows
df.na.fill({"Age": 0}).show() # Fill nulls
df.na.replace("Unknown", "N/A").show() # Replace values
11. SQL with PySpark
Register DataFrame as temporary SQL table:
df.createOrReplaceTempView("people")
result = spark.sql("SELECT Name, Age FROM people WHERE Age > 25")
result.show()
12. PySpark Functions
Import frequently used functions:
from pyspark.sql.functions import *
df.select(upper(col("Name")), col("Age") + 5).show()
df.withColumn("AgeGroup", when(col("Age") > 25, "Adult").otherwise("Young")).show()
13. Joins in PySpark
df1.join(df2, on="id", how="inner")
df1.join(df2, on="id", how="left")
df1.join(df2, on="id", how="right")
df1.join(df2, on="id", how="outer")
14. Aggregations
df.groupBy("Country").agg(
count("*").alias("Count"),
avg("Age").alias("AvgAge")
).show()
15. Working with Dates
from pyspark.sql.functions import current_date, year, month, dayofmonth
df = df.withColumn("today", current_date())
df = df.withColumn("year", year(col("today")))
df.show()
16. User Defined Functions (UDFs)
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
def greeting(name):
return "Hello " + name
greet_udf = udf(greeting, StringType())
df = df.withColumn("Greet", greet_udf(col("Name")))
df.show()
17. Machine Learning with PySpark (MLlib)
Example: Linear Regression
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler
data = [(1, 2.0, 3.0), (2, 3.0, 5.0), (3, 4.0, 7.0)]
columns = ["id", "feature", "label"]
df = spark.createDataFrame(data, columns)
assembler = VectorAssembler(inputCols=["feature"], outputCol="features")
train_data = assembler.transform(df)
lr = LinearRegression(featuresCol="features", labelCol="label")
model = lr.fit(train_data)
print(model.coefficients, model.intercept)
18. PySpark with Pandas
Convert Spark DataFrame to Pandas:
pandas_df = df.toPandas()
Convert Pandas DataFrame to Spark:
spark_df = spark.createDataFrame(pandas_df)
19. Partitioning & Parallelism
-
Spark divides data into partitions to process in parallel.
-
Check partitions:
df.rdd.getNumPartitions() -
Repartition:
df = df.repartition(4)
20. Saving & Loading Models
model.save("lr_model")
loaded_model = LinearRegression.load("lr_model")
21. Integration with AWS / GCP
| Platform | Method |
|---|---|
| AWS S3 | spark.read.csv("s3a://bucket/file.csv") |
| Google Cloud Storage | spark.read.csv("gs://bucket/file.csv") |
| Hadoop HDFS | spark.read.csv("hdfs://path/file.csv") |
22. Performance Tips
✅ Use Parquet instead of CSV (columnar & compressed)
✅ Use filter() early (predicate pushdown)
✅ Cache DataFrames with .cache() for reuse
✅ Avoid too many small files
✅ Use broadcast joins for small lookup tables
23. PySpark Data Types
| PySpark Type | Equivalent Python Type |
|---|---|
StringType() |
str |
IntegerType() |
int |
DoubleType() |
float |
BooleanType() |
bool |
TimestampType() |
datetime |
Example:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
schema = StructType([
StructField("Name", StringType(), True),
StructField("Age", IntegerType(), True)
])
df = spark.createDataFrame(data, schema)
24. Common PySpark Functions
| Function | Purpose |
|---|---|
col() |
Access column |
lit() |
Add constant value |
when() |
Conditional column |
count(), sum(), avg() |
Aggregations |
regexp_extract() |
Regex matching |
concat_ws() |
String concatenation |
explode() |
Flatten array column |
25. Example: End-to-End ETL Pipeline
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName("ETL Example").getOrCreate()
# Read
df = spark.read.csv("sales.csv", header=True, inferSchema=True)
# Transform
df_clean = df.filter(col("Amount").isNotNull())
df_final = df_clean.groupBy("Region").agg(sum("Amount").alias("TotalSales"))
# Load
df_final.write.csv("output/sales_summary", header=True)
26. Spark MLlib Use Cases
-
Regression: Linear, Logistic
-
Classification: Decision Trees, Random Forests
-
Clustering: K-Means
-
Feature Engineering: VectorAssembler, StandardScaler
-
Pipelines: Combine multiple transformations
27. PySpark vs Pandas
| Feature | Pandas | PySpark |
|---|---|---|
| Scale | Small data (in-memory) | Big data (distributed) |
| Speed | Single machine | Multi-node cluster |
| API | Easy & rich | Similar syntax |
| Use Case | EDA | ETL, ML on big data |
28. Common Use Cases
✅ ETL on large datasets
✅ Feature engineering for ML
✅ Log analysis
✅ Data cleaning at scale
✅ Joining datasets across clusters
Kubernetes Complete Notes (Beginner-Friendly)
1. What is Kubernetes?
-
Kubernetes (K8s) is an open-source platform for automating deployment, scaling, and management of containerized applications (like Docker containers).
-
It helps you run applications reliably across clusters of machines (physical or virtual).
-
Originally developed by Google, now maintained by the Cloud Native Computing Foundation (CNCF).
2. Why Use Kubernetes?
✅ Automatic scaling of apps
✅ Self-healing — restarts crashed containers
✅ Load balancing between containers
✅ Rolling updates for zero downtime
✅ Portability — works on any cloud or on-prem
3. Basic Terminology
| Concept | Description |
|---|---|
| Cluster | Set of nodes (machines) managed by Kubernetes |
| Node | A worker machine (physical or VM) that runs pods |
| Pod | The smallest deployable unit — one or more containers |
| Container | Application running inside Docker (or similar runtime) |
| Service | Exposes pods to the network (for communication) |
| Deployment | Manages replicas of pods and ensures desired state |
| Namespace | Logical grouping of resources (like folders) |
| Ingress | Manages external access (HTTP/HTTPS) to services |
| ConfigMap / Secret | Store configuration or sensitive data separately |
4. Architecture Overview
+-----------------------------------------------------------+
| Kubernetes Cluster |
|-----------------------------------------------------------|
| Control Plane (Master Node) |
| • kube-apiserver → Handles all requests (API) |
| • etcd → Key-value store for cluster data |
| • scheduler → Assigns pods to worker nodes |
| • controller-mgr → Monitors cluster state |
|-----------------------------------------------------------|
| Worker Nodes |
| • kubelet → Communicates with control plane |
| • kube-proxy → Networking for pods |
| • container runtime (Docker/Containerd) |
+-----------------------------------------------------------+
5. Installation (Local Setup)
Option 1: Minikube (for local testing)
# Install Minikube (on Ubuntu/macOS)
brew install minikube # macOS
choco install minikube # Windows
# Start cluster
minikube start
# Verify setup
kubectl get nodes
Option 2: Cloud Providers
-
Google Kubernetes Engine (GKE)
-
AWS Elastic Kubernetes Service (EKS)
-
Azure Kubernetes Service (AKS)
6. Core Kubernetes Components
๐งฉ Pods
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: myapp
image: nginx
ports:
- containerPort: 80
Deploy:
kubectl apply -f pod.yaml
kubectl get pods
kubectl describe pod myapp-pod
๐ Deployment
Used to manage and scale pods automatically.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deployment
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx
ports:
- containerPort: 80
Deploy and check:
kubectl apply -f deployment.yaml
kubectl get deployments
kubectl get pods
๐ Service
Exposes deployment to network (internal or external).
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
type: NodePort
selector:
app: myapp
ports:
- port: 80
targetPort: 80
nodePort: 30001
Check service:
kubectl get svc
minikube service myapp-service
7. Scaling
kubectl scale deployment myapp-deployment --replicas=5
kubectl get pods
8. Rolling Updates
kubectl set image deployment/myapp-deployment myapp=nginx:latest
kubectl rollout status deployment/myapp-deployment
Rollback:
kubectl rollout undo deployment/myapp-deployment
9. ConfigMaps & Secrets
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
APP_MODE: "production"
Secret:
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
data:
DB_PASSWORD: cGFzc3dvcmQ= # base64 encoded
Mount in pod:
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-secret
10. Namespaces
kubectl create namespace dev
kubectl get namespaces
kubectl apply -f app.yaml -n dev
11. Logs & Monitoring
kubectl logs pod_name
kubectl describe pod pod_name
kubectl top pods # if metrics-server installed
Popular tools:
-
Prometheus + Grafana (metrics)
-
ELK Stack (logs)
-
Lens (GUI dashboard)
12. Real-World Example Flow
-
Dockerize your ML/Web app →
Dockerfile -
Push image to Docker Hub or private registry
-
Create
deployment.yamlandservice.yaml -
Apply configs:
kubectl apply -f deployment.yaml kubectl apply -f service.yaml -
Scale if needed:
kubectl scale deployment myapp --replicas=4 -
Access app:
minikube service myapp-service
13. Useful Commands
| Command | Description |
|---|---|
kubectl get pods |
List all pods |
kubectl get svc |
List all services |
kubectl describe pod <name> |
Pod details |
kubectl delete pod <name> |
Delete pod |
kubectl logs <pod> |
Show logs |
kubectl apply -f file.yaml |
Apply configuration |
kubectl exec -it <pod> -- bash |
Access container shell |
14. Kubernetes vs Docker
| Feature | Docker | Kubernetes |
|---|---|---|
| Scope | Containerization | Orchestration |
| Scale | Single host | Multi-host clusters |
| Self-healing | No | Yes |
| Load Balancing | Manual | Automatic |
| Configuration | Docker CLI | YAML manifests |
15. Key Takeaways
-
Kubernetes = container orchestrator for scaling & managing apps.
-
Works hand-in-hand with Docker.
-
Core concepts: Pod, Deployment, Service, ConfigMap, Namespace.
-
Ideal for ML model serving, microservices, and production apps.
-
Learn
kubectlcommands + YAML basics to get started fast.
Prometheus Complete Notes (Beginner-Friendly)
1. What is Prometheus?
-
Prometheus is an open-source monitoring and alerting toolkit designed for time-series data (metrics that change over time).
-
Commonly used for:
-
Monitoring applications, infrastructure, and Kubernetes clusters.
-
Setting alerts when performance issues or failures occur.
-
Visualizing metrics in Grafana dashboards.
-
2. Key Features
✅ Time-series database
✅ Pull-based metrics collection (scrapes from targets)
✅ Multi-dimensional data model
✅ Powerful query language — PromQL
✅ Integrates easily with Grafana
✅ Lightweight and easy to set up
3. How Prometheus Works (Architecture)
+------------------------------------------------------+
| Prometheus |
|------------------------------------------------------|
| 1. Targets (exporters, apps, K8s nodes) |
| 2. Scrapes metrics via HTTP endpoints (/metrics) |
| 3. Stores data as time-series in local DB |
| 4. Query metrics via PromQL |
| 5. Triggers alerts (Alertmanager) |
| 6. Visualize with Grafana |
+------------------------------------------------------+
4. Key Components
| Component | Description |
|---|---|
| Prometheus Server | Collects and stores metrics data |
| Exporters | Expose metrics in Prometheus format |
| Alertmanager | Sends notifications (Email, Slack, etc.) |
| PromQL | Query language for analyzing metrics |
| Pushgateway | For short-lived jobs that push metrics |
| Grafana | For dashboard visualization |
5. Installation (Local Setup)
Step 1: Download Prometheus
-
Extract and run:
./prometheus --config.file=prometheus.yml
Visit: http://localhost:9090
6. Configuration File (prometheus.yml)
This file defines what to monitor (targets) and how often to scrape.
Example:
global:
scrape_interval: 15s # How often to collect metrics
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "myapp"
static_configs:
- targets: ["localhost:8000"]
Here:
-
Prometheus scrapes metrics from itself (
9090) and your app (8000).
7. Exporters (for Different Systems)
Exporters are lightweight programs that expose metrics.
| Exporter | Purpose |
|---|---|
| Node Exporter | OS-level metrics (CPU, RAM, Disk) |
| cAdvisor | Container metrics |
| Kube State Metrics | Kubernetes cluster info |
| Blackbox Exporter | Endpoint uptime check |
| MySQL/Postgres Exporter | Database metrics |
| JMX Exporter | Java apps (JVM metrics) |
Run Node Exporter:
./node_exporter
Access metrics: http://localhost:9100/metrics
8. Integrating Prometheus with Python / Flask App
Step 1: Install client library
pip install prometheus-client
Step 2: Add metrics endpoint in your app
from flask import Flask, Response
from prometheus_client import Counter, generate_latest
app = Flask(__name__)
REQUEST_COUNT = Counter('request_count', 'Total web requests')
@app.route("/")
def home():
REQUEST_COUNT.inc()
return "Hello, Prometheus!"
@app.route("/metrics")
def metrics():
return Response(generate_latest(), mimetype="text/plain")
if __name__ == "__main__":
app.run(port=8000)
Now Prometheus can scrape metrics from /metrics endpoint every few seconds.
9. Querying Metrics with PromQL
PromQL = Prometheus Query Language
Common examples:
| Query | Meaning |
|---|---|
up |
Show status (1 = up, 0 = down) of all targets |
node_cpu_seconds_total |
Total CPU time |
rate(http_requests_total[5m]) |
Requests per second (last 5 minutes) |
sum(rate(http_requests_total[1m])) by (instance) |
Requests per instance |
avg_over_time(cpu_usage[10m]) |
Average over last 10 minutes |
You can run these queries at http://localhost:9090/graph.
10. Setting Alerts
Add alerting rules:
groups:
- name: alert.rules
rules:
- alert: HighCPUUsage
expr: rate(node_cpu_seconds_total[1m]) > 0.8
for: 1m
labels:
severity: critical
annotations:
summary: "High CPU usage detected"
Start Prometheus with Alertmanager:
./prometheus --config.file=prometheus.yml
Alertmanager can notify:
-
Email
-
Slack
-
PagerDuty
-
Telegram
11. Visualization with Grafana
-
Install Grafana → https://grafana.com/grafana/download
-
Open Grafana at
http://localhost:3000 -
Add Prometheus as a data source:
-
URL:
http://localhost:9090
-
-
Import dashboards (Node, Kubernetes, App metrics)
Now you can see live charts, e.g. CPU, memory, app response time.
12. Prometheus with Kubernetes
In Kubernetes, Prometheus monitors pods, nodes, and services.
You can deploy it easily using:
kubectl create namespace monitoring
kubectl apply -f https://github.com/prometheus-operator/kube-prometheus/releases/latest/download/manifests/setup
Or use Helm:
helm install prometheus prometheus-community/prometheus
This installs:
-
Prometheus server
-
Alertmanager
-
Node exporter
-
kube-state-metrics
Then access:
kubectl port-forward svc/prometheus-server 9090:80 -n monitoring
13. Common Use Cases
-
Monitor Docker / Kubernetes clusters
-
Track ML model latency & prediction counts
-
Set alerts for high memory usage or downtime
-
Integrate with Grafana dashboards for live monitoring
-
Observe system health trends over time
14. Key Takeaways
| Concept | Description |
|---|---|
| Prometheus | Monitoring + alerting system |
| Metrics endpoint | /metrics — exposes time-series data |
| PromQL | Query and analyze data |
| Exporters | Provide metrics for different systems |
| Alertmanager | Triggers alerts on defined conditions |
| Grafana | Visualization tool for Prometheus data |
15. Mini Example: Full Flow Recap
-
Run a Python/Flask app with
/metricsendpoint -
Install Prometheus and configure:
- job_name: 'flask' static_configs: - targets: ['localhost:8000'] -
Start Prometheus:
./prometheus --config.file=prometheus.yml -
Open
http://localhost:9090 -
Query:
request_count_total -
Add Grafana dashboard → visualize in charts.
16. Tools That Work With Prometheus
-
Grafana → dashboards
-
Alertmanager → notifications
-
Thanos → long-term storage
-
VictoriaMetrics → scalable alternative
-
Prometheus Operator → easy setup in Kubernetes
17. Quick Commands
| Command | Description |
|---|---|
./prometheus --config.file=prometheus.yml |
Start Prometheus |
kubectl port-forward svc/prometheus-server 9090:80 |
Access in K8s |
curl localhost:9090/metrics |
Check metrics endpoint |
systemctl status prometheus |
Check service status (Linux) |
Here’s a beginner-friendly, short and complete notes summary on Grafana ๐
๐งญ Grafana – Short Notes
๐ 1. What is Grafana?
Grafana is an open-source visualization and monitoring tool used to analyze metrics, logs, and traces from various data sources.
It helps you create interactive dashboards to monitor system performance, infrastructure, and applications.
⚙️ 2. Key Features
| Feature | Description |
|---|---|
| ๐ Dashboards | Visualize time-series data in real time |
| ๐งฉ Plugins | Extend functionality (data sources, panels, apps) |
| ๐ Alerts | Set thresholds and receive alerts via email, Slack, etc. |
| ๐️ Data Sources | Connect to Prometheus, InfluxDB, ElasticSearch, Loki, MySQL, etc. |
| ๐ฅ User Management | Role-based access control |
| ๐ง Templating | Dynamic, parameterized dashboards |
๐งฑ 3. Grafana Architecture
+-------------------+
| Web UI (Dashboards) |
+----------+--------+
|
+----------v--------+
| Backend Server |
| (API + Logic) |
+----------+--------+
|
+----------v--------+
| Data Sources |
| (Prometheus, DBs) |
+-------------------+
-
Frontend (UI): Displays dashboards
-
Backend: Handles authentication, alerts, queries
-
Data Sources: Provide time-series or metric data
๐ 4. Common Data Sources
-
Prometheus – metrics monitoring
-
Loki – log aggregation
-
InfluxDB – time-series data
-
Elasticsearch – search + analytics
-
MySQL / PostgreSQL – SQL databases
-
Cloud Sources – AWS CloudWatch, Azure Monitor, GCP Stackdriver
๐งญ 5. Installing Grafana (Quick Setup)
On Ubuntu / Debian:
sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Access at: http://localhost:3000
Default credentials:
user: admin / password: admin
๐งฉ 6. Creating Dashboards
-
Login → Click “+” → Dashboard → Add new panel
-
Choose a data source (e.g., Prometheus)
-
Write a query (e.g.,
up,cpu_usage_total) -
Choose visualization type (Graph, Gauge, Table, etc.)
-
Save dashboard
๐ 7. Alerts & Notifications
-
Add alert on a panel → Set condition (e.g., CPU > 80%)
-
Configure Notification Channel (Slack, Email, PagerDuty)
-
Alert Rules can be viewed & managed centrally
๐งฑ 8. Panels & Visualizations
| Type | Use |
|---|---|
| Time Series | Continuous data (CPU, memory) |
| Gauge | Current metric value |
| Bar Gauge | Compare multiple values |
| Table | Tabular data |
| Stat | Single numeric indicator |
| Heatmap | Distribution visualization |
๐งฐ 9. Variables (Templating)
-
Create dynamic dashboards with dropdowns.
Example:
$server → all available server names
$metric → all available metrics
Used in query as:
avg(cpu_usage{instance="$server"})
๐ง 10. Grafana + Prometheus Workflow
-
Prometheus collects metrics from servers/applications
-
Grafana connects to Prometheus as data source
-
Dashboards visualize time-series metrics
-
Alerts notify when thresholds are crossed
๐ก️ 11. Authentication & Roles
-
Admin – full control
-
Editor – can modify dashboards
-
Viewer – read-only access
Supports:
-
LDAP, OAuth, Google, Azure AD, GitHub authentication
☁️ 12. Cloud & Enterprise Versions
| Type | Description |
|---|---|
| Grafana OSS | Free open-source |
| Grafana Cloud | Hosted SaaS version |
| Grafana Enterprise | Adds support, SSO, auditing |
๐งฉ 13. Integration Examples
-
Prometheus + Grafana → system metrics
-
Loki + Grafana → centralized log dashboard
-
Tempo + Grafana → distributed tracing
-
MySQL + Grafana → business analytics
๐ 14. Common Use Cases
✅ Infrastructure & server monitoring
✅ Application performance tracking
✅ Business KPIs visualization
✅ Log + Metric correlation (via Loki)
✅ Cloud resource monitoring
๐งพ 15. Grafana Query Examples
PromQL (Prometheus):
node_cpu_seconds_total{mode="idle"}
avg(rate(http_requests_total[5m]))
InfluxQL (InfluxDB):
SELECT mean("usage") FROM "cpu" WHERE time > now() - 1h GROUP BY time(1m)
๐งฉ 16. Short Commands & Ports
| Command | Purpose |
|---|---|
sudo systemctl start grafana-server |
Start service |
sudo systemctl stop grafana-server |
Stop service |
sudo systemctl status grafana-server |
Check status |
| Default Port: | 3000 |
Comments
Post a Comment