📊 Statistics for Data Science – Cheat Sheet

1. Types of Statistics

Descriptive Statistics → Summarizing data (mean, median, mode, variance, std. dev., etc.)
Inferential Statistics → Drawing conclusions about a population using sample data (hypothesis testing, confidence intervals).

2. Types of Data

Qualitative (Categorical)
- Nominal → Categories without order (e.g., Gender, Colors).
- Ordinal → Categories with order (e.g., Ratings: Poor, Average, Good).
Quantitative (Numerical)
- Discrete → Countable (e.g., No. of students).
- Continuous → Measurable (e.g., Height, Weight).

3. Measures of Central Tendency

Mean = Average
Median = Middle value (robust to outliers)
Mode = Most frequent value

4. Measures of Dispersion

Range = Max – Min
Variance = Average squared deviation from mean
Standard Deviation (σ) = √Variance
IQR (Interquartile Range) = Q3 – Q1
Coefficient of Variation (CV) = (Std. Dev. / Mean) × 100

5. Probability Basics

Probability = (Favorable outcomes / Total outcomes)
Addition Rule: P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
Multiplication Rule: P(A ∩ B) = P(A) × P(B|A)

6. Probability Distributions

Binomial Distribution → Discrete, fixed trials (e.g., coin toss).
Poisson Distribution → Discrete, number of events in time/space (e.g., calls per hour).
Normal Distribution → Continuous, bell-shaped, mean=median=mode.
Uniform Distribution → Equal probability for all outcomes.
Exponential Distribution → Continuous, time between events.

7. Sampling Methods

Random Sampling
Stratified Sampling (split into groups, sample each)
Cluster Sampling (random clusters)
Systematic Sampling (every k-th item)

8. Hypothesis Testing

Formulate hypotheses:
- Null Hypothesis (H₀): No effect/difference.
- Alternative Hypothesis (H₁): Significant effect/difference.
Steps:
- Select significance level (α, usually 0.05).
- Choose test (t-test, chi-square, ANOVA).
- Calculate test statistic & p-value.
- Compare p-value with α.
Common Tests:
- Z-test → Large sample, known variance.
- T-test → Small sample, unknown variance.
  - One-sample, two-sample, paired.
- Chi-Square Test → Independence of categorical variables.
- ANOVA → Compare means of 3+ groups.
- Mann-Whitney U Test → Non-parametric test for medians.

9. Confidence Intervals

Formula:
```
CI = sample_mean ± Z * (σ / √n)
```
Interpretation: "We are 95% confident the population parameter lies within this range."

10. Correlation & Covariance

Covariance: Measures joint variability (can be positive/negative).
Correlation (r): Standardized measure (–1 ≤ r ≤ 1).
- +1 = strong positive, –1 = strong negative, 0 = no relation.

11. Regression Basics

Simple Linear Regression: y = β₀ + β₁x + ε
Multiple Linear Regression: y = β₀ + β₁x₁ + β₂x₂ + ... + ε
Key metrics: R², Adjusted R², p-values, F-statistic.

12. Outliers

Detection methods:
- Z-score > |3|
- IQR method → Outside [Q1 – 1.5×IQR, Q3 + 1.5×IQR]
Handling: Remove, cap, or transform (log).

13. Bias & Variance

Bias: Error from wrong assumptions (underfitting).
Variance: Error from too much sensitivity to training data (overfitting).
Tradeoff → Goal is balance (low bias, low variance).

14. Important Concepts

Central Limit Theorem (CLT): Sample means from any distribution tend to follow a normal distribution as n → ∞.
Law of Large Numbers: As sample size increases, sample mean → population mean.
p-value: Probability of getting results at least as extreme as observed, assuming H₀ is true.
Type I Error (α): Rejecting H₀ when it is true (false positive).
Type II Error (β): Failing to reject H₀ when it is false (false negative).

15. Interview Quick Tips

Always clarify type of data before choosing a test.
Use median & IQR when data is skewed.
State assumptions (e.g., normality, equal variance) before applying tests.
Be ready to interpret results (not just calculate).

🗄️ SQL Cheatsheet for Data Science Interviews

1. Basics

Database: Collection of tables
Table: Rows (records) + Columns (fields)
Query: Command to interact with DB

-- Select columns from table
SELECT column1, column2 
FROM table_name 
WHERE condition;

2. Selecting Data

SELECT * FROM employees;       -- all columns
SELECT DISTINCT department FROM employees;   -- unique values
SELECT salary AS monthly_salary FROM employees; -- alias

3. Filtering Rows

-- Comparison operators: =, !=, >, <, >=, <=
-- Logical operators: AND, OR, NOT
SELECT * FROM employees
WHERE age BETWEEN 25 AND 35
AND department IN ('HR', 'Finance')
AND name LIKE 'A%';

4. Sorting & Limiting

SELECT * FROM employees
ORDER BY salary DESC, age ASC
LIMIT 10;

5. Aggregate Functions

SELECT department, COUNT(*) AS num_employees, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

Common functions: COUNT, SUM, AVG, MIN, MAX

6. Joins

-- Inner Join (matching rows)
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d
ON e.dept_id = d.id;

-- Left Join (all left + matched right)
-- Right Join (all right + matched left)
-- Full Outer Join (all rows)

7. Subqueries

-- In WHERE
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- In FROM (derived table)
SELECT dept_id, avg_salary
FROM (SELECT dept_id, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY dept_id) t;

8. Window Functions

ROW_NUMBER: unique row index
RANK / DENSE_RANK: ranking with/without gaps
NTILE(n): bucket into n groups

SELECT name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

Running totals / moving averages:

SELECT name, salary,
       SUM(salary) OVER (ORDER BY hire_date) AS running_salary
FROM employees;

9. Case Statements

SELECT name, 
       CASE 
         WHEN salary > 80000 THEN 'High'
         WHEN salary BETWEEN 50000 AND 80000 THEN 'Medium'
         ELSE 'Low'
       END AS salary_band
FROM employees;

10. Common Table Expressions (CTE)

WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY dept_id
)
SELECT e.name, e.salary, d.avg_salary
FROM employees e
JOIN dept_avg d
ON e.dept_id = d.dept_id;

11. Set Operations

UNION → combines distinct results
UNION ALL → includes duplicates
INTERSECT → common rows
EXCEPT / MINUS → rows in first query not in second

12. Data Cleaning

-- Remove duplicates
SELECT DISTINCT * FROM employees;

-- Handle NULLs
SELECT COALESCE(salary, 0) AS salary FROM employees;

-- String functions
SELECT TRIM(name), UPPER(name), LOWER(name), LENGTH(name), SUBSTRING(name,1,3)
FROM employees;

-- Date functions
SELECT CURRENT_DATE, EXTRACT(YEAR FROM hire_date)
FROM employees;

13. Keys & Constraints

Primary Key → unique & not null
Foreign Key → references another table
Unique, Check, Not Null

14. Performance Tips

Use indexes on frequently filtered/joined columns
Avoid SELECT *, fetch only needed columns
Use EXPLAIN to check query plan
Use CTEs & window functions instead of deeply nested subqueries

15. Popular Interview Queries

Second highest salary

SELECT MAX(salary) 
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Top 3 salaries per department

SELECT * FROM (
    SELECT name, dept_id, salary,
           RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk
    FROM employees
) t
WHERE rnk <= 3;

Find duplicates

SELECT name, COUNT(*)
FROM employees
GROUP BY name
HAVING COUNT(*) > 1;

Percentage contribution of salary by department

SELECT department, 
       SUM(salary) AS dept_salary,
       SUM(salary)*100.0 / SUM(SUM(salary)) OVER() AS pct_contribution
FROM employees
GROUP BY department;

This covers almost all SQL topics that DS interviews usually test: selection, joins, aggregation, window functions, subqueries, CTEs, string/date handling, and performance optimization.

🐍 Python for Data Science Cheatsheet

1. Basics

# Variables & Datatypes
x = 10          # int
y = 3.14        # float
name = "Alice"  # str
flag = True     # bool

# Type & Casting
type(x)         # <class 'int'>
int(y), str(x), float(x)

# String Operations
s = "Hello"
s.upper(), s.lower(), s.isupper(), s.islower()
len(s)
s.replace("H", "J")
s.split(" ")

2. Data Structures

List

lst = [1, 2, 3]
lst.append(4)
lst.extend([5,6])
lst.insert(0,0)
lst.pop()
lst.count(2)
lst.sort(), lst.reverse()

Tuple

t = (1,2,3)
t.count(2)
t.index(3)
# Immutable

Dictionary

d = {'a':1, 'b':2}
d['a']
d.get('c', 0)
d.keys(), d.values(), d.items()
d.pop('b')
d.clear()

Set

s = {1,2,3}
s.add(4)
s.remove(2)
s.union({5,6})
s.intersection({3,4})
s.difference({1,5})

3. Loops & Comprehensions

# Loop
for i in range(5):
    print(i)

# List Comprehension
squared = [x**2 for x in range(10) if x%2==0]

# Dictionary Comprehension
squared_dict = {x: x**2 for x in range(5)}

# Set Comprehension
squared_set = {x**2 for x in range(5)}

4. Functions & Lambda

def add(a, b=0):
    return a+b

# Lambda
f = lambda x,y: x+y
map(lambda x: x*2, [1,2,3])
filter(lambda x: x%2==0, [1,2,3,4])
from functools import reduce
reduce(lambda x,y: x+y, [1,2,3])

5. NumPy Basics

import numpy as np

arr = np.array([1,2,3])
arr.ndim
arr.shape
np.zeros((2,3))
np.arange(1,10,2)
np.random.randint(1,100,size=(3,3))

# Operations
arr + 2
np.add(arr, 2)
np.multiply(arr, 2)
np.matmul(np.array([[1,2],[3,4]]), np.array([[2,0],[1,2]]))

# Indexing & Slicing
arr[0], arr[1:3]
np.sum(arr, axis=0)
np.concatenate([arr1, arr2])
np.reshape(arr, (3,1))

6. Pandas Basics

import pandas as pd

# Series
s = pd.Series([1,2,3], index=['a','b','c'])
s.mean(), s.median(), s.mode()
s.describe()

# DataFrame
df = pd.DataFrame({'A':[1,2], 'B':[3,4]})
df.head(5), df.tail(5)
df.info(), df.shape
df.columns, df.dtypes
df['A'], df[['A','B']]
df.isnull().sum()
df.dropna(), df.fillna(0)
df.duplicated(), df.drop_duplicates(inplace=True)

# Indexing
df.loc[0,'A']      # label-based
df.iloc[0,0]       # position-based

# GroupBy
df.groupby('A')['B'].mean()
df.agg({'A':['min','max'], 'B':['mean','sum']})

# Merge/Concat
pd.concat([df1, df2], axis=1)
df1.merge(df2, on='key')

# Pivot
pd.pivot_table(df, index='A', columns='B', values='C')

# Map / Replace
df['A'] = df['A'].map(lambda x:x+2)
df['B'].replace(0, np.nan, inplace=True)

7. Data Visualization

import matplotlib.pyplot as plt
import seaborn as sns

# Matplotlib
plt.plot(df['A'])
plt.hist(df['B'], bins=10)
plt.scatter(df['A'], df['B'])
plt.boxplot(df['A'])
plt.pie([10,20,30], labels=['X','Y','Z'])

# Seaborn
sns.distplot(df['A'])
sns.pairplot(df)
sns.heatmap(df.corr(), annot=True)
sns.countplot(df['A'])
sns.boxplot(x='A', y='B', data=df)
sns.violinplot(x='A', y='B', data=df)

8. Data Preprocessing

# Handling missing values
df['A'].fillna(df['A'].mean(), inplace=True)

# Encoding categorical variables
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le = LabelEncoder()
df['cat'] = le.fit_transform(df['cat'])
ohe = OneHotEncoder()
encoded = ohe.fit_transform(df[['cat']]).toarray()

# Scaling
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler = StandardScaler()
df[['num']] = scaler.fit_transform(df[['num']])
minmax = MinMaxScaler()
df[['num']] = minmax.fit_transform(df[['num']])

9. Train-Test Split

from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

10. Quick Tips for DS Interviews

Know Python data structures thoroughly.
Practice list/dict/set comprehensions.
Be able to clean, transform, and manipulate data with Pandas.
Know NumPy for array operations and vectorization.
Be ready to plot insights using Matplotlib/Seaborn.
Practice train-test split, encoding, and scaling for ML pipelines.

🧾 Machine Learning Cheatsheet (Interview-Focused)

🔹 1. Types of ML

Supervised Learning → Input + Output labels (Regression, Classification).
Unsupervised Learning → Only input, find patterns (Clustering, Dimensionality Reduction).
Reinforcement Learning → Agent learns via rewards.

🔹 2. Common Algorithms

Regression

Linear Regression: ( y = \beta_0 + \beta_1x + \epsilon )
Regularization:
- Ridge (L2 penalty) → shrinks coefficients.
- Lasso (L1 penalty) → can set coefficients to zero (feature selection).

Classification

Logistic Regression: Uses sigmoid → ( P(y=1) = \frac{1}{1+e^{-z}} ).
Decision Tree: Splits based on features to minimize impurity (Gini/Entropy).
Random Forest: Ensemble of decision trees (bagging).
Gradient Boosting (XGBoost, LightGBM, CatBoost): Trees built sequentially to reduce error.
SVM: Finds max-margin hyperplane; kernel trick for non-linear data.
kNN: Classify based on majority of k-nearest neighbors.

Unsupervised

k-Means: Minimizes within-cluster variance.
Hierarchical Clustering: Agglomerative/divisive merging of clusters.
DBSCAN: Density-based clustering.
PCA: Projects data to lower dimensions (maximize variance).

🔹 3. Key Concepts

Bias-Variance Tradeoff:
- High Bias → Underfitting.
- High Variance → Overfitting.
Overfitting Prevention: Cross-validation, regularization, pruning, dropout (NN).
Feature Engineering: Encoding (One-Hot, Label), Scaling (Standard, MinMax), Feature selection.
Evaluation Metrics:
- Regression: MSE, RMSE, MAE, ( R^2 ).
- Classification: Accuracy, Precision, Recall, F1, ROC-AUC, Log Loss.
- Imbalanced Data → Use Precision/Recall, ROC-AUC, PR-AUC.
Cross-Validation: k-Fold, Stratified k-Fold (for classification).

🔹 4. Probability & Statistics in ML

Bayes Theorem:
[
P(A|B) = \frac{P(B|A) P(A)}{P(B)}
]
Expectation: Mean value of distribution.
Variance/Std. Dev.: Spread of data.
Normal Distribution: Symmetric, mean=median=mode.
Central Limit Theorem: Sampling distribution tends to normal.

🔹 5. Neural Networks (Basics)

Perceptron: ( y = f(\sum w_ix_i + b) ).
Activation Functions:
- Sigmoid → (0,1), vanishing gradients.
- ReLU → efficient, sparse activation.
- Tanh → (-1,1).
Backpropagation: Gradient descent on weights using chain rule.
Optimizers: SGD, Adam, RMSProp.

🔹 6. Ensemble Learning

Bagging: Parallel training on bootstrapped samples (e.g., Random Forest).
Boosting: Sequential training to fix previous errors (e.g., AdaBoost, XGBoost).
Stacking: Combine predictions of multiple models using a meta-learner.

🔹 7. Model Selection & Evaluation

Hyperparameter Tuning: Grid Search, Random Search, Bayesian Optimization.
Regularization: L1 (Lasso), L2 (Ridge).
Early Stopping: Stop training when validation loss doesn’t improve.

🔹 8. Common Interview Questions (Quick Recall)

Difference between supervised, unsupervised, and reinforcement learning?
Why does logistic regression use sigmoid instead of linear function?
What is multicollinearity and how do you detect it?
Explain bias-variance tradeoff.
How do you handle imbalanced datasets?
Difference between bagging and boosting?
What is ROC-AUC and when is it better than accuracy?
Difference between PCA and LDA?
Explain gradient descent and learning rate effect.
When to prefer Random Forest vs Gradient Boosting?

🧠 Deep Learning Cheatsheet (DS Interview-Focused)

🔹 1. What is Deep Learning?

A subset of machine learning that uses artificial neural networks (ANNs) with multiple hidden layers to learn complex patterns from data.
Learns hierarchical representations — low-level features in early layers, high-level features in deeper layers.
Best suited for unstructured data: images, audio, video, text.

2. Core Components of a Neural Network

Neuron (Perceptron): Basic computational unit
$y = f(\sum w_i x_i + b)$
Weights (w): Learnable parameters
Bias (b): Shifts the activation
Activation Function (f): Introduces non-linearity

3. Activation Functions

Function	Range	Use Case
Sigmoid ( \frac{1}{1+e^{-x}} )	(0, 1)	Binary classification output
Tanh ( \frac{e^x - e^{-x}}{e^x + e^{-x}} )	(-1, 1)	Hidden layers
ReLU ( \max(0, x) )	[0, ∞)	Most common for hidden layers
Leaky ReLU	(-∞, ∞)	Prevents dying ReLU
Softmax	(0, 1) sum=1	Multi-class output layer

Training Neural Networks

Forward Propagation → Calculate output layer predictions.
Loss Functions:
- Regression → MSE, MAE.
- Classification → Cross-Entropy, Hinge Loss.
Backpropagation → Compute gradients using chain rule.
Gradient Descent Variants:

Batch GD, Mini-batch GD, Stochastic GD.
Optimizers: SGD, Adam, RMSProp, Adagrad.

Key Concepts

Epoch: One pass through entire training data.
Batch Size: Number of samples per gradient update.
Learning Rate: Step size in gradient descent. Too high → divergence; too low → slow.
Overfitting Prevention:

Regularization (L1, L2)
Dropout
Early Stopping
Data Augmentation

Neural Network Architecture

Input Layer – Features
Hidden Layers – Transformations/feature learning
Output Layer – Predictions

Forward & Backpropagation

Forward pass: Compute predictions
Loss function: Measure error
Backward pass (Backpropagation): Compute gradients using chain rule
Gradient Descent: Update weights to minimize loss
$w = w - η \frac{\partial L}{\partial w}$
Learning Rate (η): Step size during optimization

Optimization Algorithms

Optimizer	Description
SGD	Basic gradient descent
Momentum	Adds previous gradients to speed up convergence
RMSProp	Adapts learning rate per parameter
Adam	Combines Momentum + RMSProp (most used)

Common loss function

Task	Loss
Regression	MSE (Mean Squared Error)
Binary Classification	Binary Cross-Entropy
Multi-class Classification	Categorical Cross-Entropy
Ranking	Hinge Loss

Regularization techniques

Method	Purpose
L1/L2 Regularization	Penalize large weights
Dropout	Randomly deactivate neurons
Batch Normalization	Normalize activations to stabilize training
Early Stopping	Stop training before overfitting

CNN (Convolutional Neural Networks)

Use Case: Image data, spatial features.
Layers:
- Convolution → extracts features using filters/kernels.
- Pooling → reduces dimensions (Max Pooling, Avg Pooling).
- Fully Connected → classification head.
Concepts:
- Padding (same vs valid).
- Stride.
- Transfer Learning with pre-trained models (ResNet, VGG, Inception).

🔹 6. RNN (Recurrent Neural Networks)

Use Case: Sequential data (time series, NLP).
Problem: Vanishing/Exploding gradients.
Variants:
- LSTM (Long Short-Term Memory) → handles long dependencies.
- GRU (Gated Recurrent Unit) → simpler, fewer parameters.
Attention Mechanism → focus on relevant parts of sequence.

🔹 7. Transformers (Modern NLP)

Self-Attention: Captures relationships between tokens.
Architecture: Encoder–Decoder.
Popular Models: BERT, GPT, T5.
Why Transformers > RNN: Parallelization, better long-range dependency capture.

🔹 8. Autoencoders

Use Case: Dimensionality reduction, anomaly detection.
Structure: Encoder (compress) + Decoder (reconstruct).
Variational Autoencoder (VAE) → generates new samples.

🔹 9. Generative Models

GAN (Generative Adversarial Networks):
- Generator → creates fake data.
- Discriminator → distinguishes real vs fake.
Applications: Image synthesis, text-to-image, deepfakes.

🔹 10. Regularization & Normalization

Dropout → randomly deactivate neurons during training.
Batch Normalization → normalizes layer outputs, stabilizes training.
Weight Decay (L2 Regularization) → penalizes large weights.

🔹 11. Evaluation Metrics (DL-specific)

Classification → Accuracy, Precision, Recall, F1, AUC.
Object Detection → IoU (Intersection over Union), mAP (mean Average Precision).
Segmentation → Dice Coefficient, Jaccard Index.
Language Models → Perplexity, BLEU score.

10. Key Deep Learning Tricks

Weight Initialization: Use He or Xavier for faster convergence
Learning Rate Scheduling: Reduce LR over time
Data Augmentation: Improves generalization (especially in vision tasks)
Transfer Learning: Use pre-trained models for small datasets

Deployment Best Practices

Save model: .h5 (Keras), .pt (PyTorch)
Export for inference: ONNX, TF Serving
Monitor drift: Check model performance post-deployment
Use GPUs/TPUs for large-scale training

🔹 13. Common Interview Questions (Quick Review)

What is the vanishing gradient problem and how to solve it?
Difference between CNN and RNN?
Why use ReLU over sigmoid?
What is the role of batch normalization?
Explain LSTM internals (gates and cell state).
How does attention work in transformers?
What is transfer learning and why is it useful?
How do you prevent overfitting in deep networks?
Explain dropout and how it works.
What is the difference between batch size and epoch?

📌 Tips for Interviews

Focus on intuitions behind architectures, not just formulas.
Be ready to draw neural network diagrams and explain data flow.
Know pros/cons and real-world applications of CNNs, RNNs, Transformers.
Be comfortable with libraries like TensorFlow, Keras, and PyTorch.

Perfect! Here’s a NLP (Natural Language Processing) Cheatsheet for Data Science interviews, structured for quick revision. It covers concepts, preprocessing, feature extraction, and popular models.

🧠 NLP Cheatsheet for Data Science

1. Basics

NLP → Process & analyze text data using algorithms & models.
Applications: Sentiment analysis, chatbots, translation, summarization, topic modeling.

Common terms:

Corpus → Collection of text
Token → Word or sentence unit
Vocabulary → Set of unique tokens
Stopwords → Common words with little semantic value (e.g., “is”, “the”)
Stemming → Reduce words to root (running → run)
Lemmatization → Convert to dictionary form (better than stemming)

2. Text Preprocessing

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

text = "NLTK is a leading platform for building Python programs!"

# Lowercase
text = text.lower()

# Remove punctuation
text = re.sub(r'[^a-zA-Z]', ' ', text)

# Tokenization
from nltk.tokenize import word_tokenize, sent_tokenize
words = word_tokenize(text)
sentences = sent_tokenize(text)

# Remove stopwords
stop_words = set(stopwords.words('english'))
words = [w for w in words if w not in stop_words]

# Stemming
ps = PorterStemmer()
words_stemmed = [ps.stem(w) for w in words]

# Lemmatization
lemmatizer = WordNetLemmatizer()
words_lemma = [lemmatizer.lemmatize(w) for w in words]

3. Text Representation

3.1 Bag-of-Words (BoW)

Counts frequency of each word.
Example using CountVectorizer:

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=500)
X = cv.fit_transform(corpus).toarray()

3.2 TF-IDF

Term Frequency-Inverse Document Frequency → weighs words by importance.

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(max_features=500)
X = tfidf.fit_transform(corpus).toarray()

3.3 Word Embeddings

Capture semantic meaning of words.
Word2Vec, GloVe, FastText
Libraries: gensim, spacy, tensorflow, pytorch

4. Text Similarity & NLP Tasks

Cosine Similarity → Measure similarity between vectors

from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vec1, vec2)

Text Classification → spam detection, sentiment analysis
Named Entity Recognition (NER) → extract names, locations
POS Tagging → Part-of-speech tagging (noun, verb, etc.)
Topic Modeling → LDA (Latent Dirichlet Allocation)

5. NLP Libraries

NLTK → Preprocessing, tokenization, stopwords, stem/lemma
spaCy → Fast tokenization, NER, POS tagging
gensim → Topic modeling, Word2Vec
TextBlob → Sentiment analysis
scikit-learn → TF-IDF, vectorization, ML models
transformers (HuggingFace) → BERT, GPT, other transformer models

6. Sequence Models for NLP

Model	Use Case	Key Points
RNN	Text, sequential data	Maintains hidden state; suffers vanishing gradient
LSTM	Long sequences	Solves RNN vanishing gradient; uses gates
GRU	Lightweight LSTM	Fewer parameters, faster
Transformer	Modern NLP	Uses attention mechanism; parallelizable; BERT, GPT

7. Pretrained Models

Word Embeddings: GloVe, Word2Vec, FastText
Transformers:
- BERT → Contextual embeddings, masked language modeling
- GPT → Text generation, autoregressive
- RoBERTa, DistilBERT → Optimized BERT variants
Libraries: transformers (Hugging Face), torch, tensorflow

8. NLP Metrics

Task	Metric
Classification	Accuracy, Precision, Recall, F1-score, ROC-AUC
Sequence Generation	BLEU, ROUGE, METEOR
Language Modeling	Perplexity

9. Feature Engineering Tips

Remove stopwords, punctuation, numbers
Lowercase & normalize text
Consider n-grams for context (bigrams, trigrams)
Use TF-IDF or embeddings instead of raw counts
Handle imbalanced classes (SMOTE, weighted loss)

10. Quick Interview Qs

Difference between stemming and lemmatization?
How is TF-IDF better than Bag-of-Words?
What is word embedding? Why is it useful?
Explain RNN, LSTM, GRU differences.
How does attention mechanism work in transformers?
What are some text preprocessing steps?
Difference between context-free embeddings (Word2Vec) and contextual embeddings (BERT)?
How to handle OOV (out-of-vocabulary) words?
How to measure similarity between two sentences?
Popular pretrained NLP models for sentiment analysis?

Perfect! Here’s a Generative AI Cheatsheet tailored for Data Science / AI interviews, structured for quick revision. Covers concepts, architectures, models, and evaluation.

🤖 Generative AI Cheatsheet (Interview-Focused)

1. What is Generative AI?

Definition: AI that creates new content similar to training data.
Content Types: Text, images, audio, video, code, 3D models.
Applications:
- Text: Chatbots, summarization, code generation
- Images: AI art, deepfakes
- Audio: Music generation, speech synthesis
- Video: Animation, video synthesis

2. Key Concepts

Discriminative vs Generative Models:

Type Task Example

Discriminative Predict label from input Logistic Regression, SVM

Generative Learn data distribution & generate data GAN, VAE, Diffusion Models
Latent Space: Compressed representation capturing data features.
Sampling: Process of generating new data points from learned distribution.

Type	Task	Example
Discriminative	Predict label from input	Logistic Regression, SVM
Generative	Learn data distribution & generate data	GAN, VAE, Diffusion Models

3. Popular Generative Models

3.1 GANs (Generative Adversarial Networks)

Components:
- Generator → Creates fake data
- Discriminator → Distinguishes real vs fake
Objective: Minimax game
[
\min_G \max_D V(D,G) = E_{x\sim P_{data}}[\log D(x)] + E_{z\sim P_z}[\log(1-D(G(z)))]
]
Applications: Image synthesis, super-resolution, style transfer

3.2 VAEs (Variational Autoencoders)

Encoder → Maps input to latent distribution
Decoder → Reconstructs data from latent vector
Probabilistic latent space → Sample new data
Applications: Image generation, anomaly detection

3.3 Diffusion Models

Generate data by gradually denoising from Gaussian noise
Examples: DALL-E 2, Stable Diffusion

3.4 Transformer-Based Generative Models

GPT (Generative Pretrained Transformer) → Autoregressive text generation
BERT → Masked language modeling (not autoregressive)
T5, LLaMA, Falcon → Large language models for text generation

4. Training Techniques

Adversarial Training: GANs use generator vs discriminator
Reconstruction Loss: VAEs minimize reconstruction + KL divergence
Pretraining + Fine-tuning: Transformers pretrained on large corpora, fine-tuned for tasks

5. Evaluation Metrics

Task	Metric
Images	Inception Score (IS), FID (Fréchet Inception Distance)
Text	Perplexity, BLEU, ROUGE, METEOR
Audio	Signal-to-Noise Ratio, MOS (Mean Opinion Score)
General	Human evaluation for realism & quality

6.1 Sampling from VAE

z = torch.randn(batch_size, latent_dim)
generated = decoder(z)

7. Applications in Industry

Text: ChatGPT, Jasper AI, Code generation (Copilot)
Images: DALL-E, MidJourney, Stable Diffusion
Audio: Jukebox (OpenAI), Speech synthesis
Video: RunwayML, DeepFake creation
Healthcare: Drug molecule generation

8. Common Interview Questions

Difference between GAN, VAE, and Diffusion models?
Explain the generator and discriminator roles in GANs.
What is latent space? Why is it important?
How do diffusion models generate images?
Difference between autoregressive and masked language models?
How do you evaluate generative models?
What are challenges in training GANs?
How is Generative AI different from traditional ML models?
Name real-world applications of generative AI.
How to prevent mode collapse in GANs?

⚡ Advanced Generative AI Cheatsheet

1. Core Idea

Generative AI = Learning a data distribution (P_{data}(x)) and generating new samples that look like real data.

Input: Random noise or seed data
Output: Synthetic images, text, audio, or 3D data
Key property: Creativity + Realism

2. Generative Model Categories

Type	Examples	Key Idea
Explicit Density Models	VAE, PixelRNN, Normalizing Flows	Learn (P(x)) explicitly
Implicit Density Models	GANs	Learn via adversarial game, no explicit probability
Energy-Based Models (EBMs)	Boltzmann Machines	Model data via energy function
Autoregressive Models	GPT, PixelCNN	Generate sequentially (factorized probability)
Diffusion Models	Denoising Diffusion Probabilistic Models	Gradual noise removal to generate data

3. GANs (Generative Adversarial Networks)

Objective: Generator (G) creates data → Discriminator (D) distinguishes real vs fake
Loss Function:
[
\min_G \max_D V(D,G) = E_{x \sim P_{data}}[\log D(x)] + E_{z \sim P_z}[\log(1 - D(G(z)))]
]
Variants:
- DCGAN → Deep Convolutional GAN (images)
- WGAN → Wasserstein GAN (stable training)
- CycleGAN → Image-to-image translation without paired data
- StyleGAN → High-quality controllable image synthesis
Common Problems & Solutions:
- Mode Collapse → Generator produces limited variety
- Vanishing Gradient → Use WGAN, label smoothing
- Training instability → Careful learning rate tuning, batch normalization

4. Variational Autoencoders (VAEs)

Probabilistic model: Encode input to latent distribution (q(z|x)) → Sample → Decode
Loss Function: Reconstruction + KL divergence
[
L = \text{Reconstruction Loss} + D_{KL}(q(z|x) || p(z))
]
Applications: Image generation, anomaly detection, data compression

5. Diffusion Models

Idea: Start with noise → Iteratively denoise to generate data
Steps:
1. Forward process: add Gaussian noise to data
2. Reverse process: learn denoising function
Popular models: DALL-E 2, Imagen, Stable Diffusion
Pros: High-quality images, stable training
Cons: Slow sampling, compute-intensive

6. Transformer-Based Generative Models

GPT (Autoregressive): Predict next token → generate text
BERT (Masked LM): Predict masked tokens → contextual embeddings
T5 / BART: Seq2Seq → summarization, translation
LLMs: ChatGPT, LLaMA, Falcon, GPT-4 → text generation, coding, reasoning

Key Components:

Multi-head Self-Attention
Positional Encoding
Feedforward Layers
Layer Normalization

7. Evaluation Metrics

Text: Perplexity, BLEU, ROUGE, METEOR
Images: FID (Fréchet Inception Distance), IS (Inception Score), human evaluation
Audio: MOS (Mean Opinion Score), SNR (Signal-to-Noise Ratio)
General: Diversity, novelty, coherence

8. Feature Techniques in Generative AI

Latent Space Manipulation: Interpolation, style transfer, attribute editing
Conditional Generation: Generate based on labels or prompts
- Example: Conditional GAN (cGAN) → generate images conditioned on class
Prompt Engineering: Critical for text & multimodal generation

9. Practical Implementation Tips

Data Augmentation: Increases diversity of training samples
Transfer Learning: Fine-tune pre-trained models (GPT, Stable Diffusion)
Compute Optimization: Use mixed precision, distributed training for large models
Safety & Bias: Check outputs for toxicity, hallucinations, or bias

10. Popular Generative AI Applications

Text: Chatbots, code generation, story writing
Images: AI art, avatars, deepfakes, medical image synthesis
Audio: Music, voice cloning, speech synthesis
Video: Animation, deepfake videos, scene generation
Healthcare: Drug discovery, molecule generation
Marketing: Personalized content creation, ad generation

11. Interview-Focused Questions

Difference between GAN, VAE, and Diffusion models?
How does self-attention work in transformers?
Explain mode collapse and solutions in GANs.
How to evaluate generative models?
What is latent space and how is it used?
Explain conditional vs unconditional generation.
Challenges in training diffusion models.
Applications of Generative AI in industry.
How do you fine-tune a pre-trained generative model?
Ethical considerations & bias in Generative AI.

Data Science Cheet sheets (Final stage)

📊 Statistics for Data Science – Cheat Sheet

1. Types of Statistics

2. Types of Data

3. Measures of Central Tendency

4. Measures of Dispersion

5. Probability Basics

6. Probability Distributions

7. Sampling Methods

8. Hypothesis Testing

9. Confidence Intervals

10. Correlation & Covariance

11. Regression Basics

12. Outliers

13. Bias & Variance

14. Important Concepts

15. Interview Quick Tips

🗄️ SQL Cheatsheet for Data Science Interviews

1. Basics

2. Selecting Data

3. Filtering Rows

4. Sorting & Limiting

5. Aggregate Functions

6. Joins

7. Subqueries

8. Window Functions

9. Case Statements

10. Common Table Expressions (CTE)

11. Set Operations

12. Data Cleaning

13. Keys & Constraints

14. Performance Tips

15. Popular Interview Queries

🐍 Python for Data Science Cheatsheet

1. Basics

2. Data Structures

List

Tuple

Dictionary

Set

3. Loops & Comprehensions

4. Functions & Lambda

5. NumPy Basics

6. Pandas Basics

7. Data Visualization

8. Data Preprocessing

9. Train-Test Split

10. Quick Tips for DS Interviews

🧾 Machine Learning Cheatsheet (Interview-Focused)

🔹 1. Types of ML

🔹 2. Common Algorithms

Regression

Classification

Unsupervised

🔹 3. Key Concepts

🔹 4. Probability & Statistics in ML

🔹 5. Neural Networks (Basics)

🔹 6. Ensemble Learning

🔹 7. Model Selection & Evaluation

🔹 8. Common Interview Questions (Quick Recall)

🧠 Deep Learning Cheatsheet (DS Interview-Focused)

🔹 1. What is Deep Learning?

2. Core Components of a Neural Network

Training Neural Networks

Key Concepts

Neural Network Architecture

Forward & Backpropagation

CNN (Convolutional Neural Networks)

🔹 6. RNN (Recurrent Neural Networks)

🔹 7. Transformers (Modern NLP)

🔹 8. Autoencoders

🔹 9. Generative Models

🔹 10. Regularization & Normalization

🔹 11. Evaluation Metrics (DL-specific)

10. Key Deep Learning Tricks

Deployment Best Practices

🔹 13. Common Interview Questions (Quick Review)

📌 Tips for Interviews

🧠 NLP Cheatsheet for Data Science

1. Basics