SQL KEY POINTS

🧠 SQL Cheat Sheet (For Data Science & Analytics)


🏁 1️⃣ Basic SQL Commands

CommandDescription
SELECTChoose columns to display
FROMChoose table
WHEREFilter rows
GROUP BYAggregate rows
HAVINGFilter after aggregation
ORDER BYSort results
LIMIT / TOPRestrict number of rows

✅ Example:

SELECT name, salary FROM employees WHERE department = 'Sales' ORDER BY salary DESC LIMIT 5;

📋 2️⃣ Data Definition Language (DDL)

CommandPurpose
CREATE TABLECreate new table
DROP TABLEDelete table
ALTER TABLEModify table structure
TRUNCATE TABLEDelete all rows (keep structure)

Example:

CREATE TABLE employees ( emp_id INT PRIMARY KEY, name VARCHAR(50), dept_id INT, salary DECIMAL(10,2) );

🧩 3️⃣ Data Manipulation Language (DML)

CommandPurpose
INSERT INTOAdd new record
UPDATEModify existing record
DELETERemove record

Example:

INSERT INTO employees VALUES (1, 'John', 101, 50000); UPDATE employees SET salary = 55000 WHERE emp_id = 1; DELETE FROM employees WHERE emp_id = 1;

🔍 4️⃣ Filtering Data

OperatorMeaning
=Equal
<> / !=Not equal
> < >= <=Comparisons
BETWEEN a AND bRange
IN (...)Match a list
LIKEPattern matching
IS NULL / IS NOT NULLNull check

Example:

SELECT * FROM employees WHERE name LIKE 'A%' AND salary BETWEEN 40000 AND 60000;

📊 5️⃣ Aggregate Functions

FunctionDescription
COUNT()Number of rows
SUM()Sum of values
AVG()Average
MIN()Minimum
MAX()Maximum

Example:

SELECT dept_id, AVG(salary) AS avg_sal FROM employees GROUP BY dept_id HAVING AVG(salary) > 50000;

🔗 6️⃣ SQL Joins

TypeDescription
INNER JOINMatching rows only
LEFT JOINAll from left + matches
RIGHT JOINAll from right + matches
FULL JOINAll rows from both sides
CROSS JOINCartesian product

Example:

SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.dept_id;

🧮 7️⃣ Set Operations

OperationDescription
UNIONCombine unique rows
UNION ALLCombine all rows (duplicates allowed)
INTERSECTCommon rows
EXCEPT / MINUSRows in first not in second

🧠 8️⃣ Subqueries

Subquery in WHERE:

SELECT name FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

Subquery in FROM:

SELECT dept_id, AVG(salary) AS avg_sal FROM (SELECT * FROM employees WHERE salary > 40000) AS high_sal GROUP BY dept_id;

🧱 9️⃣ CASE WHEN (Conditional Logic)

SELECT name, CASE WHEN salary > 70000 THEN 'High' WHEN salary BETWEEN 40000 AND 70000 THEN 'Medium' ELSE 'Low' END AS salary_band FROM employees;

📆 🔟 Date & Time Functions

FunctionDescription
NOW() / CURRENT_TIMESTAMPCurrent date & time
DATE()Extract date
YEAR(date)Year
MONTH(date)Month
DATEDIFF(d1, d2)Difference in days
DATE_ADD() / DATE_SUB()Add/Subtract days

Example:

SELECT name, DATEDIFF(NOW(), hire_date) AS days_worked FROM employees;

📈 11️⃣ Window (Analytic) Functions

Powerful for ranking, running totals, and partitioned calculations.

🧩 Syntax:

function(column) OVER (PARTITION BY col ORDER BY col2)

Examples:

a) Row Number:

SELECT name, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rank FROM employees;

b) Rank & Dense Rank:

RANK() OVER (ORDER BY salary DESC) DENSE_RANK() OVER (ORDER BY salary DESC)

c) Running Total:

SUM(salary) OVER (ORDER BY emp_id) AS running_total

d) Percent Rank:

PERCENT_RANK() OVER (ORDER BY salary)

🧩 12️⃣ Common Table Expressions (CTEs)

WITH high_earners AS ( SELECT name, salary FROM employees WHERE salary > 50000 ) SELECT * FROM high_earners WHERE salary < 100000;

🧮 13️⃣ String Functions

FunctionExampleOutput
CONCAT(a,b)'Data', 'Science'DataScience
UPPER(col)'john'JOHN
LOWER(col)'JOHN'john
SUBSTRING(col, start, len)'Analytics', 1, 4Anal
LENGTH(col)'Data'4
TRIM(col)' Data ''Data'
REPLACE(col, 'a', 'x')'data''dxtx'

🧩 14️⃣ Numeric Functions

FunctionDescription
ROUND(col, n)Round value
CEIL(col) / FLOOR(col)Round up/down
ABS(col)Absolute value
POWER(col, n)Exponent
MOD(a,b)Remainder

🧠 15️⃣ Advanced Analytics SQL

🔹 Correlation:

SELECT CORR(salary, experience) FROM employees;

🔹 Top N per Group:

SELECT * FROM ( SELECT name, dept_id, salary, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk FROM employees ) sub WHERE rnk <= 3;

🔹 Year-over-Year (YoY) Growth:

SELECT year, sales, LAG(sales) OVER (ORDER BY year) AS prev_year_sales, (sales - LAG(sales) OVER (ORDER BY year)) / LAG(sales) OVER (ORDER BY year) * 100 AS yoy_growth FROM sales_data;

💾 16️⃣ Table Constraints

ConstraintDescription
PRIMARY KEYUnique & not null
FOREIGN KEYReference another table
UNIQUEAll values unique
CHECKMust satisfy condition
DEFAULTAssign default value
NOT NULLCannot be null

📂 17️⃣ Indexes

CREATE INDEX idx_emp_name ON employees(name); DROP INDEX idx_emp_name;

→ Speeds up queries, but slows down inserts/updates.


🔐 18️⃣ Views

CREATE VIEW sales_summary AS SELECT region, SUM(sales) AS total_sales FROM sales GROUP BY region;

Use views to simplify complex queries.


🧩 19️⃣ Data Export / Import

CommandDescription
SELECT INTO new_table FROM ...Copy data
INSERT INTO ... SELECT ...Copy data to existing table
LOAD DATA INFILEImport CSV
COPY FROM (Postgres)Import data file

🧠 20️⃣ Interview Key Points

✅ SQL is declarative — you tell what you want, not how.
✅ Use CTEs + Window Functions for cleaner analytics queries.
✅ For Data Science, focus on:

  • Aggregations

  • Joins

  • Subqueries

  • Window functions

  • CASE logic

  • Date handling

  • Ranking / Percentiles

Comments

Popular posts from this blog

Resume Work and Project Details

Time Series and MMM basics

LINEAR REGRESSION