Error Handling

✅ Module 1: Basics of Error Handling in Python

🔹 1.1 Common Types of Errors in Python

These are runtime errors that commonly occur:

Error Type	Description	Example
`SyntaxError`	Invalid Python syntax	`if x = 5` (should be `==`)
`NameError`	Variable not defined	`print(x)` when `x` is undefined
`TypeError`	Operation on incompatible types	`"2" + 5`
`ValueError`	Function receives inappropriate value	`int("hello")`
`ZeroDivisionError`	Division by zero	`5 / 0`
`IndexError`	Index out of range	`arr[10]`

🔹 1.2 Basic `try-except` Block

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero!")

🧠 Explanation: This prevents the program from crashing by catching the ZeroDivisionError.

🔹 1.3 Catching Multiple Exceptions

try:
    value = int("abc")
    print(10 / 0)
except ValueError:
    print("Conversion failed!")
except ZeroDivisionError:
    print("Division by zero!")

You can also combine them:

try:
    # some risky code
    pass
except (ValueError, ZeroDivisionError) as e:
    print(f"Error occurred: {e}")

🔹 1.4 Using `else` and `finally`

try:
    result = 10 / 2
except ZeroDivisionError:
    print("Division error!")
else:
    print("No error occurred:", result)
finally:
    print("This will always run.")

🔹 1.5 Raising Exceptions

You can manually raise errors:

def divide(a, b):
    if b == 0:
        raise ValueError("b cannot be zero.")
    return a / b

print(divide(10, 2))

🔹 1.6 Creating Custom Exceptions

class MyCustomError(Exception):
    pass

def check_value(x):
    if x < 0:
        raise MyCustomError("Negative values not allowed!")

check_value(-1)

✅ Module 2: Advanced Exception Handling

This module focuses on writing more robust, scalable, and clean error-handling code — useful in larger apps or production environments.

🔹 2.1 Nested Try-Except Blocks

You can have try-except blocks inside other try blocks.

try:
    a = int(input("Enter number: "))
    try:
        result = 10 / a
        print(result)
    except ZeroDivisionError:
        print("Inner: Division by zero")
except ValueError:
    print("Outer: Invalid input")

🧠 Why: Helps isolate and handle errors in specific code sections.

🔹 2.2 Catching Multiple Exceptions with `as`

try:
    # risky code
    pass
except (TypeError, ValueError) as e:
    print(f"An error occurred: {e}")

🧠 Tip: as e gives access to the original exception object for logging or debugging.

🔹 2.3 Exception Chaining

Useful to raise a new exception while preserving the original one.

try:
    1 / 0
except ZeroDivisionError as e:
    raise ValueError("Invalid math operation") from e

🧠 Why: Maintains traceback of both exceptions — useful for debugging.

🔹 2.4 Logging Exceptions (Instead of Printing)

import logging

logging.basicConfig(level=logging.ERROR)

try:
    1 / 0
except ZeroDivisionError as e:
    logging.error("Division error occurred", exc_info=True)

🧠 Why logging?

print() is fine for small scripts.
logging is ideal for production, debugging, and persistent logs.

🔹 2.5 Best Practices for Exception Handling

✅ Do:

Handle specific exceptions (ValueError, not just Exception)
Keep try blocks small
Use logging instead of print
Document custom exceptions
Catch exceptions only when you can handle or report them

❌ Avoid:

Catching broad Exception unless at the top level
Swallowing errors silently
Overusing nested try blocks

✅ Module 3: Error Handling in File I/O and APIs

🔹 3.1 Handling File I/O Errors

When working with files, errors like missing files or permission issues are common.

try:
    with open("data.csv", "r") as file:
        content = file.read()
except FileNotFoundError:
    print("The file does not exist.")
except PermissionError:
    print("Permission denied.")
except Exception as e:
    print(f"Unexpected error: {e}")

🧠 Best Practice: Use with open(...) to auto-close files.

🔹 3.2 Reading CSV/JSON Safely

import csv

try:
    with open("data.csv", newline='') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)
except Exception as e:
    print("Error reading CSV:", e)

import json

try:
    with open("data.json") as f:
        data = json.load(f)
except json.JSONDecodeError:
    print("JSON is malformed")

🔹 3.3 Writing Files with Care

try:
    with open("output.txt", "w") as f:
        f.write("Sample output")
except IOError as e:
    print("Write error:", e)

🧠 Always handle IOError when writing files (e.g., disk full, permission denied).

🔹 3.4 Error Handling with APIs using `requests`

import requests

try:
    response = requests.get("https://api.example.com/data", timeout=5)
    response.raise_for_status()  # Raises HTTPError for bad responses
    data = response.json()
except requests.exceptions.HTTPError as errh:
    print("HTTP error:", errh)
except requests.exceptions.ConnectionError as errc:
    print("Connection error:", errc)
except requests.exceptions.Timeout as errt:
    print("Timeout error:", errt)
except requests.exceptions.RequestException as err:
    print("Something went wrong:", err)

🧠 Use .raise_for_status() to trigger exceptions for 4xx/5xx responses.

🔹 3.5 Custom Wrapper Function for Safe API Call

def safe_api_call(url):
    try:
        res = requests.get(url, timeout=5)
        res.raise_for_status()
        return res.json()
    except Exception as e:
        print(f"API failed: {e}")
        return None

This makes API handling reusable and robust.

✅ Module 4: Data Handling Errors in Pandas and NumPy

🔹 4.1 Common Errors in Pandas

Error	Cause/Example
`KeyError`	Accessing missing column/index
`IndexError`	Accessing out-of-bound row index
`ValueError`	Mismatch during assignment or reshaping
`TypeError`	Operations on incompatible datatypes

🔹 4.2 Handling Missing Values (`NaN`, `None`)

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name": ["A", "B", np.nan],
    "score": [90, np.nan, 80]
})

# Detect missing
print(df.isnull())

# Drop rows with missing
df_clean = df.dropna()

# Fill missing with value
df_filled = df.fillna("Unknown")

🧠 Tip: Always check for NaN before doing aggregations.

🔹 4.3 Safely Accessing Columns

if "age" in df.columns:
    print(df["age"])
else:
    print("Column 'age' does not exist")

Use .get() for dictionaries and column-safe logic for DataFrames.

🔹 4.4 NumPy Array Shape and Type Errors

import numpy as np

arr = np.array([1, 2, 3])
try:
    arr.reshape(2, 2)
except ValueError as e:
    print("Reshape error:", e)

🧠 Always validate shape compatibility before reshaping or broadcasting.

🔹 4.5 Type Conversion and Casting Errors

try:
    df["score"] = df["score"].astype(int)
except ValueError as e:
    print("Type casting failed:", e)

Use pd.to_numeric(df["col"], errors="coerce") to handle bad conversions gracefully.

🔹 4.6 Try-Except Around Data Transformations

def safe_transform(df):
    try:
        df["score"] = df["score"].fillna(0).astype(int)
        return df
    except Exception as e:
        print("Data transformation error:", e)
        return df

🔹 4.7 Chained Indexing Warnings (Best Practice)

# Not recommended — may lead to SettingWithCopyWarning
df[df['name'] == 'A']['score'] = 95

# Recommended
df.loc[df['name'] == 'A', 'score'] = 95

🧠 Use .loc[] or .iloc[] for assignment to avoid ambiguous behavior.

🔹 4.8 Debugging Unexpected Results

Use print(df.dtypes) and df.head() before and after operations to trace silent failures (like NaNs).

🧪 Practice Tasks

Load a CSV with some missing data and apply .fillna() safely.
Try reshaping a NumPy array incorrectly and handle the error.
Write a function that converts a column to numeric and handles type casting failures using to_numeric(..., errors="coerce").
Attempt to access a missing DataFrame column and handle it gracefully.

✅ Module 5: Error Handling in Data Cleaning Pipelines

🔹 5.1 Parsing Errors in Data (Dates, Numbers, etc.)

import pandas as pd

data = pd.DataFrame({
    "date": ["2025-01-01", "not_a_date", "2024-12-31"]
})

# Handle with errors='coerce'
data["parsed_date"] = pd.to_datetime(data["date"], errors="coerce")

🧠 Tip: Always use errors="coerce" in to_datetime, to_numeric to avoid hard crashes.

🔹 5.2 Safe `apply()` Functions

Custom transformations inside apply() can break if data is dirty. Always use try-except inside them.

def safe_parse(x):
    try:
        return int(x)
    except:
        return None

df["converted"] = df["raw_col"].apply(safe_parse)

🔹 5.3 Skipping or Logging Bad Records

While looping over records (e.g., row-wise ops), handle bad rows with logging:

import logging

logging.basicConfig(filename="bad_rows.log", level=logging.ERROR)

def process_row(row):
    try:
        # risky transformation
        return row["a"] / row["b"]
    except Exception as e:
        logging.error(f"Row failed: {row.to_dict()} | Error: {e}")
        return None

df["result"] = df.apply(process_row, axis=1)

🔹 5.4 Handling Duplicates with Grace

try:
    df = df.drop_duplicates()
except Exception as e:
    print("Duplicate removal failed:", e)

🔹 5.5 Cleaning Pipelines with Function Wrappers

You can wrap multiple steps in a pipeline-like cleaning function with full safety:

def clean_data(df):
    try:
        df["price"] = pd.to_numeric(df["price"], errors="coerce")
        df["date"] = pd.to_datetime(df["date"], errors="coerce")
        df.dropna(subset=["price", "date"], inplace=True)
        return df
    except Exception as e:
        print("Cleaning pipeline failed:", e)
        return df

🔹 5.6 Error-Handled ETL Mini-Pipeline

def load_and_clean(path):
    try:
        df = pd.read_csv(path)
    except FileNotFoundError:
        print("File not found")
        return None

    try:
        df = clean_data(df)
    except Exception as e:
        print("Cleaning failed:", e)
    
    return df

🧠 This is the kind of pattern you’ll use in production ETL jobs and notebooks.

✅ Module 6: Error Handling in Machine Learning Pipelines

🔹 6.1 Handling Train-Test Split Issues

from sklearn.model_selection import train_test_split

try:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
except ValueError as e:
    print("Train-test split failed:", e)

🧠 Common issues:

Mismatch in X and y lengths
Using stratify on target with too few classes

🔹 6.2 Catching Errors in Model Training

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

try:
    model.fit(X_train, y_train)
except ValueError as e:
    print("Model training error:", e)

🧠 Always check:

Input types (Pandas vs NumPy)
Missing values
Feature shapes

🔹 6.3 Handling Scikit-learn Pipeline Failures

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier())
])

try:
    pipeline.fit(X_train, y_train)
except Exception as e:
    print("Pipeline training failed:", e)

🧠 Watch for:

Mismatched input types
Missing values not handled before StandardScaler

🔹 6.4 Catching Hyperparameter Search Errors

from sklearn.model_selection import GridSearchCV

param_grid = {"n_estimators": [10, 50, 100]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)

try:
    grid.fit(X_train, y_train)
except Exception as e:
    print("GridSearchCV failed:", e)

🧠 Common mistakes:

Wrong param names
Empty grids
Invalid scoring functions

🔹 6.5 Handling Warnings like `ConvergenceWarning`

import warnings
from sklearn.exceptions import ConvergenceWarning

with warnings.catch_warnings():
    warnings.simplefilter("ignore", ConvergenceWarning)
    model.fit(X_train, y_train)

🧠 You can also choose to log or elevate warnings as errors in CI systems.

🔹 6.6 Saving and Loading Models with Care

import joblib

# Saving
try:
    joblib.dump(model, "model.pkl")
except Exception as e:
    print("Model save failed:", e)

# Loading
try:
    model = joblib.load("model.pkl")
except FileNotFoundError:
    print("Model file not found")

🔹 6.7 Wrap it in a Reusable Train Function

def train_model(X_train, y_train):
    try:
        model = RandomForestClassifier()
        model.fit(X_train, y_train)
        return model
    except Exception as e:
        print("Training failed:", e)
        return None

✅ Module 7: Debugging and Logging Techniques

🔹 7.1 Why Logging > Printing

Problem with print():

It disappears unless you're watching the console.
Doesn't work well in production or background jobs.

Logging Advantages:

Persistent (can save to files)
Different severity levels
Timestamped messages
Better traceability

🔹 7.2 Basic Logging Setup

import logging

logging.basicConfig(level=logging.INFO)
logging.info("Pipeline started")
logging.warning("This might be a problem")
logging.error("Something went wrong")

You can write logs to a file:

logging.basicConfig(filename='app.log', level=logging.DEBUG,
                    format='%(asctime)s - %(levelname)s - %(message)s')

🧠 Levels: DEBUG < INFO < WARNING < ERROR < CRITICAL

🔹 7.3 Logging Exceptions

try:
    1 / 0
except ZeroDivisionError as e:
    logging.exception("Division failed")

🧠 logging.exception automatically includes the full traceback.

🔹 7.4 Setting Up Module-Specific Logs

logger = logging.getLogger("DataCleaner")
logger.setLevel(logging.DEBUG)

logger.debug("Starting cleaning process")

Use different loggers for different pipeline stages.

🔹 7.5 Debugging with `pdb` (Python Debugger)

Start interactive debugging session:

import pdb

def buggy_function():
    x = 10
    y = 0
    pdb.set_trace()
    print(x / y)

buggy_function()

🧠 Inside pdb, use commands:

n: next line
c: continue
p var: print variable
q: quit

🔹 7.6 Using `traceback` for Custom Logs

import traceback

try:
    1 / 0
except Exception:
    print(traceback.format_exc())

🧠 Useful when you want custom error formatting instead of full crash.

🔹 7.7 Good Logging Practices

✅ Do:

Use logging in place of print in production
Log meaningful messages, not just "Error occurred"
Keep debug logs in development, warn/error logs in production

❌ Avoid:

Logging sensitive information (API keys, passwords)
Logging inside tight loops (may slow down)

🔹 7.8 Logging in Jupyter Notebooks

import logging

logging.basicConfig(level=logging.INFO, force=True)
logging.info("Notebook log works!")

🧠 Use force=True to reset configuration inside notebooks.

✅ Module 8: Testing for Failures in Python & Data Science Pipelines

Testing isn’t just about checking if your code works — it’s about verifying it fails gracefully when something goes wrong.

🔹 8.1 Why Test for Failures?

Ensures robustness
Prevents silent bugs in production
Helps future-proof your code

You’ll use either unittest (built-in) or pytest (popular in data teams).

🔹 8.2 Basic Test with `unittest`

import unittest

def divide(a, b):
    return a / b

class TestMathOps(unittest.TestCase):
    def test_divide_success(self):
        self.assertEqual(divide(10, 2), 5)

    def test_divide_by_zero(self):
        with self.assertRaises(ZeroDivisionError):
            divide(10, 0)

if __name__ == '__main__':
    unittest.main()

🔹 8.3 Using `pytest` (Simpler, Recommended)

Install it:

pip install pytest

# test_math.py

import pytest

def divide(a, b):
    return a / b

def test_divide_success():
    assert divide(10, 2) == 5

def test_divide_by_zero():
    with pytest.raises(ZeroDivisionError):
        divide(10, 0)

Run with:

pytest test_math.py

🔹 8.4 Test Error Handling in Data Functions

def convert_to_int(value):
    try:
        return int(value)
    except ValueError:
        return None

def test_convert_valid():
    assert convert_to_int("123") == 123

def test_convert_invalid():
    assert convert_to_int("abc") is None

🔹 8.5 Mocking Error Scenarios with `unittest.mock`

from unittest.mock import patch
import requests

def fetch_data(url):
    return requests.get(url).json()

@patch('requests.get')
def test_fetch_error(mock_get):
    mock_get.side_effect = requests.exceptions.RequestException
    with pytest.raises(requests.exceptions.RequestException):
        fetch_data("https://api.fake.com")

🧠 Use mocking to simulate:

API failures
File not found
Model loading errors

🔹 8.6 Testing Cleaning Functions with Edge Cases

def clean_age(x):
    try:
        age = int(x)
        if age < 0 or age > 120:
            return None
        return age
    except:
        return None

def test_clean_age_valid():
    assert clean_age("25") == 25

def test_clean_age_negative():
    assert clean_age("-5") is None

def test_clean_age_string():
    assert clean_age("abc") is None

🔹 8.7 Tips for Failure-Oriented Testing

✅ Do:

Write tests for expected failure cases
Use pytest.raises() or unittest's assertRaises
Test edge cases and dirty data

❌ Don’t:

Only test happy paths
Swallow errors silently

Error Handling

✅ Module 1: Basics of Error Handling in Python

🔹 1.1 Common Types of Errors in Python

🔹 1.2 Basic try-except Block

🔹 1.3 Catching Multiple Exceptions

🔹 1.4 Using else and finally

🔹 1.5 Raising Exceptions

🔹 1.6 Creating Custom Exceptions

✅ Module 2: Advanced Exception Handling

🔹 2.1 Nested Try-Except Blocks

🔹 2.2 Catching Multiple Exceptions with as

🔹 2.3 Exception Chaining

🔹 2.4 Logging Exceptions (Instead of Printing)

🔹 2.5 Best Practices for Exception Handling

✅ Module 3: Error Handling in File I/O and APIs

🔹 3.1 Handling File I/O Errors

🔹 3.2 Reading CSV/JSON Safely

🔹 3.3 Writing Files with Care

🔹 3.4 Error Handling with APIs using requests

🔹 3.5 Custom Wrapper Function for Safe API Call

✅ Module 4: Data Handling Errors in Pandas and NumPy

🔹 4.1 Common Errors in Pandas

🔹 4.2 Handling Missing Values (NaN, None)

🔹 4.3 Safely Accessing Columns

🔹 4.4 NumPy Array Shape and Type Errors

🔹 4.5 Type Conversion and Casting Errors

🔹 4.6 Try-Except Around Data Transformations

🔹 4.7 Chained Indexing Warnings (Best Practice)

🔹 4.8 Debugging Unexpected Results

🧪 Practice Tasks

✅ Module 5: Error Handling in Data Cleaning Pipelines

🔹 5.1 Parsing Errors in Data (Dates, Numbers, etc.)

🔹 5.2 Safe apply() Functions

🔹 5.3 Skipping or Logging Bad Records

🔹 5.4 Handling Duplicates with Grace

🔹 5.5 Cleaning Pipelines with Function Wrappers

🔹 5.6 Error-Handled ETL Mini-Pipeline

✅ Module 6: Error Handling in Machine Learning Pipelines

🔹 6.1 Handling Train-Test Split Issues

🔹 6.2 Catching Errors in Model Training

🔹 6.3 Handling Scikit-learn Pipeline Failures

🔹 6.4 Catching Hyperparameter Search Errors

🔹 6.5 Handling Warnings like ConvergenceWarning

🔹 6.6 Saving and Loading Models with Care

🔹 6.7 Wrap it in a Reusable Train Function

✅ Module 7: Debugging and Logging Techniques

🔹 7.1 Why Logging > Printing

🔹 7.2 Basic Logging Setup

🔹 7.3 Logging Exceptions

🔹 7.4 Setting Up Module-Specific Logs

🔹 7.5 Debugging with pdb (Python Debugger)

🔹 7.6 Using traceback for Custom Logs

🔹 7.7 Good Logging Practices

🔹 7.8 Logging in Jupyter Notebooks

✅ Module 8: Testing for Failures in Python & Data Science Pipelines

🔹 8.1 Why Test for Failures?

🔹 8.2 Basic Test with unittest

🔹 8.3 Using pytest (Simpler, Recommended)

🔹 8.4 Test Error Handling in Data Functions

🔹 8.5 Mocking Error Scenarios with unittest.mock

🔹 8.6 Testing Cleaning Functions with Edge Cases

🔹 8.7 Tips for Failure-Oriented Testing

Comments

Post a Comment

Popular posts from this blog

Resume Work and Project Details

Time Series and MMM basics

LINEAR REGRESSION

🔹 1.2 Basic `try-except` Block

🔹 1.4 Using `else` and `finally`

🔹 2.2 Catching Multiple Exceptions with `as`

🔹 3.4 Error Handling with APIs using `requests`

🔹 4.2 Handling Missing Values (`NaN`, `None`)

🔹 5.2 Safe `apply()` Functions

🔹 6.5 Handling Warnings like `ConvergenceWarning`

🔹 7.5 Debugging with `pdb` (Python Debugger)

🔹 7.6 Using `traceback` for Custom Logs

🔹 8.2 Basic Test with `unittest`

🔹 8.3 Using `pytest` (Simpler, Recommended)

🔹 8.5 Mocking Error Scenarios with `unittest.mock`