MLOPS 1 - Interview questions

 

MLOps Lifecycle: Core Concepts & Principles (Q1-Q20) ⚙️

  1. Q: What is MLOps?

    • A: MLOps is a set of practices that combines Machine Learning, DevOps, and Data Engineering to reliably and efficiently deploy and maintain ML systems in production.

  2. Q: What is the main goal of MLOps?

    • A: The main goal is to bridge the gap between model development (training) and deployment in a scalable and repeatable way.

  3. Q: Name the key stages of a typical MLOps lifecycle.

    • A: Data Collection & Preparation, Model Development, Experiment Tracking, Model Training, Model Versioning, Model Deployment, Monitoring & Governance.

  4. Q: How does MLOps differ from traditional DevOps?

    • A: MLOps includes additional complexities like data versioning, model retraining, and monitoring for data and concept drift, which are not present in traditional software deployment.

  5. Q: Why is data versioning crucial in MLOps?

    • A: Data versioning ensures that the model can be reproduced using the exact dataset it was trained on, which is essential for debugging and auditing.

  6. Q: What is "data drift"?

    • A: Data drift is when the statistical properties of the production data change over time in an unexpected way, causing the model's performance to degrade.

  7. Q: What is "concept drift"?

    • A: Concept drift is when the relationship between the input data and the target variable changes over time.

  8. Q: What is the purpose of a Feature Store?

    • A: A Feature Store is a centralized repository for managing and serving machine learning features. It ensures consistency between training and serving data.

  9. Q: Why is automated retraining a key MLOps practice?

    • A: Automated retraining is used to keep the model's performance from degrading due to data or concept drift without manual intervention.

  10. Q: What is the "reproducibility" problem in ML?

    • A: The reproducibility problem is the difficulty in getting the exact same results when running the same code, data, and configuration at a later time.

  11. Q: How does MLOps help with reproducibility?

    • A: By using version control for code, data, and models, and by tracking every experiment's parameters and metrics.

  12. Q: What is the role of a data scientist in an MLOps team?

    • A: The data scientist focuses on model development, algorithm selection, and feature engineering.

  13. Q: What is the role of an ML Engineer in an MLOps team?

    • A: The ML Engineer focuses on building the production-ready ML pipelines, deploying models, and building the necessary infrastructure.

  14. Q: What is the "reproducibility crisis" and how does MLOps address it?

    • A: The reproducibility crisis refers to the difficulty of reproducing scientific results. MLOps addresses it by enforcing strict versioning and tracking of all components.

  15. Q: What is CI/CD/CT in MLOps?

    • A: CI (Continuous Integration) integrates code changes. CD (Continuous Delivery/Deployment) automates model deployment. CT (Continuous Training) automates model retraining.

  16. Q: What are the main challenges in MLOps?

    • A: Challenges include managing diverse dependencies, versioning large datasets, monitoring models in production, and ensuring low latency.

  17. Q: How do you handle model governance in MLOps?

    • A: Model governance involves tracking model lineage, managing approvals, and maintaining an audit trail for regulatory compliance.

  18. Q: Why is containerization (e.g., Docker) important in MLOps?

    • A: Containerization packages the model and its dependencies into a single unit, ensuring it runs consistently across different environments.

  19. Q: What is "Model Monitoring"?

    • A: Model Monitoring is the process of tracking a model's performance in production, including its predictions, latency, and resource usage.

  20. Q: What is an "offline" vs. "online" serving environment for ML models?

    • A: Offline serving is for batch predictions (e.g., daily recommendations). Online serving is for real-time, low-latency predictions (e.g., fraud detection).


Version Control: DVC, MLflow, and Model Registries (Q21-Q40) 🔄

  1. Q: What is the purpose of a version control system in MLOps?

    • A: To track and manage changes to code, data, and models, enabling collaboration and reproducibility.

  2. Q: How does Git fit into an MLOps workflow?

    • A: Git is used to version control the code, scripts, and configuration files of an ML project.

  3. Q: What is the main problem with using Git for data and models?

    • A: Git is designed for small text files and is inefficient at handling large binary files like datasets and trained models.

  4. Q: What is DVC (Data Version Control)?

    • A: DVC is an open-source tool built on top of Git that versions large files like data and models without storing them in the Git repository.

  5. Q: How does DVC work with Git?

    • A: DVC stores pointers to the data files in Git, while the actual data is stored remotely in a cloud storage or shared server.

  6. Q: What command would you use to add a dataset to DVC?

    • A: dvc add data/raw_data.csv

  7. Q: What is the purpose of the .dvc file?

    • A: The .dvc file is a small text file that contains metadata (e.g., file hash, size) for the versioned data. This is what gets committed to Git.

  8. Q: How do you pull a DVC-versioned dataset from a remote storage?

    • A: dvc pull

  9. Q: What is MLflow?

    • A: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, project packaging, and model management.

  10. Q: What are the four main components of MLflow?

    • A: MLflow Tracking, MLflow Projects, MLflow Models, and MLflow Model Registry.

  11. Q: What is MLflow Tracking used for?

    • A: MLflow Tracking is used to log parameters, metrics, artifacts (e.g., models), and source code for each experiment run.

  12. Q: What is a "run" in MLflow?

    • A: A run is a single execution of a machine learning code. It's the primary unit of organization in MLflow Tracking.

  13. Q: How do you log a metric (e.g., accuracy) in MLflow?

    • A: mlflow.log_metric("accuracy", 0.95)

  14. Q: What is an MLflow Model Registry?

    • A: An MLflow Model Registry is a centralized repository to collaboratively manage the lifecycle of an MLflow Model, including versioning and stage transitions (e.g., from Staging to Production).

  15. Q: How would you register a model using MLflow?

    • A: mlflow.register_model("runs:/<run_id>/<artifact_path>", "ModelName")

  16. Q: What is the benefit of using a Model Registry?

    • A: It provides a clear, central place to manage all model versions, allowing teams to collaborate and deploy models confidently.

  17. Q: How does DVC differ from MLflow?

    • A: DVC is a data and model versioning tool that works with Git. MLflow is a full-lifecycle management platform, with tracking, model registry, and project packaging capabilities.

  18. Q: Can DVC and MLflow be used together?

    • A: Yes, they are complementary. You can use DVC to version your data and models, and then use MLflow to track the experiments and manage the model's lifecycle in a registry.

  19. Q: What is the main purpose of a Model Registry?

    • A: To serve as a single source of truth for all deployed models, tracking their versions, metadata, and status.

  20. Q: What is "artifact" in the context of MLflow?

    • A: An artifact is any file generated by a run that you want to save, such as the trained model, a plot, or a text file.


Experiment Tracking: Importance & Tools (Q41-Q60) 📊

  1. Q: What is Experiment Tracking?

    • A: Experiment Tracking is the practice of systematically logging and organizing all the details of your ML experiments, including code, data, hyperparameters, and results.

  2. Q: Why is experiment tracking so important?

    • A: It helps you understand which models perform best, ensures reproducibility, and provides a clear audit trail of your development process.

  3. Q: Name three popular experiment tracking tools.

    • A: MLflow, Weights & Biases (W&B), and Comet ML.

  4. Q: What information should you track for each experiment?

    • A: Hyperparameters, evaluation metrics (accuracy, loss), the version of the data, the code version, and any other relevant artifacts.

  5. Q: How does an experiment tracking tool help with collaboration?

    • A: It provides a centralized dashboard where team members can view, compare, and reproduce each other's experiments.

  6. Q: How can you compare different experiments using an experiment tracking tool?

    • A: Most tools provide a dashboard that allows you to plot metrics from different runs on a single chart and sort them by performance.

  7. Q: What is a "run ID" in an experiment tracking system?

    • A: A run ID is a unique identifier for each experiment. It's used to retrieve and inspect the details of a specific run.

  8. Q: What is the benefit of logging code version (e.g., Git commit hash) in an experiment?

    • A: It ensures that you can always go back to the exact code that was used to train a specific model, guaranteeing reproducibility.

  9. Q: What is the difference between experiment tracking and model monitoring?

    • A: Experiment tracking is for the development phase to track model performance. Model monitoring is for the production phase to track model performance and health after deployment.

  10. Q: How does logging an artifact (like a trained model) in MLflow differ from using Git LFS or DVC?

    • A: Logging an artifact in MLflow links the model file to a specific run, while Git LFS/DVC simply versions the file regardless of the experiment that produced it.

  11. Q: What is a "dashboard" in the context of experiment tracking?

    • A: A dashboard is a web-based interface that provides visualizations and summaries of all your experiments, making it easy to compare results.

  12. Q: How does experiment tracking help with hyperparameter tuning?

    • A: It allows you to track and compare the metrics of different hyperparameter combinations, helping you find the optimal set of values.

  13. Q: What is a "parameter" in the context of an MLflow run?

    • A: A parameter is a key-value pair used to record an input to your run, such as the learning rate or the number of epochs.

  14. Q: Can I log a custom image or plot to an experiment tracking tool?

    • A: Yes, most tools allow you to log various artifacts, including plots generated during analysis.

  15. Q: How does MLflow's Projects component help with experiment tracking?

    • A: It provides a convention for packaging code in a reusable and reproducible way, ensuring consistent experiment execution across different machines.

  16. Q: What is the main benefit of using a hosted experiment tracking service (like W&B) over a local one?

    • A: Hosted services provide a centralized, shareable platform for teams, with no setup required and a user-friendly UI.

  17. Q: What is a run.log() function in a tracking tool?

    • A: A generic function to log any kind of data to an experiment, often used for logging custom metrics or artifacts.

  18. Q: What is the purpose of logging a "tag" in an experiment?

    • A: Tags are used for labeling and grouping experiments. For example, you can tag runs with "production" or "testing" to easily filter them.

  19. Q: How can experiment tracking help in a debugging process?

    • A: By tracking all the details of each run, you can compare a failed run with a successful one to identify what went wrong (e.g., a change in hyperparameters).

  20. Q: What is the difference between a metric and an artifact?

    • A: A metric is a numerical value (e.g., accuracy, loss). An artifact is a file (e.g., the model file, a plot).


Pipeline Orchestration: Automation & Scheduling (Q61-Q80) 🔄

  1. Q: What is a "pipeline" in MLOps?

    • A: An ML pipeline is a series of automated steps that represent the end-to-end ML workflow, from data ingestion to model deployment.

  2. Q: Why is pipeline orchestration important in MLOps?

    • A: It automates the entire ML workflow, ensuring consistency, efficiency, and scalability, and eliminating manual steps.

  3. Q: Name three popular pipeline orchestration tools.

    • A: Apache Airflow, Kubeflow Pipelines, and Prefect.

  4. Q: What is a "DAG" in pipeline orchestration?

    • A: A DAG (Directed Acyclic Graph) is a visual representation of a pipeline. It defines the sequence of tasks and their dependencies.

  5. Q: What is the difference between a "task" and a "pipeline"?

    • A: A task is a single, atomic operation within a pipeline (e.g., "preprocess data"). A pipeline is a collection of interconnected tasks.

  6. Q: How does a pipeline orchestrator handle failures?

    • A: Orchestrators can be configured to automatically retry a failed task or send a notification to a team when a task fails.

  7. Q: What is the purpose of a "scheduler" in a pipeline orchestrator?

    • A: The scheduler is the component that triggers the execution of pipelines based on a predefined schedule (e.g., every day at midnight).

  8. Q: What is the main benefit of using a managed orchestration service (like Google Cloud Composer)?

    • A: It removes the operational overhead of setting up and managing the orchestrator's infrastructure.

  9. Q: How does a pipeline orchestrator help with reproducibility?

    • A: By automating the entire process, it ensures that every time the pipeline runs, the exact same steps are executed, reducing human error.

  10. Q: What is the difference between a "stateless" and a "stateful" pipeline?

    • A: A stateless pipeline processes data independently. A stateful pipeline remembers information from previous runs, which can complicate reproducibility.

  11. Q: How can you trigger a pipeline run?

    • A: You can trigger a run manually, on a schedule, or based on an external event (e.g., a new file arriving in a data lake).

  12. Q: What is a "trigger" in the context of an ML pipeline?

    • A: A trigger is an event or a condition that starts a pipeline run.

  13. Q: What is the purpose of "dependencies" in a pipeline?

    • A: Dependencies define the order in which tasks must be executed. A task cannot start until its dependent tasks are complete.

  14. Q: How can you visualize a pipeline's progress?

    • A: Orchestration tools provide a web-based UI that shows the real-time status of each task in a pipeline.

  15. Q: What is the purpose of a "data artifact" in a pipeline?

    • A: A data artifact is the output of one task that serves as the input for a subsequent task. It ensures a clear hand-off between pipeline steps.

  16. Q: How do you handle secrets (e.g., API keys) in a pipeline?

    • A: Secrets should be stored in a secure secret management system and accessed by the pipeline through a secure connection.

  17. Q: What is the role of a "Kubeflow Pipeline" in MLOps?

    • A: Kubeflow Pipelines is a platform for building and deploying portable, scalable ML pipelines on Kubernetes.

  18. Q: How does Apache Airflow handle tasks?

    • A: Airflow uses Python to define DAGs, and each task is an Operator that represents a specific action.

  19. Q: What is the benefit of using an orchestrator for retraining a model?

    • A: It automates the entire retraining process, from data fetching and preprocessing to model training and deployment, ensuring a continuous loop.

  20. Q: What is the difference between a pipeline orchestrator and a CI/CD tool?

    • A: A CI/CD tool (e.g., Jenkins) focuses on code integration and deployment. A pipeline orchestrator is designed specifically for the complex dependencies and data flow of ML workflows.


General & Advanced MLOps Concepts (Q81-Q100) 💡

  1. Q: What is a "Model Registry"?

    • A: A Model Registry is a centralized hub for managing the lifecycle of ML models, including versioning, metadata, and stage transitions.

  2. Q: What is "Model serving"?

    • A: Model serving is the process of deploying a trained model so that it can receive input data and return predictions.

  3. Q: Name two common model serving frameworks.

    • A: TensorFlow Serving and TorchServe.

  4. Q: What is "A/B Testing" in MLOps?

    • A: A/B Testing is a method for comparing two versions of a model by exposing a portion of traffic to each version to determine which performs better.

  5. Q: How does MLOps help with cost optimization?

    • A: By automating pipelines and using scalable infrastructure, MLOps reduces manual effort and optimizes resource usage.

  6. Q: What is a "reproducible environment"?

    • A: A reproducible environment is a consistent and isolated environment (e.g., a Docker container) that contains all the necessary dependencies to run a project.

  7. Q: What is the difference between a CI/CD pipeline and an MLOps pipeline?

    • A: An MLOps pipeline adds steps like data validation, model training, and model serving to the standard CI/CD process.

  8. Q: How would you deal with a "data drift" alert?

    • A: You would investigate the data changes, potentially retrain the model on the new data, and deploy the new version.

  9. Q: What is the role of a "feature store"?

    • A: A feature store acts as a centralized database for both offline training and online serving features, ensuring consistency.

  10. Q: What is the benefit of using a "ML Metadata Store"?

    • A: It stores metadata about every run, from data lineage to experiment results, providing a comprehensive audit trail.

  11. Q: What are the main components of a model monitoring system?

    • A: Components include data drift detection, model performance tracking, and anomaly detection.

  12. Q: What is "CI/CD for ML"?

    • A: A system that automates the testing and deployment of ML code and models, similar to traditional software but with ML-specific considerations.

  13. Q: What is a "canary deployment" in MLOps?

    • A: Canary deployment is a strategy where a new version of the model is deployed to a small subset of users before a full rollout.

  14. Q: How does MLOps handle ethical concerns in ML?

    • A: By providing a framework for tracking model lineage and auditing for bias, MLOps helps ensure models are fair and transparent.

  15. Q: What is a "batch prediction" service?

    • A: A service that processes large volumes of data at once to generate predictions, typically on a scheduled basis.

  16. Q: How does a "Model Registry" help in a team with multiple data scientists?

    • A: It provides a single source of truth for which model versions are approved for production, preventing conflicts.

  17. Q: What is the difference between "model versioning" and "model staging"?

    • A: Versioning is about tracking changes over time (version 1, 2, 3). Staging is about categorizing models based on their readiness (e.g., Staging, Production, Archive).

  18. Q: What is the main challenge of "online inference"?

    • A: Ensuring low latency and high throughput for real-time predictions.

  19. Q: What is the purpose of a "rollback" in MLOps?

    • A: Rollback is the process of reverting to a previous, stable version of a model in case of a failure in production.

  20. Q: What is the relationship between MLOps, DevOps, and Data Engineering? * A: MLOps can be seen as the intersection of all three: it uses DevOps principles for automation, Data Engineering practices for data pipelines, and ML principles for model development.

Comments

Popular posts from this blog

Resume Work and Project Details

Time Series and MMM basics

LINEAR REGRESSION