Git Resources
Version control essentials for MLOps
This page introduces Git and GitHub as essential tools for MLOps workflows, covering the core concepts and commands youâll need to version-control your code, data pipelines, and infrastructure throughout this lab.
Why Git for MLOps?
In machine learning projects, reproducibility is everything. Git ensures that:
- every change to your code, configs, and pipeline definitions is tracked
- teammates can collaborate without overwriting each otherâs work
- you can roll back to a previous version if a model or pipeline breaks
- experiments are linked to specific code commits for full auditability
MLOps without version control is like science without a lab notebook. Git is the foundation everything else is built on.
Core Git Concepts
Repository (repo)
A repo is the project folder that Git tracks. In this lab, the entire MLOps project lives in one GitHub repository.
Commit
A commit is a snapshot of your changes. Every time you commit, you record what changed, who changed it, and why.
Branch
A branch is an isolated line of development. You work on a feature or fix in a branch without affecting the main codebase.
Pull Request (PR)
A PR is a proposal to merge your branch into the main branch. It enables code review before changes go to production.
Remote
The remote is the version of the repo hosted on GitHub (or Azure Repos). You push your local commits to the remote to share them.
We use GitHub as our remote repository. All pipeline definitions, model training scripts, and infrastructure code are version-controlled here.
Essential Git Commands
Getting started
# Clone the lab repository
git clone https://github.com/shakshi-gandhi/UW-MLOps-Boeing-x-WIC.git
# Check the status of your working directory
git status
# See the commit history
git log --onelineMaking changes
# Stage all changes for commit
git add .
# Commit with a descriptive message
git commit -m "feat: add data ingestion pipeline script"
# Push your changes to the remote
git push origin mainBranching and merging
# Create a new branch and switch to it
git checkout -b feature/data-preprocessing
# Switch back to main
git checkout main
# Merge a feature branch into main
git merge feature/data-preprocessingSyncing with the remote
# Pull the latest changes from the remote
git pull origin main
# Fetch updates without merging
git fetch originNever commit secrets, API keys, or connection strings to a Git repository. Use .gitignore to exclude sensitive files, and use Azure Key Vault to store secrets securely.
Setting Up .gitignore for ML Projects
A .gitignore file tells Git which files to skip. For ML projects, always ignore:
# Python
__pycache__/
*.pyc
*.pyo
.env
venv/
# Jupyter
.ipynb_checkpoints/
# Data and model files (use Azure Storage instead)
*.csv
*.parquet
*.pkl
*.joblib
data/raw/
data/processed/
# Credentials
.env
*.key
secrets.yaml
# OS files
.DS_Store
Thumbs.db
Never store large data files in Git. Use Azure Blob Storage for data and register datasets in Azure ML. Git tracks the code that processes the data, not the data itself.
Git Workflow for This Lab
In this lab, we follow a simple feature-branch workflow:
- Pull the latest
mainbranch before starting new work - Create a branch for your feature or fix
- Commit often with clear messages describing what you changed and why
- Open a pull request when your work is ready for review
- Merge after review and CI checks pass
Use descriptive prefixes in commit messages: - feat: for new features - fix: for bug fixes - docs: for documentation updates - chore: for setup or maintenance tasks
Example: feat: add Azure ML pipeline YAML for model training
Next Steps
With Git set up, youâre ready to connect your repository to Azure and start building the infrastructure for your ML system.
Proceed to the Infra Setup section to learn how to push your code to Azure and provision your ML environment.