MLSecOps Lab: Boeing x WIC x UW - Security and best practices for MLOps on Azure

Identity and access (RBAC - Role Based Access Control)

Assign roles at the smallest scope possible (resource group, workspace, storage container)
Separate duties
- Data owners manage data access
- ML engineers run training jobs
- Release owners deploy to production

Best Practice

Use Azure’s built-in roles (Contributor, Reader, Data Scientist) when possible. Create custom roles only when necessary.

Secrets management

Store secrets in Azure Key Vault
Access Key Vault using Managed Identity (avoid connection strings in code)
Do not put secrets in
- Git repos
- notebooks
- environment variables committed to YAML files
Rotate secrets regularly if any must exist

Common Mistake

Never hardcode connection strings or API keys in notebooks or code. Always use Key Vault with Managed Identity.

Data protection

Store data in Azure Storage (Blob or ADLS Gen2)
Disable public blob access
Use RBAC for data access instead of storage account keys when possible
Keep clear data zones
- raw
- processed
- features
- evaluation
Track dataset versions (Azure ML Data assets) and record which version trained each model

Data Versioning

Tracking which dataset version trained each model is critical for reproducibility and debugging production issues.

Network protection

Prefer private access when data is sensitive

Production Recommendation

For production workloads with sensitive data, use Azure Private Link to keep traffic within the Azure network.

Reproducible pipelines

Run preprocessing and training as Azure ML Jobs (not only notebooks)
Pin dependencies (requirements or conda environment)
Version everything in git
- code
- environment files
- pipeline definitions
- deployment configs
Capture metadata in every run
- data version
- git commit SHA
- metrics
- environment version

Reproducibility Checklist

Every training run should be fully reproducible. If you can’t recreate a model from its metadata, your pipeline needs improvement.

Model registry and promotion

Register models in Azure ML Model Registry
Require minimum metadata tags
- data_version
- git_sha
- metric_summary
- owner
Use promotion stages
- candidate
- validated
- production
Avoid “latest” for prod deployments (deploy specific versions only)

Deployment Safety

Never deploy “latest” to production. Always deploy specific, validated model versions with known performance characteristics.

Deployment hardening

Enable authentication
Validate inputs (types, ranges, schema) before scoring
Do not log raw sensitive payloads
- log request IDs, latency, error codes, and summary stats instead

Input Validation

Always validate input data before scoring. Invalid inputs can cause errors, security issues, or incorrect predictions.

Monitoring and alerts

Use Application Insights + Log Analytics for endpoint telemetry
Monitor service health
- latency
- error rate
- throughput
Monitor model signals
- feature drift (compare to training stats)
- prediction drift
Set alerts to notify owners when thresholds are exceeded

Drift Detection

Model drift and data drift are inevitable in production. Set up monitoring to detect them early before they impact users.

Code Safety

Require PR reviews and CI checks before merge
Pin dependencies and avoid installing random packages at runtime
Store images in Azure Container Registry (ACR)

Dependency Management

Pin exact versions in requirements.txt (e.g., pandas==2.0.3 not pandas>=2.0). This prevents unexpected breaking changes.

Compute hygiene and cost guardrails

Prefer autoscaling compute clusters for jobs and scale down to zero
Stop/delete unused compute instances
Set budgets and alerts per resource group
Tag resources with owner and environment

Cost Management

Unused compute instances can drain your Azure credits quickly. Always stop or delete them when not in use, and set up budget alerts.

Knowledge Check

🎯 Knowledge Check: Security & Best Practices

1. Where should secrets and credentials be stored in an MLOps system?

Hardcoded in the Python script In a .env file committed to git In Azure Key Vault or a secrets manager In a README file

2. What does RBAC stand for in Azure?

Random Binary Access Control Role-Based Access Control Restricted Binary Authentication Code Remote Backup and Control

3. Why should you pin exact package versions (e.g., pandas==2.0.3) in requirements.txt?

To make the file look more professional To save disk space To ensure reproducibility and prevent unexpected breaking changes It is required by Azure

4. What is the "principle of least privilege" in security?

Give all users admin access for convenience Grant users and services only the minimum permissions needed to do their job Restrict all access and require manual approval for every action Share all credentials openly within the team