Model Registry
Versioning, tracking, and promoting ML models on Azure
This page walks through how trained models are registered in the Azure ML Model Registry — what metadata gets captured, how versioning works, and how the registry connects training outputs to deployment. All examples come directly from the code in this lab.
Why a Model Registry?
After a training pipeline runs, you end up with model artifact files on disk — model.joblib files in an output directory. Without a registry, these are just files. There is no history, no way to compare two runs, and no safe path back to a previous version if a deployment goes wrong.
The Azure ML Model Registry solves this by giving every model:
- a unique name and auto-incremented version number
- a permanent link to the MLflow run that produced it (including all logged metrics and parameters)
- tags for team ownership, dataset version, task type, and training framework
- a promotion path from training output to candidate to validated to production
The registry is the handoff point between training and deployment. Nothing gets deployed that hasn’t been registered. This makes every deployment traceable back to its training run and the data that produced it.
What Gets Registered in This Lab
The NYC Taxi pipeline registers three models at the end of every training run. Two come from supervised learning on fare prediction, and one from unsupervised learning on trip zone clustering.
| Registered Model Name | Algorithm | Task |
|---|---|---|
taxi-fare-linear-regression |
Linear Regression | Predict fare amount |
taxi-fare-ridge-regression |
Ridge Regression | Predict fare amount |
taxi-trip-kmeans |
KMeans Clustering | Assign trips to zone clusters |
Each model is registered with its performance metrics and tags logged to the same MLflow run, so you can compare all three in a single view in Azure ML Studio.
The register_models.py Script
Registration is handled by ML/register_models.py, which is the final step in the pipeline. It loads each trained model artifact, wraps it in an MLflow run, and calls mlflow.sklearn.log_model() with a registered_model_name to push it into the registry.
Here is the core registration logic for the supervised models:
for artifact_name, display_name, registered_name in model_specs:
model = load_model(models_dir / artifact_name / "model.joblib")
model_metrics = metrics[artifact_name]
with mlflow.start_run(run_name=f"register-{registered_name}"):
mlflow.set_tags({
"task_type": "regression",
"dataset": "nyc-taxi",
"framework": "scikit-learn",
"model_type": display_name,
})
mlflow.log_metrics({
f"{artifact_name}_rmse": model_metrics["rmse"],
f"{artifact_name}_mae": model_metrics["mae"],
f"{artifact_name}_r2": model_metrics["r2"],
})
mlflow.log_dict(model_metrics, f"{artifact_name}_metrics.json")
mlflow.sklearn.log_model(
sk_model=model,
artifact_path=artifact_name,
registered_model_name=registered_name,
)And the unsupervised model follows the same pattern:
with mlflow.start_run(run_name=f"register-{model_name}"):
mlflow.set_tags({
"task_type": "clustering",
"dataset": "nyc-taxi",
"framework": "scikit-learn",
})
mlflow.log_params({
"n_clusters": metrics["n_clusters"],
})
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="kmeans",
registered_model_name=model_name,
)Every registered model carries: RMSE, MAE, R² (supervised) or n_clusters (unsupervised), the features used in training, and tags for task type, dataset, and framework. This metadata is searchable and filterable in Azure ML Studio.
Where Registration Fits in the Pipeline
Registration is the last step in ML/pipeline.yaml, running after both supervised and unsupervised training have completed:
register_models:
type: command
command: >-
python register_models.py
--supervised-dir ${{inputs.supervised_dir}}
--unsupervised-dir ${{inputs.unsupervised_dir}}
inputs:
supervised_dir: ${{parent.jobs.supervised_learning.outputs.output_dir}}
unsupervised_dir: ${{parent.jobs.unsupervised_learning.outputs.output_dir}}
environment: azureml:taxi-ml-env@latest
code: ./This means registration only happens if both training steps succeed. If either fails, the pipeline stops early and no partial models get registered.
Viewing Registered Models in Azure ML Studio
Once the pipeline completes, you can inspect registered models in the Studio UI:
- Navigate to your Azure ML workspace in the Azure portal
- Click Models in the left sidebar
- You will see all three registered models:
taxi-fare-linear-regression,taxi-fare-ridge-regression, andtaxi-trip-kmeans - Click any model to see its full version history, metrics, tags, and linked training runs
- Use the Compare button to view RMSE and R² side-by-side across runs
Click into a model version and then navigate to the linked Run to see all logged metrics, parameters, and artifacts from that specific training execution. This is the full lineage trail — dataset version → training code → model artifact.
Model Versioning and the @latest Reference
Every time register_models.py runs for the same registered_model_name, Azure ML automatically increments the version number: v1, v2, v3, and so on. Older versions are never overwritten — they stay available for rollback.
In the deployment configuration, models are referenced as:
model: azureml:taxi-fare-linear-regression@latestThe @latest tag always resolves to the highest version number at deploy time.
@latest in Production
Using @latest for production deployments is convenient but risky — if a new model version is registered with a performance regression and you redeploy, @latest picks it up automatically. For production deployments, pin a specific version (e.g., azureml:taxi-fare-linear-regression@3) once the model has been validated. Use @latest for staging and testing only.
Next Steps
With models registered and versioned in Azure ML, the next step is to create a managed online endpoint that serves real-time predictions from those registered models.
Proceed to Endpoint Creation (Testing & Deployment) to see how the registered models become a live REST API.