Vertex AI & MLOps | LegacyForward.ai

Vertex AI Service Map

Vertex AI is Google Cloud's unified ML platform with over a dozen interconnected services. This map helps you quickly identify which service to use for each stage of the ML lifecycle.

Service	Purpose	When to Use
Vertex AI Training	Custom model training	Need full control over training
AutoML	No-code model training	Tabular, image, text, video classification
Vertex AI Pipelines	Orchestrated ML workflows	Reproducible training pipelines
Feature Store	Centralized feature management	Shared features across models
Model Registry	Model versioning and metadata	Track, compare, and promote models
Endpoints	Model serving (online/batch)	Real-time or batch predictions
Vertex AI Studio	Prompt design and tuning	GenAI experimentation
Model Garden	Pre-trained model catalog	Use or fine-tune foundation models
Vertex AI Agent Builder	Build AI agents and RAG	Grounded generation, agent tools
Model Monitoring	Detect drift and anomalies	Production model health
Experiments	Track training runs	Compare hyperparameters, metrics

AutoML vs Custom Training

AutoML gets you a baseline model in hours with zero code; custom training gives you full control at the cost of engineering effort. Start with AutoML to establish a benchmark, then switch to custom only if you need to beat it.

Aspect	AutoML	Custom Training
Code required	None (UI/API config)	Full training code
Model control	Limited (architecture chosen)	Full (any framework)
Data prep	Minimal (automatic)	Manual feature engineering
Training time	Hours	Flexible
Best for	Prototyping, tabular data	Research, complex architectures
Cost	Higher per training hour	Lower with spot/preemptible
Explainability	Built-in	Manual (SHAP, etc.)
Export	TF SavedModel, container	Any format

AutoML Supported Tasks

Data Type	Tasks
Tabular	Classification, regression, forecasting
Image	Classification, object detection, segmentation
Text	Classification, entity extraction, sentiment
Video	Classification, object tracking, action recognition

Vertex AI Pipelines

Pipelines make your ML workflow reproducible, auditable, and automatable. Without them, training runs are one-off scripts that nobody can reproduce six months later.

Pipeline Structure

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  Data Prep  │───>│   Training   │───>│  Evaluation  │
│  Component  │    │  Component   │    │  Component   │
└─────────────┘    └──────────────┘    └──────┬──────┘
                                              │
                                     ┌────────▼────────┐
                                     │ Conditional Gate │
                                     │ (metrics check)  │
                                     └────────┬────────┘
                                              │
                              ┌───────────────▼──────────────┐
                              │ Model Upload + Deploy to      │
                              │ Endpoint (if metrics pass)    │
                              └───────────────────────────────┘

Pipeline Definition (KFP v2)

from kfp import dsl
from google.cloud import aiplatform

@dsl.component(base_image="python:3.11", packages_to_install=["pandas", "scikit-learn"])
def preprocess_data(input_path: str, output_path: dsl.OutputPath("Dataset")):
    import pandas as pd
    df = pd.read_csv(input_path)
    # preprocessing logic
    df.to_csv(output_path, index=False)

@dsl.component(base_image="python:3.11", packages_to_install=["scikit-learn", "joblib"])
def train_model(
    dataset_path: dsl.InputPath("Dataset"),
    model_path: dsl.OutputPath("Model"),
    learning_rate: float = 0.01
):
    import pandas as pd
    from sklearn.ensemble import GradientBoostingClassifier
    import joblib

    df = pd.read_csv(dataset_path)
    X, y = df.drop("target", axis=1), df["target"]
    model = GradientBoostingClassifier(learning_rate=learning_rate)
    model.fit(X, y)
    joblib.dump(model, model_path)

@dsl.pipeline(name="training-pipeline")
def training_pipeline(input_path: str, lr: float = 0.01):
    preprocess_task = preprocess_data(input_path=input_path)
    train_task = train_model(
        dataset_path=preprocess_task.outputs["output_path"],
        learning_rate=lr
    )

Running a Pipeline

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

job = aiplatform.PipelineJob(
    display_name="training-run-001",
    template_path="pipeline.json",  # compiled pipeline
    parameter_values={"input_path": "gs://bucket/data.csv", "lr": 0.01},
    pipeline_root="gs://bucket/pipeline-runs",
)
job.run(service_account="sa@project.iam.gserviceaccount.com")

Feature Store

Feature stores solve the problem of feature consistency between training and serving -- without one, teams recompute features inconsistently, leading to training-serving skew and duplicated engineering effort.

Key Concepts

Concept	Description
Feature Group	Logical grouping of related features (e.g., "user_features")
Feature	Individual data column with type and metadata
Entity	Key that identifies the row (e.g., user_id)
Online Store	Low-latency serving for real-time predictions
Offline Store	BigQuery-backed for batch training
Point-in-time lookup	Get feature values as of a specific timestamp

Feature Store Operations

from google.cloud import aiplatform

# Create feature group
fg = aiplatform.FeatureGroup.create(
    name="user_features",
    source=aiplatform.FeatureGroup.BigQuerySource(
        uri="bq://project.dataset.user_features_table"
    )
)

# Create feature (column in the source table)
feature = fg.create_feature(name="purchase_count")

# Online serving: create Feature Online Store
online_store = aiplatform.FeatureOnlineStore.create(
    name="production-store",
    bigtable=aiplatform.FeatureOnlineStore.Bigtable(
        auto_scaling=aiplatform.FeatureOnlineStore.Bigtable.AutoScaling(
            min_node_count=1, max_node_count=3
        )
    )
)

# Create feature view for online serving
view = online_store.create_feature_view(
    name="user_features_view",
    source=aiplatform.FeatureView.BigQuerySource(
        uri="bq://project.dataset.user_features_table",
        entity_id_columns=["user_id"]
    )
)

Model Registry

The model registry is your single source of truth for what models exist, which version is in production, and how each version performed. Without it, model management becomes spreadsheets and tribal knowledge.

Uploading a Model

model = aiplatform.Model.upload(
    display_name="fraud-detector-v2",
    artifact_uri="gs://bucket/models/fraud-v2/",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
    labels={"team": "fraud", "version": "2"},
    description="Gradient boosting fraud detection model"
)

Model Versioning

# Upload as new version of existing model
model_v2 = aiplatform.Model.upload(
    display_name="fraud-detector-v2",
    parent_model=existing_model.resource_name,  # creates new version
    artifact_uri="gs://bucket/models/fraud-v2.1/",
    serving_container_image_uri="...",
    version_aliases=["champion"],
    version_description="Improved recall by 5%"
)

Endpoints and Serving

Endpoints handle the operational complexity of serving models at scale -- autoscaling, traffic splitting, and health checks. Use traffic splitting for A/B tests and canary deployments instead of risky all-at-once rollouts.

Deploy Model

endpoint = aiplatform.Endpoint.create(display_name="fraud-endpoint")

endpoint.deploy(
    model=model,
    deployed_model_display_name="fraud-v2",
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=5,
    traffic_percentage=100,
    accelerator_type=None
)

Traffic Splitting (A/B Testing)

# Deploy new model alongside existing one
endpoint.deploy(
    model=model_v3,
    deployed_model_display_name="fraud-v3-candidate",
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=3,
    traffic_percentage=10  # 10% to new model
)
# Existing model automatically gets 90%

Online Prediction

instances = [
    {"feature1": 1.0, "feature2": "category_a", "feature3": 42}
]
prediction = endpoint.predict(instances=instances)
print(prediction.predictions)

Batch Prediction

batch_job = model.batch_predict(
    job_display_name="batch-fraud-scoring",
    gcs_source="gs://bucket/input/batch_data.jsonl",
    gcs_destination_prefix="gs://bucket/output/",
    machine_type="n1-standard-4",
    max_replica_count=10,
    accelerator_count=0
)
batch_job.wait()

Model Monitoring

Models degrade silently as the real world drifts away from training data. Monitoring for feature skew and prediction drift catches these problems before they impact business metrics.

Monitoring Setup

from google.cloud.aiplatform import model_monitoring

# Define skew/drift thresholds
skew_config = model_monitoring.SkewDetectionConfig(
    data_source="bq://project.dataset.training_data",
    skew_thresholds={"feature1": 0.3, "feature2": 0.2}
)

drift_config = model_monitoring.DriftDetectionConfig(
    drift_thresholds={"feature1": 0.3, "feature2": 0.2}
)

# Create monitoring job
monitor_job = aiplatform.ModelDeploymentMonitoringJob.create(
    display_name="fraud-monitoring",
    endpoint=endpoint,
    logging_sampling_strategy={"random_sample_config": {"sample_rate": 0.1}},
    schedule_config={"monitor_interval": {"seconds": 3600}},  # hourly
    objective_configs=[{
        "deployed_model_id": deployed_model_id,
        "objective_config": {
            "training_dataset": skew_config,
            "training_prediction_skew_detection_config": skew_config,
            "prediction_drift_detection_config": drift_config
        }
    }],
    alert_config={"email_alert_config": {"user_emails": ["team@company.com"]}}
)

Monitoring Metrics

Metric Type	What It Detects	Method
Feature skew	Training/serving data mismatch	Distribution comparison (Jensen-Shannon)
Feature drift	Input distribution shift over time	Distribution comparison over windows
Prediction drift	Output distribution shift	Distribution comparison
Attribution drift	Feature importance shift	Explanation-based monitoring

Compute Options

Choosing the wrong machine type wastes money on over-provisioning or causes OOM failures during training. Match your compute to your workload -- use GPUs for model training and right-size CPU machines for preprocessing and serving.

Machine Type	vCPUs	Memory	GPU	Use Case
n1-standard-4	4	15 GB	Optional	Small models, preprocessing
n1-standard-16	16	60 GB	Optional	Medium models
n1-highmem-8	8	52 GB	Optional	Memory-intensive
a2-highgpu-1g	12	85 GB	1x A100	Large model training
a2-highgpu-8g	96	680 GB	8x A100	Distributed training
g2-standard-4	4	16 GB	1x L4	Inference, light training

Common Pitfalls

Most Vertex AI failures come from skipping automation and monitoring -- the same mistakes that plague any ML platform. These shortcuts save time initially but create painful debugging sessions later.

Pitfall	Problem	Fix
No pipeline versioning	Can't reproduce results	Use pipeline templates with version tags
Manual deployments	Error-prone, slow	Automate via CI/CD pipeline
No monitoring	Silent model degradation	Set up skew/drift alerts
Over-provisioned endpoints	Wasted cost	Use autoscaling, right-size machines
Ignoring Feature Store	Feature recomputation, inconsistency	Centralize features
No A/B testing	Risky all-at-once deployments	Use traffic splitting
Training on local machine	Not reproducible	Use Vertex AI Training
Hardcoded project/location	Not portable	Use environment variables