Module 01 of 6

Designing and Planning a Cloud Solution Architecture

The highest-weighted exam section. Master business and technical requirements gathering, the Well-Architected Framework, compute/storage/network design, migration planning, and AI/ML solution architecture with Gemini, Agent Builder, and Model Garden.

11 min readOpen in Colab

01. Requirements Gathering

Business Requirements

Every architecture decision on the PCA exam traces back to business requirements. The exam presents case studies where you must identify constraints before jumping to technical solutions. Key dimensions include:

DimensionQuestions to AskImpact on Architecture
CostBudget constraints? OpEx vs CapEx preference? Growth rate?Serverless vs provisioned, committed use discounts, storage class selection
ComplianceHIPAA? PCI-DSS? GDPR? SOC 2? Data residency?Region selection, encryption (CMEK), VPC-SC, audit logging
Time to MarketMVP timeline? Team velocity? Existing codebase?Managed services over self-hosted, App Engine/Cloud Run over GKE
AvailabilitySLA target? Acceptable downtime? Geographic reach?Multi-region vs regional, global load balancing, failover design
ScalabilityExpected users? Traffic patterns (bursty vs steady)?Autoscaling config, database sharding, CDN placement
SecurityData sensitivity? Zero-trust requirements? Team access model?IAM design, VPC-SC, IAP, encryption, Secret Manager

Exam Tip: Always read the business requirements before the technical ones. The exam often has answers that are technically correct but violate a business constraint (budget, compliance, timeline). The best answer satisfies both.

Technical Requirements

Technical requirements define how the system must behave. On the exam, you will translate these into specific GCP service selections and configurations.

ComponentDescription
⚡ Latency / PerformanceResponse time targets drive compute placement, caching strategy (Cloud CDN, Memorystore), database choice (Spanner for global consistency vs Bigtable for low-latency reads).
📊 ThroughputRequests per second, data ingestion rate. Impacts load balancer type, Pub/Sub partitioning, Dataflow pipeline sizing, and BigQuery slot reservations.
🔒 DurabilityData loss tolerance (RPO). Drives backup strategy, Cloud Storage class, database replication, and cross-region storage configurations.
🔧 MaintainabilityTeam size, skills, operational overhead. Favors managed services, serverless, and IaC (Terraform) for reproducibility and reduced ops burden.

02. Google Cloud Well-Architected Framework

The Five Pillars

The Google Cloud Well-Architected Framework provides design principles for building cloud-native systems. The exam heavily references these pillars when evaluating architecture decisions.

PillarFocus AreasKey GCP Services
Operational ExcellenceAutomation, monitoring, incident response, CI/CD, IaCCloud Build, Cloud Deploy, Cloud Monitoring, Terraform
Security, Privacy & ComplianceIdentity, encryption, network security, compliance controlsIAM, Cloud KMS, VPC-SC, Security Command Center, IAP
ReliabilityHigh availability, disaster recovery, fault tolerance, resilienceRegional MIGs, Cloud Spanner multi-region, GKE multi-cluster
Cost OptimizationRight-sizing, committed discounts, autoscaling, storage lifecycleRecommender, Billing budgets, preemptible/spot VMs, Coldline
Performance OptimizationCaching, CDN, database tuning, load balancing, compute selectionCloud CDN, Memorystore, Cloud Load Balancing, TPUs/GPUs

Architecture Reviews

Google recommends periodic architecture reviews against the Well-Architected Framework. On the exam, you should recognize when a proposed architecture violates a pillar and suggest the correct remediation.

Key Concept: Trade-offs are expected. A multi-region Cloud Spanner deployment maximizes reliability and consistency but costs more than a regional Cloud SQL instance. The exam tests your ability to justify trade-offs based on stated requirements, not pick the most expensive option.

03. Compute Design

GKE vs Cloud Run vs App Engine vs Compute Engine

Compute selection is one of the most-tested topics on the PCA exam. Each service has a sweet spot determined by workload type, team expertise, and operational requirements.

CriteriaCompute EngineGKECloud RunApp Engine
AbstractionIaaS (full VM)CaaS (containers + K8s)Serverless containersPaaS (managed runtime)
Scale to ZeroNoNo (Autopilot scales pods)YesStandard: No; Flex: No
Max Request TimeoutUnlimitedUnlimited60 minStandard: 10 min; Flex: 60 min
Stateful WorkloadsYes (persistent disks)Yes (StatefulSets, PVs)No (stateless only)No
GPU/TPU SupportYesYes (node pools)GPU previewNo
Ops OverheadHigh (patch, scale)Medium (K8s complexity)Low (fully managed)Low
Best ForLegacy apps, custom OS, Windows, HPCMicroservices, multi-container, service meshHTTP APIs, event-driven, rapid deployWeb apps, simple APIs, rapid prototyping

Exam Tip: Default to the simplest compute that meets requirements. If a workload is a stateless HTTP API with bursty traffic, Cloud Run is almost always the answer. Only choose GKE when you need K8s-specific features (StatefulSets, service mesh, multi-container pods, DaemonSets).

Compute Engine Patterns

When VMs are the right choice, key architecture patterns include:

  • Managed Instance Groups (MIGs) — autoscaling, autohealing, rolling updates. Use regional MIGs for high availability across zones.
  • Sole-Tenant Nodes — physical server isolation for compliance (HIPAA, PCI). Dedicated hardware, no sharing with other tenants.
  • Spot/Preemptible VMs — up to 91% cheaper. Use for fault-tolerant batch, data processing, CI/CD runners. Not for production serving.
  • Custom Machine Types — right-size vCPU and memory independently. Cost optimization when standard types over-provision.
# Create a regional MIG with autoscaling
gcloud compute instance-templates create web-template \
    --machine-type=e2-medium \
    --image-family=debian-12 \
    --image-project=debian-cloud \
    --tags=http-server \
    --metadata=startup-script='#!/bin/bash
apt-get update && apt-get install -y nginx
systemctl start nginx'

gcloud compute instance-groups managed create web-mig \
    --template=web-template \
    --size=2 \
    --region=us-central1 \
    --target-distribution-shape=EVEN

gcloud compute instance-groups managed set-autoscaling web-mig \
    --region=us-central1 \
    --min-num-replicas=2 \
    --max-num-replicas=10 \
    --target-cpu-utilization=0.6 \
    --cool-down-period=90

04. Storage Design

Storage Decision Framework

Storage selection depends on data model (structured/unstructured), access pattern (OLTP/OLAP/streaming), consistency requirements, and scale.

Storage TypeServiceBest ForAnti-Pattern
Object StorageCloud StorageMedia, backups, data lake, static webTransactional data, low-latency lookups
Block StoragePersistent Disk / Local SSDVM file systems, databases on VMsShared file access, object storage use cases
File StorageFilestore / Cloud Storage FUSENFS workloads, shared data, legacy lift-and-shiftHigh-throughput streaming, analytics
Relational (Regional)Cloud SQLWeb apps, CMS, traditional OLTP <10TBGlobal distribution, horizontal scaling >64TB
Relational (Global)Cloud SpannerGlobal finance, inventory, gaming leaderboardsSmall workloads where Cloud SQL suffices
NoSQL (Document)FirestoreMobile/web apps, user profiles, game stateAnalytics, joins, complex transactions
NoSQL (Wide-column)BigtableIoT time-series, ad tech, financial ticks, >1TB<1TB data, complex queries, multi-row transactions
Data WarehouseBigQueryAnalytics, BI, ML, petabyte-scale OLAPOLTP workloads, sub-second point lookups
In-Memory CacheMemorystore (Redis/Memcached)Session store, caching, leaderboards, real-timeDurable primary storage

Database Selection by Access Pattern

Decision Flow: Structured + ACID + <10TB + single region? Cloud SQL.
Structured + ACID + global? Cloud Spanner.
Document/hierarchical + mobile/web? Firestore.
Time-series / wide-column + >1TB? Bigtable.
Analytics/OLAP? BigQuery.
Unstructured blobs? Cloud Storage.

# Cloud Storage lifecycle policy (cost optimization)
gsutil lifecycle set lifecycle.json gs://my-bucket

# lifecycle.json — transition to Nearline after 30 days, Coldline after 90
{
  "rule": [
    {
      "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
      "condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}
    },
    {
      "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
      "condition": {"age": 90, "matchesStorageClass": ["NEARLINE"]}
    },
    {
      "action": {"type": "Delete"},
      "condition": {"age": 365}
    }
  ]
}

05. Network Design

VPC Architecture Patterns

Network design on GCP centers around VPC networks, which are global resources containing regional subnets. The exam tests your understanding of connectivity models, especially for multi-project and hybrid environments.

PatternUse CaseHow It WorksLimitations
Shared VPCMulti-project, centralized network adminHost project shares VPC with service projectsSame org only; max 1 host project per service project
VPC PeeringCross-org or cross-project connectivityDirect RFC1918 connectivity between two VPCsNon-transitive; no overlapping CIDR; max 25 peers
Cloud VPNEncrypted tunnel to on-prem or other cloudsIPsec tunnels over public internetMax 3 Gbps per tunnel (HA VPN); internet-dependent
Cloud InterconnectHigh-bandwidth private on-prem connectivityDedicated (10/100 Gbps) or Partner (50 Mbps-50 Gbps)Higher cost; requires colocation (Dedicated) or partner
Cloud NATOutbound internet for private VMsNAT gateway for VMs without external IPsOutbound only; not a load balancer

Exam Tip: Shared VPC vs VPC Peering: If all projects are in the same organization and you want centralized network control, use Shared VPC. If connecting across organizations or you need decentralized control, use VPC Peering. Remember: VPC Peering is non-transitive (A peers with B, B peers with C, but A cannot reach C through B).

Hybrid Connectivity Decision

RequirementSolutionBandwidthEncryption
Quick setup, low bandwidthClassic VPNUp to 3 Gbps/tunnelIPsec (always)
Production, 99.99% SLAHA VPNUp to 3 Gbps/tunnel (multi-tunnel)IPsec (always)
High bandwidth, consistent latencyDedicated Interconnect10 or 100 Gbps per linkNot encrypted by default (add MACsec)
Moderate bandwidth, no colocationPartner Interconnect50 Mbps to 50 GbpsNot encrypted by default
# Create HA VPN gateway
gcloud compute vpn-gateways create my-ha-vpn \
    --network=my-vpc \
    --region=us-central1

# Create external VPN gateway (on-prem peer)
gcloud compute external-vpn-gateways create on-prem-gw \
    --interfaces 0=203.0.113.1,1=203.0.113.2

# Create Cloud Router for dynamic routing
gcloud compute routers create my-router \
    --network=my-vpc \
    --region=us-central1 \
    --asn=65001

06. Migration Planning

Google Cloud Migration Phases

Google recommends a four-phase migration framework. The exam tests your ability to identify which phase activities belong to and recommend appropriate tools.

  1. Assess — Inventory existing workloads, evaluate TCO, identify dependencies, assess cloud readiness. Tools: Migration Center (formerly StratoZone), Application Discovery.
  2. Plan — Define migration strategy per workload, establish landing zone (org hierarchy, networking, IAM), set up foundation infrastructure. Tools: Cloud Foundation Toolkit, Terraform.
  3. Deploy — Execute migration, validate functionality, run parallel operations. Tools: Migrate to VMs, Migrate to Containers, Database Migration Service.
  4. Optimize — Right-size resources, implement monitoring, automate operations, modernize applications. Tools: Recommender, Active Assist, Cloud Monitoring.

The 6 Rs of Migration

StrategyDefinitionWhen to UseGCP Tooling
Rehost (Lift & Shift)Move as-is to cloud VMsLegacy apps, quick migration, minimal changesMigrate to VMs
ReplatformMinor optimizations during migrationReplace self-managed DB with Cloud SQL, move to managed servicesDatabase Migration Service, Cloud SQL
RefactorRe-architect for cloud-nativeMicroservices, containers, serverless targetGKE, Cloud Run, Cloud Functions
RepurchaseReplace with SaaS/managed serviceEmail to Workspace, CRM to SaaSGoogle Workspace, Marketplace
RetireDecommissionUnused or redundant applicationsN/A
RetainKeep on-premisesRegulatory, latency, or dependency constraintsHybrid with Anthos, Cloud VPN/Interconnect

Common Mistake: Do not default to refactoring everything. The exam rewards pragmatism. A lift-and-shift with post-migration optimization is often the correct answer when the business requirement emphasizes speed or minimal disruption. Refactoring is justified only when the requirements demand cloud-native features (autoscaling, serverless) or when the existing architecture is fundamentally incompatible.

# Terraform landing zone — project factory pattern
module "project-factory" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 15.0"

  name                 = "my-migration-project"
  org_id               = "123456789"
  folder_id            = "folders/456789012"
  billing_account      = "01ABCD-234567-EFGH89"

  activate_apis = [
    "compute.googleapis.com",
    "container.googleapis.com",
    "sqladmin.googleapis.com",
    "monitoring.googleapis.com",
  ]

  shared_vpc         = "host-project-id"
  shared_vpc_subnets = [
    "projects/host-project-id/regions/us-central1/subnetworks/shared-subnet",
  ]
}

07. AI/ML Solution Architecture

Gemini and Agent Builder

The PCA exam now includes AI/ML solution design. As a Cloud Architect, you need to recommend the right AI approach based on the use case, not train models yourself.

Use CaseRecommended ServiceWhy
Chatbot grounded in company docsAgent Builder + Vertex AI SearchManaged RAG, no custom ML pipeline needed
Code generation / developer assistGemini Code AssistIDE integration, security-aware suggestions
Cloud operations assistanceGemini Cloud AssistNatural-language queries about GCP resources, troubleshooting
Custom image classificationVertex AI AutoML VisionDomain-specific labels, no ML expertise required
Text extraction from documentsDocument AIPre-trained processors for invoices, receipts, forms
Translation at scaleCloud Translation API100+ languages, batch and real-time
Custom LLM fine-tuningModel Garden + Vertex AI TuningSupervised fine-tuning, RLHF, adapter tuning on Gemini/open models

Model Garden

Vertex AI Model Garden provides a catalog of Google models (Gemini, Imagen, Chirp), open-source models (Llama, Mistral, Gemma), and partner models (Claude, Cohere). As an architect, you choose based on task, latency, cost, and data governance requirements.

Architecture Decision: Pre-trained API vs AutoML vs Custom Training vs Foundation Model: Use pre-trained APIs for common tasks (Vision, NLP, Translation). Use AutoML when you have labeled data for a domain-specific task. Use custom training only when you need full control over architecture. Use foundation models (Gemini) for generative tasks, and ground them with RAG for enterprise data.

# Deploy a Gemini model endpoint via gcloud
gcloud ai endpoints create \
    --display-name=gemini-endpoint \
    --region=us-central1

# Create an Agent Builder data store
gcloud discovery-engine data-stores create my-datastore \
    --location=global \
    --collection=default_collection \
    --type=CONTENT

# Import documents to the data store
gcloud discovery-engine documents import \
    --data-store=my-datastore \
    --location=global \
    --collection=default_collection \
    --gcs-uri=gs://my-bucket/docs/

08. Exam Tips

Scenario 1: "A company needs to migrate 500 VMs to GCP within 3 months. The apps are legacy .NET and Java with minimal cloud-native readiness..."
Answer: Rehost (lift-and-shift) using Migrate to VMs. Replatform databases to Cloud SQL after initial migration. Timeline rules out refactoring. Scenario 2: "A financial services company needs a globally consistent database for real-time inventory tracking across 4 continents..."
Answer: Cloud Spanner with multi-region configuration. Only GCP database offering global strong consistency with SQL semantics. Cloud SQL read replicas do not provide strong consistency. Scenario 3: "A startup with 2 developers needs to deploy a REST API that handles 0-10,000 RPS with unpredictable traffic..."
Answer: Cloud Run. Scales to zero (cost), handles burst traffic, minimal ops overhead for small team. GKE is overkill for this scenario. Scenario 4: "An enterprise needs to connect 10 GCP projects with centralized network policies managed by a dedicated networking team..."
Answer: Shared VPC. Central host project, service projects for workloads. Networking team manages firewall rules and subnets centrally. VPC Peering would decentralize control. Scenario 5: "A healthcare company wants to build a patient-facing chatbot that answers questions from medical records, with HIPAA compliance..."
Answer: Agent Builder with Vertex AI Search for RAG, grounded in Firestore or Cloud Healthcare API data. VPC-SC perimeter for data protection. BAA with Google Cloud for HIPAA. Do NOT fine-tune models on PHI without proper data governance. General Strategy: The PCA exam values pragmatism over perfection. Choose managed services over self-hosted. Choose serverless when stateless. Choose the simplest solution that meets ALL stated requirements (business + technical + compliance). When two answers are technically valid, the one with lower operational complexity is usually correct.

Previous

PCA Hub

Next Section

02 · Managing and Provisioning Infrastructure

Infrastructure Provisioning