fix: simplify to single snapshot per workspace

- Remove rotating slots, use single snapshot
- Snapshot name: {owner}-{workspace}-snapshot
- Overwrites on each workspace stop
- Remove snapshot_retention_count variable
- Simpler user choice: restore or fresh
This commit is contained in:
blink-so[bot] 2026-02-05 14:39:31 +00:00
parent 98c1767ffb
commit 2920f0517f
3 changed files with 53 additions and 193 deletions

View File

@ -1,6 +1,6 @@
--- ---
display_name: GCP Disk Snapshot display_name: GCP Disk Snapshot
description: Create and manage disk snapshots for Coder workspaces on GCP with automatic rotation description: Create and manage disk snapshots for Coder workspaces on GCP
icon: ../../../../.icons/gcp.svg icon: ../../../../.icons/gcp.svg
verified: false verified: false
tags: [gcp, snapshot, disk, backup, persistence] tags: [gcp, snapshot, disk, backup, persistence]
@ -8,7 +8,7 @@ tags: [gcp, snapshot, disk, backup, persistence]
# GCP Disk Snapshot Module # GCP Disk Snapshot Module
This module provides disk snapshot functionality for Coder workspaces running on GCP Compute Engine. It automatically creates snapshots when workspaces are stopped and allows users to restore from previous snapshots when starting workspaces. This module provides disk snapshot functionality for Coder workspaces running on GCP Compute Engine. It automatically creates a snapshot when workspaces are stopped and allows users to restore from the snapshot when starting.
```tf ```tf
module "disk_snapshot" { module "disk_snapshot" {
@ -24,22 +24,12 @@ module "disk_snapshot" {
## Features ## Features
- **Automatic Snapshots**: Creates disk snapshots when workspaces are stopped - **Automatic Snapshots**: Creates a disk snapshot when workspaces are stopped
- **Rotating Slots**: Maintains up to N snapshot slots (configurable, default: 3) - **Single Snapshot**: Maintains one snapshot per workspace (overwrites on each stop)
- **Snapshot Selection**: Users can choose from available snapshots when starting workspaces - **Restore Option**: Users can choose to restore from snapshot or start fresh
- **Default to Newest**: Automatically selects the most recent snapshot by default - **Default to Restore**: Automatically selects restore if a snapshot exists
- **Pure Terraform**: No external CLI dependencies (gcloud not required) - **Pure Terraform**: No external CLI dependencies
- **Workspace Isolation**: Snapshots are labeled and filtered by workspace and owner - **Workspace Isolation**: Snapshots are named and labeled by workspace and owner
## How It Works
The module uses a **rotating slot** approach:
1. Snapshots are named with predictable slot names: `{owner}-{workspace}-slot-1`, `slot-2`, `slot-3`
2. When a workspace stops, a new snapshot is created in the next available slot
3. Once all slots are full, the oldest slot is reused (round-robin)
4. Users can select from any available snapshot when starting the workspace
5. By default, the most recent snapshot is selected
## Usage ## Usage
@ -72,25 +62,6 @@ resource "google_compute_disk" "workspace" {
} }
``` ```
### With Custom Retention
```hcl
module "disk_snapshot" {
source = "registry.coder.com/coder-labs/gcp-disk-snapshot/coder"
disk_self_link = google_compute_disk.workspace.self_link
default_image = "debian-cloud/debian-12"
zone = var.zone
project = var.project_id
snapshot_retention_count = 2 # Keep only 2 snapshot slots
labels = {
environment = "development"
team = "engineering"
}
}
```
### With Regional Storage ### With Regional Storage
```hcl ```hcl
@ -101,74 +72,29 @@ module "disk_snapshot" {
default_image = "debian-cloud/debian-12" default_image = "debian-cloud/debian-12"
zone = var.zone zone = var.zone
project = var.project_id project = var.project_id
storage_locations = ["us-central1"] # Store snapshots in specific region storage_locations = ["us-central1"] # Store snapshot in specific region
labels = {
environment = "development"
team = "engineering"
}
} }
``` ```
## Variables ## How It Works
| Name | Description | Type | Default | Required | 1. When a workspace stops, a snapshot is created with a predictable name: `{owner}-{workspace}-snapshot`
| ------------------------ | ----------------------------------------------------------- | ------------ | ------- | :------: | 2. The snapshot is overwritten each time the workspace stops
| disk_self_link | The self_link of the disk to create snapshots from | string | - | yes | 3. When starting, users can choose to restore from the snapshot or start fresh
| default_image | The default image to use when not restoring from a snapshot | string | - | yes | 4. If a snapshot exists, restore is selected by default
| zone | The zone where the disk resides | string | - | yes |
| project | The GCP project ID | string | - | yes |
| snapshot_retention_count | Number of snapshot slots to maintain (1-3) | number | 3 | no |
| storage_locations | Cloud Storage bucket location(s) for snapshots | list(string) | [] | no |
| labels | Additional labels to apply to snapshots | map(string) | {} | no |
| test_mode | Skip GCP API calls for testing | bool | false | no |
## Outputs
| Name | Description |
| ---------------------- | ------------------------------------------------------- |
| snapshot_self_link | Self link of the selected snapshot (null if fresh disk) |
| use_snapshot | Whether a snapshot is being used |
| default_image | The default image configured |
| selected_snapshot_name | Name of the selected snapshot |
| available_snapshots | List of available snapshot names |
| created_snapshot_name | Name of snapshot created on stop |
| snapshot_slots | The snapshot slot names used for rotation |
## Required IAM Permissions ## Required IAM Permissions
The service account running Terraform needs the following permissions: The service account running Terraform needs:
```json - `compute.snapshots.create`
{ - `compute.snapshots.delete`
"permissions": [ - `compute.snapshots.get`
"compute.snapshots.create", - `compute.disks.createSnapshot`
"compute.snapshots.delete",
"compute.snapshots.get",
"compute.snapshots.list",
"compute.snapshots.setLabels",
"compute.disks.createSnapshot"
]
}
```
Or use the predefined role: `roles/compute.storageAdmin` Or use the predefined role: `roles/compute.storageAdmin`
## Considerations
- **Cost**: Snapshots incur storage costs. The rotating slot approach limits the number of snapshots.
- **Slot Naming**: Snapshots use predictable names (`-slot-1`, `-slot-2`, etc.) for rotation
- **Time**: Snapshot creation takes time; workspace stop operations may take longer
- **Permissions**: Ensure proper IAM permissions for snapshot management
- **Region**: Snapshots can be stored regionally for cost optimization
- **Lifecycle**: Use `ignore_changes = [snapshot, image]` on disks to prevent Terraform conflicts
## Comparison with Machine Images
This module uses _disk snapshots_ rather than _machine images_:
| Feature | Disk Snapshots | Machine Images |
| ----------- | ------------------------ | ---------------------------- |
| API Status | GA (stable) | Beta |
| Captures | Disk data only | Full instance config + disks |
| Cleanup | Rotating slots (simple) | Manual or custom automation |
| Cost | Lower | Higher |
| Restore | Requires instance config | Full instance restore |
| List/Filter | Limited in Terraform | Limited in Terraform |
For most Coder workspace use cases, disk snapshots are recommended as they capture the persistent data while the instance configuration is managed by Terraform.

View File

@ -70,7 +70,6 @@ describe("gcp-disk-snapshot", async () => {
zone: "us-central1-a", zone: "us-central1-a",
project: "test-project", project: "test-project",
test_mode: true, test_mode: true,
snapshot_retention_count: 2,
storage_locations: JSON.stringify(["us-central1"]), storage_locations: JSON.stringify(["us-central1"]),
labels: JSON.stringify({ labels: JSON.stringify({
environment: "test", environment: "test",
@ -78,18 +77,4 @@ describe("gcp-disk-snapshot", async () => {
}), }),
}); });
}); });
it("validates retention count range", async () => {
await expect(
runTerraformApply(import.meta.dir, {
disk_self_link:
"projects/test-project/zones/us-central1-a/disks/test-disk",
default_image: "debian-cloud/debian-12",
zone: "us-central1-a",
project: "test-project",
test_mode: true,
snapshot_retention_count: 5, // Invalid: max is 3
}),
).rejects.toThrow();
});
}); });

View File

@ -15,7 +15,6 @@ terraform {
# Provider configuration for testing only # Provider configuration for testing only
# In production, the provider will be inherited from the calling module # In production, the provider will be inherited from the calling module
# Note: Using fake credentials for CI testing - Terraform will still validate syntax
provider "google" { provider "google" {
project = "test-project" project = "test-project"
region = "us-central1" region = "us-central1"
@ -69,17 +68,6 @@ variable "labels" {
default = {} default = {}
} }
variable "snapshot_retention_count" {
description = "Number of snapshots to retain (1-3, default: 3). Uses rotating snapshot slots."
type = number
default = 3
validation {
condition = var.snapshot_retention_count >= 1 && var.snapshot_retention_count <= 3
error_message = "snapshot_retention_count must be between 1 and 3."
}
}
variable "storage_locations" { variable "storage_locations" {
description = "Cloud Storage bucket location to store the snapshot (regional or multi-regional)" description = "Cloud Storage bucket location to store the snapshot (regional or multi-regional)"
type = list(string) type = list(string)
@ -96,52 +84,32 @@ locals {
normalized_owner_name = lower(replace(replace(data.coder_workspace_owner.me.name, "/[^a-z0-9-_]/", "-"), "--", "-")) normalized_owner_name = lower(replace(replace(data.coder_workspace_owner.me.name, "/[^a-z0-9-_]/", "-"), "--", "-"))
normalized_template_name = lower(replace(replace(data.coder_workspace.me.template_name, "/[^a-z0-9-_]/", "-"), "--", "-")) normalized_template_name = lower(replace(replace(data.coder_workspace.me.template_name, "/[^a-z0-9-_]/", "-"), "--", "-"))
# Base name for snapshots - uses rotating slots (1, 2, 3) # Single snapshot name per workspace
snapshot_base_name = "${local.normalized_owner_name}-${local.normalized_workspace_name}" snapshot_name = "${local.normalized_owner_name}-${local.normalized_workspace_name}-snapshot"
# Snapshot slot names (fixed, predictable names for rotation)
snapshot_slot_names = [
for i in range(var.snapshot_retention_count) : "${local.snapshot_base_name}-slot-${i + 1}"
]
} }
# Try to read existing snapshots to determine which slots are used # Try to read existing snapshot for this workspace
# This data source will fail gracefully if snapshot doesn't exist data "google_compute_snapshot" "workspace_snapshot" {
data "google_compute_snapshot" "existing_snapshots" { count = var.test_mode ? 0 : 1
for_each = var.test_mode ? toset([]) : toset(local.snapshot_slot_names) name = local.snapshot_name
name = each.value project = var.project
project = var.project
} }
locals { locals {
# Determine which snapshots actually exist (have data) # Check if snapshot exists
existing_snapshot_names = var.test_mode ? [] : [ snapshot_exists = var.test_mode ? false : can(data.google_compute_snapshot.workspace_snapshot[0].self_link)
for name, snapshot in data.google_compute_snapshot.existing_snapshots : name
if can(snapshot.self_link)
]
# Sort by creation timestamp to find newest (for default selection) # Default to using snapshot if it exists
# Since we can't easily sort in Terraform without timestamps, we'll use slot order default_restore = local.snapshot_exists ? "snapshot" : "none"
# Slot with highest number that exists is likely newest
available_snapshots = reverse(sort(local.existing_snapshot_names))
# Default to newest available snapshot
default_snapshot = length(local.available_snapshots) > 0 ? local.available_snapshots[0] : "none"
# Calculate next slot to use (round-robin)
# Count existing snapshots and use next slot, or slot 1 if all are full
next_slot_index = length(local.existing_snapshot_names) >= var.snapshot_retention_count ? 0 : length(local.existing_snapshot_names)
next_snapshot_name = local.snapshot_slot_names[local.next_slot_index]
} }
# Parameter to select from available snapshots # Parameter to choose whether to restore from snapshot
# Defaults to the most recent snapshot
data "coder_parameter" "restore_snapshot" { data "coder_parameter" "restore_snapshot" {
name = "restore_snapshot" name = "restore_snapshot"
display_name = "Restore from Snapshot" display_name = "Restore from Snapshot"
description = "Select a snapshot to restore from. Defaults to the most recent snapshot." description = "Restore workspace from the last snapshot, or start fresh."
type = "string" type = "string"
default = local.default_snapshot default = local.default_restore
mutable = true mutable = true
order = 1 order = 1
@ -152,26 +120,23 @@ data "coder_parameter" "restore_snapshot" {
} }
dynamic "option" { dynamic "option" {
for_each = local.available_snapshots for_each = local.snapshot_exists ? [1] : []
content { content {
name = option.value name = "Restore from snapshot"
value = option.value value = "snapshot"
description = "Restore from snapshot: ${option.value}" description = "Restore from: ${local.snapshot_name}"
} }
} }
} }
# Determine which snapshot to use
locals { locals {
use_snapshot = data.coder_parameter.restore_snapshot.value != "none" use_snapshot = data.coder_parameter.restore_snapshot.value == "snapshot" && local.snapshot_exists
selected_snapshot = local.use_snapshot ? data.coder_parameter.restore_snapshot.value : null
} }
# Create snapshot when workspace is stopped # Create/update snapshot when workspace is stopped
# Uses the next available slot in rotation
resource "google_compute_snapshot" "workspace_snapshot" { resource "google_compute_snapshot" "workspace_snapshot" {
count = !var.test_mode && data.coder_workspace.me.transition == "stop" ? 1 : 0 count = !var.test_mode && data.coder_workspace.me.transition == "stop" ? 1 : 0
name = local.next_snapshot_name name = local.snapshot_name
source_disk = var.disk_self_link source_disk = var.disk_self_link
zone = var.zone zone = var.zone
project = var.project project = var.project
@ -183,19 +148,13 @@ resource "google_compute_snapshot" "workspace_snapshot" {
coder_owner = local.normalized_owner_name coder_owner = local.normalized_owner_name
coder_template = local.normalized_template_name coder_template = local.normalized_template_name
workspace_id = data.coder_workspace.me.id workspace_id = data.coder_workspace.me.id
slot_number = tostring(local.next_slot_index + 1)
}) })
lifecycle {
# Allow replacing snapshots in the same slot
create_before_destroy = false
}
} }
# Outputs # Outputs
output "snapshot_self_link" { output "snapshot_self_link" {
description = "The self_link of the selected snapshot to restore from (null if using fresh disk)" description = "The self_link of the snapshot to restore from (null if not using snapshot)"
value = local.use_snapshot && !var.test_mode ? "projects/${var.project}/global/snapshots/${local.selected_snapshot}" : null value = local.use_snapshot ? data.google_compute_snapshot.workspace_snapshot[0].self_link : null
} }
output "use_snapshot" { output "use_snapshot" {
@ -208,22 +167,12 @@ output "default_image" {
value = var.default_image value = var.default_image
} }
output "selected_snapshot_name" { output "snapshot_name" {
description = "The name of the selected snapshot (null if using fresh disk)" description = "The name of the workspace snapshot"
value = local.selected_snapshot value = local.snapshot_name
} }
output "available_snapshots" { output "snapshot_exists" {
description = "List of available snapshot names for this workspace" description = "Whether a snapshot exists for this workspace"
value = local.available_snapshots value = local.snapshot_exists
}
output "created_snapshot_name" {
description = "The name of the snapshot created when workspace stopped (if any)"
value = !var.test_mode && data.coder_workspace.me.transition == "stop" ? local.next_snapshot_name : null
}
output "snapshot_slots" {
description = "The snapshot slot names used for rotation"
value = local.snapshot_slot_names
} }