Troubleshooting Image upload issues

Problem

This guide addresses failures occurring during the upload of an image. Symptoms include images stuck in a saving state, uploads transitioning to error or killed, or large file transfers (200GB+) resulting in an active status but with zero bytes of data.

Environment

  • Private Cloud Director Virtualization - v2025.10 and Higher

  • Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher

  • Component- Image Library

Deep Dive: Technical Image Upload & Staging Flow

Once an image record is queued, the data transfer begins. This is not a direct write to the backend; it is a multi-step streaming process managed by the Glance API and the pf9-glance-api service.

1. The PUT Data Stream

circle-info

The glance logs needs to be checked on the host with image library role.

The client (PCD UI or CLI) initiates an HTTP PUT request to the Image Service. The Glance API validates the user's token with Keystone before accepting the stream. You can identify the start of this process in the logs by looking for the specific Request ID:

INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Use the existing user token.
INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "PUT /v2/images/[IMAGE_UUID]/file HTTP/1.1" 204

2. Validation & Staging

As the bits arrive, Glance calculates the checksum and the virtual size. The virtual size is critical for ensuring the image fits the project quotas and the destination hypervisor. This can be manually verified on the host.

The data is temporarily written to the Staging Area at /var/lib/glance/os_glance_staging_store/. This staging prevents partial or corrupted files from being committed to the main storage library.

3. Backend Commitment (pf9-glance-api)

After the full file is staged and validated, the pf9-glance-api service moves the data from the staging directory to the final backend storage location (typically /var/lib/glance/images).

4. Status Finalization & Audit

Once the move is confirmed, the status transitions from saving to active. The outcome is recorded in the audit logs:

Procedure

1. Verify Service Health

Confirm the Image Services are active on the library hostvia the UI Service Health section or the CLI:

2. Interrogate the Image State

Check the image status and size to see if any data was actually received.

  • saving: Transfer is in progress or stalled at the staging-to-backend move.

  • killed: The transfer was aborted (check for DisconnectionError in logs).

  • Size 0: The metadata exists, but the data stream failed before the first byte was committed.

3. Check Capacity & Staging Space

Uploads will fail if there is no room to hold the file during the staging process.

circle-info

The <IMAGE_STORAGE_PATH> can be obtained from the cluster blueprint under the Image Library section in the cluster details.

4. Trace the Upload via REQ-ID

Search the Glance API log for the specific failure reason (e.g., timeout, disk full, or auth error).

Most Common Causes

  • Large Volume-to-Image Timeouts: For volumes 200GB+, the transfer often exceeds the default 600s (10-minute) timeout between Cinder and Glance. The client disconnects, resulting in an active image record with 0 bytes.

  • Not enough space on <IMAGE_STORAGE_PATH> to complete the upload.

  • Token Expiration: On slow networks, the Keystone authentication token may expire before a multi-hour transfer completes, causing the final "Commit" to the backend to be rejected.

  • Network Instability: Browser-based uploads (UI) for files >10GB are prone to failure; utilizing the CLI (openstack image create --file ...) provides a more resilient stream.

Last updated