# Image Service Troubleshooting Guide

## Problem

A troubleshooting guide for image services is needed to address frequent issues with image management in cloud environments, such as image upload failures, slow performance, and incorrect metadata. The guide must provide clear, actionable steps for diagnosing and resolving common errors to ensure the reliability and availability of the image service.

## Environment

* Private Cloud Director Virtualization - v2025.4 and Higher
* Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
* Component - PCD Image service

## Deep Dive

### Image Creation Flow

The image creation process in <code class="expression">space.vars.product\_name</code> is managed by the **Glance** service. The flow begins when a user uploads a new image file, which is then processed and stored.

1. **User Request:** A user initiates an image upload via the OpenStack CLI, <code class="expression">space.vars.product\_name</code> dashboard, or direct API call. The request includes the image file and metadata (e.g., name, format, disk format).
2. **API Service:** The **Glance API** service receives the request, validates the user's authentication token with **Keystone**, and checks for permissions and quotas. Below Glance API logs show the token is being used:

   <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
   </code></pre>
3. **Image Validation and Upload Request:** The glance service validates the [image format](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#image-formats) and confirms that its virtual size (can be fetched using the command "`_$qemu-img info <image_name>.qcow2_`" on glance host) meets the requirements. Then the Image `PUT /v2/images/<IMAGE_UUID>/file` request is placed to upload image data to a temporary staging area, ideally at the default staging location (If default directory changed, then check the custom location) `/var/lib/glance/os_glance_staging_store/`.

   <pre class="language-bash" data-title="Sample Logs:" data-overflow="wrap"><code class="lang-bash">INFO glance.location [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Image format matched and virtual size computed: 41126400
   INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "PUT /v2/images/[IMAGE_UUID]/file HTTP/1.0" 204 468 2.400140
   </code></pre>
4. **Glance API to Registry:** The API service then communicates with the **Glance Registry**, which creates a new entry for the image in the Glance database. The status is set to `queued`.
5. **Glance Service:** The Glance API hands off the request to the **pf9-glance-api** service, which moves the image data from the staging area to the backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location `/var/opt/imagelibrary/data/glance/`.
6. **Status Update:** Once the image is successfully stored, the Glance Store service updates the image's status in the database from `queued` to `active`. The image is now ready for use. The host glance audit logs (`/var/log/pf9/glance-audit.log`) show information about the request Username, Image UUID, outcome, etc.

   <pre class="language-bash" data-title="Sample Logs" data-overflow="wrap"><code class="lang-bash">INFO oslo.messaging.notification.audit.http.response [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] {"message_id": "[Audit_Message_ID]", "publisher_id": "glance-api", "event_type": "audit.http.response", "priority": "INFO", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "eventType": "activity", "id": "[Activity_ID]", "eventTime": "[..]", "action": "update", "outcome": "success", "observer": {"id": "target"}, "initiator": {"id": "[..]", "typeURI": "service/security/account/user", "name": "[USER_NAME]", "credential": {"token": "***", "identity_status": "Confirmed"}, "host": {"address": "127.0.0.1", "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"}, "project_id": "[PROJECT_UUID]"}, "target": {"id": "unknown", "typeURI": "unknown", "name": "unknown"}, "requestPath": "/v2/images/[IMAGE_UUID]/file", "tags": ["correlation_id?value=[..]"], "reason": {"reasonType": "HTTP", "reasonCode": "204"}, "reporterchain": [{"role": "modifier", "reporterTime": "[..]", "reporter": {"id": "target"}}]}, "timestamp": "[..]"}
   </code></pre>

***

### Image Deletion Flow

The deletion process also uses the Glance services to remove the image's data and its database entry.

1. **User Request:** A user sends a deletion request via the OpenStack CLI, <code class="expression">space.vars.product\_name</code> dashboard, or direct API call. The request includes the image information (e.g Image ID or Name).
2. **API Service:** The **Glance API** service receives the request, validates the user's authentication token with **Keystone**, and performs a permission check, and changes the image's status in the database to `deleting`. Below Glance API logs show the token is being used.

   <pre class="language-bash" data-title="Sample Logs" data-overflow="wrap"><code class="lang-bash">INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
   </code></pre>
3. **Glance Service:** The **pf9-glance-api** service receives a message to delete the image from the backend storage. The Glance API hands off the request to the **pf9-glance-api** service, which moves the image data from backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location `/var/opt/imagelibrary/data/glance/`. This is a crucial step that frees up disk space.

   <pre class="language-bash" data-title="Sample Logs" data-overflow="wrap"><code class="lang-bash">INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "DELETE /v2/images/[IMAGE_UUID] HTTP/1.0" 204 468 1.924124
   </code></pre>
4. **Final Status Update:** Once the data is confirmed to be deleted from the backend store, the Glance API removes the image's database entry, completing the deletion process.

## Procedure

The following steps outline how to troubleshoot the image issue.

1. Review image details like status, any errors using the command below:

   <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">$ openstack image show &#x3C;IMAGE_UUID>
   </code></pre>
2. Validate if the glance image endpoints are available and the public endpoint is responding using a curl request. This curl request should return the glance information.

   <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">$ openstack endpoint list --service glance
   $ openstack endpoint list --service glance-cluster
   $ curl -s https://&#x3C;FQDN>/glance/
   </code></pre>
3. Check if the image service is enabled.

   <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">$ openstack service list | grep -i image
   $ openstack service show &#x3C;GLANCE/GLANCE_CLUSTER_UUID>
   </code></pre>
4. The management plane has a **glance-api** pod to provide the image service. Check if the glance-api pod is running in the workload region namespace. Review this pod:

{% hint style="warning" %}
**NOTE**

Step 4 is applicable only for Self-Hosted Private Cloud Director
{% endhint %}

* Check if they are in "`CrashLoopBackOff/OOMkilled/Pending/Error/Init`" state.
* Also, verify if all containers in the pods are Running.
* See the events section in pod describe output.
* Review pods logs using `REQ_ID` or `VM_UUID` for relevant details.

  <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">$ kubectl get pods -o wide -n &#x3C;WORKLOAD_REGION> | grep -i "glance"

  $ kubectl describe -n &#x3C;WORKLOAD_REGION> &#x3C;GLANCE_API_POD>

  $ kubectl logs -n &#x3C;WORKLOAD_REGION> &#x3C;GLANCE_API_POD>
  </code></pre>

5. Validate if the **pf9-glance-api** service is running on the host where glance role is applied.

   <pre class="language-bash" data-overflow="wrap"><code class="lang-bash">$ sudo systemctl status pf9-glance-api
   </code></pre>
6. On the host, review the `/var/log/pf9/glance-api.log` to track the relevant events against a specific image ID.
7. If these steps prove insufficient to resolve the issue, kindly reach out to the [Platform9 Support Team](https://support.platform9.com/hc/en-us) for additional assistance.

## Most common causes

* Ensure that the glance [Pre-Requisites](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#prerequisites) are met.
* While uploading an image `admin.rc` file does not have the `OS_INTERFACE` variable set to the `admin`.
* Incorrect image format. Ref - [Supported Image Format](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#image-formats).
* The `pf9-glance-api` service is down on the underlying host.
* In the case of Self-Hosted PCD, the `--insecure` flag was not used while using the OpenStack command, as the image hosts uses self-signed certificates.
