0024: IAM Service as a Policy and Claims Enrichment Engine

Date: 2025-08-31

Status: Proposed

Context

The iam-service has historically drifted from its intended purpose. It began as a simple service for managing authorization policies but gradually accumulated responsibilities that belong to a dedicated Identity Provider (IdP). This scope creep included implementing OIDC protocol endpoints (/authorize, /token), managing OIDC state, and storing user profile information.

This drift has led to several problems:

Increased Complexity: The service became a complex, stateful "mini-IdP," making it difficult to maintain, test, and reason about.
Blurred Security Boundaries: The responsibility for user authentication, credential management, and session handling became ambiguous between the iam-service and the upstream IdP.
Developer Confusion: The "OIDC Facade" pattern was not well-understood, leading to incorrect implementations and a fragile system.

A fundamental architectural reset is required to simplify the service, clarify its responsibilities, and align it with a more robust and secure microservices architecture. This ADR defines that new, focused role for the iam-service.

Decision

The iam-service will be refactored to function as a Policy and Claims Enrichment Engine. Its role is strictly limited to administrative tasks and runtime authorization policy enforcement. The OIDC Facade model is abandoned.

This decision is guided by the following architectural principles:

The Upstream IdP is the Single Source of Truth for Identity. All user authentication, credential management, and profile information (PII) is handled exclusively by the upstream Identity Provider. The iam-service will never handle user-facing authentication flows.
The iam-service Database Stores Policy, Not Profiles. The local database is a cache and a place to store Citadel-specific authorization context, such as tenant_id and application-specific roles. It will not store user PII. This keeps the service lean and reduces its security footprint.
The iam-service is a Management API & Policy Engine. Its primary responsibilities are: a. Managing Citadel-specific authorization policies (e.g., which user belongs to which tenant and what roles they have). b. Enriching claims for the API Gateway with this policy information at runtime. c. Provisioning resources (e.g., OAuth2 clients) in the upstream IdP via administrative APIs.
The Upstream clientID is the Canonical Identifier. To avoid confusion and unnecessary mapping, there will be only one public ID for a client application, which is the one assigned by the upstream IdP.

Core Responsibilities

The iam-service has three core responsibilities:

1. Policy Management

The service is the authoritative source for managing Citadel-specific authorization policies. This includes the data models and administrative APIs for Tenants, Users (as policy objects), and Roles.

Data Model:
- Tenant: Represents a customer or a logical isolation boundary.
- UserPolicy: A local record mapping an external user ID to a Tenant and a set of Roles. It does not contain PII.
- Role: A collection of permissions within the Citadel platform.
Administrative APIs: The service exposes a set of RESTful endpoints for CRUD (Create, Read, Update, Delete) operations on these policy objects.

2. Claims Enrichment

At runtime, the service's key responsibility is to enrich authorization tokens with Citadel-specific context.

Process: The API Gateway, after validating a user's token from the upstream IdP, will make a sub-request to the iam-service. The iam-service will then look up the user's policy and return enriched claims.
Enriched Claims: The service will add claims such as x-tenant-id and x-user-roles to be used by downstream services for authorization.

3. IdP Provisioning (Administrative)

The service will perform limited, administrative interactions with the upstream IdP, exclusively for the purpose of provisioning and managing OAuth2 clients. This allows other platform services (like a developer portal) to programmatically manage their clients without having direct credentials to the IdP.

Non-Responsibilities

To ensure clarity and a strict separation of concerns, the iam-service will not:

Handle User Authentication: All user-facing authentication and authorization flows (e.g., login pages, consent screens) are the exclusive responsibility of the upstream IdP.
Store Sensitive User PII: The service will not store any personally identifiable information beyond the user's unique identifier from the IdP.
Act as an OIDC Provider: The service will not implement any part of the OIDC specification. It is a consumer of identity from the upstream IdP, not a provider of it.

Interaction Model (API Gateway)

The primary runtime interaction is between the API Gateway and the iam-service.

`POST /v1/system/enrich-token`

This internal endpoint is called by the API Gateway's forwardAuth middleware after it has successfully validated a user's token.

Input: The request from the API Gateway should contain the full, validated upstream JWT. The iam-service will extract the necessary user identifier (e.g., the sub claim) from this token.
Output (Success): On success, the iam-service returns a 200 OK response with the enriched claims set as HTTP response headers. The response body is ignored.
- X-User-ID: user-abc
- X-Tenant-ID: tenant-123
- X-User-Roles: tenant-123:admin,global-role
Error Handling:
- 401 Unauthorized: Returned if the user's policy is not found in the iam-service database (i.e., the user has not been invited to a tenant).
- 500 Internal Server Error: Returned for any unexpected server-side errors.

Data Flow and Component Interactions

The following sequence describes the runtime data flow for an authenticated request:

Sequence of Events:

Authentication: The end-user authenticates with the upstream IdP via the client application (e.g., the shell-app).
Token Issuance: The IdP issues a standard JWT to the client application.
API Request: The client application makes a request to a protected API endpoint, presenting the upstream JWT to the API Gateway.
Token Validation: The API Gateway validates the received JWT with the upstream IdP (e.g., via its introspection endpoint).
Validation Response: The IdP confirms the token is valid.
Claims Enrichment Request: The API Gateway sends the validated token to the iam-service's /v1/system/enrich-token endpoint.
Policy Lookup: The iam-service extracts the user's unique ID from the token and looks up their associated policies (tenant, roles) in its local database.
Enrichment Response: The iam-service returns the additional claims to the API Gateway.
Internal Token Creation: The API Gateway creates a new, short-lived internal JWT, adding the enriched claims received from the iam-service.
Downstream Request: The API Gateway forwards the request to the appropriate downstream service, replacing the original upstream token with the new internal, enriched token.

Consequences

This section describes the "so what" of the decision. It should list the positive, negative, and neutral consequences of the decision. It should also include any trade-offs that were made.

Positive

Drastic Simplification: The iam-service becomes a much smaller, leaner, and more focused service. All complex OIDC protocol logic is removed, reducing the cognitive load on developers and making the service easier to maintain.
Clear Separation of Concerns: The roles of the upstream IdP, the iam-service, and the API Gateway are now unambiguous. The IdP handles identity; the iam-service handles authorization policy; the Gateway handles routing and token validation.
Improved Security: The attack surface is significantly reduced. The iam-service no longer needs to handle user sessions or sensitive OIDC state. By not storing PII, it becomes less of a target.
Increased Robustness: The runtime authentication flow is more resilient. The iam-service's administrative functions can fail without impacting the ability of users to log in and access the application.
Gateway Agnostic: The claims enrichment pattern is a standard approach that can be implemented with any modern API Gateway (e.g., Traefik, Kong, APISIX), preserving architectural flexibility.

Negative

Requires Significant Refactoring: The existing codebase, tests, and documentation must be heavily refactored to align with this new model.
Additional Network Hop: The claims enrichment flow introduces one extra, internal network hop from the API Gateway to the iam-service. If the upstream IdP requires token introspection (instead of local JWKS validation), this adds another network hop for every incoming request, increasing latency. This is a known and accepted trade-off for the increased security and separation of concerns.

Future Considerations

Advanced Policy Models: This ADR focuses on a foundational RBAC model. Future enhancements could include Attribute-Based Access Control (ABAC) or Relationship-Based Access Control (ReBAC) for more granular authorization.
Policy Administration UI: A dedicated user interface for managing policies (tenants, users, roles) would greatly improve usability.
Integration with External Systems: Consider integrations with external identity governance and administration (IGA) systems or security information and event management (SIEM) tools.
Performance Optimization: As the platform scales, further performance optimizations for the claims enrichment flow may be necessary, such as caching strategies or optimized database queries.
Technical Debt: The current refactoring addresses architectural drift. Future work should focus on code quality, test coverage, and potential performance bottlenecks identified during development.

Context​

Decision​

Core Responsibilities​

1. Policy Management​

2. Claims Enrichment​

3. IdP Provisioning (Administrative)​

Non-Responsibilities​

Interaction Model (API Gateway)​

POST /v1/system/enrich-token​

Data Flow and Component Interactions​

Consequences​

Positive​

Negative​

Future Considerations​