0024: IAM Service as a Policy and Claims Enrichment Engine
Date: 2025-08-31
Status: Proposed
Context
The iam-service has historically drifted from its intended purpose. It began as a simple service for managing authorization policies but gradually accumulated responsibilities that belong to a dedicated Identity Provider (IdP). This scope creep included implementing OIDC protocol endpoints (/authorize, /token), managing OIDC state, and storing user profile information.
This drift has led to several problems:
- Increased Complexity: The service became a complex, stateful "mini-IdP," making it difficult to maintain, test, and reason about.
- Blurred Security Boundaries: The responsibility for user authentication, credential management, and session handling became ambiguous between the
iam-serviceand the upstream IdP. - Developer Confusion: The "OIDC Facade" pattern was not well-understood, leading to incorrect implementations and a fragile system.
A fundamental architectural reset is required to simplify the service, clarify its responsibilities, and align it with a more robust and secure microservices architecture. This ADR defines that new, focused role for the iam-service.
Decision
The iam-service will be refactored to function as a Policy and Claims Enrichment Engine. Its role is strictly limited to administrative tasks and runtime authorization policy enforcement. The OIDC Facade model is abandoned.
This decision is guided by the following architectural principles:
-
The Upstream IdP is the Single Source of Truth for Identity. All user authentication, credential management, and profile information (PII) is handled exclusively by the upstream Identity Provider. The
iam-servicewill never handle user-facing authentication flows. -
The
iam-serviceDatabase Stores Policy, Not Profiles. The local database is a cache and a place to store Citadel-specific authorization context, such astenant_idand application-specificroles. It will not store user PII. This keeps the service lean and reduces its security footprint. -
The
iam-serviceis a Management API & Policy Engine. Its primary responsibilities are: a. Managing Citadel-specific authorization policies (e.g., which user belongs to which tenant and what roles they have). b. Enriching claims for the API Gateway with this policy information at runtime. c. Provisioning resources (e.g., OAuth2 clients) in the upstream IdP via administrative APIs. -
The Upstream
clientIDis the Canonical Identifier. To avoid confusion and unnecessary mapping, there will be only one public ID for a client application, which is the one assigned by the upstream IdP.
Core Responsibilities
The iam-service has three core responsibilities:
1. Policy Management
The service is the authoritative source for managing Citadel-specific authorization policies. This includes the data models and administrative APIs for Tenants, Users (as policy objects), and Roles.
- Data Model:
Tenant: Represents a customer or a logical isolation boundary.UserPolicy: A local record mapping an external user ID to aTenantand a set ofRoles. It does not contain PII.Role: A collection of permissions within the Citadel platform.
- Administrative APIs: The service exposes a set of RESTful endpoints for CRUD (Create, Read, Update, Delete) operations on these policy objects.
2. Claims Enrichment
At runtime, the service's key responsibility is to enrich authorization tokens with Citadel-specific context.
- Process: The API Gateway, after validating a user's token from the upstream IdP, will make a sub-request to the
iam-service. Theiam-servicewill then look up the user's policy and return enriched claims. - Enriched Claims: The service will add claims such as
x-tenant-idandx-user-rolesto be used by downstream services for authorization.
3. IdP Provisioning (Administrative)
The service will perform limited, administrative interactions with the upstream IdP, exclusively for the purpose of provisioning and managing OAuth2 clients. This allows other platform services (like a developer portal) to programmatically manage their clients without having direct credentials to the IdP.
Non-Responsibilities
To ensure clarity and a strict separation of concerns, the iam-service will not:
- Handle User Authentication: All user-facing authentication and authorization flows (e.g., login pages, consent screens) are the exclusive responsibility of the upstream IdP.
- Store Sensitive User PII: The service will not store any personally identifiable information beyond the user's unique identifier from the IdP.
- Act as an OIDC Provider: The service will not implement any part of the OIDC specification. It is a consumer of identity from the upstream IdP, not a provider of it.
Interaction Model (API Gateway)
The primary runtime interaction is between the API Gateway and the iam-service.
POST /v1/system/enrich-token
This internal endpoint is called by the API Gateway's forwardAuth middleware after it has successfully validated a user's token.
-
Input: The request from the API Gateway should contain the full, validated upstream JWT. The
iam-servicewill extract the necessary user identifier (e.g., thesubclaim) from this token. -
Output (Success): On success, the
iam-servicereturns a200 OKresponse with the enriched claims set as HTTP response headers. The response body is ignored.X-User-ID: user-abcX-Tenant-ID: tenant-123X-User-Roles: tenant-123:admin,global-role
-
Error Handling:
401 Unauthorized: Returned if the user's policy is not found in theiam-servicedatabase (i.e., the user has not been invited to a tenant).500 Internal Server Error: Returned for any unexpected server-side errors.
Data Flow and Component Interactions
The following sequence describes the runtime data flow for an authenticated request:
Sequence of Events:
- Authentication: The end-user authenticates with the upstream IdP via the client application (e.g., the
shell-app). - Token Issuance: The IdP issues a standard JWT to the client application.
- API Request: The client application makes a request to a protected API endpoint, presenting the upstream JWT to the API Gateway.
- Token Validation: The API Gateway validates the received JWT with the upstream IdP (e.g., via its introspection endpoint).
- Validation Response: The IdP confirms the token is valid.
- Claims Enrichment Request: The API Gateway sends the validated token to the
iam-service's/v1/system/enrich-tokenendpoint. - Policy Lookup: The
iam-serviceextracts the user's unique ID from the token and looks up their associated policies (tenant, roles) in its local database. - Enrichment Response: The
iam-servicereturns the additional claims to the API Gateway. - Internal Token Creation: The API Gateway creates a new, short-lived internal JWT, adding the enriched claims received from the
iam-service. - Downstream Request: The API Gateway forwards the request to the appropriate downstream service, replacing the original upstream token with the new internal, enriched token.
Consequences
This section describes the "so what" of the decision. It should list the positive, negative, and neutral consequences of the decision. It should also include any trade-offs that were made.
Positive
- Drastic Simplification: The
iam-servicebecomes a much smaller, leaner, and more focused service. All complex OIDC protocol logic is removed, reducing the cognitive load on developers and making the service easier to maintain. - Clear Separation of Concerns: The roles of the upstream IdP, the
iam-service, and the API Gateway are now unambiguous. The IdP handles identity; theiam-servicehandles authorization policy; the Gateway handles routing and token validation. - Improved Security: The attack surface is significantly reduced. The
iam-serviceno longer needs to handle user sessions or sensitive OIDC state. By not storing PII, it becomes less of a target. - Increased Robustness: The runtime authentication flow is more resilient. The
iam-service's administrative functions can fail without impacting the ability of users to log in and access the application. - Gateway Agnostic: The claims enrichment pattern is a standard approach that can be implemented with any modern API Gateway (e.g., Traefik, Kong, APISIX), preserving architectural flexibility.
Negative
- Requires Significant Refactoring: The existing codebase, tests, and documentation must be heavily refactored to align with this new model.
- Additional Network Hop: The claims enrichment flow introduces one extra, internal network hop from the API Gateway to the
iam-service. If the upstream IdP requires token introspection (instead of local JWKS validation), this adds another network hop for every incoming request, increasing latency. This is a known and accepted trade-off for the increased security and separation of concerns.
Future Considerations
- Advanced Policy Models: This ADR focuses on a foundational RBAC model. Future enhancements could include Attribute-Based Access Control (ABAC) or Relationship-Based Access Control (ReBAC) for more granular authorization.
- Policy Administration UI: A dedicated user interface for managing policies (tenants, users, roles) would greatly improve usability.
- Integration with External Systems: Consider integrations with external identity governance and administration (IGA) systems or security information and event management (SIEM) tools.
- Performance Optimization: As the platform scales, further performance optimizations for the claims enrichment flow may be necessary, such as caching strategies or optimized database queries.
- Technical Debt: The current refactoring addresses architectural drift. Future work should focus on code quality, test coverage, and potential performance bottlenecks identified during development.