Skip to main content

0023: IAM Service as a Policy and Authorization Claims Engine

Date: 2025-08-17

Status: Accepted

Context

The previous architecture for the iam-service (defined in ADR-0006 and ADR-0020) positioned it as an "OIDC Facade". This design led to significant scope creep, where the service began to reinvent features of a true Identity Provider (IdP). It included its own OIDC protocol endpoints (/authorize, /token), managed OIDC state, and created confusion about its role versus the upstream IdP.

This approach was flawed, leading to:

  • Unnecessary Complexity: The service was becoming a stateful "mini-IdP" instead of a lean microservice.
  • Security Ambiguity: The lines of responsibility for credential management and session management were blurred.
  • Developer Confusion: The "facade" pattern was not clearly understood and was being implemented incorrectly, leading to a system that was difficult to maintain and reason about.

A fundamental reset is required to align the service with core microservice principles of simplicity, clear boundaries, and separation of concerns.

Decision

The iam-service will be refactored to function as a Policy and Authorization Claims Engine. Its role is strictly limited to administrative tasks and runtime authorization policy enforcement. The OIDC Facade model is abandoned.

  1. No OIDC Protocol Endpoints: The iam-service will no longer expose any user-facing OIDC endpoints. The /authorize, /token, /auth-callback, /userinfo, and /jwks.json endpoints will be removed.

  2. Direct Upstream Authentication: All client applications (e.g., shell-app) will perform OIDC authentication flows directly with the upstream IdP (e.g., Keycloak, R-Auth). The upstream IdP is the single source of truth for user identity and authentication.

  3. Token Verification and Header Enrichment Flow: The core runtime responsibility of the iam-service is to verify tokens and provide Citadel-specific authorization claims as HTTP headers. Note: The API Gateway does not modify or "enrich" the token itself. Instead, it verifies the token and adds authorization headers to the request. This flow operates in two modes:

    Standard Mode (API Gateway Verification): a. A client application receives a standard access token (JWT or opaque) from the upstream IdP after a successful login. b. The client makes a request to a protected Citadel API, presenting this token and an X-Active-Tenant-ID header to the API Gateway. c. The API Gateway's claims-enrichment middleware intercepts the request. d. The middleware makes a forwardAuth subrequest to the iam-service's internal /v1/system/enrich-token endpoint, forwarding the original Authorization and X-Active-Tenant-ID headers. e. The iam-service's enrich-token handler receives the subrequest. It is responsible for verifying or introspecting the token against the appropriate upstream IdP (using JWKS validation for JWTs, or introspection endpoint for opaque tokens). f. If the token is valid, the iam-service looks up the user's policy (internal user ID, roles) and verifies they are a member of the hinted tenant. g. The iam-service returns a 200 OK with the authorization claims as HTTP response headers (e.g., X-User-ID, X-Tenant-ID, X-User-Roles). The original token is not modified. h. The API Gateway receives the 200 OK and copies these specific headers from the iam-service's response onto the original client request. i. The API Gateway forwards the original request, now with added authorization headers, to the downstream service (e.g., book-keeper-service).

    Zero-Trust Mode (Downstream Verification): a. When zero-trust mode is enabled, the API Gateway performs no verification of the token. b. The API Gateway simply passes the Authorization header through to the downstream service without modification. c. Each downstream service is responsible for calling the iam-service directly to verify the token and retrieve authorization claims. d. This mode is useful for environments requiring end-to-end verification or when services need to handle token validation independently.

  4. Scoped Role Names for Security: To prevent a malicious tenant from creating a role named "Super Admin" and gaining elevated privileges, tenant-specific roles are scoped.

    • Global Roles (e.g., "Super Admin") are passed by name.
    • Tenant-Specific Roles (e.g., "admin") are passed in the format tenant_id:role_name.
    • Downstream services must parse these scoped names to make authorization decisions.
  5. Administrative API: The iam-service's other primary role is to provide a stable administrative API for managing Citadel-specific resources. This includes:

    • Client Provisioning: Programmatically creating OAuth2 clients in the upstream IdP on behalf of other platform services (e.g., a Developer Portal).
    • Policy Mapping: Managing the mapping between external user identities and internal tenants and roles (e.g., via the POST /users "invitation" flow).
  6. Superseded ADRs: This decision explicitly supersedes ADR-0006: OAuth2 Client Strategy and ADR-0020: IAM Service Facade Philosophy & Boundaries.

Consequences

Positive

  • Drastic Simplification: The iam-service becomes a much smaller, leaner, and more focused service. All complex OIDC protocol logic is removed.
  • Clear Separation of Concerns: The upstream IdP handles authentication. The iam-service handles Citadel-specific authorization policy. The API Gateway handles token validation and routing. These roles are now unambiguous.
  • Improved Security: The attack surface of the iam-service is significantly reduced. It no longer manages sensitive OIDC state or user sessions. The introduction of scoped roles prevents privilege escalation vulnerabilities.
  • Increased Robustness: The runtime authentication flow is more resilient. The iam-service's admin functions can fail without impacting the ability of users to log in and access the application.
  • Gateway Agnostic: The claims enrichment pattern is a standard approach that can be implemented with any modern API Gateway, preserving architectural flexibility.

Negative

  • Requires Significant Refactoring: The existing codebase, tests, and documentation must be heavily refactored to align with this new model.
  • Additional Network Hop: The claims enrichment flow introduces one extra, internal network hop from the API Gateway to the iam-service for every incoming request. This is a known and accepted trade-off for the increased security and separation of concerns.

This ADR represents a fundamental reset of the iam-service architecture, prioritizing simplicity, security, and clear separation of concerns over the previous, overly complex facade model.