Skip to main content

0058: User Domain Model and PII Strategy

Date: 2025-12-21

Status: Accepted

Context

The user-directory-service is responsible for managing the lifecycle of users across multiple Identity Providers (IdPs). A critical architectural challenge is defining the local data model for a "User" and determining how much Personally Identifiable Information (PII) should be stored within the Citadel platform versus remaining exclusively in the upstream IdP.

Storing PII (Name, Email, Phone) locally creates:

  1. Compliance Liability: Increases the scope for GDPR/CCPA compliance.
  2. Synchronization Complexity: Requires keeping local data in sync with the IdP (the source of truth).

However, not storing any user data locally creates:

  1. Performance Issues: Listing users requires querying the IdP API, which is often slow and rate-limited.
  2. Referential Integrity Issues: Other services (e.g., book-keeper) need a stable foreign key (created_by_user_id) to reference users. IdP identifiers (sub) can change if the IdP is migrated or if the user is re-created.
  3. Join Complexity: It becomes impossible to perform SQL joins between business entities (e.g., Invoices) and Users.

Decision

We will adopt the Hybrid / Pointer Model (Option C) for the User Domain.

  1. Internal "Pointer" Entity: The user-directory-service will maintain a local User entity that acts as a stable pointer to the external identity.

    • id (UUID): The internal, immutable primary key used by all other Citadel services.
    • external_id (String): The unique identifier from the IdP (e.g., sub, oid, uid). This is configurable per adapter.
    • idp_id (String): The routing key identifying which IdP adapter owns this user (e.g., keycloak-default, rauth-staging).
    • tenant_id (UUID): The tenant context for this user.
  2. Pragmatic PII Caching: We will operate in Pragmatic Mode. We will store minimal PII (specifically email and full_name) in the local database.

    • Purpose: This is strictly a read-only cache to enable performant UI displays (e.g., "Created by John Doe") and efficient searching/filtering within the Admin Portal without hammering the IdP API.
    • Source of Truth: The IdP remains the absolute source of truth. Authentication and profile updates must happen at the IdP.
    • Synchronization: The cache is updated during login (via token claims) or via webhooks from the IdP.
  3. Configurable Subject Claim: The specific claim used to populate external_id must be configurable per adapter, as different IdPs use different fields (e.g., Auth0 uses sub, Azure AD uses oid).

Consequences

Positive

  • Stable References: Downstream services can rely on a stable, internal UUID (user_id) that never changes, even if the upstream IdP is swapped.
  • Performance: "List Users" screens in the Admin UI can be served instantly from the local database with pagination and filtering, avoiding slow IdP API calls.
  • SQL Joins: Enables efficient queries like "Show all invoices created by users with email domain @acme.com".

Negative

  • Data Duplication: We are duplicating PII, which requires strict access controls on the user-directory-service database.
  • Sync Latency: The local cache might be slightly stale if a user updates their profile in the IdP and the webhook fails or hasn't arrived yet. This is an acceptable trade-off for display purposes.