Skip to main content

Design Decisions

This document captures key architectural decisions and answers open questions regarding the system's design, security, and user interaction models. It serves as a guide for development and ensures alignment with the project's goals.


1. What is the strategy for the User Interface?

The project will maintain an API-first approach. The core deliverable is a robust, well-documented REST API and a potential Command-Line Interface (CLI) for administrative tasks.

Decision: A graphical user interface (UI) for making journal entries is a separate concern from the core engine.

  • Decoupling: A dedicated UI (e.g., a Single-Page Application using React, Vue, or Angular) will be developed as a separate project. This decouples the frontend and backend development cycles, allowing teams to work in parallel.
  • Interface Contract: The OpenAPI specification will serve as the strict contract between the backend and any potential UI client.
  • Priority: The development of the API as defined in the milestones is the highest priority. A UI can be considered a future milestone or a separate product that consumes this API.

2. How will Authentication and Authorization be handled?

A hybrid approach is recommended, leveraging the strengths of both an API Gateway/Sidecar and in-application logic. This provides layered security without sacrificing the ability to perform fine-grained checks.

Decision:

  1. Authentication (AuthN): This will be handled at the edge by an API Gateway or a service mesh sidecar. The gateway will be responsible for validating OIDC-compliant JWTs from an external Identity Provider. It will reject any unauthenticated requests.
  2. Authorization (AuthZ): This will be a two-step process:
    • Coarse-Grained (Gateway): The gateway can perform initial role checks if desired (e.g., reject requests from users who lack a basic book_keeper_user role).
    • Fine-Grained (In-Application): The gateway will pass the validated user claims (from the JWT) to the application. The application code, specifically within the command handlers and domain services, will use these claims to perform detailed, context-aware authorization.

This strategy is already supported by the get_current_user dependency in api/deps.py, which is designed to extract claims from a Bearer token.


3. What will the interface for reports be?

The system will support a multi-faceted reporting strategy to cater to different needs, from programmatic access to interactive business intelligence.

Decision:

  • REST API for Structured Reports: The API endpoints (GET /api/book-keeper/v1/reports/...) are the primary interface for core, structured reports like the General Ledger and Trial Balance. This is essential for integrations and custom front-ends.
  • Direct Database Access for BI Tools: For ad-hoc analysis and visualization, Business Intelligence (BI) tools (e.g., Metabase, Tableau, Power BI) will be given direct, read-only access to the PostgreSQL read-side database. The denormalized projection tables are optimized for this exact purpose.
  • Elasticsearch for Analytics: The future integration with Elasticsearch will power advanced search and analytics, which can be connected to tools like Kibana or Grafana for powerful dashboarding.

4. What authorization model will be used internally?

A combination of RBAC and ABAC will be used to provide security that is both simple to manage and powerful enough for complex financial rules.

Decision:

  • RBAC (Role-Based Access Control) for Endpoint Access: Use roles from the user's token for coarse-grained protection at the API endpoint level. For example, a FastAPI dependency can ensure that only users with the Accountant role can call the POST /api/book-keeper/v1/journal-entries endpoint. This is a simple and effective first line of defense.

  • ABAC (Attribute-Based Access Control) for Business Logic: Inside the command handlers, implement fine-grained authorization using attributes from both the user and the data.

    • Example: A command handler would receive the RecordJournalEntryCommand and the current user's claims. It can then enforce a rule like:

      "A user can post a journal entry if the user's department attribute matches the entry's department and the entry's amount is less than the user's approval_limit attribute."

This layered approach ensures that business-critical rules reside within the domain and application layers, where they have the most context.


5. What technology will be used for the authorization engine?

Context: The system requires a fine-grained authorization model supporting a mix of RBAC and ABAC for a multi-tenant (multi-national/company) environment. The decision involves choosing between building a custom solution or adopting an existing engine like Casbin, OPA, or SpiceDB.

Decision: Casbin will be adopted as the internal authorization engine.

  • Why Casbin?

    • Balanced Power & Simplicity: Casbin provides a powerful policy model (PERM) that can express the required RBAC and ABAC rules without the steep learning curve of OPA's Rego language.
    • Excellent Python Integration: It has a mature Python library that can be integrated directly into the application layer (e.g., within command handlers).
    • Flexible Policy Storage: Casbin policies can be stored in the existing PostgreSQL database using an official adapter. This avoids adding a new, separate service to the stack, simplifying deployment and management.
    • Multi-Tenancy Support: The policy model can be easily extended to support multi-tenancy by adding a tenant_id or company_id as a domain in the policy rules, which is ideal for a multi-national setup.
  • Comparison to Alternatives:

    • OPA (Open Policy Agent): While extremely powerful, it was considered too complex for the initial implementation phase. The overhead of learning and managing Rego policies was deemed a potential risk to development velocity.
    • SpiceDB (Zanzibar-style): This is a superior solution for relationship-based access control (ReBAC), but the project's immediate needs are more attribute-based. SpiceDB might be considered in the future if complex graph-based permissions become a primary requirement.
    • Custom Code: A custom solution was rejected to avoid coupling policy logic with application code, which makes policies hard to audit and update.

6. How will other microservices interact with Book Keeper?

Context: Other business-critical microservices (e.g., Invoicing, Payroll, Orders) will need to record financial transactions in the general ledger managed by Book Keeper. The question is whether they should do this by calling Book Keeper's API directly (orchestration) or by emitting domain events that Book Keeper consumes (choreography).

Decision: The primary integration pattern will be Event-Driven Choreography, with the API serving as a secondary or administrative interface.

  • Primary Pattern (Choreography):

    • External services will publish business-significant domain events to the application's event bus (e.g., InvoiceIssued, PaymentReceived, PayrollRunCompleted). The primary implementation for this bus is NATS.
    • These events should contain business data, not pre-formatted accounting entries. For example, PayrollRunCompleted would contain employee_id, net_pay, tax_withheld, etc.
    • Book Keeper will have dedicated internal subscribers (event handlers) that listen for these external events.
    • These subscribers are responsible for translating the business event into the appropriate RecordJournalEntryCommand and dispatching it internally.
  • Why Choreography is Preferred:

    • Loose Coupling: The Invoicing service doesn't need to know about accounting principles or the Book Keeper's API schema. This respects the boundaries of each Bounded Context.
    • Resilience: If the Book Keeper service is temporarily unavailable, events will queue in NATS. Once Book Keeper is back online, it can process the backlog without data loss, and the originating service's primary function is not blocked.
    • Centralized Accounting Logic: All logic for creating journal entries remains centralized within the Book Keeper context, preventing accounting rules from leaking into other services.
  • Secondary Pattern (API Orchestration):

    • The POST /api/book-keeper/v1/journal-entries endpoint will still be available.
    • This is useful for administrative tools, manual entry UIs, or specific, tightly-coupled scenarios where immediate feedback from the ledger is required. However, it should not be the default choice for inter-service communication.

7. How will Book Keeper consume and scale with external events?

Context: As an event-driven service, Book Keeper must consume events from numerous other microservices. This raises questions about scalability, maintenance, and how to prevent the internal domain from becoming polluted by external data models.

Decision: An Anti-Corruption Layer (ACL) will be implemented as the primary entry point for all external events.

  • Architecture:

    1. Event Dispatcher: A central component will subscribe to a list of required external event topics on the event bus. It will act as a router, forwarding incoming events to the correct translator based on event type.
    2. Translators: For each external event (e.g., PayrollRunCompleted), a dedicated Translator class will be created. This class is responsible for:
      • Validating the incoming external event's schema.
      • Translating the external data model into a Book Keeper-native RecordJournalEntryCommand.
      • Dispatching this internal command to the application's command bus.
    3. Domain Purity: This pattern ensures that the core Ledger Context (application and domain layers) remains completely decoupled from and ignorant of the structure of any external service.
  • Scalability:

    • The system will leverage the consumer group functionality of the event bus (e.g., "Durable Consumers" in NATS JetStream or standard "Consumer Groups" in Kafka).
    • Pluggable Architecture for Extensibility: To avoid tight coupling and enable deep customization, the system supports a plugin model for adding new Translator components.
      • Discovery Mechanism: The application uses Python's standard entry_points mechanism to discover plugins. The specific entry point for this feature is book_keeper.translators.
      • Plugin Implementation: A plugin is a standard, installable Python package. It advertises one or more ITranslator classes via the entry point in its pyproject.toml.
      • Automatic Registration: During application startup, the dependency injection system automatically discovers these entry points, loads the translator classes, registers them with the DI container, and injects instances into the EventDispatcher.
      • Benefits: This approach fully decouples the core application from its integrations. New external events can be supported by simply installing a new plugin package, with no changes required to the Book Keeper codebase. This has been proven by refactoring all built-in translators into a single example plugin.

8. How will multi-national concerns (Countries, Currencies, Compliance, Taxes) be handled?

Context: The system must support operations in multiple countries, each with its own currency, tax laws, and compliance regulations. The design must be flexible enough to accommodate new countries and changing rules without requiring constant code changes.

Decision: A hybrid data-and-code approach will be implemented, centered within the Compliance Bounded Context. The principle is to store rules as data and the logic to execute those rules as code.

The Compliance context will operate asynchronously. It will not block the creation of journal entries but will instead react to them after they have been posted.

  • Rules as Data:

    • The application database will contain tables for storing configuration that varies by country or region. This data can be updated by administrators without a new application deployment.
    • Examples:
      • countries (ISO 3166-1 codes, default currency)
      • currencies (ISO 4217 codes, decimal precision)
      • tax_rates (country_code, tax_type, rate, effective_date)
      • fiscal_periods (country_code, start_date, end_date)
  • Logic as Code:

    • The Compliance Context will act as a "rules engine" by consuming events from the Ledger context.
    • It will subscribe to JournalEntryPosted events from the event bus.
    • Upon receiving an event, its internal services (e.g., ComplianceValidationService) will apply the relevant rules.
    • If a transaction is found to be non-compliant, this context is responsible for initiating a compensating action (e.g., publishing a TransactionFailedCompliance event or creating a reversing journal entry).
  • Benefit: This asynchronous, event-driven approach makes the system more resilient and scalable. The core ledger is not blocked by compliance checks, and the two domains are fully decoupled.

  • Management: The set of events the service listens to is explicitly managed by the collection of Translator classes within the ACL. To support a new external event, a developer simply adds a new translator.


9. What is the storage strategy for the write-side (Ledger & Event Store)?

Context: The system's write-side, based on Event Sourcing, has two critical storage needs: persisting the stream of immutable events (the Event Store) and, potentially, using a specialized ledger for high-performance transaction processing. The architecture must be flexible to accommodate different performance, operational, and cost requirements.

Decision: A swappable, adapter-based architecture will be used for all write-side storage, allowing the specific technology to be chosen via configuration.

  • Event Store Strategy:

    • Interface: An IEventStore interface will define the contract for saving and retrieving event streams for aggregates.
    • Default Implementation (PostgreSQL): The default implementation will use a PostgreSQL table as the event store. This provides strong transactional guarantees and simplifies the initial technology stack.
    • High-Throughput Options: For scenarios requiring higher performance and features specifically designed for event sourcing, adapters for dedicated event stores will be developed.
      • KurrentDB: A mature, standalone event store that offers a rich feature set specifically for event sourcing patterns. An adapter has been implemented using the kurrentdbclient library, making it a fully supported and swappable backend.
  • Ledger Storage Strategy (for core transaction processing):

    • Interface: An ILedgerStorageService interface will define the contract for core, high-speed financial transaction operations.
    • Default Implementation (PostgreSQL): The PostgresLedgerAdapter uses standard SQL transactions to ensure atomicity. This is suitable for a wide range of applications.
    • High-Performance Option (TigerBeetle): For use cases demanding extreme throughput and financial-grade safety, the TigerBeetleLedgerAdapter has been implemented. This allows the core double-entry logic to be offloaded to a specialized, high-performance database.

Rationale: This approach provides maximum flexibility. A project can start with a simple, all-PostgreSQL setup for ease of deployment and later scale up by swapping in specialized databases like EventStoreDB or TigerBeetle for specific components without rewriting the core application logic.


10. What is the strategy for the Event Bus?

Context: The system relies on an event bus for asynchronous communication between its own components (e.g., write-side to read-side projectors) and for consuming events from external microservices. The choice of event bus impacts performance, reliability, and operational complexity.

Decision: The system will be architected around a swappable IEventBus interface, allowing the concrete implementation to be chosen via configuration.

  • Primary Choice (NATS with JetStream): NATS is the recommended default due to its high performance, operational simplicity, and strong support for the required features like persistence, ordering, and durable consumer groups (JetStream).
  • Alternative Option (Apache Kafka): For environments where Kafka is already the established standard, an adapter will be provided. This ensures that Book Keeper can integrate seamlessly into existing enterprise ecosystems.

Rationale: By depending on an abstraction (IEventBus), the application's core logic remains decoupled from the specific messaging technology. This allows deployment flexibility and future-proofing against changing technology landscapes.


11. How do other services consume data from Book Keeper?

Context: For a truly integrated ecosystem, other microservices (e.g., a Treasury service or a FinancialPlanning service) may need to react to financial events as they are officially recorded in the ledger.

Decision: Book Keeper will publish its own validated domain events to the public event bus for other services to consume.

  • Event Publishing: After a command is successfully processed and its resulting domain events are persisted to the event store, a dedicated "Event Publisher" component will publish key events (e.g., JournalEntryPosted, AccountDebited, AccountCredited) to well-defined public topics on the event bus (NATS/Kafka).
  • External Read Models: This allows other services to build their own specialized read models based on the official financial record from Book Keeper. For example, a Treasury service could listen to AccountCredited events for a specific bank account to monitor cash flow in real-time.
  • Decoupling: This approach is highly decoupled. Book Keeper does not need to know anything about its consumers. It simply publishes its facts, and any authorized service can subscribe to them.

12. How is the Chart of Accounts (CoA) managed?

Context: Each organization (tenant) requires its own unique Chart of Accounts (CoA), which is often structured as a hierarchy. This is a classic "master data" problem. The book_keeper service needs to use account codes from the CoA without being responsible for managing the CoA itself.

Decision: The management of the Chart of Accounts will be handled by a dedicated, external Accounts Master microservice with its own codebase and repository. book_keeper will act as a client to this service.

  • Ledger Context Responsibility: The Ledger context's sole responsibility is to record balanced double-entry transactions. It treats an account_code as an opaque string identifier and does not validate its existence or position within a hierarchy. This adheres to the "do one thing well" principle.

  • Accounts Master Service Responsibility: This external service is the single source of truth for creating, managing, and querying the Chart of Accounts for all tenants. It will publish events like AccountCreated and AccountUpdated to the event bus.

  • Interaction Model (Composition & Choreography):

    1. Client-Side Composition: UIs or other client services are expected to first query the Accounts Master API to get a list of valid accounts. They then use the selected account_code when calling the book_keeper API to record a transaction.
    2. Backend Choreography: The Reporting context within book_keeper will subscribe to events from both its own Ledger context (JournalEntryPosted) and the external Accounts Master service (AccountCreated, AccountUpdated). This allows it to build rich, human-readable reports (e.g., a Trial Balance with full account names) and to perform asynchronous reconciliation.

Rationale: This approach provides maximum flexibility and scalability.

  • Flexibility: Tenants can define a strict, formal CoA or use simple, ad-hoc account codes. The core engine supports both.
  • Decoupling: The book_keeper and Accounts Master services are fully decoupled and can be developed, deployed, and scaled independently, increasing system resilience.
  • Domain Purity: The core Ledger domain is not polluted with the concerns of CoA management, keeping its logic clean and focused.

13. How are Account Codes validated during transaction recording?

Context: When a transaction is submitted to the book_keeper API, the account_codes provided must eventually correspond to valid accounts in the external Accounts Master service. The question is whether this validation should happen synchronously (at the time of API call) or asynchronously (after the transaction is recorded).

Decision: Validation will be performed asynchronously through event-driven reconciliation. The book_keeper API will not perform a synchronous, blocking call to the Accounts Master service to validate account codes.

  • Write-Side Behavior (book_keeper service): The POST /api/book-keeper/v1/journal-entries endpoint will accept any transaction as long as it is internally balanced (debits equal credits). It treats account_code as an opaque identifier and persists the JournalEntryPosted event without validating the code against an external system.

  • Read-Side Behavior (Reporting context): The Reporting context will subscribe to events from both book_keeper (JournalEntryPosted) and the Accounts Master service (AccountCreated, AccountUpdated). It will be responsible for building reports that join this data. This process naturally reveals any JournalEntryPosted events that reference an account_code for which no corresponding AccountCreated event has been received.

Rationale: This choice prioritizes service availability and loose coupling over immediate consistency.

  • High Availability: book_keeper can continue to record transactions even if the Accounts Master service is temporarily unavailable or slow. This is critical for a core system.
  • Performance: It avoids adding a synchronous network call to the critical path of recording a transaction, resulting in lower latency.
  • Architectural Consistency: This aligns perfectly with Decision #6 (Event-Driven Choreography) and Decision #12 (External Accounts Master), which establish a decoupled, autonomous service architecture.

Consequences:

  • The system operates under an "eventual consistency" model for account code validation.
  • A separate business process must be defined to handle reconciliation failures (e.g., an "Unrecognized Account" report that alerts an accountant to correct the entry). This is a known and accepted trade-off.

14. What is the development strategy for the Reporting Engine given external dependencies?

Context: The Reporting Engine (Milestone 2) is planned to consume events from the external Accounts Master service to provide rich, user-friendly reports (e.g., with account names). However, this service does not yet exist and its development is on a separate track.

Decision: The Reporting Engine will be developed in two stages to allow for incremental progress and de-risk the dependency.

  • Stage 1 (Initial Implementation): The projectors and APIs for reports (General Ledger, Trial Balance) will be built using only the events published by the internal Ledger context. These reports will be fully functional but will display raw account_codes instead of human-readable account_names. This delivers core value and validates the read-side architecture independently.

  • Stage 2 (Enhancement): Once the Accounts Master service is available and publishing events (AccountCreated, etc.), the existing projectors will be enhanced. They will be updated to subscribe to these new external events and enrich the read models with account_name and other master data.

Rationale: This agile approach allows book_keeper development to proceed without being blocked. It delivers a functional (if not perfect) reporting engine quickly and provides a clear path for future enhancement. It also aligns with the principle of building resilient systems that can function, albeit in a degraded mode, when dependencies are unavailable.


15. How is idempotency handled for external events?

Context: External events from other microservices (e.g., InvoicePaid) might be delivered more than once due to network retries or "at-least-once" delivery guarantees from the message bus. The system must not process the same event twice, as this would lead to duplicate journal entries and incorrect financial state.

Decision: Idempotency will be enforced at the command-processing layer by leveraging the optimistic concurrency features of the Event Store. This is a standard pattern in Event Sourcing architectures.

  • Mechanism:
    1. Unique Business Key: Every external event that triggers the creation of a new aggregate (like a Journal) must have a unique business identifier (e.g., invoice_id, payroll_run_id).
    2. Deterministic Aggregate ID: The Translator in the Anti-Corruption Layer (ACL) responsible for the external event will generate a deterministic UUID for the new Journal aggregate, derived from the event's unique business key.
    3. Stream Creation Check: When the RecordJournalEntryCommandHandler saves the new Journal aggregate, the repository attempts to create a new event stream with an expected_version of 0. This version number signifies that the stream must not already exist.
    4. Concurrency Control in Action:
      • On the first receipt of the event, the stream does not exist, so the write succeeds.
      • On any subsequent receipt of the same event, the system attempts to create a stream that already exists. The Event Store's optimistic concurrency control rejects this write with a conflict error (e.g., EventStoreConflictError).
    5. Graceful Handling: The application layer is designed to catch this specific conflict error. It interprets it as a successful-but-duplicate request, logs it, and acknowledges the message to the event bus without creating a duplicate transaction.

Rationale:

  • Robustness: This pattern is highly robust and atomic, as the idempotency check is part of the same transaction as the state change itself.
  • Simplicity: It avoids the need for a separate, dedicated "processed messages" tracking table or cache, which can introduce its own consistency challenges.
  • Architectural Alignment: It is a natural and elegant solution that leverages the core capabilities of an Event Sourcing system.