Design Decisions
This document captures key architectural decisions and answers open questions regarding the system's design, security, and user interaction models. It serves as a guide for development and ensures alignment with the project's goals.
1. What is the strategy for the User Interface?
The project will maintain an API-first approach. The core deliverable is a robust, well-documented REST API and a potential Command-Line Interface (CLI) for administrative tasks.
Decision: A graphical user interface (UI) for making journal entries is a separate concern from the core engine.
- Decoupling: A dedicated UI (e.g., a Single-Page Application using React, Vue, or Angular) will be developed as a separate project. This decouples the frontend and backend development cycles, allowing teams to work in parallel.
- Interface Contract: The OpenAPI specification will serve as the strict contract between the backend and any potential UI client.
- Priority: The development of the API as defined in the milestones is the highest priority. A UI can be considered a future milestone or a separate product that consumes this API.
2. How will Authentication and Authorization be handled?
A hybrid approach is recommended, leveraging the strengths of both an API Gateway/Sidecar and in-application logic. This provides layered security without sacrificing the ability to perform fine-grained checks.
Decision:
- Authentication (AuthN): This will be handled at the edge by an API Gateway or a service mesh sidecar. The gateway will be responsible for validating OIDC-compliant JWTs from an external Identity Provider. It will reject any unauthenticated requests.
- Authorization (AuthZ): This will be a two-step process:
- Coarse-Grained (Gateway): The gateway can perform initial role checks if desired (e.g., reject requests from users who lack a basic
book_keeper_userrole). - Fine-Grained (In-Application): The gateway will pass the validated user claims (from the JWT) to the application. The application code, specifically within the command handlers and domain services, will use these claims to perform detailed, context-aware authorization.
- Coarse-Grained (Gateway): The gateway can perform initial role checks if desired (e.g., reject requests from users who lack a basic
This strategy is already supported by the get_current_user dependency in api/deps.py, which is designed to extract claims from a Bearer token.
3. What will the interface for reports be?
The system will support a multi-faceted reporting strategy to cater to different needs, from programmatic access to interactive business intelligence.
Decision:
- REST API for Structured Reports: The API endpoints (
GET /api/book-keeper/v1/reports/...) are the primary interface for core, structured reports like the General Ledger and Trial Balance. This is essential for integrations and custom front-ends. - Direct Database Access for BI Tools: For ad-hoc analysis and visualization, Business Intelligence (BI) tools (e.g., Metabase, Tableau, Power BI) will be given direct, read-only access to the PostgreSQL read-side database. The denormalized projection tables are optimized for this exact purpose.
- Elasticsearch for Analytics: The future integration with Elasticsearch will power advanced search and analytics, which can be connected to tools like Kibana or Grafana for powerful dashboarding.
4. What authorization model will be used internally?
A combination of RBAC and ABAC will be used to provide security that is both simple to manage and powerful enough for complex financial rules.
Decision:
-
RBAC (Role-Based Access Control) for Endpoint Access: Use roles from the user's token for coarse-grained protection at the API endpoint level. For example, a FastAPI dependency can ensure that only users with the
Accountantrole can call thePOST /api/book-keeper/v1/journal-entriesendpoint. This is a simple and effective first line of defense. -
ABAC (Attribute-Based Access Control) for Business Logic: Inside the command handlers, implement fine-grained authorization using attributes from both the user and the data.
- Example: A command handler would receive the
RecordJournalEntryCommandand the current user's claims. It can then enforce a rule like:"A user can post a journal entry if the user's
departmentattribute matches the entry'sdepartmentand the entry'samountis less than the user'sapproval_limitattribute."
- Example: A command handler would receive the
This layered approach ensures that business-critical rules reside within the domain and application layers, where they have the most context.
5. What technology will be used for the authorization engine?
Context: The system requires a fine-grained authorization model supporting a mix of RBAC and ABAC for a multi-tenant (multi-national/company) environment. The decision involves choosing between building a custom solution or adopting an existing engine like Casbin, OPA, or SpiceDB.
Decision: Casbin will be adopted as the internal authorization engine.
-
Why Casbin?
- Balanced Power & Simplicity: Casbin provides a powerful policy model (PERM) that can express the required RBAC and ABAC rules without the steep learning curve of OPA's Rego language.
- Excellent Python Integration: It has a mature Python library that can be integrated directly into the application layer (e.g., within command handlers).
- Flexible Policy Storage: Casbin policies can be stored in the existing PostgreSQL database using an official adapter. This avoids adding a new, separate service to the stack, simplifying deployment and management.
- Multi-Tenancy Support: The policy model can be easily extended to support multi-tenancy by adding a
tenant_idorcompany_idas a domain in the policy rules, which is ideal for a multi-national setup.
-
Comparison to Alternatives:
- OPA (Open Policy Agent): While extremely powerful, it was considered too complex for the initial implementation phase. The overhead of learning and managing Rego policies was deemed a potential risk to development velocity.
- SpiceDB (Zanzibar-style): This is a superior solution for relationship-based access control (ReBAC), but the project's immediate needs are more attribute-based. SpiceDB might be considered in the future if complex graph-based permissions become a primary requirement.
- Custom Code: A custom solution was rejected to avoid coupling policy logic with application code, which makes policies hard to audit and update.
6. How will other microservices interact with Book Keeper?
Context: Other business-critical microservices (e.g., Invoicing, Payroll, Orders) will need to record financial transactions in the general ledger managed by Book Keeper. The question is whether they should do this by calling Book Keeper's API directly (orchestration) or by emitting domain events that Book Keeper consumes (choreography).
Decision: The primary integration pattern will be Event-Driven Choreography, with the API serving as a secondary or administrative interface.
-
Primary Pattern (Choreography):
- External services will publish business-significant domain events to the application's event bus (e.g.,
InvoiceIssued,PaymentReceived,PayrollRunCompleted). The primary implementation for this bus is NATS. - These events should contain business data, not pre-formatted accounting entries. For example,
PayrollRunCompletedwould containemployee_id,net_pay,tax_withheld, etc. Book Keeperwill have dedicated internal subscribers (event handlers) that listen for these external events.- These subscribers are responsible for translating the business event into the appropriate
RecordJournalEntryCommandand dispatching it internally.
- External services will publish business-significant domain events to the application's event bus (e.g.,
-
Why Choreography is Preferred:
- Loose Coupling: The
Invoicingservice doesn't need to know about accounting principles or theBook Keeper's API schema. This respects the boundaries of each Bounded Context. - Resilience: If the
Book Keeperservice is temporarily unavailable, events will queue in NATS. OnceBook Keeperis back online, it can process the backlog without data loss, and the originating service's primary function is not blocked. - Centralized Accounting Logic: All logic for creating journal entries remains centralized within the
Book Keepercontext, preventing accounting rules from leaking into other services.
- Loose Coupling: The
-
Secondary Pattern (API Orchestration):
- The
POST /api/book-keeper/v1/journal-entriesendpoint will still be available. - This is useful for administrative tools, manual entry UIs, or specific, tightly-coupled scenarios where immediate feedback from the ledger is required. However, it should not be the default choice for inter-service communication.
- The
7. How will Book Keeper consume and scale with external events?
Context: As an event-driven service, Book Keeper must consume events from numerous other microservices. This raises questions about scalability, maintenance, and how to prevent the internal domain from becoming polluted by external data models.
Decision: An Anti-Corruption Layer (ACL) will be implemented as the primary entry point for all external events.
-
Architecture:
- Event Dispatcher: A central component will subscribe to a list of required external event topics on the event bus. It will act as a router, forwarding incoming events to the correct translator based on event type.
- Translators: For each external event (e.g.,
PayrollRunCompleted), a dedicatedTranslatorclass will be created. This class is responsible for:- Validating the incoming external event's schema.
- Translating the external data model into a
Book Keeper-nativeRecordJournalEntryCommand. - Dispatching this internal command to the application's command bus.
- Domain Purity: This pattern ensures that the core
Ledger Context(application and domain layers) remains completely decoupled from and ignorant of the structure of any external service.
-
Scalability:
- The system will leverage the consumer group functionality of the event bus (e.g., "Durable Consumers" in NATS JetStream or standard "Consumer Groups" in Kafka).
- Pluggable Architecture for Extensibility: To avoid tight coupling and enable deep customization, the system supports a plugin model for adding new
Translatorcomponents.- Discovery Mechanism: The application uses Python's standard
entry_pointsmechanism to discover plugins. The specific entry point for this feature isbook_keeper.translators. - Plugin Implementation: A plugin is a standard, installable Python package. It advertises one or more
ITranslatorclasses via the entry point in itspyproject.toml. - Automatic Registration: During application startup, the dependency injection system automatically discovers these entry points, loads the translator classes, registers them with the DI container, and injects instances into the
EventDispatcher. - Benefits: This approach fully decouples the core application from its integrations. New external events can be supported by simply installing a new plugin package, with no changes required to the
Book Keepercodebase. This has been proven by refactoring all built-in translators into a single example plugin.
- Discovery Mechanism: The application uses Python's standard
8. How will multi-national concerns (Countries, Currencies, Compliance, Taxes) be handled?
Context: The system must support operations in multiple countries, each with its own currency, tax laws, and compliance regulations. The design must be flexible enough to accommodate new countries and changing rules without requiring constant code changes.
Decision: A hybrid data-and-code approach will be implemented, centered within the Compliance Bounded Context. The principle is to store rules as data and the logic to execute those rules as code.
The Compliance context will operate asynchronously. It will not block the creation of journal entries but will instead react to them after they have been posted.
-
Rules as Data:
- The application database will contain tables for storing configuration that varies by country or region. This data can be updated by administrators without a new application deployment.
- Examples:
countries(ISO 3166-1 codes, default currency)currencies(ISO 4217 codes, decimal precision)tax_rates(country_code, tax_type, rate, effective_date)fiscal_periods(country_code, start_date, end_date)
-
Logic as Code:
- The
Compliance Contextwill act as a "rules engine" by consuming events from theLedgercontext. - It will subscribe to
JournalEntryPostedevents from the event bus. - Upon receiving an event, its internal services (e.g.,
ComplianceValidationService) will apply the relevant rules. - If a transaction is found to be non-compliant, this context is responsible for initiating a compensating action (e.g., publishing a
TransactionFailedComplianceevent or creating a reversing journal entry).
- The
-
Benefit: This asynchronous, event-driven approach makes the system more resilient and scalable. The core ledger is not blocked by compliance checks, and the two domains are fully decoupled.
-
Management: The set of events the service listens to is explicitly managed by the collection of
Translatorclasses within the ACL. To support a new external event, a developer simply adds a new translator.
9. What is the storage strategy for the write-side (Ledger & Event Store)?
Context: The system's write-side, based on Event Sourcing, has two critical storage needs: persisting the stream of immutable events (the Event Store) and, potentially, using a specialized ledger for high-performance transaction processing. The architecture must be flexible to accommodate different performance, operational, and cost requirements.
Decision: A swappable, adapter-based architecture will be used for all write-side storage, allowing the specific technology to be chosen via configuration.
-
Event Store Strategy:
- Interface: An
IEventStoreinterface will define the contract for saving and retrieving event streams for aggregates. - Default Implementation (PostgreSQL): The default implementation will use a PostgreSQL table as the event store. This provides strong transactional guarantees and simplifies the initial technology stack.
- High-Throughput Options: For scenarios requiring higher performance and features specifically designed for event sourcing, adapters for dedicated event stores will be developed.
- KurrentDB: A mature, standalone event store that offers a rich feature set specifically for event sourcing patterns. An adapter has been implemented using the
kurrentdbclientlibrary, making it a fully supported and swappable backend.
- KurrentDB: A mature, standalone event store that offers a rich feature set specifically for event sourcing patterns. An adapter has been implemented using the
- Interface: An
-
Ledger Storage Strategy (for core transaction processing):
- Interface: An
ILedgerStorageServiceinterface will define the contract for core, high-speed financial transaction operations. - Default Implementation (PostgreSQL): The
PostgresLedgerAdapteruses standard SQL transactions to ensure atomicity. This is suitable for a wide range of applications. - High-Performance Option (TigerBeetle): For use cases demanding extreme throughput and financial-grade safety, the
TigerBeetleLedgerAdapterhas been implemented. This allows the core double-entry logic to be offloaded to a specialized, high-performance database.
- Interface: An
Rationale: This approach provides maximum flexibility. A project can start with a simple, all-PostgreSQL setup for ease of deployment and later scale up by swapping in specialized databases like EventStoreDB or TigerBeetle for specific components without rewriting the core application logic.
10. What is the strategy for the Event Bus?
Context: The system relies on an event bus for asynchronous communication between its own components (e.g., write-side to read-side projectors) and for consuming events from external microservices. The choice of event bus impacts performance, reliability, and operational complexity.
Decision: The system will be architected around a swappable IEventBus interface, allowing the concrete implementation to be chosen via configuration.
- Primary Choice (NATS with JetStream): NATS is the recommended default due to its high performance, operational simplicity, and strong support for the required features like persistence, ordering, and durable consumer groups (JetStream).
- Alternative Option (Apache Kafka): For environments where Kafka is already the established standard, an adapter will be provided. This ensures that
Book Keepercan integrate seamlessly into existing enterprise ecosystems.
Rationale: By depending on an abstraction (IEventBus), the application's core logic remains decoupled from the specific messaging technology. This allows deployment flexibility and future-proofing against changing technology landscapes.
11. How do other services consume data from Book Keeper?
Context: For a truly integrated ecosystem, other microservices (e.g., a Treasury service or a FinancialPlanning service) may need to react to financial events as they are officially recorded in the ledger.
Decision: Book Keeper will publish its own validated domain events to the public event bus for other services to consume.
- Event Publishing: After a command is successfully processed and its resulting domain events are persisted to the event store, a dedicated "Event Publisher" component will publish key events (e.g.,
JournalEntryPosted,AccountDebited,AccountCredited) to well-defined public topics on the event bus (NATS/Kafka). - External Read Models: This allows other services to build their own specialized read models based on the official financial record from
Book Keeper. For example, aTreasuryservice could listen toAccountCreditedevents for a specific bank account to monitor cash flow in real-time. - Decoupling: This approach is highly decoupled.
Book Keeperdoes not need to know anything about its consumers. It simply publishes its facts, and any authorized service can subscribe to them.
12. How is the Chart of Accounts (CoA) managed?
Context: Each organization (tenant) requires its own unique Chart of Accounts (CoA), which is often structured as a hierarchy. This is a classic "master data" problem. The book_keeper service needs to use account codes from the CoA without being responsible for managing the CoA itself.
Decision: The management of the Chart of Accounts will be handled by a dedicated, external Accounts Master microservice with its own codebase and repository. book_keeper will act as a client to this service.
-
LedgerContext Responsibility: TheLedgercontext's sole responsibility is to record balanced double-entry transactions. It treats anaccount_codeas an opaque string identifier and does not validate its existence or position within a hierarchy. This adheres to the "do one thing well" principle. -
Accounts MasterService Responsibility: This external service is the single source of truth for creating, managing, and querying the Chart of Accounts for all tenants. It will publish events likeAccountCreatedandAccountUpdatedto the event bus. -
Interaction Model (Composition & Choreography):
- Client-Side Composition: UIs or other client services are expected to first query the
Accounts MasterAPI to get a list of valid accounts. They then use the selectedaccount_codewhen calling thebook_keeperAPI to record a transaction. - Backend Choreography: The
Reportingcontext withinbook_keeperwill subscribe to events from both its ownLedgercontext (JournalEntryPosted) and the externalAccounts Masterservice (AccountCreated,AccountUpdated). This allows it to build rich, human-readable reports (e.g., a Trial Balance with full account names) and to perform asynchronous reconciliation.
- Client-Side Composition: UIs or other client services are expected to first query the
Rationale: This approach provides maximum flexibility and scalability.
- Flexibility: Tenants can define a strict, formal CoA or use simple, ad-hoc account codes. The core engine supports both.
- Decoupling: The
book_keeperandAccounts Masterservices are fully decoupled and can be developed, deployed, and scaled independently, increasing system resilience. - Domain Purity: The core
Ledgerdomain is not polluted with the concerns of CoA management, keeping its logic clean and focused.
13. How are Account Codes validated during transaction recording?
Context: When a transaction is submitted to the book_keeper API, the account_codes provided must eventually correspond to valid accounts in the external Accounts Master service. The question is whether this validation should happen synchronously (at the time of API call) or asynchronously (after the transaction is recorded).
Decision: Validation will be performed asynchronously through event-driven reconciliation. The book_keeper API will not perform a synchronous, blocking call to the Accounts Master service to validate account codes.
-
Write-Side Behavior (
book_keeperservice): ThePOST /api/book-keeper/v1/journal-entriesendpoint will accept any transaction as long as it is internally balanced (debits equal credits). It treatsaccount_codeas an opaque identifier and persists theJournalEntryPostedevent without validating the code against an external system. -
Read-Side Behavior (
Reportingcontext): TheReportingcontext will subscribe to events from bothbook_keeper(JournalEntryPosted) and theAccounts Masterservice (AccountCreated,AccountUpdated). It will be responsible for building reports that join this data. This process naturally reveals anyJournalEntryPostedevents that reference anaccount_codefor which no correspondingAccountCreatedevent has been received.
Rationale: This choice prioritizes service availability and loose coupling over immediate consistency.
- High Availability:
book_keepercan continue to record transactions even if theAccounts Masterservice is temporarily unavailable or slow. This is critical for a core system. - Performance: It avoids adding a synchronous network call to the critical path of recording a transaction, resulting in lower latency.
- Architectural Consistency: This aligns perfectly with Decision #6 (Event-Driven Choreography) and Decision #12 (External
Accounts Master), which establish a decoupled, autonomous service architecture.
Consequences:
- The system operates under an "eventual consistency" model for account code validation.
- A separate business process must be defined to handle reconciliation failures (e.g., an "Unrecognized Account" report that alerts an accountant to correct the entry). This is a known and accepted trade-off.
14. What is the development strategy for the Reporting Engine given external dependencies?
Context: The Reporting Engine (Milestone 2) is planned to consume events from the external Accounts Master service to provide rich, user-friendly reports (e.g., with account names). However, this service does not yet exist and its development is on a separate track.
Decision: The Reporting Engine will be developed in two stages to allow for incremental progress and de-risk the dependency.
-
Stage 1 (Initial Implementation): The projectors and APIs for reports (General Ledger, Trial Balance) will be built using only the events published by the internal
Ledgercontext. These reports will be fully functional but will display rawaccount_codes instead of human-readableaccount_names. This delivers core value and validates the read-side architecture independently. -
Stage 2 (Enhancement): Once the
Accounts Masterservice is available and publishing events (AccountCreated, etc.), the existing projectors will be enhanced. They will be updated to subscribe to these new external events and enrich the read models withaccount_nameand other master data.
Rationale: This agile approach allows book_keeper development to proceed without being blocked. It delivers a functional (if not perfect) reporting engine quickly and provides a clear path for future enhancement. It also aligns with the principle of building resilient systems that can function, albeit in a degraded mode, when dependencies are unavailable.
15. How is idempotency handled for external events?
Context: External events from other microservices (e.g., InvoicePaid) might be delivered more than once due to network retries or "at-least-once" delivery guarantees from the message bus. The system must not process the same event twice, as this would lead to duplicate journal entries and incorrect financial state.
Decision: Idempotency will be enforced at the command-processing layer by leveraging the optimistic concurrency features of the Event Store. This is a standard pattern in Event Sourcing architectures.
- Mechanism:
- Unique Business Key: Every external event that triggers the creation of a new aggregate (like a
Journal) must have a unique business identifier (e.g.,invoice_id,payroll_run_id). - Deterministic Aggregate ID: The
Translatorin the Anti-Corruption Layer (ACL) responsible for the external event will generate a deterministic UUID for the newJournalaggregate, derived from the event's unique business key. - Stream Creation Check: When the
RecordJournalEntryCommandHandlersaves the newJournalaggregate, the repository attempts to create a new event stream with anexpected_versionof 0. This version number signifies that the stream must not already exist. - Concurrency Control in Action:
- On the first receipt of the event, the stream does not exist, so the write succeeds.
- On any subsequent receipt of the same event, the system attempts to create a stream that already exists. The Event Store's optimistic concurrency control rejects this write with a conflict error (e.g.,
EventStoreConflictError).
- Graceful Handling: The application layer is designed to catch this specific conflict error. It interprets it as a successful-but-duplicate request, logs it, and acknowledges the message to the event bus without creating a duplicate transaction.
- Unique Business Key: Every external event that triggers the creation of a new aggregate (like a
Rationale:
- Robustness: This pattern is highly robust and atomic, as the idempotency check is part of the same transaction as the state change itself.
- Simplicity: It avoids the need for a separate, dedicated "processed messages" tracking table or cache, which can introduce its own consistency challenges.
- Architectural Alignment: It is a natural and elegant solution that leverages the core capabilities of an Event Sourcing system.