, , , , , , ,

With our Context Diagram to support us, it’s time to move onto the last part of the kata – the Container Diagram, which gives us a more detailed description of the logical solution architecture


This is part of a series of articles about how I use Architecture Kata as a capability building exercise and a recruitment tool at ThoughtWorks.

Here are the previous articles in the series:

All these articles will be focused on the same kata; “I’ll Have the BLT“. I’ve chosen this one because it’s in a domain (retail and food ordering/delivery) that most people have some innate understanding of, from either a consumer and/or employee perspective.

The C4 Model Container Diagram

Let’s start with a quick look again at where we ended up with our Context Diagram from the last article. Remember that we reduced this diagram from the initial draft and removed many of the actors that had been prominent in our thinking up until this point.

C4 Context Diagram

The following aspects are missing from this diagram: we want to ensure they make it into our Container Diagram:

  • Showing the connections between the System and other relevant actors from previous articles
  • Decomposing System into a lower layer of technical components, including our thoughts around subsystems (which might result in separately deployable components)
  • More specificity about the nature of the integration between these components
  • Some hints to what technology could be used to realise the components.

The C4 Container Diagram

By the time you get to this level of thinking about the solution, you probably have some idea of what dominant architectural style you plan on using. These thoughts might be based on your experience, current industry trends or convention within your employer. In this case, it’s all three as I’ve decided to build the solution using microservices. Similarly, you have likely decided on a hosting platform and – in order to keep the up-front costs down for our cost-conscious customer – I’ve decided to use AWS. You’ll start to see these decisions reveal themselves in parts of the container diagrams to follow.

The first draft of our container diagram appears below. It still looks pretty rough, but there are some aspects to it which are worth highlighting:

  • There are three subsystems represented; Promotions, Ordering and Order Management. The diagram shows that these subsystems are largely used by only a single actor within our system; Marketing -> Promotions, Customer -> Ordering and Cashier -> Order Management. The ability to cleanly associate these actors with only a single subsystem gives us confidence that we have these boundaries drawn correctly. We could also start to think about different cross-functional requirements for each of these subsystems as well… does the Promotions subsystem need to scale to the same level as Ordering? Does it need to be as highly available?
  • Each of the subsystems includes a Front-end and Back-end component, as is typical of modern web application architecture where browser-side components are served from a static content repository and communicate to back-end services via an API.
  • Where databases have been identified for the Promotions and Order Management subsystems, they are only connected to the corresponding back-end services. Specifically, the Customer back-end component persists order information via the Store back-end and not by integrating directly with the Store database.
Version 1: Components and actors

The next version of this diagram makes a few changes to the components and adds a lot of detail around the implementation and integration decisions. Apart from adding depth to these aspects, I’ve also had a rethink about the Promotions subsystem. Let’s talk about this change…

Version 2: Integration and implementation details

Headless CMS and publishing Promotions

I decided to host an open source Headless CMS behind a bespoke custom API to support the minimal requirements for promotions that will be needed. With the right choice of CMS, much of the drudgery of implementing Promotions management will be included out-of-the-box. There might be some rough edges in the user experience, but given budget constraints, hopefully these issues will not be blockers.

The Promotions API will also be responsible for publishing promotions to the Ordering system to make them available to customers. This publishing flow suggests some form of persistence in the Ordering system which I haven’t shown at this point. I plan to leave this as an implementation detail as the promotion data is only readable by the Ordering subsystem, although it would presumably need to be used both as input for rendered information in the front-end and also for validation and pricing decisions in the Customer back-end. To be honest, there is a potential hornet’s nest of requirements here that could make this quite tricky to design correctly, especially if both rich content (e.g., images) and data representations of the promotions are needed. Promotions are also often time-bounded (e.g., “every Tuesday…”, “for this month only…”), which implies either a time-based promotion publishing flow or having the Ordering subsystem being capable of expiring promotions.

We’ll revisit the Promotions subsystem in the next version of this diagram.

Version 3: Promotions subsystem updates

The second iteration on the Promotions subsystem has resulted in further changes:

  • The removal of the Promotions front-end in favour of using the default UI provided by the CMS. Most headless CMS still have a Content Management capability, even if there is no matching Content Presentation capability.
  • An assumption that expiring promotions can be handled via notification directly from the CMS, which triggers the Promotions Publication API (note: formerly the more generically named Promotions API) to update the Customer back-end.
  • The addition of an explicit call from the Customer front-end to the Customer back-end (“Retrieve promotions”) to retrieve the live promotion data. This call has been added for further clarity around how this information is rendered.

Outside of the Promotions subsystem, there are a couple of areas I want to refine for the next version of this solution:

  • The integration to the Docket Printer. I don’t really know how this is going to work as we want dockets to be printed automatically when confirmation of an order arrives from the Customer back-end and the printer is physical piece of hardware located inside the store itself.
  • The robustness of the critical path of Order -> Payment -> Confirmation -> Delivery… are there single points of failure along this path which we need to focus on?

Docket Printing

Below is the fourth version of our container diagram, with a particular focus on integration with the Docket Printer. I must admit the solution in this diagram is badly in need of a technical spike to qualify, but the idea is roughly:

  1. Create a new web app (Docket Printer F/E) which is always running on a browser in-store. This could be on an iPad, or any other device connected to a local network inside the store.
  2. Create the local Docket Printer as a printer on the device hosting the Docket Printer F/E.
  3. The Docket Printer F/E constantly polls (i.e., refreshes) the Store B/E to find new orders and automatically prints them to the Docket Printer with no human intervention.

Will this work? Who knows – as mentioned previously, a proof-of-concept is needed to determine the feasibility.

Version 4: Docket Printer integration

Robustness and Fault Tolerance

The critical part of most retail systems is that which connects directly to the revenue stream of the company. In this context, we must always provide a way for customers to order (and therefore pay), even if everything downstream of that process is held together with duct tape. In our solution diagrams, the systems/components on this critical path are:

  • Customer front-end (React app served from S3)
  • Customer back-end (Node JS Lambda functions behind API Gateway
  • Payment Gateway (external hosted SaaS)
  • Store back-end (Node JS Lambda functions behind API Gateway)
  • Store database (S3 bucket)

Looking at these components, I’m happy (for now) with the inherent levels of availability of the underlying infrastructure. All the AWS services being used will elastically scale and are available in multiple Availability Zones. The Payment Gateway by it’s very nature will have (presumably) sufficient SLAs and tend to make availability a #1 concern in order to attract/maintain customers: unreliable payment gateways tend not to last very long in the market.

All that said, we should at least consider these potential failures that lie along the critical ordering path:

1Customer front-end -> Payment GatewayCustomer cannot provide payment detailsError page displayed to customer.
Retry request to Payment Gateway.
Create alerts on repeated failures.
Confirm with customer in store payment option.
2Payment Gateway -> Customer back-endSystem has no receipt of payment -> no sandwich made for CustomerError page displayed to customer. Confirm with customer in store payment option.
3Customer front-end -> Customer back-end???Retry request.
Create alerts on repeated failures.
4Customer back-end -> Order back-endNo record of order is recorded -> no sandwich made for CustomerRetry request.
Create alerts on repeated failures.
5Order back-end -> Order databaseNo record of order is recorded -> no sandwich made for CustomerRetry request.
Create alerts on repeated failures.
6Docket Printer front-end -> Store back-endOrder is not printed -> no sandwich made for CustomerRetry request.
Create alerts on repeated failures.
7Docker Printer front-end unavailableOrder is not printed -> no sandwich made for CustomerStore back-end monitors polling of Docker Printer front-ends and alerts Cashier to possible offline front-end.
Table 1: List of key failure scenarios in payment chain

Of all these mitigation approaches, many are realised by using good distributed engineering 101, as detailed in the excellent Michael Nygard book “Release It”, but I’d still like to show some of the alerting components in the next version of the container diagram (see below).

Version 5: Alerting subsystem

I haven’t shown the full set of integration points to the Alerting subsystem because it would quickly make the diagram unreadable, but it does show all the ones on the critical ordering path and also shows the path back to the cashier via an SMS triggered from the AWS Simple Notification Service.

Why SMS, I hear you ask? Because I imagine the cashier is not going to be spending much time looking at a browser, so I wanted to make sure critical, time-sensitive alerts were being brought to their attention immediately.

Next Steps

More than any of these articles, I’ve progressively built out the container diagram and tried to document my decision points along that journey. You may end up with a very different topology for your solution based on your own experience and preferences, but hopefully my thought process was at least logical.

I could continue to iterate on this diagram, bringing more focus to various parts of functional and cross-functional requirements, but we’ve done enough to show how much depth you could get to on any of these kata exercises.

Finally, there are two particular approaches to using kata which place a far higher premium on time and naturally result in considerably less mature solutions; kata workshops and using kata as a recruiting tool. I’ll look at each of these in turn in the next article.

Thanks for your patience 🙂

Did you find this article useful/interesting?

Rating: 1 out of 5.