Why the Nederland Dal Vrij subscription crashed the online store

In the past, I previously wrote about how organisations can scale both vertically and horizontally in the event of unexpected success. In short: during periods of high demand. Especially when that demand is sustained.

Yet it appears that the sale of the Nederland Dal Vrij subscription brought the Dutch Railways web shop to its knees. But how could that happen? It may have something to do with a so-called single point of failure, and from an IT perspective, that's interesting to examine a bit more closely.

A closer look at the basics

If we trace the history, we see that the domain name ns.nl was registered by NS as early as 1991 through Argeweb. As of 2026, this domain points to nameservers operated by Akamai. These are the nameservers of Akamai Technologies Inc. Akamai is generally considered one of the largest enterprise CDNs in the world.

That's not the kind of company that stops working when 10,000 or 20,000 visitors show up. Akamai serves a large portion of the world's biggest websites and companies, operates a global network of servers, and processes enormous amounts of internet traffic every day. A traffic spike of several tens of thousands of visitors is, by itself, not something a company like Akamai would struggle with.

Based on the publicly available source code, the NS webshop appears to run on a custom frontend platform called “Nessie”, combined with a Java/Spring backend and Thymeleaf templates. A large part of the static components, such as stylesheets, scripts, and fonts, are delivered via Akamai’s global CDN. This does not suggest that the web layer itself was the limiting factor, but rather that the cause should be sought in a back-end administrative system.

So what happened?

Akamai is a CDN, or content delivery network. It stores images, videos, and other static components of a website. Through intelligent routing, it delivers this content to visitors in an efficient and robust manner. In the case of the NS website, it's also fair to assume that a large portion of the traffic originates from the Netherlands and is therefore largely handled by Dutch or nearby Akamai servers.

But a web shop that sells subscriptions is more than just a collection of static files. There's also an administrative layer behind it. For every subscription sold, the system has to ensure that the subscription is activated on the buyer's OV-chipkaart number and/or OVpay account. In my view, that's most likely where the bottleneck was that brought everything down: the weakest link, so to speak.

Let's assume that selling a Nederland Dal Vrij subscription requires an API call to a system operated by Trans Link Systems (TLS) in order to link the subscription to a traveller. That creates a single administrative system that has to perform a piece of work for every sale. If such an API system isn't designed to handle several tens of thousands of actions in a short period of time, everything can grind to a halt.

I should emphasise that this is merely a technical hypothesis. I don't know how the systems of NS and TLS are actually structured. However, in practice, you often see these kinds of dependencies becoming problematic during sudden traffic spikes.

How can you prevent something like this?

Problems like these are not new. Companies that sell concert tickets regularly run into the same issue. Traffic on their website is usually only a fraction of what suddenly appears the moment ticket sales open for a popular concert. It's entirely possible for 50,000 visitors to try to buy a ticket within the space of an hour.

On the other hand, as NS or a ticket seller, you can't keep the most powerful and expensive infrastructure on standby every single day. That would create unnecessary costs. Ideally, you only want additional capacity during peak moments. Those moments can often be anticipated and, with modern technology, can even be detected automatically, allowing systems to scale up automatically as well.

But again: in this case, the problem does not appear to have been the website or web shop itself. Most likely, the limitation was in an underlying administrative API. And that's exactly where the scaling should have taken place.

Postpone and distribute

There is, however, a third option: postponement. When a subscription was purchased, the NS infrastructure could simply have recorded which traveller bought the subscription and stored that information in its own database. A background process, for example using batches (a queue), could then have ensured that only a limited number of requests per hour were sent to the TLS API. That way, the API itself wouldn't need to scale immediately and the load would be spread out over a longer period of time.

The only downside is that these subscriptions were not only sold from 15 June onwards, but could also be used immediately from 15 June. Ideally, sales would have started earlier, allowing sufficient time for this administrative processing to be completed gradually. With delayed processing, there is always the possibility that some of the more than 31,000 subscriptions sold would only have become active later that day.