Titanic Talk: This Ship Will Not Fail
December 4, 2025
It’s too big to fail! How many times have we heard that phrase? There’s another common expression that makes more sense: The bigger they are the harder they fall. On his blog, Will Gallego writes about that idea: “Big Enough To Fail.” Through a lot of big words (intelligently used BTW), Gallego explains that big stuff fails all the time.
It’s actually a common occurrence, because crap happens. Outages daily occur, Mother Nature shows her wraith, acts of God happen, and systems fail due to mistakes. Gallego makes the observation that we’ve accepted these issues and he explains why:
- “It’s so exceptional (or feels that way). This is less so about frequency but that when a company becomes so big you just assume they’re impervious to failure, a shock and awe to the impossible.
- The lack of choices in services informs your response. Are there other providers? Sure, but with the continuous consolidation of businesses, we have fewer options every day.
- You’re locked in on your choices. Are you going to knock on Google’s door and complain, take three years to move out of one virtual data center and into another, while retraining your staff, updating your internal documents, and updating your code? No, you’re likely not.
- Failover is costly. Similarly, those at the sharp end know that the level of effort in building failover for something like this is frequently impractical. It would cost too much to set up, to maintain as developers, it would remove effort that could be put towards new features, and the financial cost backing that might be considered infeasible.
- The brittleness is everywhere. The level of complexity and the highly coupled nature of interconnected services means we’ve become brittle to failures. Doubly so when those services are the underpinnings of what we build on. “The internet is down today” as the saying goes, despite the internet having no principle nucleus. This is considered acceptable.
- We’re all in it together. When a service as large as these goes down, there’s a good chance we’re seeing so many failures in so many places that it becomes reasonable to also be down. Your competitors are likely down, your customers might be – there might be too much failure to go around to cast it in any one direction.
Ultimately, this leads into resilience engineering which is “reframing how we look incidents.” Gallego ends the article by saying we should take everything in stride, show some show patience, and give a break to the smaller players in the game. His approach is more human aka realistic unlike the egotistical rants that sank the Titanic. It’s unsinkable or it won’t fail! Yes, it will. Prepare for the eventualities. Whitney Grace, December 4, 2025
Comments
Got something to say?

