Imagine you own a bakery. You wake up one morning, walk to the shop, and find that the front door has changed its own lock overnight. Nobody told it to. It just decided. Your key - the one you've used for two years - no longer works. Your bread is inside. Your customers are outside. The "Open" sign is blinking, mocking you.
That is, more or less, what an expired TLS certificate feels like.
The site is still running. The servers are still humming. The code hasn't changed. But every visitor who tries to walk in gets a scary red browser warning that says, in big unfriendly letters, "This connection is not secure." And then they leave. They don't email you. They don't fill out a support ticket. They just close the tab and try your competitor.
If you think this only happens to small teams with sloppy processes, the next few examples are going to ruin your day.
When the giants fell over
In February 2020, Microsoft Teams went down for several hours. The reason was not a clever attack or a regional outage. An expired authentication certificate caused the outage. It happened in the middle of a workweek, during a pandemic when the planet had just discovered video calls. Microsoft has thousands of engineers. They still missed it.
In December 2018, O2 customers across the UK lost mobile data for most of a day after an expired certificate inside Ericsson equipment triggered failures in the carrier's network. Calls and texts were degraded too. One of the UK's largest mobile operators, knocked offline by a digital expiry date. The kind of incident that makes the BBC's six o'clock news.
In 2017, the Equifax breach wasn't just a story about an unpatched server. One reason the attackers had so much time inside the network was that a digital certificate on a device responsible for inspecting outbound traffic had been expired for nineteen months. The intrusion-detection tool was, in effect, blind. When the cert was finally renewed and the tool came back online, the evidence of the breach was already there to find. An expired cert didn't just cause downtime here. It muted the alarm while the burglars worked.
In May 2026, Let's Encrypt halted all certificate issuance for several hours after an incident with one of their cross-signed roots. Sites with healthy automation that needed a new cert during that window had a problem they hadn't planned for. The cron job worked. The CA didn't. A reminder that "we automate our renewals" and "we are protected" are not the same sentence.
I could go on. There are dozens of these - Cisco WebEx, GitHub's CA partner, various Let's Encrypt automation incidents, smaller telcos and SaaS apps that don't make the news. The pattern is always the same: a forgotten cert (or a CA having a bad day), hours of public embarrassment, a post-mortem full of phrases like "we have improved our internal processes." If it can happen to companies with thousands of engineers, it can happen to you. Especially because, deep down, you already know how your team tracks renewals: a calendar reminder, a sticky note on someone's monitor, and a quiet prayer.
What an outage actually costs
The thing nobody puts on the post-mortem slide is the real cost. Let me try.
Direct revenue loss. If you sell anything on your site, this is the easy math. Take last month's revenue, divide by total minutes in a month, multiply by the minutes you were down. For a small SaaS doing $50,000 a month, every minute of outage is roughly $1.15 of revenue you're never getting back. For an e-commerce site doing $5 million a month, it's $115 a minute. For a public company, the per-minute number gets silly fast.
Lost trust. This one is harder to measure, which is exactly why it hurts more. A new visitor who sees a browser warning on your homepage doesn't think "oh, they have a cert problem." They think "this site is not safe." Some of those people never come back. There is no log line for "user who silently decided you weren't worth the risk."
SEO penalty. Google notices when your site serves errors. It doesn't have to be a long outage - a few hours of 5xx errors or invalid TLS will quietly lower your crawl frequency for weeks. A short outage can leave a long footprint in your rankings.
Engineering time. Every cert-driven outage is a fire drill. Two or three engineers stop what they're doing. Someone wakes up. A Slack incident channel gets created. A post-mortem gets written. A Jira ticket promises this will never happen again. Then, six months later, it happens again. None of this work ships features.
Regulatory exposure. If you're under PCI-DSS, HIPAA, SOC 2, ISO 27001, or any of the other audit frameworks, an outage caused by a basic hygiene failure is the worst kind of finding to explain. It's not technical sophistication that gets you in trouble. It's "the cert expired and we didn't notice."
Add it up. For most organisations of any real size, a single cert outage is a five-to-six-figure event. Some of that is direct. Some of it shows up six months later in retention numbers nobody connects back to that one bad Tuesday.
Why the calendar reminder doesn't work
You already use calendars. So why do certs still slip through?
Three reasons.
One: the calendar is owned by a person. People leave jobs. They change teams. They go on holiday during renewal week. The calendar invite they made two years ago is sitting in a deactivated account. The reminder fires into the void.
Two: there are more certs than you think. You know about the main domain. You probably know about the API. But what about the staging environment that's been on a renewing-itself sticky note since 2022? The internal admin tool nobody talks about? The mobile app certificate pinning? The SMTP relay? Once you start counting, the number is always at least three times higher than the first guess.
Three: even when renewals work, certs fail. Expiry is just one of many ways a TLS certificate goes bad. The chain can break. The cipher can be deprecated. The hostname can mismatch a new subdomain. The CA itself can have a bad day (see: Let's Encrypt's various 2024 and 2026 incidents). A calendar reminder only protects you from one of these.
What actually works
Three things, in order of impact.
An inventory you didn't write by hand. If you can't list every cert you have, in under five minutes, you don't have certificate monitoring. You have certificate hoping. Real monitoring starts with continuously discovering certs across all your domains and subdomains, not just the ones someone added to a spreadsheet.
External validation. Your own automation can fail silently. The cron job that renews can succeed but the web server reload can fail. The new cert can be valid but installed wrong. The only way to catch this is to check from outside, the way a user would. As one Hacker News commenter put it: "Even with automation, having external validation helps avoid blind spots."
Alerts that go to a team, not a person. Cert ownership is the most common single-point-of-failure in this whole problem. Alerts need to go somewhere a team can see them, with a clear escalation path. Email to a personal address is not a system. Slack to a shared channel is.
This is, not coincidentally, exactly what TLS Radar is built to do. We continuously check certs from the outside, across all your domains, flag the expiry-and-everything-else problems, and route alerts to where your team will actually see them. We don't sell certificates, so we don't care which CA you use. We just care that nothing slips past you.
One small ask
If you read this and felt a twitch of "hmm, who owns our renewals again?" - that's the feeling worth listening to. Take ten minutes today and find out. The bakery analogy at the top of this piece is funny until it's your shop.
Get the next post in your inbox
TLS monitoring tips and product updates. No spam, unsubscribe anytime.