Over a four-week investigation, I uncovered and proved a silent, platform-level crash in AWS Lambda (Node.js, VPC-attached) that terminates functions mid-execution during outbound HTTPS calls — with no logs, no exceptions, and no catchable errors. AWS denied the issue, blamed my code, and withheld telemetry — until their own test proved me right.
Read full post →As a CTO, I've learned that leadership under pressure means staying calm, assessing the situation, and moving quickly. In times of crisis, failure becomes an opportunity to improve systems, and creating a resilient culture ensures that your team is prepared for anything. Resilience is key to not just surviving, but thriving, during technical crises.
Read full post →
A fibre channel SAN failed silently. No logs, no alerts — just stalled workloads and flatlined I/O.
What looked like a healthy system was actually degraded at the physical layer.
The root cause? A hairline fracture in a fibre connector. Enough to pass keepalives, but not data.
This wasn’t a crash — it was a pause.
The fix came not from dashboards, but from walking the cable.