Everything Will Fail

Lesson of the day for engineers.   Whatever large system you design, some component of it will fail one day.  It’s a valuable exercise to draw a diagram of every system, machine, load balancer, DNS servers and start drawing a big X on each component and ask yourself what happens when that component fails.

1) What is the impact to your users
2) What is the impact to the business
3) What is the impact to the other dependent systems

a) Design for failure
b) Code for failure
c) Design for 100% uptime for the end users
d) Design for 99.999% uptime for the business (don’t try to hit 100%, it’s not needed)

 

2 Comments

*
Yahoo! User Yahoo! User
commented
10/11/2006 10:53 pm
I like that one.. Build for Change + Build for Failure
Report abuseDeleteComment
*
Yahoo! User Yahoo! User
commented
10/11/2006 12:53 pm
One common mantra that I repeat often is “Build for Change”. I think “Build for failure” and “Build for change” are two of the most important aspects to large scale systems design. I can’t tell you the number of times, I’ve seen these two violated and had to pay dearly because of the consequences.

Tony Tam

Senior Principal Architect @ Splunk Founder of ImpactfulEngineer.org & SFBadminton.org