Menu Bar

Tuesday, July 9, 2013

T-SQL Tuesday #44 - Second Chances (or how not to be a Horse's Rear End)


"Mr Wazowski, what you lack simply cannot be taught. You're just not scary." - Dean Hardscrabble, 'Monsters University'
T-SQL TuesdayAfter a year of always realizing on the second Wednesday that T-SQL Tuesday was yesterday, I remembered this month and went searching early. This month I found Bradley Ball(blog|twitter) asking us to write about "Second Chances" - time to reflect upon a time we could have done something better if given a second chance or when we royally screwed up and wished we could have a "Do over".

Since my "infrequently updated blog" has been idle for over 6 months now, it is probably due for an update.

I've been working in IT for over 25 years. Screwing up, particularly in production can be pretty scary. My list of screw ups in that time is probably long and glorious. I'd like to say that I've never repeated the same screw up but I'm sure that I have.

Making a mistake (unless your "mistake" is something criminal or your sense of humor when developing test data ends up in production) probably isn't going to be the defining point of your career - these things are things you can learn from and grow from.

Looking back at a quarter of a century of assorted screw ups, I remember such things as:

  • The time I truncated a table in production - except it was the wrong table (all part of being an accidental DBA I suppose)
  • The time I defined the same physical disk on the SAN to two servers and wondered why I was having daisy chained DB failures
  • The time I realized that my backups really weren't quite as good as I thought they were after the aforementioned disk issue
There's more, I'm sure, but we'll leave it there.

Today's post is not just about learning from my mistakes, it's also writing about how the company I work for celebrates those mistakes.

Before I go into detail on that, I should note that I work for a small company. We all seem to get along very well together. Management truly has an open door policy - walk into the office of the President, CEO or anyone else and air concerns or just chat. Every two weeks we have a brief all hands conference call where successes are celebrated, updates on new business are given and at the end of the meeting, failures are also celebrated.

When you fail, you can expect to be the "proud" recipient of an "HA" award.

What is an HA you ask? HA stands for "Horse's A**" (since we're dealing with a blog post probably read in places of employment, I'll employ a censor to sanitize my post :-) ).

I've been with this company for over 5 years. I had done pretty well for the first 3 years and managed to avoid earning an HA of my very own.

I was doing some testing on one of the development servers. I forget what I was doing at the time - but it was something that required me to stop and restart the SQL Server service (you can see where this is going already, right?).

During the testing of whatever it was that I was doing, a call came in where I needed to jump onto the production server to check something out. I completed that and forgot to log out of the server.

I went back to where I had left off in my testing and stopped and restarted the SQL Server Service. Within a few seconds emails start coming in about a down production SQL Server as the monitoring system starts screaming and in short order after that the phone starts ringing as users call in wondering where the ERP system we support went.

Stopping the database also caused a couple of other failures downstream as other services that relied on the database also stopped because the database wasn't there anymore (and they maintained an active connection).

It wasn't a huge deal to correct it and systems were back up in a matter of minutes, but there was lots to do afterwards, since every outage requires a Root Cause Analysis to be presented to the customer etc.

I mentioned earlier about learning from your mistakes. This particular outage caused me to always take the time to be double sure when doing things like stopping services. Now I also try to reconfigure my desktop on production servers (is the Start Button on the left of the screen? Must be a production server).

Of course in the grand scheme of things, as far as second chances go, this is a minor thing. The main reason I wanted to write about this particular incident was to "show off" the little certificate that you can earn at my place of employment for those times when you wish you had a "Second Chance".

(For the record, these are awarded humorously and without ill intent, just in case you're wondering - and the transgression doesn't necessarily have to be huge to earn one, either. As Forrest Gump says, "It Happens" and when it happens to you here, you're probably getting one of these!).

Until next time - may you never find yourself facing the behind of the horse!