Tag: incident management

Tales from Tech Support 02: The Server Room is Nuts

I’ve said previously that things come in threes. This particular month in 2020 gave us three quite severe issues in the server room at work.

One – as the tide comes rolling in

We had some heavy rain in early October over a weekend. I spent the majority of that time in front of the fireplace until work on Monday morning.

Before I even got through the office I was alerted to a flood in our server room. I rushed down there to find an absolute mess. We are unfortunately strapped for space and also use the server room to store our spare equipment and some peripherals and consumables. We lost quite a lot of stuff due to water damage however most of it was old so doesn’t hold much value. There were some switches that got soaked, and also my own personal Draytek I was going to use as a secondary VPN (ours wasn’t great at the time) during lockdown in case of emergencies.

You can see in the following image a “tide line” around the walls of the room – we had about an inch of water in there that slowly soaked away. A phenomenal amount of water given the size of the room and the fact that there are sizeable gaps under the two doors that allow water to escape to neighbouring rooms

Somehow our servers didn’t get too wet. Humidity was at 90%+ for quite a while though, and we have experienced several hard drive failures since which were likely related.

After three days of drying, ripping up carpet and sorting through our stuff we could finally get back to the normal slog. However we learned that the fitted Aircon unit in the room doesn’t have a cut off for drying the air. It’s either on or off, and only when manually set. We can’t set a target humidity. In a server room you typically want to hover at about 50% – too high and the moisture in the air corrodes internals, too low and static electricity can more easily discharge in the dry air. Before ripping out the carpet and other detritus we’d hover at about 50-70% without any flood water in the room, but since then we’ve gone as low as 30%.

We need proper climate control and monitoring. And for the leak to be fixed… It has been an issue for a while which involves the design of the roof – water does drain away but if there’s too much too quickly, or the drainage is blocked as was the case this time, the water overflows and somehow finds it’s way to the ground through the second path of least resistance… Which just so happens to pass right through the server room about two feet in front of the cabinets.

Two – splashback

About two weeks after the first incident we had more heavy rain. More water in the server room. Luckily not as much water and because the room was empty and carpetless it dried out within a day. Unfortunately it came in through a slightly different part of the ceiling about half a foot closer to the server rack.

I’d love to get up on to the roof to try and figure out where this water is getting in. I’m told the fix is “new roof” but wonder if making a path for the water to escape by which doesn’t pass through one of the most expensive rooms on the campus is feasible. Not a fix, certainly, but a workaround until we can move the servers (which we should be moving before the end of this year.)

Three – this is nuts

Finally (at least I hope) is this, which happened about a week after flood #2:


We saw him on the camera we set up after the first flood. We raced down there and eventually managed to coerce it out from its temporary home directly under the server rack before it could eat through anything. There have since been more squirrels find their way into the building but none have yet braved the server room. For fear of drowing, I suspect.

Although management doesn’t particularly care about incident management and response reports, I find the whole process fascinating. So I wrote up an incident report for the squirrel invasion and it was quite entertaining to type out to say the least. So many puns.