Category: Technology

Anything tech-based – code and scripts, hardware and software, work related or not. If it’s got electricity in it, it’ll probably be technical.

Implement but never move

I really appreciate well-written implementation guides for server-client software, but what really gets me excited is seeing migration guides for when you need to decommission that legacy OS and move onto something supported and current. Even the bad migration guides are great to have, but some that I’ve come across are works of art – numbered-list instructions that are clear and concise, incorrect assumptions busy sysadmins have during the migration are highlighted then corrected, screenshots where required, mid-migration tests to ensure things are going as smoothly as they seem… ahh, dreamy.

One of my personal pet peeves, however, is purchased server based software products that don’t come with any migration guidance or instructions.

I’m spending a lot of time at the moment moving niche (read: there’s only a couple of options and they all suck) and poorly made software from older server OS’s to newer ones, and whilst some of them are fine and can be figured out right away, most of them have… issues. Typically you can just install $dumpsterFire fresh to the new server, dump the existing database and throw it over the fence to the new server (yes, some of this terrible software rEqUiReS tHe DaTaBaSe Is oN tHe SaMe SeRvEr), maybe I’ll need to update some config files and point the new install at the existing database. One CNAME change later for the clients and you’re golden. Simple stuff.

Sometimes, however, you need to perform some forbidden magical incantations, and the kick to the teeth in many cases is that nobody in $org will write down what they are publicly. So you gotta pick the phone up and get some overworked and underpaid $tech at $org to walk you through the process, shooting down the constant stream of bugs and errors that occur along the way due to the shoddy quality of $dumpsterFire with bullets of solidified experience (that’ll no doubt be lost when $tech has had enough and leaves for greener pastures.)

Or, even worse than that… the $org requires payment for a migration because migrations aren’t considered “support” and are “optional”. Yes, it is support. And no, it isn’t optional.

Dear organisations that explicitly hide their migration guides to force an already-paying customer to pay you yet again to migrate your horrible software from one server to another (immoral), that INSIST that you “must TeamViewer in to do this” (in business hours only!?), that there’s no possible way anyone else can do it (false and stupid), that require DA/root accounts (you don’t), that have to be installed on a Domain Controller (I’m crying), that MUST have unmitigated 24/7 access by installing un-licensed teamviewer (Oh no you won’t)… There’s even one software solution here which “requires” a physical server. In 2023. The software won’t work in a virtual machine. (Oh, wait, spoiler: it works fine in a VM and has been working fine for over a decade)

To those organisations I say: fuck you. Do better.

Because I’m writing this all down, I’m recording my screen when you connect to solve unhelpful errors in your hidden log files, I’m fixing your stupid permission requirements and immediately uninstalling any additional or third party crap that isn’t a business requirement.

And I’m publishing those guides online for free.

Duct Tape

Back when I did end user support as my primary job I’d often ask myself “if there were no helpdesk tickets, what would I work on right now?”

The list I’d come up with was generally fairly short and changed each time I found myself pondering it, typically because the needs of the environment changed quite frequently, but it would also contain a few regulars – patch this, upgrade that, move that thing from there to there. Stuff that wouldn’t have an immediately beneficial impact but should probably happen. These are all good things to do and I encourage them to be part of the daily workload of a team.

The dynamic entries in the list were always something to do with making something better, getting rid of an annoyance or frustration. These are the things that do actually have immediate ROI and I would argue they should have time spent on them even during high ticket load periods but often find themselves being ignored because… Well, it ain’t broken. It’s just not great.

The issue is, as hinted at, this list changes often. Sure you can keep notes, an ideas board or submit a ticket, but when you do finally get an hour to look at the list, what’s on it doesn’t seem that critical.

I just read an older post on rachelbythebay.com that summarised this and explains what it means in five words:

Look for the duct tape

Read it (and whilst you’re there, if you don’t already consume that content take a look at some other posts – very valuable information and opinions contained within)

Essentially, find the things people have whipped up a quick workaround for and are using and fighting with now, whether that’s within your team or another. Spend some time making that thing better, or resolve the problem at source if possible. You’ll not only make that person or team happy, you’ll also actively solve a now problem that will probably directly impact success, however that’s measured.

Receiving unsolicited blogging advice

TL;DR: me post more!!1!

I recently read a blog post about a blog post and came to realise that I too struggle to post stuff on this very blog. I am conscious that I suffer from the curse of “I must make everything interesting!” but… maybe I don’t have to suffer any more. Maybe I don’t have to care. After all, if you don’t like it, you don’t have to read it.

I started this corner of the web to document my renovation and tech stuff for me and myself only. And Ben. (Hi Ben) But somewhere along the way it started to be about what other people find interesting, and… nothing really felt big or important or interesting enough. Hence the lack of posts recently.

So, a reset. Time to post more. They may not fit the predetermined categories, they may be completely uninteresting, but it turns out that not only is that okay, it’s what I intended originally anyway. They’re interesting to me, and besides I suspect I read these posts more than anyone else. Maybe, just maybe, someone else will find something a little bit interesting or useful too? That’ll be a positive bonus.

The algorithms and view counts hold no sway. I forbid thee!

Don’t Trust Copy & Paste – even with JavaScript disabled

I’ve seen a few articles recently advising readers to not blindly copy/paste code from a website into your CLI directly because, with a small bit of JavaScript, you can overwrite the clipboard.

This is something that has been known about for a while and for some reason has seemingly resurfaced recently. The advice is to always paste into a text editor first to ensure what you think you have copied is actually what you have copied. However, I have seen comments about how you should disable JavaScript unless you need it in order to prevent this from occurring.

As awkward as this “disabling JavaScript” advice is on the modern web (and it does require some technical knowledge to enable just what you need) I agree, and in fact disable JavaScript by default. However, for this particular issue, this doesn’t matter. You can achieve essentially the same thing without any JavaScript.

The stuff below isn’t new. In fact, in the linked article is a link to a reddit thread where someone outlines this exact problem. But I feel that it can’t hurt to reiterate. And explore!

So, for the obligatory warning: Don’t paste anything on this page into a Powershell window. Don’t paste it into anything but a text editor. The examples below shouldn’t be harmful but… look, just don’t risk it, okay?

Oh, and disable JavaScript if you want.

Malicious String – copy and paste the below example into a text editor (NOT a Powershell window)

echo ‘hello,
copy c:\inetpub\www\config.php c:\inetpub\www\config.php.txt -whatif
clear
echo ‘hello world!’

Let’s explore how we got here and what we can do about it.

(more…)

Tales from Tech Support 02: The Server Room is Nuts

I’ve said previously that things come in threes. This particular month in 2020 gave us three quite severe issues in the server room at work.

One – as the tide comes rolling in

We had some heavy rain in early October over a weekend. I spent the majority of that time in front of the fireplace until work on Monday morning.

Before I even got through the office I was alerted to a flood in our server room. I rushed down there to find an absolute mess. We are unfortunately strapped for space and also use the server room to store our spare equipment and some peripherals and consumables. We lost quite a lot of stuff due to water damage however most of it was old so doesn’t hold much value. There were some switches that got soaked, and also my own personal Draytek I was going to use as a secondary VPN (ours wasn’t great at the time) during lockdown in case of emergencies.

You can see in the following image a “tide line” around the walls of the room – we had about an inch of water in there that slowly soaked away. A phenomenal amount of water given the size of the room and the fact that there are sizeable gaps under the two doors that allow water to escape to neighbouring rooms

Somehow our servers didn’t get too wet. Humidity was at 90%+ for quite a while though, and we have experienced several hard drive failures since which were likely related.

After three days of drying, ripping up carpet and sorting through our stuff we could finally get back to the normal slog. However we learned that the fitted Aircon unit in the room doesn’t have a cut off for drying the air. It’s either on or off, and only when manually set. We can’t set a target humidity. In a server room you typically want to hover at about 50% – too high and the moisture in the air corrodes internals, too low and static electricity can more easily discharge in the dry air. Before ripping out the carpet and other detritus we’d hover at about 50-70% without any flood water in the room, but since then we’ve gone as low as 30%.

We need proper climate control and monitoring. And for the leak to be fixed… It has been an issue for a while which involves the design of the roof – water does drain away but if there’s too much too quickly, or the drainage is blocked as was the case this time, the water overflows and somehow finds it’s way to the ground through the second path of least resistance… Which just so happens to pass right through the server room about two feet in front of the cabinets.

Two – splashback

About two weeks after the first incident we had more heavy rain. More water in the server room. Luckily not as much water and because the room was empty and carpetless it dried out within a day. Unfortunately it came in through a slightly different part of the ceiling about half a foot closer to the server rack.

I’d love to get up on to the roof to try and figure out where this water is getting in. I’m told the fix is “new roof” but wonder if making a path for the water to escape by which doesn’t pass through one of the most expensive rooms on the campus is feasible. Not a fix, certainly, but a workaround until we can move the servers (which we should be moving before the end of this year.)

Three – this is nuts

Finally (at least I hope) is this, which happened about a week after flood #2:

Blurry

We saw him on the camera we set up after the first flood. We raced down there and eventually managed to coerce it out from its temporary home directly under the server rack before it could eat through anything. There have since been more squirrels find their way into the building but none have yet braved the server room. For fear of drowing, I suspect.

Although management doesn’t particularly care about incident management and response reports, I find the whole process fascinating. So I wrote up an incident report for the squirrel invasion and it was quite entertaining to type out to say the least. So many puns.

Error b8bb2b3e on HP Office Jet Pro 8710

TL;DR: Try disabling “SNMP Status” for the port on the Windows print server in Print Management. No guarantee this will work everywhere but it appears to have worked for me. Give it a go, let me know!

Turn the SNMP Status Enabled box off

Printer says… what, exactly?

Feel free to skip the rest of this post. It’s just the sequence of events that took me to the supposed conclusion/solution.

I arrived at a customer site to find a stack of issues, but one in particular caused some confusion to my smooth-brained self. A HP Office Jet Pro 8710 printer, which worked up until last Wednesday, had a black screen with an error on it stating:

There is a problem with the printer. Turn the printer off, then on.

I of course tried a reboot. The printer switched on, booted up and appeared to be fine. But after 5 to 10 seconds it would make a clicking noise, the screen would flash blue, then go black with the same error. I hate printers.

It looked like there was some white text on the blue screen so I recorded it with the phone in slow motion and picked up an error code: b8bb2b3e

Googling this took me to one relevant result, a post on the HP forum with no replies. Great. I hate printers (I may say this several times) and I hate an error code with no official documentation anywhere on the internet. So, what is one to do in such a situation?

Well, poke about, push buttons, and hope you figure out a pattern, obviously!

My immediate thought was something PrintNightmare related, however as the post on the HP forum was made in April I moved that away from the forefront of my mind for now and started looking elsewhere. As luck would have it, I noticed that the error occurred just as the Wireless light finished flashing (because yes, this printer is connected to a domain network wirelessly. Literally the definition of evil.)

I plugged in a network cable that I pinched from a nearby desktop and the printer stopped crashing. Great, progress! This building had a new wireless network installed a couple of months ago so I began poking at that but nothing had changed in the last couple of weeks. Instead, I moved across to the Print server to change the ports over to the new IP address the device has from the wired network assuming it was wireless related and… pop! Blue screen with b8bb2b3e again. My head flipped back around to PrintNightmare patch issues or something up with the server. I didn’t really have a direction of travel from this point as Event Viewer had nothing of interest in it as far as I could see, but I decided to send the printer a Test Print just before it connected. I’ve seen before issues with printers where it’ll process an existing queue of work and then crash out or error, indicating that the core technology (network, server, printer) is functioning but some feature is causing an issue, and that’s exactly what I saw here. The device connected to the wired network, sent out the test print page, then immediately crashed out again. Historically I’ve fixed this (rare) occurrence by removing and re-adding the printer to the print server but I decided to poke around some more and try to nail down which setting was causing this (if, indeed, any at all.)

Luckily it didn’t take too many attempts to switch off SNMP Status Enabled, reboot the printer, and not see any crashes. The printer is still working a few hours later (back on the wireless… grr. We’ll run some cable into this particular office soon) and I have since checked Windows Updates installation date – the printer stopped working right after updates were installed on the server. The server was up to date prior to this month’s (July 2021) patch release so it does look like something in the most recent set of updates caused this. And yes, I have ensured the server is not vulnerable to PrintNightmare by checking for those registry keys, so it’s not that.

If I unveil the exact cause I will update this post.

Kaseya Says Yes

I just read the truesec analysis of the Kaseya VSA 0-day that hit the news earlier in the month. I love reading articles like this, but this one in particular I had to highlight.

The authentication… “bypass”… utilised as a first step: D’oh! How did something like that even get into production? The linked article has more details but essentially, if all authentication checks fail (when querying this particular file, not generally) instead of saying “Nope, you are not authenticated!” they instead say “Oh, you don’t supply a password that we can verify? Ok, let’s give you authenticated status anyway👍”.

Logic failures like this generally don’t happen in the ideal world because they’re blindingly obvious, so allow me to speculate for the rest of this paragraph. I can only assume that a developer temporarily set this up to diagnose a bug or test a feature and simply forgot to flip it back to “fail by default”. This is why peer review is important, though the rush to get things out to production works against this. It’s too easy to miss this kind of thing in the modern world. It shouldn’t be, but it is. There should be no blame here on any individual – I suspect a process/procedure needs to be looked at, or a team needs to be better resourced.

Visitor Information Disclosure in wp-statistics

Just noticed this and when Googling it has been picked up already, so this isn’t new, but the wp-statistics module (v13.0.8 for sure but likely other versions too) seems to be logging information into the “wp-statistics.log” file in the root directory of the site it is installed on. You can therefore access it and in some cases read the IP addresses of visitors to a site if they have the addon enabled by visiting domain.tld/wp-statistics.log.

You can block external access to it in the .htaccess file via:

<Files "wp-statistics.log">  
  Require all denied
</Files>

I’ve logged an issue on their github page, hopefully they fix this soon 2021-07-22: a fix will be pushed out this weekend according to the latest update on the issue.

A quick google dork will show up a fair number of affected sites, including some… potentially embarrassing ones.

GLPI – Ripe for the Injecting

I’ll move on from GLPI eventually and start working on some interesting technical stuff, but today is not that day.

We ditched GLPI after we got hit by an accidental SQLi from HaveIBeenPwned – in short, version 9.4.5 is vulnerable to an SQL Injection flaw. You can exploit it by sending it an email (say, to helpdesk@company.tld) and once the email gets automatically turned into a ticket and assigned, the SQL will be executed. This affected us because the obscenely simple execution string was included in the header of the haveibeenpwned email notification.

I’ve just been poking around GLPI again (we have kept it around for non-end-user stuff, isolated and kept out of reach) and noticed that there was a “telemetry” scheduled task in the list of Automatic Actions which got me curious.

GLPI have decided to publish some of their telemetry data, which is nice of them. But it shows that there’s still a significant number of users running 9.4.5 and older.

Of the installs that report telemetry in the last year (and only those installs on 9.2 and above do this), 14,313 are on a version at or below 9.4.5, whilst 26,985 are on 9.4.6 and above. Over 34% of GLPI installs are potentially* vulnerable to this painfully simple exploit but over 12% of installs absolutely are still vulnerable as they’re on 9.4.5 exactly.

*Potentially, but I suspect only 9.4.5 is vulnerable – they fixed it by accident in 9.4.6 here which looks like a response to an issue that appeared in 9.4.5.

We learned a lesson with the GLPI issue – keep your software up to date. Though to be fair to us, it was up to date according to their website at the time. There was a newer version available (9.4.6) but that wasn’t advertised anywhere.

I hope these out of date installs get updated. We know there’s a lot of malicious activity out there, but at the same time… accidents can happen.

Tales From Tech Support 01: Percussive Maintenance

If you’ve worked in IT for more than a year you probably have some crazy tales to tell. I certainly do – with nearly 15 years in the field I have seen insanity and hilarity – more than I could ever remember. So I thought to myself, why not write it all down somewhere?


This first tale comes from back in my early days.

As a PFY, I was trusted with little more than basic desktop repairs and printer toner replacements – a fairly common slice of life for many IT bods. I was relatively fresh faced, a few scars but nothing major. Eager to learn and eager to please, I was often the first to raise my hand and take any challenging job, despite the vast gaps in my knowledge. Our team was small – three techs (two general helpdesk end user support PFYs, one mobile device repairs) with one very isolationist “Network Guy” (who would now be called a SysAdmin) and one “Database guy” (although their only qualification was knowing what “SQL” stood for.)

It came as quite a surprise when, early on a sunny Monday, we were told that the Network Guy was leaving. And he wasn’t being replaced.

It quickly became apparent that the entirety of his duties were to fall on myself and PFY#2, who had been working at this place for a bit less than me. Networks Guy’s last day rolled around without much issue, nor much communication from him – in fact at one point I asked for his help with a network issue and he told me “I don’t care, I’m leaving!” So it came as a bit of a surprise (and equal amounts of relief) when he called me and PFY#2 up to his office to give us his handover document.

His handover document consisted of a single sheet of A4 with handwritten notes about a few things barely qualifying as useful. A few IPs and other miscellaneous details about servers and switches, the odd issue he knew about but hadn’t fixed, and maybe a password or two.

We received the document at the door to his office (we weren’t tolerable enough to go inside on this occasion?) and quickly shoo’d off to our regular helpdesk support duties.

That evening, he left and we never saw him again.

The following weeks and months were absolutely insane. I can’t recall much about what happened during this time. Myself and PFY#2 managed to keep on top of most of the helpdesk support calls and make a start on untangling the network. We quickly found that most switches were essentially unboxed and plugged in without any config changes, servers were unpatched and had uptime into the hundreds of days (you could tell when the last powercut had been by looking at the uptime) and we were getting very close to the limit on resources – maxed CPU and/or RAM, HDDs filling up. Group policy was a mess, roaming profiles reaching into the tens of gigabytes with nothing preventing their growth… it was a mess.

To top it off, out the back of Network Guys office was another small closet containing almost all the servers we had. Neither of us had ever been back there, and when we did we found dust, cobwebs and equipment that seemed to be switched on but we had absolutely no idea what it did. Helpfully the single sheet of A4 told us some server names and serial numbers.

We were fueled by Red Bull and the long (long) days began to blur into one massive learning experience. To this day I have never learned so much so quickly as I did back then as we fought to keep the place running, the users happy(ish) and continue to learn as much as we could.

There was one event, though, that utterly stumped us.

Cheerily and awesomely smashing the helpdesk as we were, we suddenly had calls coming in about email being offline. After a quick check we realised that, yep, email was down. We couldn’t RDP into the Exchange (2003) server either – something was wrong, clearly. Off to the dusty old Network Guys office we go.

We walk in, grab the single 4:3 CRT monitor in the room, stretch the cables across the room from one of the power extension cables with a spare socket to one of the nearby tables, and plug the crusty old VGA cable into the back of the absolute beast of an exchange server. I mean, this thing was huge. An old tower, black, solid steel everything. No idea why it was up in this first floor room when all the other servers were on the ground floor, but… whatever.

Flicking the monitor on we quickly saw everyones favourite screen. Yep, it’s blue, and it signifies death. The BSOD.

We panicked a little, probably downed another Red Bull each, then got to work trying to bring this thing back online.

Remember: we were literally flying by the seat of our pants here, and had been four months at this point. We had no idea what we were doing.

First things first – switch it off and back on again. We held the power button down, heard a massive CLUNK, the fans span down and the screen went black. Take a breath. Switch it on again, wait for the BIOS, wait for Windows to start booting, wait some more…. BSOD.

Crap.

We try again, switch it off and back on. This time, we hear a horrible grinding noise as the machine spins up. We get to Windows trying to boot and everything freezes – not even a BSOD. Off it goes once more, switch it on, grinding noise, BIOS doesn’t even finish loading.

Double crap.

Backups! Backups? There are backups, right? One of our jobs is to take the tapes out of the drive and swap them with the next numbered batch in the fire safe, surely we could restore this? But… Network Guy did the backups. Network Guy didn’t elect to write anything down about the backups! We don’t know anything about them! An oversight of the highest order!

We know the boss has the phone number of Network Guy, but we need to try fixing this ourselves first. We don’t want him shouting down the phone at us like the one and only time we called him for help before this…

More Red Bull, more diagnosing. On the odd occasion we can get Windows to try booting, and sometimes we can get to the login screen using Safe Mode, but no matter how quick we are, we can never get logged in, and even this doesn’t last forever. Eventually the server just stops trying to load Windows and we’re presented with some error about not finding a HDD. The grinding at this point is still going on and we’re faced with our fear that the grinding isn’t a fan, but the single HDD that all of our email is stored on.

There’s nothing else for it. We’ve gotta call up Network Guy and ask his advice. Neither of us want to do this though – he wasn’t helpful to us when he worked here, and especially not when he was leaving.

Is there nothing we can do?

I remember the moment – we were stood either side of this huge hulking great server with no more options (that we are aware of at any rate). Our eyes meet, and without saying a word we both think the same thing at the same time.

PFY#2: “Shall we?”

Me: “I dunno… I mean it’s not working so maybe?”

PFY#2: “I think we should.”

Me: “Okay. Let’s do it.”

PFY#2: “Go on then, you can try first.”

Me: “No way, you do it.”

We look down at this ancient black server, lined with solid steel frame, sides and front, touched with the occasional bit of cheap plastic and the odd faded sticker.

I sigh.

PFY#2 raises his foot, and kicks the bastard right in the side, smack bang in the middle.

The grinding noise changes in pitch audibly. It’s still there, still buzzing away in that audio range that you think you can put up with but slowly sends you insane without you realising it. PFY#2 reaches down and holds the power button in to switch it off. He switches it back on.

BIOS loads up, boots fully, screen goes black.

Windows loads. And loads. We stare. And it continues to load. The login window appears after what must have been at least 25 minutes. We’re in shock. No BSOD. No lock up. Buzzing? Yeah that’s still there, but the server has booted. What the f-?

We rush to a nearby office, get the first user to open up outlook. It connects. Emails from the upstream start flooding in to their mailbox.

We check our own, same thing – it’s working.

We’re buzzing. Our blood is filled with adrenaline, caffiene, sugar, and whatever the hell else they put in Red Bull, but it also filled with joy. We fixed a server by kicking it.

After spreading the word that email is back, we get right back to our helpdesk calls, of which dozens have appeared since our exchange issue surfaced.

Not long after this we do eventually employ a sysadmin who I work with to this day, but this exchange server didn’t get replaced immediately. I can’t remember exactly when it was retired, but it whirred on for a good year or more after this. We tried very hard to not touch it – it wasn’t perfect, and we definitely had at least one more very confusing issue with it, which I’m sure I’ll write about at some point, but the beast chugged on and kept our email flowing until it was eventually replaced by a younger sexier model.

Some say that if you can find your way into Network Guys old (and long repurposed) office, and if then you can manage to make your way into the back room, you can, on quiet days when email traffic is up, still hear the buzzing of that once-failed-but-then-recovered hard drive.

This is how I came to learn about and respect percussive maintenance.