Author: matt

Useless Windows bug #5462

Here’s a bug in Windows explorer that I happened to notice today, which has zero value and despite me trying to find a way to abuse it, is seemingly completely useless. Tested on Windows 10 22H2 & Server 2019/2022.

If you right click on the start menu icon in the task bar, the task bar across all screens freezes in time. The clock doesn’t update, applications that open after you right click don’t show up on the task bar as being open (except in Server 2022), but when you click away from the right-click menu everything snaps back to the correct state.

Right clicking on other things in the task bar doesn’t yield the same results. It’s only the start menu icon.

EDIT: Does not affect Windows 2012R2.

VMs not booting after October 2023 Windows Updates

Booting a VM and you get the error “Failed to power on with Error ‘Incorrect Function'”? Yeah, this hit us too. Looks like it hits VMs with secure boot enabled.

Quick fix: Move, delete or rename the .mrt & .rct files associated with the .vhdx file that the OS of the non-booting VM normally boots from. It’ll then boot normally.

Alternatively you can uninstall the updates kb5031364 and/or kb5031361.

Migrate Sims.Net Server

Sims.Net is a very common and, in my opinion, a very badly written Management Information System used in the UK’s lower education sector – primary and secondary schools. It’s commonly used not because it’s good, but because it was one of the few options many years ago and a huge number of schools just haven’t bothered to move to something else. And I don’t blame them, it’s got a lot of history and data going back decades in that database. It’s not an easy thing to migrate off of.

ESS, the (current) caretakers of Sims.Net, have recently changed their licensing model to seemingly prevent third parties from hosting Sims.Net data on behalf of a school. I admit, I don’t know the details, but that’s the message as I understand it.

It looks like due to this, and the EOL of Server 2012R2, many school techs and sysadmins across the UK will be scrambling to get Sims.Net installed on an on-prem server.

Unfortunately, documentation is sparse. It’s a shame, but it costs a whole lot of cash to pay for third parties to do this for you. Right?

In the spirit of publishing guides for stuff organisations seemingly don’t want guides published for I’ve written up a guide on installing or migrating Sims.Net (and the horrible, horrible SOLUS3) server/database, because I struggled to find any well written and up to date guides and had to figure it out myself recently.

So if you’re looking to install or migrate Sims.Net, hop in and come for a ride. It’s easier than you think! Until you get to SOLUS3. We won’t be migrating that. But I’ll go through how to install it fresh!

(more…)

Implement but never move

I really appreciate well-written implementation guides for server-client software, but what really gets me excited is seeing migration guides for when you need to decommission that legacy OS and move onto something supported and current. Even the bad migration guides are great to have, but some that I’ve come across are works of art – numbered-list instructions that are clear and concise, incorrect assumptions busy sysadmins have during the migration are highlighted then corrected, screenshots where required, mid-migration tests to ensure things are going as smoothly as they seem… ahh, dreamy.

One of my personal pet peeves, however, is purchased server based software products that don’t come with any migration guidance or instructions.

I’m spending a lot of time at the moment moving niche (read: there’s only a couple of options and they all suck) and poorly made software from older server OS’s to newer ones, and whilst some of them are fine and can be figured out right away, most of them have… issues. Typically you can just install $dumpsterFire fresh to the new server, dump the existing database and throw it over the fence to the new server (yes, some of this terrible software rEqUiReS tHe DaTaBaSe Is oN tHe SaMe SeRvEr), maybe I’ll need to update some config files and point the new install at the existing database. One CNAME change later for the clients and you’re golden. Simple stuff.

Sometimes, however, you need to perform some forbidden magical incantations, and the kick to the teeth in many cases is that nobody in $org will write down what they are publicly. So you gotta pick the phone up and get some overworked and underpaid $tech at $org to walk you through the process, shooting down the constant stream of bugs and errors that occur along the way due to the shoddy quality of $dumpsterFire with bullets of solidified experience (that’ll no doubt be lost when $tech has had enough and leaves for greener pastures.)

Or, even worse than that… the $org requires payment for a migration because migrations aren’t considered “support” and are “optional”. Yes, it is support. And no, it isn’t optional.

Dear organisations that explicitly hide their migration guides to force an already-paying customer to pay you yet again to migrate your horrible software from one server to another (immoral), that INSIST that you “must TeamViewer in to do this” (in business hours only!?), that there’s no possible way anyone else can do it (false and stupid), that require DA/root accounts (you don’t), that have to be installed on a Domain Controller (I’m crying), that MUST have unmitigated 24/7 access by installing un-licensed teamviewer (Oh no you won’t)… There’s even one software solution here which “requires” a physical server. In 2023. The software won’t work in a virtual machine. (Oh, wait, spoiler: it works fine in a VM and has been working fine for over a decade)

To those organisations I say: fuck you. Do better.

Because I’m writing this all down, I’m recording my screen when you connect to solve unhelpful errors in your hidden log files, I’m fixing your stupid permission requirements and immediately uninstalling any additional or third party crap that isn’t a business requirement.

And I’m publishing those guides online for free.

Duct Tape

Back when I did end user support as my primary job I’d often ask myself “if there were no helpdesk tickets, what would I work on right now?”

The list I’d come up with was generally fairly short and changed each time I found myself pondering it, typically because the needs of the environment changed quite frequently, but it would also contain a few regulars – patch this, upgrade that, move that thing from there to there. Stuff that wouldn’t have an immediately beneficial impact but should probably happen. These are all good things to do and I encourage them to be part of the daily workload of a team.

The dynamic entries in the list were always something to do with making something better, getting rid of an annoyance or frustration. These are the things that do actually have immediate ROI and I would argue they should have time spent on them even during high ticket load periods but often find themselves being ignored because… Well, it ain’t broken. It’s just not great.

The issue is, as hinted at, this list changes often. Sure you can keep notes, an ideas board or submit a ticket, but when you do finally get an hour to look at the list, what’s on it doesn’t seem that critical.

I just read an older post on rachelbythebay.com that summarised this and explains what it means in five words:

Look for the duct tape

Read it (and whilst you’re there, if you don’t already consume that content take a look at some other posts – very valuable information and opinions contained within)

Essentially, find the things people have whipped up a quick workaround for and are using and fighting with now, whether that’s within your team or another. Spend some time making that thing better, or resolve the problem at source if possible. You’ll not only make that person or team happy, you’ll also actively solve a now problem that will probably directly impact success, however that’s measured.

Interruptions in tech support

I recently read some posts on the damaging effects of interruptions and wanted to explore this in the context of my current job – sysadmin across multiple k-12/ks1-5 educational environments – and offer some thoughts on how to change things.

First off, the core educational environment itself is pretty much built around interruptions. You generally have a single teacher in a room of a few dozen kids. I’m simplifying here, but the teachers essentially have particular targets to hit each lesson (“teach this thing”, “make sure the group produces this result”, etc) and these lessons are typically somewhere around an hour long, though from what I’ve seen these lessons are shrinking in length to squeeze more different material into a day/week. Teachers have to hop between different targets every hour (or less) as the group they teach changes (the students have to similarly hop, but across entire subject ranges. Is there any surprise that, combined with the usual suspects (advertising, media, social medias’ infinite scroll, etc), young people have shorter attention spans?) Often these differing targets are in the same subject (Science, or English, or Mathematics, and so on) however different classes are at different points in the curriculum and even if some of those classes are at the same point in the curriculum (the students are of the same age) they’re often grouped by capability – some classes need more time and effort and revision of material than others, whilst the top groups delve into subjects at a deeper level.

A teacher has an hour to achieve a goal before moving on to the next goal with a different group. During this hour, they will typically outline the thing to be done or learned, then work with the class to help them get there. This naturally results in questions (interruptions) continuously. It’s in the teachers best interest to answer these questions well to ensure that individuals maintain pace with the group, but paradoxically the more questions that are asked the slower the group can move forward, as spending time explaining material again for the benefit of one means that those who do understand it already are not really learning for the time period it takes to get the one who asked caught up. Thus in my uneducated opinion, the priority for the teacher is to get the effectiveness of the first explanation of any material perfected (as closely as possible given the ability of the class) ensuring that most of the time most of the class (if not all) comprehends it to a suitable degree on the first attempt. This minimises interruptions and enhances learning for all, as the class learning velocity is kept high.

This problem is outside the scope of what I want to talk about but it is relevant, because one side effect of this we-must-acknowledge-interruptions behaviour is that I feel it becomes habit. Being interrupted a lot becomes the norm, and it is my opinion that this encourages the teaching individual to also be the one interrupting others more.

Let’s go back to the classroom. You have one hour to teach a group of 33 twelve year olds about some algebra. You fire up your laptop and switch the projector on to find that no matter how you try you can’t get an image to appear on the board. This unplanned interruption has immediately taken up brain power and, critically, time, even if you have backup plans for each and every lesson. As you have chosen to use the technology resource you have clearly decided that it is important, so we can assume that without that resource your teaching and the students learning will be less effective. You are after a fix for this problem quickly. The quality of your teaching is likely somewhat diminished and the longer this goes on the worse this gets.

So you do what anyone in your situation does – you try and get it fixed. This is where tech support comes in. You make a phone call or send an email to that tech you like, or if you’re a hero, log a ticket on the ticketing system. The hero did the right thing – logging a ticket. No complaints from me there. (Pro tip for techs reading this: make every avenue of communication a ticket generating event!)

However emailing an individual or phoning interrupts the support techs. This is often warranted and is always understandable, and your job is a constant stream of interruptions so one more won’t hurt, right? This is where, unfortunately, tech support suffer. We operate in the opposite universe. Your tech problem, as a teacher, is probably one of your biggest current problems. And we totally get that. Promise. We really do. But… it probably isn’t our biggest problem and is almost certainly not something we’re going to want to deal with right away.

An interruption to us does not progress anything, it in fact stops everything. If we are in the middle of resolving another problem, being forced to stop for a reason (whether it’s to deal with a major issue or respond to a phone ringing or being pulled out of the moment by a name being called) causes us to disconnect from “the flow” and at best mentally change gears, at worse slow the brain CPU way the heck down. This is jarring and as evidenced by the many articles published about interruptions can in fact be damaging, both mentally and economically.

The more this happens the less effective overall an individual or a team can be. I argue that an ops team lead (Helpdesk Manager, IT Director, etc) should put work into minimising interruptions for the betterment of the team and yourself, regardless of your vocation or the objectives of the team. Here’s what I propose you can do to help within your tech support team regardless of work environment… Though you will likely need managerial sign off on some or all of these. It’s worth noting that I didn’t come up with these, they’re merely an amalgamation of things I’ve learned and tested which have worked for me. I am assuming you’re part of a team instead of solo hero, too, though if you are solo then interruption reduction could potentially save you from burnout. Some of these might help.

Everything should (automatically) log a ticket

Make it easy to log tickets. Saying “no ticket no fix!” feels good but doesn’t actually achieve anything for the organisation, except to piss off someone with an issue (and let’s hope that person isn’t a C-Level.) Let people call, let people email, let people visit, let people knock on the door. You can more than likely automatically create tickets in your ticketing system that come through to email, and if you can’t, get a better ticketing system. Get a generic “techsupport@” inbox set up and configure your helpdesk to add everything sent to it as a ticket. Bonus points of you reply right away (automatically of course) telling the requester that their ticket has been logged. At least they know it’s in the queue instead of sitting unread in some mystery inbox.

Ten CC’s of Triage, STAT!

Yikes, you now have a flood of tickets. This looks bad. And hey, maybe it is, but at least now you know it’s bad instead of it just feeling bad. Maybe management would be surprised to see that you’ve got 143% more tickets than you last reported, because everything is a ticket now. No more invisible work.

Anyway, I digress. Triaging tickets is essentially reviewing a ticket and deciding if it’s a priority or not. There are many ways to judge this, but typically you want to take into account how urgently this needs to be fixed and how much of an impact it would have if it didn’t get fixed.

A PCI Compliance audit deadline of two hours ago is pretty urgent – it’s something that needs to have happened already. But… the banks aren’t going to block all transactions immediately. Business doesn’t stop because you’re a few hours late on submitting a self eval form (this is not an endorsement to delay PCI compliance!)

A projector in a classroom has a pretty high impact – that’s 30+ students (plus one pissed off teacher) per hour per day. That adds up quickly. And yes, it’s pretty urgent too, but worst case the teacher can still teach like they did in 1985. Right?

Triage is important for two main reasons:

  1. It helps you and your team decide on what to work on without having to decide on what to work on
    Generally, you work on the thing with the smallest SLA time left, and if you have negative SLA counting down then you should probably get on that ASAP. Assign the Priority (the urgency and the impact) an SLA (say, high impact high urgency = 4 hour SLA, low impact low urgency = 14 day SLA – whatever you decide will be unique to you and the organisation and likely requires understanding managements expectations as well as your teams abilities)
  2. Reporting, evidence, promotions, wage increase, redundancy protection, efficiency gains, areas of focus and more!
    Having all this triage data to look back on will highlight where things are working and where they’re not, but bigger picture style. You can use the data and the knowledge that you’ve gained to more closely align the team with the objectives *and reality* of the organisation. Perhaps when triaging your tickets you could also categorise them – hardware, software, licensing, accounts? Or go one level deeper – projector, laptop, desktop, TV, printer. Office, Web browser, MIS. After time passes you can really easily see where your team spends most of its time and where resources (like money) can be allocatied to reduce load in those areas.

Triaging tickets is generally not a full time job, so you could make this person…

Hero Assignment: Interrupt Shield

Designate an “interupt hero” – this can be one person or more and ideally they would triage tickets in the queue too. If you’re lucky enough to be able to do this, put them in another room and leave everyone else in the relative peace and quiet of the main office. Make everyone else go to this smaller interrupt room for direct in person support.

They deal with all phone calls and in person visits and group emails. They also deal with any quick support calls (when not on a phonecall) that don’t require them to leave their desk and that are interruptable, for example password changes, group membership updates, etc. You should rotate this person very regularly so you’re not moving the “interrupt problem” to one person all the time, as that’s a recipe for disaster. Let them do a day at a time, or two days if you use a hero team, rotating half the team out every day so handover and “current events” knowledge transfer (what were yesterdays big issues? What’s on the radar today?) can occur more naturally.

Redirect everyones incoming calls to the interrupt hero team. This further reduces interruptions to the main body, though you may want to make exceptions for VIPs. Your interrupt hero team should be able to forward calls to you, but generally they will be able to deal with most quick things (and push to the focus team via a ticket if not, of course)

This has two advantages. First off, the toil is reduced for the team at large – your senior sysadmin isn’t doing the third password change this week for little Joe – and secondly (and most relevantly) the rest of your team can focus for long periods of time on their work. Better work is done by most of the team. Better work is better results (whether those results are money/profit or a higher quality of education.)

Don’t overload on active tickets

Don’t assign anyone (including yourself) too many tickets. There is no perfect number – it’s different depending on role, personality, type of work, and many more factors. But try and get a ballpark. I would suggest starting with no more than 10 non-pending tickets (for clarity, I define a pending ticket as one which is waiting for something outside the control of any member of the team) and only one should be worked on actively at a time. You can’t install a printer whilst also diagnosing wireless issues. Pick one. Focus.

Low remaining (or negative/expired) SLA first

Plucking tickets from the queue because they look easy isn’t productive, long term. But it’s fine, short term. Don’t sweat it, manager. But keep an eye on it.

Tickets should be worked based on their SLA, accounting for the available suitable resources (don’t put the guy who sucks at DNS problems as the lead on critical DNS problems – train them? Yes. Rely on them? No.)

Sometimes it’s nice to jump 70% down the queue and grab a few easy tickets. It feels like a break.

Just don’t make it a habit.

Having repeat logs of the same call that has gone unanswered for three months is – you guessed it – yet another interruption. Old (the definition of which is, essentially, defined by your SLA – expired SLA? Ticket is old!) tickets should not exist (with very very few exceptions. Like budget availability.)

Don’t judge people on their call closure rates

How many tickets I’ve closed vs the other techs means absolutely nothing. Get honest feedback from the source – ask the staff and, yes, the students, to rate the support. Then pay it attention. This feedback is so valuable. Whenever a ticket is closed, automatically send a followup email asking for feedback, and make it quick. You want the emotional immediate feedback, not the “dwell on it for three days and [calm down/forget]” feedback.

  1. Was the support: [] Amazing! [] Ok! [] Shit!
  2. Optional comments here:

Negative feedback is the most valuable information you can possibly obtain

I argue that getting positive feedback means absolutely nothing, except as a “look how great we are, CEO!” line in the yearly review meeting. The feedback you really want, the feedback with actual damn value, is the negative feedback. And it’s all well and good getting it. But make sure to pay attention to it, don’t let it sit in the database doing nothing. Analyse it. Understand it. This is the closest you’ll get to dipping your hand in the metaphorical technological currents of your organisation. You’ll be able to feel the effect of the teams work, and when something goes against the current, you can fix it immediately.

This makes better, happier staff and students. Better happier staff and students equals better results. Which equals… you guessed it, fewer interruptions.


Interruptions aren’t going away, but you can reduce them. And when you do, that firefighting feeling will decrease, work quality and rate will improve and as a direct result the stability of the environment should improve, too. And when the stability of the environment improves, the interruptions decrease.

At least, that’s my experience.

There’s plenty more about this on the internet at large, much better explained and with additional steps than what I have listed here, however I feel that these steps are the big hitters that might just give you enough time to step back and take a real close look at the bigger picture without worrying about where the next fire will start. And it’ll take time. But it will work.

Receiving unsolicited blogging advice

TL;DR: me post more!!1!

I recently read a blog post about a blog post and came to realise that I too struggle to post stuff on this very blog. I am conscious that I suffer from the curse of “I must make everything interesting!” but… maybe I don’t have to suffer any more. Maybe I don’t have to care. After all, if you don’t like it, you don’t have to read it.

I started this corner of the web to document my renovation and tech stuff for me and myself only. And Ben. (Hi Ben) But somewhere along the way it started to be about what other people find interesting, and… nothing really felt big or important or interesting enough. Hence the lack of posts recently.

So, a reset. Time to post more. They may not fit the predetermined categories, they may be completely uninteresting, but it turns out that not only is that okay, it’s what I intended originally anyway. They’re interesting to me, and besides I suspect I read these posts more than anyone else. Maybe, just maybe, someone else will find something a little bit interesting or useful too? That’ll be a positive bonus.

The algorithms and view counts hold no sway. I forbid thee!

Don’t Trust Copy & Paste – even with JavaScript disabled

I’ve seen a few articles recently advising readers to not blindly copy/paste code from a website into your CLI directly because, with a small bit of JavaScript, you can overwrite the clipboard.

This is something that has been known about for a while and for some reason has seemingly resurfaced recently. The advice is to always paste into a text editor first to ensure what you think you have copied is actually what you have copied. However, I have seen comments about how you should disable JavaScript unless you need it in order to prevent this from occurring.

As awkward as this “disabling JavaScript” advice is on the modern web (and it does require some technical knowledge to enable just what you need) I agree, and in fact disable JavaScript by default. However, for this particular issue, this doesn’t matter. You can achieve essentially the same thing without any JavaScript.

The stuff below isn’t new. In fact, in the linked article is a link to a reddit thread where someone outlines this exact problem. But I feel that it can’t hurt to reiterate. And explore!

So, for the obligatory warning: Don’t paste anything on this page into a Powershell window. Don’t paste it into anything but a text editor. The examples below shouldn’t be harmful but… look, just don’t risk it, okay?

Oh, and disable JavaScript if you want.

Malicious String – copy and paste the below example into a text editor (NOT a Powershell window)

echo ‘hello,
copy c:\inetpub\www\config.php c:\inetpub\www\config.php.txt -whatif
clear
echo ‘hello world!’

Let’s explore how we got here and what we can do about it.

(more…)

Light-up Spinning Top in the Dark

Something about these toys appeals to me. The Kid got a few so I snapped some pictures, expecting a blurry mess. And that’s what I got, but in a good way! I love how the projected red light looks in the longer exposure photos.

Tales from Tech Support 02: The Server Room is Nuts

I’ve said previously that things come in threes. This particular month in 2020 gave us three quite severe issues in the server room at work.

One – as the tide comes rolling in

We had some heavy rain in early October over a weekend. I spent the majority of that time in front of the fireplace until work on Monday morning.

Before I even got through the office I was alerted to a flood in our server room. I rushed down there to find an absolute mess. We are unfortunately strapped for space and also use the server room to store our spare equipment and some peripherals and consumables. We lost quite a lot of stuff due to water damage however most of it was old so doesn’t hold much value. There were some switches that got soaked, and also my own personal Draytek I was going to use as a secondary VPN (ours wasn’t great at the time) during lockdown in case of emergencies.

You can see in the following image a “tide line” around the walls of the room – we had about an inch of water in there that slowly soaked away. A phenomenal amount of water given the size of the room and the fact that there are sizeable gaps under the two doors that allow water to escape to neighbouring rooms

Somehow our servers didn’t get too wet. Humidity was at 90%+ for quite a while though, and we have experienced several hard drive failures since which were likely related.

After three days of drying, ripping up carpet and sorting through our stuff we could finally get back to the normal slog. However we learned that the fitted Aircon unit in the room doesn’t have a cut off for drying the air. It’s either on or off, and only when manually set. We can’t set a target humidity. In a server room you typically want to hover at about 50% – too high and the moisture in the air corrodes internals, too low and static electricity can more easily discharge in the dry air. Before ripping out the carpet and other detritus we’d hover at about 50-70% without any flood water in the room, but since then we’ve gone as low as 30%.

We need proper climate control and monitoring. And for the leak to be fixed… It has been an issue for a while which involves the design of the roof – water does drain away but if there’s too much too quickly, or the drainage is blocked as was the case this time, the water overflows and somehow finds it’s way to the ground through the second path of least resistance… Which just so happens to pass right through the server room about two feet in front of the cabinets.

Two – splashback

About two weeks after the first incident we had more heavy rain. More water in the server room. Luckily not as much water and because the room was empty and carpetless it dried out within a day. Unfortunately it came in through a slightly different part of the ceiling about half a foot closer to the server rack.

I’d love to get up on to the roof to try and figure out where this water is getting in. I’m told the fix is “new roof” but wonder if making a path for the water to escape by which doesn’t pass through one of the most expensive rooms on the campus is feasible. Not a fix, certainly, but a workaround until we can move the servers (which we should be moving before the end of this year.)

Three – this is nuts

Finally (at least I hope) is this, which happened about a week after flood #2:

Blurry

We saw him on the camera we set up after the first flood. We raced down there and eventually managed to coerce it out from its temporary home directly under the server rack before it could eat through anything. There have since been more squirrels find their way into the building but none have yet braved the server room. For fear of drowing, I suspect.

Although management doesn’t particularly care about incident management and response reports, I find the whole process fascinating. So I wrote up an incident report for the squirrel invasion and it was quite entertaining to type out to say the least. So many puns.