Category: Technology

Anything tech-based – code and scripts, hardware and software, work related or not. If it’s got electricity in it, it’ll probably be technical.

Tales From Tech Support 01: Percussive Maintenance

If you’ve worked in IT for more than a year you probably have some crazy tales to tell. I certainly do – with nearly 15 years in the field I have seen insanity and hilarity – more than I could ever remember. So I thought to myself, why not write it all down somewhere?


This first tale comes from back in my early days.

As a PFY, I was trusted with little more than basic desktop repairs and printer toner replacements – a fairly common slice of life for many IT bods. I was relatively fresh faced, a few scars but nothing major. Eager to learn and eager to please, I was often the first to raise my hand and take any challenging job, despite the vast gaps in my knowledge. Our team was small – three techs (two general helpdesk end user support PFYs, one mobile device repairs) with one very isolationist “Network Guy” (who would now be called a SysAdmin) and one “Database guy” (although their only qualification was knowing what “SQL” stood for.)

It came as quite a surprise when, early on a sunny Monday, we were told that the Network Guy was leaving. And he wasn’t being replaced.

It quickly became apparant that the entirety of his duties were to fall on myself and PFY#2, who had been working at this place for a bit less than me. Networks Guy’s last day rolled around without much issue, nor much communication from him – in fact at one point I asked for his help with a network issue and he told me “I don’t care, I’m leaving!” So it came as a bit of a surprise (and equal amounts of relief) when he called me and PFY#2 up to his office to give us his handover document.

His handover document consisted of a single sheet of A4 with handwritten notes about a few things barely qualifying as useful. A few IPs and other miscellaneous details about servers and switches, the odd issue he knew about but hadn’t fixed, and maybe a password or two.

We received the document at the door to his office (we weren’t tolerable enough to go inside on this occasion?) and quickly shoo’d off to our regular helpdesk support duties.

That evening, he left and we never saw him again.

The following weeks and months were absolutely insane. I can’t recall much about what happened during this time. Myself and PFY#2 managed to keep on top of most of the helpdesk support calls and make a start on untangling the network. We quickly found that most switches were essentially unboxed and plugged in without any config changes, servers were unpatched and had uptime into the hundreds of days (you could tell when the last powercut had been by looking at the uptime) and we were getting very close to the limit on resources – maxed CPU and/or RAM, HDDs filling up. Group policy was a mess, roaming profiles reaching into the tens of gigabytes with nothing preventing their growth… it was a mess.

To top it off, out the back of Network Guys office was another small closet containing almost all the servers we had. Neither of us had ever been back there, and when we did we found dust, cobwebs and equipment that seemed to be switched on but we had absolutely no idea what it did. Helpfully the single sheet of A4 told us some server names and serial numbers.

We were fueled by Red Bull and the long (long) days began to blur into one massive learning experience. To this day I have never learned so much so quickly as I did back then as we fought to keep the place running, the users happy(ish) and continue to learn as much as we could.

There was one event, though, that utterly stumped us.

Cheerily and awesomely smashing the helpdesk as we were, we suddenly had calls coming in about email being offline. After a quick check we realised that, yep, email was down. We couldn’t RDP into the Exchange (2003) server either – something was wrong, clearly. Off to the dusty old Network Guys office we go.

We walk in, grab the single 4:3 CRT monitor in the room, stretch the cables across the room from one of the power extension cables with a spare socket to one of the nearby tables, and plug the crusty old VGA cable into the back of the absolute beast of an exchange server. I mean, this thing was huge. An old tower, black, solid steel everything. No idea why it was up in this first floor room when all the other servers were on the ground floor, but… whatever.

Flicking the monitor on we quickly saw everyones favourite screen. Yep, it’s blue, and it signifies death. The BSOD.

We panicked a little, probably downed another Red Bull each, then got to work trying to bring this thing back online.

Remember: we were literally flying by the seat of our pants here, and had been four months at this point. We had no idea what we were doing.

First things first – switch it off and back on again. We held the power button down, heard a massive CLUNK, the fans span down and the screen went black. Take a breath. Switch it on again, wait for the BIOS, wait for Windows to start booting, wait some more…. BSOD.

Crap.

We try again, switch it off and back on. This time, we hear a horrible grinding noise as the machine spins up. We get to Windows trying to boot and everything freezes – not even a BSOD. Off it goes once more, switch it on, grinding noise, BIOS doesn’t even finish loading.

Double crap.

Backups! Backups? There are backups, right? One of our jobs is to take the tapes out of the drive and swap them with the next numbered batch in the fire safe, surely we could restore this? But… Network Guy did the backups. Network Guy didn’t elect to write anything down about the backups! We don’t know anything about them! An oversight of the highest order!

We know the boss has the phone number of Network Guy, but we need to try fixing this ourselves first. We don’t want him shouting down the phone at us like the one and only time we called him for help before this…

More Red Bull, more diagnosing. On the odd occasion we can get Windows to try booting, and sometimes we can get to the login screen using Safe Mode, but no matter how quick we are, we can never get logged in, and even this doesn’t last forever. Eventually the server just stops trying to load Windows and we’re presented with some error about not finding a HDD. The grinding at this point is still going on and we’re faced with our fear that the grinding isn’t a fan, but the single HDD that all of our email is stored on.

There’s nothing else for it. We’ve gotta call up Network Guy and ask his advice. Neither of us want to do this though – he wasn’t helpful to us when he worked here, and especially not when he was leaving.

Is there nothing we can do?

I remember the moment – we were stood either side of this huge hulking great server with no more options (that we are aware of at any rate). Our eyes meet, and without saying a word we both think the same thing at the same time.

PFY#2: “Shall we?”

Me: “I dunno… I mean it’s not working so maybe?”

PFY#2: “I think we should.”

Me: “Okay. Let’s do it.”

PFY#2: “Go on then, you can try first.”

Me: “No way, you do it.”

We look down at this ancient black server, lined with solid steel frame, sides and front, touched with the occasional bit of cheap plastic and the odd faded sticker.

I sigh.

PFY#2 raises his foot, and kicks the bastard right in the side, smack bang in the middle.

The grinding noise changes in pitch audibly. It’s still there, still buzzing away in that audio range that you think you can put up with but slowly sends you insane without you realising it. PFY#2 reaches down and holds the power button in to switch it off. He switches it back on.

BIOS loads up, boots fully, screen goes black.

Windows loads. And loads. We stare. And it continues to load. The login window appears after what must have been at least 25 minutes. We’re in shock. No BSOD. No lock up. Buzzing? Yeah that’s still there, but the server has booted. What the f-?

We rush to a nearby office, get the first user to open up outlook. It connects. Emails from the upstream start flooding in to their mailbox.

We check our own, same thing – it’s working.

We’re buzzing. Our blood is filled with adrenaline, caffiene, sugar, and whatever the hell else they put in Red Bull, but it also filled with joy. We fixed a server by kicking it.

After spreading the word that email is back, we get right back to our helpdesk calls, of which dozens have appeared since our exchange issue surfaced.

Not long after this we do eventually employ a sysadmin who I work with to this day, but this exchange server didn’t get replaced immediately. I can’t remember exactly when it was retired, but it whirred on for a good year or more after this. We tried very hard to not touch it – it wasn’t perfect, and we definitely had at least one more very confusing issue with it, which I’m sure I’ll write about at some point, but the beast chugged on and kept our email flowing until it was eventually replaced by a younger sexier model.

Some say that if you can find your way into Network Guys old (and long repurposed) office, and if then you can manage to make your way into the back room, you can, on quiet days when email traffic is up, still hear the buzzing of that once-failed-but-then-recovered hard drive.

This is how I came to learn about and respect percussive maintenance.

Ditching GLPI

The recent bad experience with GLPI we had at work was the final nail in the coffin and, after patching the issue, I quickly began looking for alternative ticket management systems. We have wanted a new helpdesk for a long time – the support we provide has evolved over the years since GLPI was first introduced and with the covid-19 pandemic this support has seen yet another shift in the way in which we work. Not only are we doing things differently now, some unrecognised or unrealised issues have surfaced which we all wish to resolve or automate away.

We struggled through the worst of the lockdown but quickly identified some limitations with our existing way of working, namely that whilst our helpdesk did fine enough when it came to tracking tickets, it should do more for us. We spent a lot of time on the management of tickets and attempting to contact people just to get them to perform simple tasks. Why do we have to fight our helpdesk to achieve a goal, and why can’t we have a system that would let us run these simple tasks (read: scripts) ourselves in the background, invisibly to the end user?

I did some reading and some thinking post-GLPI-exploit and realised that we are essentially an MSP. We provide support for departments within a school, each of which has different objectives, priorities, demands and tools. Plus, we support several primary schools on top of that, and they are their own beasts entirely with unique networks, hardware and software, let alone processes and requirements.

As we emerge from total lockdown to a lesser version, we are also going to need to do our normal jobs but much more efficiently. With a “remote first” approach to minimise risk to our end users (both staff and students, but also guardians and members of the public) I quickly decided to look into RMM tools instead of basic ticketing systems.

Given that we also had issues in other areas (namely monitoring and alerting, in that what we have is barebones and actually broke a month ago) I was looking out for a solution that would kill as many birds with as few stones as possible.

The requirements were that it had:

  • A ticketing system
  • Monitoring and alerting
  • Patching, installing and scripting capabilities
  • Easy remote support
  • A good price (hey, we’re a school and don’t pass the cost on to the end user, we don’t have a lot of money!)

I found a list of RMM tools and their features on the /r/MSP subreddit and went through each one, checking videos, documentation and feature lists working out which would obviously not work, which might work, and which would absolutely work. I narrowed down the options to four potentials. In the end, only two of them had a “per technician” pricing model – the “per monitored device/agent” pricing model would end up costing us tens of thousands – so it came down to a war between the two. These were:

To be honest they were a close match. I preferred the look and feel of Atera although SyncroMSP was more feature rich. We didn’t really need all the features SyncroMSP boasted though, and Atera was cheaper (we’re on the cheapest plan) so ended up going with that.

Although we haven’t rolled out the agent to every device yet (we’re still mostly closed and until we have the agent installed have no way of deploying software out to machines not on our network) we have started using it heavily. So far, zero problems and we are all liking it. It has already enabled us to preempt some problems that would become tickets, solving them before they ever affect an end user or get reported. I’m looking forward to diving into it more, I am especially excited about the recently announced Chocolatey support, which seems to work wonderfully.

It’s early days yet, hopefully we can become much more efficient and provide a better service. Only time will tell!

Haveibeenpwned.com pwned our helpdesk! GLPI 9.4.5 SQL Injection

TL;DR:

I should say before we get started, the fault for this lies entirely with GLPI, I place no blame at the feet of haveibeenpwned.com or Troy Hunt for this issue. It’s all good fun! Concerning? Oh, for sure. You can’t help but laugh though. Obligatory XKCD.

On GLPI 9.4.5, creating a call (via the standard interface or email, etc) that contains the basic SQL injection string ';-- " will be logged normally with no abnormal behaviour, however if a Technician assigns themselves to that call via the quick “Assign to me” button, the SQL query will be executed. In the case of the example string given above, all existing calls, open or closed, will be updated to have their descriptions deleted and replaced with any text that appears before the aforementioned malicious string. You can of course modify this to perform other SQL queries.

This is fixed on 9.4.6, however at the time of writing the GLPI Download page still links 9.4.5 as the latest update available, you need to go to github releases page to see 9.4.6. [2020-06-03 update:] now available directly from the GLPI download page.

9.4.6 was released before I found this exploit, however the GLPI website still showed 9.4.5 as the latest version. As far as we were concerned, we were on the latest version. Credit goes to whoever submitted it first, however at the time I had no knowledge of this already being known and resolved. Here’s a video showcasing the issue:

Low quality to minimise filesize

The Long Version – how I found it, or how haveibeenpwned pwned our helpdesk

GLPI

We use GLPI as our technical support ticketing system at work. There are better solutions out there and we’re investigating others, however GLPI has served us well since 2009ish. It is a web based, self hosted PHP/MySQL application.

haveibeenpwned.com

A website that catalogues and monitors data dumps for email addresses. It collects leaked or stolen databases, analyses them, pulls out any email addresses and can be searched by anyone for free to see if your email address has been included in a breach. You can also subscribe to recieve alerts if your email address, or an email address on a domain that you own, is included in any future breaches.

haveibeenpwned.com pwned GLPI

Around late April we upgraded from an ancient version to the latest of GLPI – 9.4.5. All was well until we received an email from haveibeenpwned to our helpdesk support address, which automatically got logged as a support ticket. This email alerted us to some compromised accounts on our domain which were included in the latest Wishbone data dump.

Spoiler: that header isn’t an image, it’s text!

I rushed to get the HIBP report generated to see who’s data on our domain had been compromised by clicking a link in this email-turned-support-ticket. We got the report in a second email, which created a second ticket. I grabbed the data, deleted the second ticket (as we still had the original open) and perused the data. After doing the necessary work alerting any users to the breach of their data I went back to the original HIBP ticket, and realising I hadn’t assigned it to myself did so and promptly solved it. All is well, time to move on?

Not quite. I and the other techs quickly noticed that every single ticket description had been deleted and replaced with partial header data from the HIBP email.

This immediately stunk of some kind of SQL Injection flaw and my mind raced as to what the cause was. I had a suspicion I knew… Unfortunately we were in the middle of business hours and due to Covid-19 are fully remote – we need a working helpdesk, and I don’t have the priviledge of working on potential security issues in the day job. We restored from a backup taken on the previous evening (not too much data was lost, thankfully) and carried on with our day supporting our users.

Understanding the flaw

As soon as work ended, I grabbed an ubuntu .iso and built me a webserver VM. I had a feeling I knew what the cause of this SQLi was (check the header of the email shown above, you don’t need long to figure out where the ‘malicious’ code is!) but wasn’t sure how it got executed – the email was parsed correctly and tickets weren’t affected when the email came in, it wasn’t until around the time I deleted the second ticket and closed the first call that problems arose.

After building the VM with PHP and MySQL, I hopped onto the GLPI website and grabbed the latest version from their site, which is shown as 9.4.5.

The “Download” button took you to the 9.4.5 archive

After setting it up and adding some test calls, I forwarded our original HIBP email to a temporary account I linked this test GLPI install to. Once the email was pulled in I went through the same steps as I had done earlier in the day:

  1. Generate the report (which I didn’t do again via the link in the email, I just forwarded the original email to my test email account creating a second ticket in my install of GLPI)
  2. Delete the second ticket
  3. Assign the first ticket to myself
  4. Close it

I checked my test tickets I loaded in there beforehand and lo and behold, they had all been wiped and replaced with the same content from the HIBP email!

I restored the VM to an earlier snapshot and went through the process again, pausing to check the other tickets at each step. I quickly discovered that the issue only occurs when you assign yourself to the ticket using the handy “Assciate myself” button.

Adding yourself as a watcher also triggers the query

Making it malicious

The email data already wipes the content of all tickets, but as it stands it leaves a lot of junk data behind. I wanted to minimise the data required to exploit the flaw yet retain the same behaviour.

Another restore of GLPI followed, with more tests trying to determine the minimum amount of data needed to execute the flaw. I spent some time cutting down the email from HIBP and quickly found that the opening lines of the HIBP email were indeed the culprit – I managed to shrink the exploit down to six characters (';-- " – the space and double-quote at the end appear to be required though this could do with more testing) to achieve the same kind of malicious behaviour, in this case deleting all content of the descriptions for every ticket in the database. If you log the malicious call with this string as the title (or leave the title field blank – GLPI will then automatically add the contents of the description to the title, in this case the malicious string we have identified gets added as the title) the title on all other calls gets wiped too, however if you do include a non-malicious title in the malicious ticket the original titles on the other calls do not get modified.

Success! This is a pretty severe issue, and although it does require some user interaction you can easily hide this exploit in an innocent looking support call. GLPI supports HTML emails, which get rendered (almost) normally within the interface. Simply hiding the text in an attribute or the <head> or something will keep it invisible to the tech. You’ve just gotta wait for them to assign it to themselves.

In the end, this isn’t a zero-click flaw but it is easily hidden. If you hide the exploit and it doesn’t work out the first time (a tech doesn’t assign it to themselves) you can easily try again with another ticket until it works. Odds are the techs aren’t going to read through the raw HTML of each ticket looking for problems.

Reporting it – late to the party

I hopped over to GLPI’s github page to check for an existing issue and log my own if one didn’t exist when what do I see but 9.4.6! I check the changelog and find this:

Looks like this may have already been fixed!

Well, darn. I downloaded and installed the update and can confirm the issue no longer exists. Congrats to whoever spotted it first! Edit 2020-06-04: Twitter user @thetaphi took a look at this and found that it was spotted and/or accidentally fixed by a developer whilst fixing a separate issue.

As it is already solved I don’t really want to dig through the code and find the offending line or develop the exploit further. Edit 2020-06-04: some people have taken a look after @troyhunt tweeted about this issue. It is interesting (concerning?) that something this simple got through to release, especially when you consider the way to initiate the exploit is by assigning yourself the call. Why does the call description get parsed at all here?

Either way – if you’re running GLPI, make sure you’re on the latest release. Or look for alternative software.

G Suite Migration for Outlook Error 0x8004106b

TL;DR: You’re logging in using a non-primary alias for the account. Use the primary alias and your email will migrate smoothly.

We’re migrating around 100 email accounts that were on Exchange over to Google. This has involved changing some peoples alias so that they match up with everyone elses (<initials>.<id>@<domain>) however we’ve added their old address as an alias so they can still recieve emails sent to their old address (eg: <fullname>@<domain>).

Due to some political reasons we’re unable to touch the exchange server, so we’re using the G Suite Migration for Microsoft Outlook tool. It’s been fine up until today where we started seeing the following:

Notice that there are the same number of Processed and Failure items

What was odd was that this was only happening for one of the techs. He must have been doing something wrong! We monitored their steps and it all looked fine though. Taking a look into the trace file we could see the error appearing as Failed with 0x8004106b. Googling this didn’t result in any usable results, but as we were looking into it another tech appeared and just happen to glance at the screen as the original tech was attempting to migrate a user again. Our spreadsheet contained some info including the primary email address and the alias for us to check and the tech noticed that the guy with the sync issues on his users was using the non-primary alias to log in. After trying the primary alias, the emails migrated successfully.

Interestingly, the Contacts and Calendar data synced correctly.

Switched ISP

I’ve changed ISP. We used to be with Zen who were, and remained, awesome. We have recently however changed to Andrews & Arnold.

Due to our location we’re unable to get FTTC, we’re stuck with ADSL. The main reason for the ISP switch initially was A&A could provide us the same service for nearly £10 cheaper. I’m a fan of the smaller organisations and use them over behemoths where possible. Unfortunately the bigger ISPs can go even cheaper than what we pay now but I’m happy to pay more to a smaller company.

Before switching I called up A&A with a few questions, and they answered everything – they know what they’re talking about. I work in IT (though not at an ISP or in networking) and could tell they knew their stuff, and I was only talking to their customer service team. They also sent me a new router (as I signed up for 12 months) which is nice – I had been running on a Draytek 2925n which, whilst generally fine, is a bit old now. No 5Ghz wifi was hampering us somewhat. The new router seems to be good, though I’ve not poked around too much in it just yet. We still don’t get a wireless signal in all corners of the house but I have plans to fix that… eventually.

One cool thing that A&A do is provide access to some quite detailed line status and configuration settings via their Control Pages. I like the openness and control they provide to customers. The static IPv4 address and IPv6 support should be an option from all ISPs by now but frustratingly isn’t. A&A have this as standard at no extra cost.

An upside to switching to A&A is that our average download speed has increased slightly. We used to see 2.5Mbps download with Zen but this has increased to about 2.9Mbps with A&A. We’re still occasionally buffering when streaming (though Netflix is often fine, we’ve had issues with some other online streaming services) though the frequency at which this occurs has reduced a bit.

Overall, happy so far. The line has gone down a few times, which we didn’t really experience with Zen (or notice? I can see when it goes down on their control pages, plus I get an email about it if it’s down for a little while, which is awesome)

I would recommend anyone looking for an ISP to avoid the larger companies and look at either Zen or A&A. You can’t go wrong with these.

Hacktoberfest

I finally have a Github.com account.

I’ve been learning Git at work via an internal Gitlab installation for my Powershell/PHP scripts and it’s going quite well!

I have been told about Hacktoberfest – A few people I know have signed up and I thought “hey, why not give it a go?” It’s a perfect excuse to start contributing to projects I make use of, though I’m no coder I can probably do some documentation work, and maybe some basic Powershell stuff too if I’m feeling brave.

Eventually I’d like to put some personal projects on there, give back to the world a little bit, but for now I’ll just be logging issues or fixing small easy problems in other peoples repos 😀

You never know, I might get a T-shirt!

SIMS(.Net) is slow

I should preface this technical post by saying that I am in no way a database dude. I understand what they are and how they work at a basic level and sure, I’ve written some basic SQL queries in PHP or queried some stuff in MSSQL but as a typical “jack-of-all-trades” type I am no expert by any stretch of the imagination.

This is also relevant for any database, not just SIMS.Net.

What is SIMS?

Slow.

For those of you who don’t know, SIMS(.Net) is a Capita application used by a whole bunch of schools to record data about their staff, students and their parents. It’s a school MIS.

It’s quite renown as being very slow. There are many posts on the edugeek.net forum complaining about this, and there are many suggested ways of fixing it. Okay, that’s not entirely true. Lots of these fixes may grant you a slight boost, but the backend of SIMS was written a very long time ago (decades) and at the end of the day it’s just a slow bit of software. That said, there are some things that can be done to help, and this post outlines one of them.

The Database File

SIMS is a Windows client-server application. The server side runs from a single database, though there are some additional features which have their own databases.

Typically, the schools IT department will not install the SIMS server. It’s quite finicky and likes things set up in a very particular way, and is often handled by a third party organisation.

The client application uses a shared network drive for some data storage. Typically, at least in every school I have seen, this is set up on the same drive on the server as the database is stored in. This is generally bad practise when it comes to databases – you should put your database file on a disk/drive that has nothing else on it. No OS, no file storage, no nuthin’!

There are a couple of reasons for this. The most obvious is that if a file on that same disk is being written to or read from, the database isn’t being used. Your application must wait its turn before the database can be queried. Schools don’t really care about this as the slowdown is not noticeable by end users given the volume of non database data being read or written, but on systems where every mili- or microsecond counts this can be a big bottleneck depending on the frequency of use the other files on the disk see.

The second reason is related to the first, and it has to do with how the database file itself is initially set up. You can define the size of a database when you create it. You should really set it to be bigger than the expected end size of the database when it’s time to migrate it to a new system or replace it. Typically, if you know that in 3 years you’re going to migrate away from the system and it will have approx 10GB of data in the database, you will probably want to add 20-50% extra on top of this when you first set up the database to allow the data to grow into the file. So you go make a 15GB database and can sit happy knowing that you’ll likely never touch the sides.

You can set the database up to dynamically grow once you’ve filled it up, too. This is called “Autogrowth”. You start with a, say, 20MB database, but this quickly fills up. MSSQL is configured to allow this to grow, so increases the file size of the database in chunks. These chunks can be consistently sized (add 1MB each time) or a percentage of the current size (eg: grow by 10% current filesize. 100MB database grows into a 110MB database, which then grows into a 122MB database, and so on)

The default growth size for a database file to grow by is 1MB. This means that if you have a heavily used database with loads of data going in often, MSSQL will need to constantly grow the database by 1MB each time. This is obviously going to add some overhead to operations, however there’s another side effect to this when you have other files being used on the same drive as the database.

If you have a 50MB database, then write some files to the drive, then add more data to the database such that the database needs to increase in size, youre going to end up with 50MB of database data on the disk in a chunk, a bunch of files next to it, then next to that another chunk of database file data. You see a single 51MB file, but at a disk level there’s one 50MB chunk, random file data, then another 1MB chunk. This is called fragmentation, and it means that for the hard drive to pull data out of the database it needs to move a little needle (if you’re using old spinning disks, like us. What about SSDs? Glad you asked.) a greater distance and more often to get at the data you want. This slows things down.

What the Frag!?

Where I work, I recently found that the database had been set up with unlimited growth in 1MB increments. We went from a near-zero sized database at initialisation to what is currently a 26GB file. Each time we added data that pushed the database file to full, MSSQL would only ever add 1MB to the size of the file.

This, combined with the use of a network drive which has been configured on the same disk, has given us a horribly fragmented database!

There’s a SysInternals tool called contig.exe which allows you to query an individual file or filesystem and show you how fragmented it is.

Following this guide I queried the database to see how fragmented it was. Keep in mind, here, that the author of that guide made a 3.5GB test database and purposefully fragemented it 2000 times to show a “heavily fragmented” example.

Here are the results from my test.

PS C:\temp\contig> .\Contig.exe -a "E:\MS SQL SERVER 2012\MSSQL11.SIMS2012\MSSQL\DATA\SIMS.mdf"

Contig v1.8 - Contig
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals

E:\MS SQL SERVER 2012\MSSQL11.SIMS2012\MSSQL\DATA\SIMS.mdf is in 102951 fragments

Summary:
     Number of files processed:      1
     Number unsuccessfully procesed: 0
     Average fragmentation       : 102951 frags/file

Yeah. 102951 fragments for a single file. That’s insane. Every time the database is queried it likely needs to navigate around the disks dozens of times to get all the relevant data, slowing things down considerably.

Fixing Fragmentation

We can use the contig.exe tool to fix this. It requires that the database is offline so I’ve had to wait until this weekend to do this.

I took the database offline (via the MSSQL Server Management Studio GUI) and attempted to defrag the database file.

PS E:\MS SQL SERVER 2012\MSSQL11.SIMS2012\MSSQL\DATA> C:\temp\contig\Contig.exe .\SIMS.mdf

Contig v1.8 - Contig
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals


Summary:
     Number of files processed:      1
     Number of files defragmented:   1
     Number unsuccessfully procesed: 0
     Average fragmentation before: 102951 frags/file
     Average fragmentation after : 3 frags/file

After waiting anxiously for 40 minutes the results of the operation came through. We went from a database split into 102951 fragments to a database split into just 3.

I revived the database by bringing it back online, verified it was working, then carried on with my weekend. All in all, it only took me about an hour and a half on a Saturday to sort this out.

Results

Before I did the work over the weekend I took some rudimentary speed tests of the SIMS application to determine if the change actually had any effect.

I perfomed two tests three times. They wouldn’t stand up in a court of law but they’re good enough for my purposes.

I timed how long it would take to log in – hitting ‘Enter’ on the username/password window to getting the logged in GUI loaded – and I also ran a report on our on-role students containing their forename, surname, year & registration group code. A simple report.

Both of these tests were performed at approx 11am on a weekday. Nothing else was going on at the time (no large reports) but SIMS was actively being used by a few dozen staff. The results are in seconds.

Before defrag – Friday

Logging In

Report

  • 16.7S
  • 7.1
  • 9.9
  • 18.6
  • 15.7
  • 18.7

After defrag – Monday

Logging in

  • 10.6
  • 5.7
  • 5.8

Report

  • 14.1
  • 11.9
  • 14.3

As you can see, after the defrag has likely (assuming no unknown other reason for the slow results initially) shaved off a number of seconds from each run. The first login of the day is always the slowest as some stuff gets cached on your machine, but even that was sped up (by over 6 seconds!)

Overall, whilst not conclusive nor scientifically sound, I am happy with this approximate 30% speed boost and have begun looking into our other databases to see if they’re suffering from the same fragmentation.

I would advise you take a look at your SIMS database, too. Hell, any database you have. If you’re a database guru you likely know about this already and laugh at my realisation but it was genuinely surprising news to me, though obvious in hindsight. I know about file fragmentation, I just didn’t think about it in the context of automatic database growth.

Being able to say “I’ve increased the speed of one of our most critical applications by between 24-40%” sure feels good. Though it should have never been set up this way in the first place…

Behind the Times? Git Gud

I’ve recently decided to learn Git.

Yes yes I know, I’m over a decade late to the party. I haven’t taken a look at source control since I first played with SVN many (many) moons ago. I haven’t bothered for a few reasons. Mainly, I’ve not had a use for it. Though I have written some scripts for work and whatnot, I’ve not needed the collaborative advantages of using the tool, and neither have I really needed the version control side either.

Don’t get me wrong, it probably would have been useful, however I’ve not missed it or wanted the features it boasts until recently.

However, times are a-changin’. Some of the techs at work have started using my scripts over the last year and they’re beginning to identify issues or quirks which I would ignore or didn’t encounter. I wrote these scripts, so I know how to use them almost instinctively. These issues just don’t show their head for me, or when they do it’s sort-of by design and I don’t hesitate to work around it.

Since I’m now making small changes sporadically, and looking ahead I’m beginning to automate even more things now that I have a slither of free time occasionally, I’ve decided to jump head first into Git.

I’ve built an Ubuntu 18.04 VM at work and installed GitLab onto it. (Slight tangent, but their installation guides are very good.)

The Continuous Integration and Continuous Delivery stuff fascinates me, but I’ve disabled them from running on each project by default as I need to focus on learning Git first. I’m eager to learn more about this, though.

I’m going to make you cringe, but I’ve opted to use a GUI front end for my machine instead of relying on the CLI. This is because it seems to be a bit of a pain in the ass to run CLI git on a Windows machine (we’re a Windows network) and, although I will learn the commands eventually, I want to focus on the best practises of using Git rather than mess about with the command line syntax. The syntax is very simple, but I don’t trust my brain to remember it right now.

I’ve chosen to use the GitHub GUI for now. It works pretty well. I’ve moved six of my currently active scripts (all powershell) onto GitLab and have pushed commits to the projects.

I’ve also created a project for our network switch configs. I don’t know if this is something GitLab can do or if some other kind of automatic deployment technology is needed, but it’d be cool to make a change to the repository and for that change to be automagically applied to the relevant hardware. I can think of ways to script that myself, but is there a purpose built tool out there?

I’ve got lots of questions to ask and lots of avenues to explore. For now, though, I’m keeping it simple with version control & branching.

I’m considering eventually creating a public GitHub repo to put code out into the world. It would take some work to de-work-ify the existing scripts and remove any workplace data, but I could also eventually upload these scripts, too.

Boiler High Heat

We had a bit of a warm spell recently. One side effect for us was that some of our sites boiler control systems started to panic in an unexpected way.

Each boiler is fitted with a fire alarm which links into the site-wide system. Unfortunately it got so hot in a couple of the storage-heater boiler rooms that the temperature sensor started flipping out thinking there were flames causing the heat. Luckily, we caught the alarm within the 3 minute silent alarm before emergency services were automatically contacted and the site-wide alarm sounded, prompting an evacuation.

Overriding a firealarm long term is not a good idea, so we had to find a way to reduce the temperature in these small, cramped, dusty rooms.

Our solution was simple, but I liked it a lot – turn off the heating element, then run around the building turning on all the hot water taps.

This refilled the hot water storage tanks with cold water. The taps being on carried all the heat away from the room and within minutes the room had cooled significantly. It is a bit of a waste of water, which is a shame, and there was no hot water in the building for the afternoon, but that beats evacuating out from our cool offices into the high-thirty-C outdoors.

The simple fixes are the best.

Witholding number on external calls on Avaya IP Office R5 Manager

It took some googling to figure out, but eventually I came across the answer to this problem across a combination of a couple of old forum posts. I’m writing it here on the off chance someone else needs this. Or if I need it again.

  • Log into the IP Office R5 Manager
  • Navigate to the Short Code tree
  • Find the “?” entry – the feature name is “Dial”
  • Change the “Telephone Number” field to one of the following depending on what you want
    • “.” – this will withold the number you’re dialing from
    • “.S<number>” – this will show <number> when you dial

<number> should be your number. If I set it to a number I don’t own, it just shows the one I’m calling from.


Sometimes a business has some system or piece of infrastructure that just works. This is a good thing! Unfortunately, some businesses don’t like to invest in technology unless that technology breaks.

In related news, I recently had to figure out how to toggle a setting on some old software, Avaya IP Office R5. As someone who is totally not a phone/VOIP person this was a challenge. Add on to that the fact that the software isn’t very well designed and you have me fumbling about trying to understand terminology and remember where obscure settings are just to do the basics.

This software was managing a 15 year old phone system, and since its inception had been showing as a withheld/private number when calling out to external phones. Not sure why this was ever set, but it’s off now.