fyr.io

SPF And DMARC At Primary Schools

Posted on

A few months ago someone I know working at a primary school was having email troubles, and happened to mention this to me in passing. I took a look and noticed no SPF or DMARC records were present. Whilst unrelated to the issue, this was concerning. SPF and DMARC are pretty basic things to get in place which help prevent spam being sent from your domain, and they don't cost anything either. I passed on my thoughts and left it at that.

But it kept nagging me... if this primary school was pretty bad on the SPF/DMARC front, are any primary schools doing a decent job? Are most? None? I envisioned a malicious actor emailing from a nice friendly local school domain, convincing locals (including parents of course) to just click this link or donate for this trip... Or worse.

I decided to find out.

Skip to the results if you want the good stuff

Scanning all primary schools

The first task is, of course, knowing what to scan. What are the primary schools in the uk? Can I even determine their email domain?

Yes, because the uk government maintain a public list of educational institutions! Not only that, but they make it available as a .csv file which contains loads of data, primarily the school type (primary, secondary, etc,) a link to their website (if any) and whether they're currently open. Handy!

Now, I'm only doing this rough, this isn't a scientific study nor am I seeking perfection and complete data accuracy, it was a bodge of a few powershell scripts knocked together so make allowances for the quality :)

I downloaded the biggest .csv file from the gov.uk site and generated a much nicer .csv file containing the 16708 currently operating primary schools. Of these, 308 didn't have a website, so they're removed from the list.

I have my target list ready!

Though wait a moment, you eager beaver you. From experience, I know that a very large percentage of school website domains do not match up to their email domains. This is due to several reasons (renames, council-issued domains, Trust membership, etc) - suffice it to say, I can't rely on the website domain being the same one emails come from from that particular school. So I did the only thing I could think of - I wrote up another little script to grab the html contents of the school homepage and parse it for an email address.

Of the 16400 schools with a website listed, 3632 of them had some kind of error when I tried to grab their website data (website timeout, parsing problem, 500 error code, etc) or I couldn't find an email address on the homepage. I should note that if I didn't find an email address on the homepage, I also parsed the HTML for a link with "contact" in the URL, grabbing that pages' HTML and parsing it for email addresses, to try and catch sites that only publish an email address on the contact page (of which there are many).

My approach wasn't perfect; A bunch of these websites look like they have valid email addresses available when viewed in the browser but they hide the actual address behind javascript to deter spam (or deter annoying people like me). I couldn't be bothered to unpick or defeat these, so they're excluded from the results.

I originally just regex'd email addresses out of the html, however I was catching a load of obviously-unrelated email addresses (like in HTML comments containing license or contact info for a javascript library) as well as other random not-email-addresses, so I resorted to the easier but less complete method of looking for a 'mailto:' link. This has further excluded many results, as many sites I scanned don't wrap their email address in an anchor tag:

<a href="mailto:email@address.tld">email@address.tld</a>

That left me with 10344 (probably) email domains, which is pretty good. I should add that I only took the first 'mailto:' link, so in here are some pages where they publish an email address that isn't owned or used by the primary school, but perhaps to a third party. Unfortunately, a large percentage of school website domains do not match their email domain, so these shall remain in the data as I won't be going through unpicking them.

I wrote my own parser to scan these records, which worked sort of okay but not great. I quickly realised that someone else had already done the parser logic work for me though! So, using the DomainHealthChecker powershell module, I scanned all these 'mailto:' domains, 14 of which had errors of their own (pretty much "DNS record doesn't exist, yo!") leaving us with 10330 domains checked for SPF and DMARC.

Results

Here are the results - note that there is some overlap, some records matched multiple fail states. There were also some failures during scanning, not sure why. Anyway, the important thing is that the total won't match up to the scanned domains.

SPF

Generally, SPF was pretty decent! 67.4% are sufficiently strict (-all), with a further 26% being moderately okay (~all) but there's still work to be done. I haven't resolved or parsed individual mechanisms to ensure they're resolvable or formatted correctly (for example, misspelt domain names, incorrect IPs, no valid A or mx record on a domain, etc)

DMARC

Oh boy. Just over 31% of domains have something decent going on that accounts for both domains and subdomains. Only 9% of primary school domains reject on both the domain and subdomain front. Come on, this isn't good enough! From worst to best (imo):

Closing

So yeah as expected DMARC is pretty terrible, though SPF was better than I expected. There's clearly work to be done on this front. I wonder what that state of secondaries, colleges and universities is... hopefully better? It'd be pretty easy for me to find out now I have all these scripts sitting here...

I have the data if anyone wants it, just let me know. Though to be honest it's pretty easy to generate it yourself.

If you want to check SPF, DMARC, DKIM, and other related things on your (or anyones) domain yourself, check out mxtoolbox.com.

Wait hold up, why no DKIM?!

Determining the DKIM subdomain isn't really feasible without recieving an email from the domain in question, and I'm not about to spam all these domains hoping for a reply (well, a more sensible idea will be to email a junk mailbox and hope for an undeliverable reply, iirc that will have the required info in the message headers). You can probably guess a good 50-80% of them maybe, but again I'm not going to be the source of abuse, particularly when it comes to DNS. I've seen some random domain key subdomains when looking into this, and in the past. Things like ns427365412._domainkey.domain.tld so good luck brute forcing those.