Creating a better wordlist for cracking Danish passwords

In my work I have a need to crack password hashes acquired during various types of security assessments for Danish companies. At lot of these password hashes can be cracked using existing dictionaries and leaked passwords but brute forcing the remaining ones is very time-consuming and not always fruitful. 

A lot of dictionaries, passwords leaks and well-described methodology exists for cracking passwords in an English language context but very little material is available targeting a Danish one. This blog post will (hopefully) be the first of a series of random ramblings that explores password cracking in a Danish context.

An increasing number of users create long, secure passphrases using multiple dictionary words with or without the use of special characters. Various blog posts describe cracking this type of passphrases utilizing Hashcat’s combination attack, rules and masks hashes for English language passphrases. Multiple free English dictionaries exists for these types of attacks but the Danish ones I have found have been very limited in size or not freely available. 

A couple of years back I had an idea to try to change this but the project ended up in the drawer of unfinished projects until now. Now it will serve as an excuse to blog for the first time in over a decade and force me to get better at sharing what I learn for both personal reference and the benefit fo others.

In a previous life I studied computational linguistics and was introduced to text corpora which is used to study the actual use of words and language based on actual human-written texts. These text corpora exist for most languages but most interestingly for our purposes, a number of Danish research projects have created and released their data sets under very permissive licenses.

I approached Det Danske Sprog- og Litteraturselskab (The Danish Language and Literature Society) in 2018 and got permission to create a new freely available Danish wordlist based on a number of Danish text corpora they helped fund. This was under the condition that the wordlist will be maintained with proper academic references to the original works.

In my next post I will describe the contents of the works I used and my first experiments with extracting dictionary words resulting in a wordlist with over one million unique Danish words. 

If you have any ideas for improvement or want to collaborate on creating Danish wordlists and methodology please get in touch on Twitter or email.