Cybersecurity and You

Another long gap between posts, which I’m realizing myself may just be the norm with occasional departures instead of deluding myself into thinking I’m going to post regularly. My untimeliness aside, this post covers another project that is very important to me: the Capstone Project of my Bachelor’s Degree.

Going into this project my biggest issue was coming up with an idea. I was confident in my ability to execute whatever I came up with, but I regularly run headfirst into creative blocks when it comes to having ideas. While struggling to come up with anything, I remembered an old project for my Computer Security class and decided to take it even further.

In my Computer Security class, as part of a unit about obfuscation, my professor provided us texts scrambled with various ciphers, including a Caesar and Permutation Cipher. The only information we had was that the unscrambled texts were all famous speeches. While this would have definitely been enough to decode given a long afternoon, I decided to take a different route and instead spent the next 48 hours designing a (frankly user antagonistic) tool capable of solving any¹ Permutation Cipher.

For those unfamiliar with a Permutation Cipher, it is the same as a Caesar Cipher except the order of the alphabet is not preserved. With a Caesar Cipher, there are 26 possible alphabets, something any modern device can permutate through near instantaneously. With a Permutation Cipher, there are 4.032914611*10²⁶ valid alphabets. If you had a circle that many meters wide, it would be half the size of the observable universe. Dealing with numbers that large means brute forcing through the alphabets is impossible. Instead, this problem must be approached with strategy.

The original inspiration for my approach to this problem was a game known as Picross (also called Nonograms). Picross is a visual puzzle game played on a grid, where the X and Y axes are labeled with a series of numbers. An example would be the label 3 1 4 which means there are three filled squares in a row, followed by 1 square, followed by four filled squares in a row. Those numbers do not reveal anything about the empty spaces between the filled spaces: they just dictate how many groups there are, how big the groups are, and the overall order of the groups. When you have constraints like those across the X & Y axis, a well designed puzzle only has one valid set of filled in squares.

I have played a fair bit of picross, and while there are plenty of logic shortcuts you can take to find valid filled in squares, there is one absolute strategy that will always allow you to solve a Picross puzzle. If you simply plot all valid positions of every row or column, there will be some boxes that are filled in every single guess. Those boxes are filled in the final solution. This cannot be 1-1 applied to the Permutation Cipher, but the idea of looking at the possibilities to reduce the amount of potential valid solutions can absolutely be transferred.

I designed a series of scripts that could be used in tandem to unscramble text resulting from a Permutation Cipher. The starting set of possible alphabets is technically 26²⁶, which is larger than the actual valid number of alphabets referenced earlier. When we start out on a Permutation Cipher problem, we don’t know the replacements for any letter. Any of the 26 letters of the alphabet could go to A, any of them could go to B, etc. The one thing we know is that you will not have multiple letters mapping to A. Unfortunately, this does not allow us to eliminate any possibilities yet. Until we start analyzing the text, we have to assume every single letter could map to every other letter as we risk eliminating a valid possibility otherwise.

We need to reduce the number of possible alphabets to be below the 26²⁶ starting place, and below 4.032914611*10²⁶ as those numbers are not within reasonable scale for a computer to calculate. We will reduce this to a manageable number of potential alphabets using the idea that some letters are restricted just based on language, just as Picross can be solved because boxes are limited just on logic.

While English is a large language that is still growing, for the purposes of every day language and this tool we can use a snapshot of the language from a very large dictionary as a “definitive list of all words in the English language.” Then we can look at all the words present in the text and figure out possible translations. For example, take the phrase: HIJD J TSSA. This is a good set of words to analyze for a couple reasons. One, there’s multiple unique words. The word J can reasonably be assumed to be one of I, A, O. This observation alone reduces the number of possibilities from 26²⁶ to 7.1032149*10³⁵. While that number is not small enough yet, we dropped an entire power of 10 just looking at one letter. A logical next step is to look at HIJD, as now we know it is a four letter word where the third letter is one of I, A, O. When examining all the four letter words for which that is true, we can then construct the possible solutions for the letters H, I, & D from those words. This addition again reduces the solution set by magnitudes. Finally, we look at the word TSSA, unique because it has a repeating letter in the middle. We can draw conclusions about potential solutions for every letter in the word based on all words that match that pattern.

Now we simply repeat that process over and over again until we lock a letter in. Once a letter only has one possibility, that has implications for every other letter than can no longer include your locked in letter. This again limits the size of the solution set, and if you continue doing this you will either arrive at only one possible alphabet, or a small enough number that a computer can brute force through the remaining options.

I have described this process without code snippets as the original remnants of this project from my Computer Security class are lost to me. Those original terrible scripts were more than enough for the assignment, and the basis for my Capstone. I decided to make a website with very basic Cybersecurity information and a few little tools, including a revamped version of my Permutation Cipher cracking tools. The finished site can be seen here (be warned that though this site looks better than previous projects in terms of UI, it does still leave some to be desired as my struggles with web design still plagued me at that time and to this day). The source code for the site is available here.

We’ve mostly covered the Permutation Cipher related functionality of the site. It is essentially everything from that Computer Security project with a workable UI slapped on top of it. There are a few other bit of functionality I built in that I would like to spend some time discussing beyond that. Using similar UI elements from the Permutation Cipher section, there is a Caesar Cipher section. A Caesar Cipher tool requires no analysis with its paltry 26 possible alphabets that are easily brute forced. I’ve added functionality that creates an accuracy rating for every answer based on the percentage of words that are in either one of my default or a user provided dictionary, and then sorted the possible alphabets based on accuracy. I also created a password complexity tool to help explain the effect that variety and length have on passwords, as well as a brute forcing tool to show how insecure/secure a sample password can be.

As far as tech stacks go, this was a big step up for me from previous projects as it was made using Svelte, a UI framework, with a UI template that I added. This did make a significant difference in the look of the project, as anything from any template far surpasses my visual design abilities. All of the scripts are done in TypeScript, which is as always my preferred web language.

Frankly, there’s not much else to be said for this project. Most of my love for it and appreciation from it comes entirely from the Permutation Cipher cracker, as it was the beginning for everything and I had a really good time coming up with it. Everything else on the site was just an excuse to call the whole thing big enough for my Capstone so I could remake the Permutation Cipher tools. I would give this project an 8/10, as it showed me that I could definitely step up my UI game just by using existing frameworks and allowed me to spend time on a problem and solution that were very fun for me.

This is only reasonably true so long as there is text of sufficient length to work with. The processing time speeds up drastically with every additional word available for analysis. Word complexity and variety is also beneficial; when the words are of varying lengths and complexity there is more to analyze thus it is easier to reduce the solution space. ↩︎