Decryption Test


The researcher used a more complicated problem to test the ability of the genetic algorithm to search a large search space. The problem involved taking a document encrypted by a simple swap encryption and decrypting it.

Gene Type

The gene type for this problem was not complicated, but rather consisted of a 26 member array of characters representing a possible decryption string. The strings were not required to be an alphabet, however, and were allowed to have repeated letters. This was done because of the difficulty of maintaining the integrity of the strings during cross breeding had they been required to have 26 non-repeating characters. Crossbreeding for this problem was performed in the same way as in the integer test.

Determination of Fitness

The fitness of the string was determined by using the string to decrypt a 200 words from the encrypted document. The decrypted word was passed to a hash table of words and the closest match was found. The strings fitness was then increased by the number of letters that did not match between the word it sent to the hash table and the closest match found in the hash table. Therefore, those strings with low raw fitnesses were closer to the correct solution than those with high raw fitnesses. Since the generic genetic algorithm treats genes with higher fitneeses as better adapted, the raw fitnesses were normalized between 0 and 100 and their values were flipped(i.e. the gene with the largest raw fitness now had a fitness of 0 and the gene with the lowest now had a fitness of 100).

a 26 member array of characters representing a possible decryption string. The strings were not required to be an alphabet, however, and were allowed to have repeated letters. This was done because of the difficulty of maintaining the integrity of the strings during cross breeding had they been required to have 26 non-repeating characters. Crossbreeding for this problem was performed in the same way as in the integer test.

Genetic Algorithm Parameters

Cross breeding was performed on 60% of the population while straight breeding was performed on the remaining 40%. The population size was 5000 and the program ran for 50 generations. The genetic algorithm was given five letters to start with. This was to simulate the number of letters that could be easily deduced by finding words such as 'I' and 'the.' The fitness test was set to 0% tolerance. In other words if an exact match for a decrypted word was not found the word was counted as being entirely wrong. This test was run twice.

Result

In the first test, the algorithm found all letter but b, q, and x. In the second test, the algorithm found all letters except b, f, j, and x. Since both decryptions produced readable text from a search space of 26! possible combinations, the test was considered a success.