THE DESIGN AND DEVELOPMENT OF A GENERIC GENETIC ALGORITHM IN C++

By Michael Smit



me at the ISEF

The basic premise:


A genetic algorithm is a method of solving problems that is loosely based on the Darwinian model of evolution. Basically stated, this model says that in any given population each individual in the population has a �fitness� based on how well adapted they are to the environment in which they live. This fitness is directly related to how many opportunities they have to reproduce, thus causing a predominance of individuals with higher fitnesses after many generations. In the genetic algorithm there are two types of reproduction, straight breeding and cross breeding. In straight breeding, the gene string of a selected individual is simply copied into the next generation. In other words, a-sexual reproduction. In cross-breeding, however, the gene strings of two individuals are split and then recombined to produce two new individuals. This would be vaguely similar to sexual reproduction.

How this can be applied to every day problems:


Any problem that can be represented in such a way that
  1. It can be assigned a scalar fitness directly related to how close it is to the optimal solution for the search space and

  2. Its �gene string� is in a form that can be crossed with another string of the same type while maintaining the integrity of both strings produced can be �evolved� using a genetic algorithm in this way:

    1. A population of random �gene strings� is created.


    2. Each string�s measure of fitness is determined.


    3. Part of the population is selected probabilistically based on fitness and copied into the new generation.


    4. Part of the population is selected probabilistically based on fitness and then is cross-bred.


    5. The new generation is returned in place of the old one.


    6. steps 2-5 are repeated until a �gene string� that meets the termination criterion is found.



The purpose of this project:


The genetic algorithm is a powerful tool for solving problems. Unfortunately, it is not often used because the effort involved in programming a genetic algorithm for any one problem, defining the problem, and defining the �gene string� type is significant, causing most programmers to discard the genetic algorithm as an option.
The purpose of this project is to design and develop a generic genetic algorithm that can evolve any user defined types capable of being cross bred and straight bred and that comes with number of pre-programmed generic gene types. This would prevent the programmer from having to research, plan, program, and debug the genetic algorithm itself and in many cases even from having to program the gene string and thus half or more of the time usually required to use the genetic algorithm as a method of solving problems.
Back to the top

The design parameters


  1. The program had to run in Borland C++ 4.5


  2. The program had to be capable of handling user-defined types so long as they conformed to this template (Tree.h is a good example):

    class userClass{
    public:
    userClass( );
    ~userClass( ); //when necessary to free the memory taken by the object
    void go( int pos );
    userClass *cut( );
    void paste( userClass &auserClass );
    ..........
    ..........
    int numnodes( );
    const userClass &operator=( const userClass &auserClass );

    private:
    .........
    .........
    };

    userClass::userClass( )
    The class must have a default constructor which creates an empty instance of the object.

    userClass::~userClass( )
    When the default destructor is not sufficient to free all the memory allocated for the object, the class must include a destructor which does.

    userClass::void go(int pos)
    Moves the internal pointer of the class to the position pos between 0 and numnodes().

    userClass *userClass::cut( )
    Cuts the current instance of userClass at the position of the movable pointer and returns an instance of userClass containing the resultant piece. The movable pointer must stay in the same position.

    void userClass::paste(userClass &auserClass)
    Takes an instance of userClass, auserClass, created by the cut function and appends it to the current instance of userClass, having just undergone the cut function, at the point indicated by the movable pointer while maintaining the integrity of the current instance.

    int userClass::numnodes( )
    Returns the number of nodes or potential cut/paste points in this instance of userClass.

    const userClass &userClass::operator =(const userClass &auserClass)
    Makes the current instance of userClass an exact, entirely separate copy of auserClass.

  3. The program had to handle populations of any size within the memory limitations of the computer on which it was run.


  4. The program had to include functions that supported crossbreeding within the population.


  5. The program had to, when given all appropriate functions and an input population that has been evaluated in a static environment and that has individuals with different fitness values, produce an output population that had an average fitness that was higher than the average fitness of the input population when tested in the same environment.


Back to the top

Methodology


  1. Design both a simple static environment and a population type.


  2. Program the simple static environment for the genetic algorithm population as well as any functions required for an individual in the population to interact with the environment.


  3. Debug the environment.


  4. Program the population and those functions required to initialize and monitor the population.


  5. Test the population in the environment and debug both as necessary.


  6. Design the genetic algorithm engine and the program required to coordinate the environment, population, and genetic algorithm engine.


  7. program the genetic algorithm engine.


  8. Program the coordination program.


  9. Debug the genetic algorithm engine with test data.


  10. Combine all elements and debug as necessary.


  11. Repeat steps 1 - 5 and 10 for each new type of AI population and environment.


  12. Program and debug statistics package to interpret the data collected from test runs.


Back to the top

CONCLUSION

The program conformed to its design parameters as follows:

  1. The program had to run in Borland C++ 4.5. The genetic algorithm does not, technically, conform to this parameter. Because of the poor pointer debugging incorporated in the Borland compiler the researcher changed platforms to Microsoft Visual C++ 5.0. Programs will no longer run in Borland C++ 4.5 because that version of C++ does not include bool as a system data type. This is easily fixable, however, should another programmer wish to take the time and include the bool.h header file before all the include statements in their main file.


  2. The program had to support the use of independently programmed gene types and environments. The program does support user defined gene types that conform to the layout listed in the design parameters.


  3. The program had to handle populations of any size within the memory limitations of the computer on which it was run. Due to unavoidable logistical considerations, the generic genetic algorithm can only handle populations of up to half the available computer memory. This is because the genetic algorithm must store the old generation while creating the new one.


  4. The program had to include functions that supported crossbreeding within the population. The genetic algorithm does support crossbreeding.


  5. The program had to, when given all appropriate functions and an input population that has been evaluated in a static environment and that has individuals with different fitness values, produce an output population that has an average fitness that is higher than the average fitness of the input population when tested in the same environment. The integer test clearly showed that this parameter is satisfied.


Extras:

  1. The genetic algorithm has an overselection function.


  2. The genetic algorithm automatically detects if the strings are of fixed or variable length and acts accordingly.


  3. The Tree class is a generic, templated data tree type that can be used as a gene type with the genetic algorithm. All of the appropriate member functions to perform straight and cross reproduction have already been written. This in conjunction with the genetic algorithm cuts up to 80% of the time involved in setting up any genetic algorithm run that uses data trees as the gene type.


  4. The Gstring class is a generic, templated variable or fixed length gene string type that has all appropriate funtions included for use with the generic genetic algorithm. This type is not quite as useful as the Tree class, but it still saves a fair amount of programming time.


Back to the top

Future research

The researcher is currently working on improving the method by which the original random population is created with the data tree gene types. Though the current method is effective, the researcher wishes to take steps to produce more varied initial populations by combining a number of techniques. A more varied random starting population could significantly decrease the number of generations required to find a particular solution.

Another improvement will be the addition of more advanced time-saving methods. These include:

  1. Decimation, in which the program creates a very large starting population and then uses that population to create a smaller starting population based on fitness.


  2. Support for the random mutation of genes. The researcher is also currently developing additional generic gene types that can be used in conjunction with the genetic algorithm to solve many diverse problems. These types include:


    1. A generic multi-state automaton type. This type would be capable of evolving survival or game-playing strategies.


    2. A generic neural network type. A genetic algorithm could be used instead of a traditional learning algorithm to determine the weights of a neural network. Such a type could be used for such things as predicting stocks, evolving survival behavior, and basic image recognition and compression.


    3. A generic "creature" type to serve as a base type for organisms in artificial life simulations.


Future development opportunities also include the creation of generic programs. This would involve a program similar to the one used for the curve fit problem in which programmers would simply have to replace the code representing the original data points in the fitness evaluation function with their own and then run the program to find a curve fit. Such a generic program would virtually eliminate the need for programming and totally eliminate the time spent on researching genetic algorithm methods.

Other researchers can use this project for future development. This genetic algorithm can be used in any case where:

  1. Potential solutions can be represented in such a way as to conform to the gene type layout shown by the genetic algorithm para meters.


  2. Potential solutions can be assigned a scalar fitness that increases as the p otential solution approaches the correct answer.


Examples of such problems include data mining, time management systems, circuit building, development of improved aerodynamic design, artificial-life programs, virus protection, strategy development, and many other fascinating optimization problems. The ability of the genetic algorithm to search tremendous problem spaces with relatively few attempts makes it a highly lucrative solution technique for many of today's problems in not just computing, but in any number of other fields.

Back to the top