Genetics and Representation
Genetic algorithms are essentially an optimization strategy. However, unlike most domain-specific optimization techniques (that is, gradient descent), they do not rely on any particular representation. This doesn't mean that representation isn't important for genetic algorithms—quite the contrary. It just implies there's a lot of freedom in the choice of knowledge representation. A large proportion of chapter discusses knowledge representation, and this is another direct application.
Because biology is the inspiration for the field of genetic algorithms, much terminology is borrowed, too. This makes it convenient to explain, but once again, this intuitiveness can lead to unfounded assumptions. These are pointed out when appropriate.
Genome and Genotype
The basis of evolution is genetic code, composed of a sequence of genes. This sequence is known as a genome. Each species has a different kind of genome, with various lengths and different genes. There is one important thing to note about the genome: It describes the overall structure of the genetic code (its meaning and not the actual value).
Genomics is the study of the meaning of these genes for particular species. The genomes found within a species are almost identical. Each gene will occur at the same position—or locus—in the genome, as shown in Figure 32.1.
Figure 32.1. Conceptual representation of a genotype, displayed as a sequence of letters representing the genes.
In each individual creature, these genes can carry specific values known as alleles. An allele is one of the possible forms a gene can take. The sequence of alleles is called the genotype. The genotype is a sequence of information, and not just the structure. (That's the genome.) The field of genetics aims to study the role of alleles (forms of genes) in biological inheritance.
As programmers, we can understand the link between genome and genotype as a class/instance relationship. The genotype is a particular instantiation of the genome, with each gene assigned values as alleles.
The human genetic blueprint is a genome, and each of us comes from one genotype. In our case, the genotype is a set of chromosomes contained within each cell. Chromosomes can be found in most living creatures, which explains why some people use the word "chromosome" in the context evolutionary algorithms. Because this is only one particular option (found by nature) for representing a genotype, its use is often incorrect in this context—strictly speaking. However, most people will understand this terminology as well. We'll use the term "genotype" for the sake of semantic accuracy, historical reference, and convenience. (It's a shorter and unambiguous word.)
Generally, the genome is a highly implicit way to store information. For humans, it contains all the information needed for our cells to develop themselves into a fully grown body. We're still somewhat amazed how the genetic code can allow a collection of cells to grow into a "working" human being. These concepts are borrowed by genetic algorithms.
This genotype's information is typically compact and needs to be somehow "interpreted." To do this, human cells use the genetic code to decide how to behave, which indirectly drives the growth of the body. However, this is very difficult to achieve with simulation—and still produce useful solutions to problems.
Genetic algorithms use much more explicit formalisms to represent the genotype, because this proves easier to work with in practice. However, the principles remain; the genotypes contain the essential information about a solution, usually in the most compact form possible. The idea is that we can build a solution to solve a problem from the genotype.
Phenome and Phenotype
Phenetics is the study of physical properties and morphology of creatures, regardless of their genetic background. This field of science isn't too popular, but attempts to provide understanding of the concept of phenome. The phenome is the general structure of a creature's body: cells, nerves, veins, brain, organs, skin, and so on. This can be seen as a set of physical attributes.
Unlike this example, phenomes can be very complex and very detailed—especially for humans. There are different levels at which the phenome can be modeled, each with increasing complexity (cells, organs, limbs) as shown in Table 32.1. In fact, the human phenome is not fully understood by the fields of biology/anatomy/medicine!
The phenotype is a particular instance of a phenome, namely a unique creature like each of us. Each of the physical attributes will be associated with a particular value.
Once again, these are very high-level properties. The phenotype itself is basically every cell in our body. Unlike the genotype, the phenotype changes quite rapidly over time. (Cells die, organs get injured, the body grows and ages.)
The phenotype is the product of the genotype and the constraints of the environment. For genetic algorithms, this is an explicit representation of the solution. In fact, it's the representation that will be applied to solve the problem—directly or using a third-party algorithm. As such, the structure of the phenotype is often designed to be convenient to manipulate, so solving the problem is made easy.
As discussed, in humans the genotype specifies the way each cell grows, indirectly creating the phenotype. As such, attributes of the phenotype itself aren't determined by a single gene, but rather the observable effect of all the genes on the organism.
The genotype has a strong influence on the outcome of the phenotype, but not an exclusive influence. Because the genes provide a very implicit form of controlling growth, the environment itself has a strong part to play in the equation. This too leads to observable characteristics in the organism (for instance, physical conditioning or accidents).
In artificial intelligence (AI), things are nowhere near as complex. There is usually a direct mapping between genotypes and phenotypes. The AI engineer needs to devise a function to handle this conversion. The representation of the genotype must be converted to a suitable phenotype (see Figure 32.2). In most cases, this happens by decoding the same representation into a different data structure.
Figure 32.2. The conversion from a genotype to a phenotype, showing the relationship to the evolutionary algorithm and the problem itself.
Because there is little distinction between genotype and phenotype in most simulations, why not disregard the genotypes and evolve the phenotypes directly? This would prevent the software from having to convert information from one format to another.
Using the representation of the solution for evolution is becoming common practice. This is sometimes referred to as an evolutionary data structure, although it's really about phenotype-specific operators. It's often much easier to adapt the genetic operators to work with different data structures than it is to encode and decode phenotypes. This is discussed in the next two sections.
The simplicity of the conversion between genotypes and phenotypes is one of the reasons why genetic algorithms just do not have the power of biological evolution; the representation is nowhere near as sophisticated. Nature has taken billions of years to evolve a robust representation itself, and a sophisticated way to transform it into living organisms. In AI, representations are chosen in a few hours by engineers, and even implicit encodings in AI are designed by hand.