+ All documents
Home > Documents > An overview of evolutionary computation

An overview of evolutionary computation

Date post: 11-Dec-2023
Category:
Upload: xmu
View: 0 times
Download: 0 times
Share this document with a friend
19
Transcript

An Overview of Evolutionary Computation�Xin YaoComputational Intelligence GroupDepartment of Computer ScienceUniversity College, The University of New South WalesAustralian Defence Force Academy, Canberra, ACT, Australia 2600Email: [email protected] paper presents a brief overview of the �eld of evolutionary computation. Three majorresearch areas of evolutionary computation will be discussed; evolutionary computation theory,evolutionary optimisation and evolutionary learning. The state-of-the-art and open issues ineach area will be addressed. It is indicated that while evolutionary computation techniqueshave enjoyed great success in many engineering applications, the progress in theory has beenrather slow. This paper also gives a brief introduction to parallel evolutionary algorithms.Two models of parallel evolutionary algorithms, the island model and the cellular model, aredescribed.1 IntroductionThe �eld of evolutionary computation has grown rapidly in recent years [1, 2, 3]. Engineers andscientists with quite di�erent backgrounds have come together to tackle some of the most di�cultproblems using a very promising set of stochastic search algorithms | evolutionary algorithms(EAs). There are several di�erent types of EAs; genetic algorithms (GAs) [4, 5], evolutionaryprogramming (EP) [6, 7] and evolution strategies (ESs) [8, 9]. Each type has numerous variantsdue to di�erent parameter settings and implementations. The answer to the question which EAis the best is problem dependent. There is no universally best algorithm which can achieve thebest result for all problems. One of the challenges to the researchers is to identify and characterisewhich EA is best suited to which problem.EAs have two prominent features which distinguish themselves from other search algorithms.First, they are all population-based. Second, there is communications and information exchangeamong individuals in a population. Such communications and information exchange are the result ofselection, competition and recombination. A simple EA can be summarised by Figure 1, where thesearch operators are often called genetic operators in GAs. Di�erent representation or encodingschemes, selection schemes, and search operators will de�ne di�erent EAs. For example, GAsnormally use crossover and mutation as search operators, while EP only uses mutation. GAs oftenemphasise genetic evolution, while EP puts more emphasis on the evolution of behaviours.�This work is partially supported by a University College Special Research Grant. The paper is based on an invitedtutorial given at ICYCS'95. To be published in Chinese Journal of Advanced Software Research (AllertonPress, Inc., New York, NY 10011), Vol. 3, No. 1, 1996. 1

1. Generate the initial population P (0) at random, and set i = 0;2. REPEAT(a) Evaluate the �tness of each individual in P (i);(b) Select parents from P (i) based on their �tness in P (i);(c) Apply search operators to the parents and get generation P (i+ 1);3. UNTIL the population converges or the time is upFigure 1: A Simple Evolutionary Algorithm.1.1 Evolutionary Algorithms as Generate-and-Test SearchAlthough EAs are often introduced from the point of view of survival of the �ttest and from theanalogy to natural evolution, they can also be understood through the framework of the generate-and-test search. The advantage of introducing EAs as a type of generate-and-test search algorithmsis that the relationships between EAs and other search algorithms, such as simulated annealing(SA), tabu search (TS), hill-climbing, etc., can be made clearer and thus easier to explore. Underthe framework of generate-and-test search, di�erent search algorithms investigated in arti�cialintelligence, operations research, computer science, and evolutionary computation can be uni�edtogether. Cross-interdisciplinary studies are expected to generate more insights into the searchproblem in general. A general framework of generate-and-test search is shown by Figure 2.1. Generate the initial solution at random and denote it as the current solution;2. Generate the next solution from the current one by perturbation;3. Test whether the newly generated solution is acceptable;(a) Accepted it as the current solution if yes;(b) Keep the current solution unchanged otherwise.4. Goto Step 2 if the current solution is not satisfactory, stop otherwise.Figure 2: A General Framework of Generate-and-Test.It is quite clear that various hill-climbing algorithms can be described by Figure 2 with di�erentstrategies for perturbation. They all require the new solution to be no worse than the current oneto be acceptable. SA does not have such a requirement. It regards a worse solution to be acceptablewith certain probability. The di�erence among classical SA [10], fast SA [11], very fast SA [12],and a new SA [13] is mainly due to the di�erence in their perturbations, i.e., methods of generatingthe next solution.EAs can be regarded as a population-based version of generate-and-test search. They use searchoperators like crossover and mutation to perturb the current solutions, and use selection to decidewhether a solution is acceptable. From this point of view, it is clear that we do not have to limit2

ourselves to crossover and mutation. In theory, we can use any search operators as long as theyperform well on the given representation of the problem they are dealing with. This is also truefor selection. In practice, a good way to tailor the generate-and-test search to the problem weare interested in is to incorporate problem speci�c heuristic knowledge into search operators andselection schemes [14].The rest of this paper is organised as follows: Section 2 discusses some theoretical issues inevolutionary computation, including the classical schema theorem, convergence of EAs, neighbour-hood search and landscape analysis, and computational complexity. Section 3 gives an overview ofevolutionary optimisation. Some useful crossover and mutation operators, and selection schemeswill be discussed. Some hybrid algorithms which combine EAs with other search algorithms willalso be discussed in terms of global versus local search and search biases. Section 4 is on evolution-ary learning. An introduction to classi�er systems is given. Population-based machine learning ismentioned with an emphasis on co-evolutionary learning. Section 5 describes three common modelsof parallel EAs. Finally, Section 6 concludes with a summary of the paper.2 Some Theoretical Issues in Evolutionary ComputationOne of the most important questions in evolutionary computation is: Why does and why doesn't theevolutionary algorithm work? Many e�orts have been made towards answering the question sincethe early days of EAs. One typical example is Holland's schema theorem for GAs [4]. Later, GA-hardness has been investigated by many researchers in terms of GA deceptive problems [5]. In recentyears, analysis of neighbourhood structures and landscapes has attracted increasing attentions fromboth the evolutionary computation �eld [15] and other �elds like SA [16, 17, 18, 19].2.1 The Schema TheoremThe schema theorem was �rst proposed by Holland [4] to explain how GAs work by propagatingsimilarity templates in populations. A schema is a similarity template describing a subset ofstrings with similarities at certain string positions. Without loss of generality, let's assume that weare working with binary strings, i.e., our alphabet is f0; 1g. To generate schemata, we introducea don't care symbol � into the alphabet. Now we have an extended alphabet f0; 1; �g. Any stringgenerated by this new alphabet is a schema. A schema matches a binary string if 1 matches 1, 0matches 0, and � matches either 1 or 0 at all corresponding positions. Schemata o�er us a moreaccurate way to describe and discuss similarities among strings.In order to examine how schemata are propagated from a generation to another, two conceptsneed to be introduced. The order of a schema H , denoted as o(H), is the number of �xed (non-�)positions in the schema. The de�ning length of a schema H , denoted as �(H), is the distancebetween the �rst and the last �xed position in the schema. It facilitates our discussion if we viewone generation of a GA as having two steps; the �rst is selection and the second is crossover andmutation [20]:Current Population selection=) Intermediate Population crossover;mutation=) New Population2.1.1 Impact of Selection on Schema PropagationLet f(H) be the average �tness of the samples in the population which contain schema H , �f be theaverage �tness of the population, m(H; t) be the number of samples in the population which contain3

schema H at generation t. Then the impact of roulette wheel selection on schema propagation ism(H; t+ intermediate) =m(H; t)f(H)�f2.1.2 Impact of Crossover on Schema PropagationLet pc be the one-point crossover probability. Then the the impact of crossover on schema propa-gation is m(H; t+ 1) = (1� pc)m(H; t)f(H)�f + pc �m(H; t)f(H)�f (1� losses) + gains�The gains in the above formula represent the cases where a new sample containing schema H isgenerated from two samples containing no H through crossover. For simplicity, we ignore thisterm. We also make a conservative estimation of losses and assume that crossover within thede�ning length of the schema is always disruptive although this is not always true. Hence we getthe following inequality:m(H; t+ 1) � (1� pc)m(H; t)f(H)�f + pcm(H; t)f(H)�f (1� disruptions)The probability of disruptions can be calculated easily asProbdisruptions = �(H)l � 1 �1� 1nm(H; t)f(H)�f �where n is the population size.After re-arrange the formula we havem(H; t+ 1) � m(H; t)f(H)�f �1� pc �(H)l � 1 �1� 1nm(H; t)f(H)�f ��2.1.3 Impact of Mutation on Schema PropagationLet pm be the bit- ipping mutation probability. We can calculate the probability of schema H 'ssurvival (i.e., not disrupted) as follows.Probsurvival = (1� pm)o(H)2.1.4 Schema Theorem for the Simple Genetic AlgorithmPutting the previous three sub-sections together, we have the schema theorem for the simple GAwith one-point crossover, bit- ipping mutation and roulette wheel selection [4, 5, 20]:m(H; t+ 1) � m(H; t)f(H)�f �1� pc �(H)l� 1 �1� 1nm(H; t)f(H)�f �� (1� pm)o(H)It shows that short, low-order, above-average schemata receive exponentially increasing samplesin subsequent generations. While this is a very nice property of the GA in theory, the practicaluse of the theorem is limited. It is worth noting that the above schema theorem replies on a largepopulation and su�cient number of generations to get a satisfactory statistical estimation of aschema's �tness. In other words, the calculation of f(H) is very di�cult and noisy in practice. Itis unclear how the noise a�ect schema propagation.4

2.2 Convergence of Evolutionary AlgorithmsConvergence discussed in this section refers to the global convergence of various EAs. It can bedescribed as limn!1Prob fXn 2 S�g = 1S� = fX j X 2 S; fX � fY 8Y 2 Sgwhere Xn is the solution at time n, fX is the �tness of X , S is the whole search (solution) space,and S� is the set of global optima.It has been shown that such global convergence can be established under certain conditions[21, 22, 23, 24, 25, 26]. As indicated by Rudolph [26], EAs can be classi�ed into elitist and non-elitist. Elitist EAs are those which always copy the best individual in the current generation tothe next generation. Such elitist EAs include EP for numerical optimisation [7, 23], (� + �)-ESs[9, 27], and elitist GAs [5, 28]. It is not di�cult to analyse the global convergence of these elitistEAs using Markov chains [29]. The techniques used are very similar to those used to show theglobal convergence of SA [30, 31].For non-elitist EAs, the analysis of global convergence is not straightforward. It has been shownthat the simple GA [5] (also called the canonical GA) without elitism cannot converge to globaloptima regardless of the objective function and crossover operators used [25]. In general, however,non-elitist EAs may still converge if certain conditions can be satis�ed [26].While convergence is an important issue for EAs, it has rather limited role in practice and inguiding the design of new EAs. From the point of view of algorithm design, we are more interestedin the computational complexity of EAs for a particular problem. A result similar to that forsimulated annealing [32] would be very valuable.2.3 Computational ComplexityComputational complexity is one of the most important issues in the analysis of algorithms. Toour best knowledge, the only paper on this topic is by Hart and Belew [33]. The analysis ofcomputational complexity for EAs is di�cult because it must be done for a particular problem. Itdoes not make sense when we discuss the computational complexity of an EA without indicatingwhich problem the EA is applied to. Fogel [34, 35, 36] has carried out some empirical studies onthe computational complexity of his EP in solving selected travelling salesman problems (TSPs).He found that the time used to arrive at a good solution for TSPs increased polynomially in thenumber of cities. This is an interesting experimental result and needs further investigation.It is well-known that no polynomial approximate algorithm exists for the TSP which guaranteesthe performance unless P = NP , a very unlikely event [37]. This result does not contradict Fogel'sexperimental result because the theoretical result is about the worst case time complexity and theexperimental result is about the estimated average case time complexity. No performance guaranteewas given in Fogel's studies. It is still an open issue whether an EA o�er any advantage over theclassical optimisation algorithms for TSPs in terms of average case time complexity. This is avery di�cult problem as the average case analysis is already di�cult to conduct for a deterministicalgorithm, let alone for a population-based stochastic algorithm.2.4 GA Deceptive Problems and GA-HardnessAlthough few researchers in the evolutionary computation �eld have been working on computationalcomplexity, there are many people working on GA deceptive problems and GA-hardness [5, 38, 39].5

The motivation for such studies is to identify hard problems for GAs and develop new algorithmsto combat them.Nearly all GA deceptive problems investigated so far are arti�cial and created by human beings.The research has provided some insights into how GAs work and what makes a problem hard tosolve by GAs. However, the question of how to identify whether a given problem is hard or easyfor GAs remains open. What we need to concentrate on should be the studies in characterisingproblems and the relationships between the problems and the genetic operators and �tness functionsused by GAs, so that we have a clear picture of whether a problem is hard for a particular GA.Recent work in the analysis of neighbourhood structures and landscapes is an e�ort made towardsthis direction.2.5 Analysis of Neighbourhood Structures and LandscapesAnalysis of neighbourhood structures and landscapes has been carried out for many algorithms likeSA [16, 17, 18, 19, 40] and TS [41]. Jones [15] recently proposed a formal model of landscapes forEAs. The model make it possible to analyse various EAs with di�erent search operators, especiallycrossover operators. It provides a convenient way to compare some hill-climbing algorithms. Ameasure of search di�culty, i.e., �tness distance correlation, is given. Jones [15] claimed that themeasure\is a remarkably reliable indicator of problem di�culty for a genetic algorithm on manyproblems taken from the genetic algorithms literature, even though the measure incor-porates no knowledge of the operation of a genetic algorithm. This leads to one answerto the question `What makes a problem hard (or easy) for a genetic algorithm?' Theanswer is perfectly in keeping with what has been well known in Arti�cial Intelligencefor over thirty years."It is interesting to see how well this model of landscapes and the measure work on problemsother than those \taken from the genetic algorithms literature".3 Evolutionary OptimisationEvolutionary optimisation is by far the most active area in evolutionary computation. Most pub-lished papers and technical reports are on optimisation. EAs have been applied to both combina-torial and numerical optimisation problems, but the results are mixed. Most papers have reportedvery positive results obtained from EAs (too many to cite them all here). However, there are a fewpapers which contain some negative results about EAs [42, 43, 44]. This is not surprise becauseno algorithm can be e�cient for all problems [45]. The answer to the question whether an EA ise�cient is problem dependent.Most problems attacked by evolutionary optimisation are unconstrained optimisation prob-lems. Michalewicz [46, 47] has achieved some very good results with his EAs on unconstrainedoptimisation problems. Constraint handling is still one of the unsolved problems in evolutionaryoptimisation. Simply adding a �xed penalty term in the �tness function does not work as well as onewould hope. Michalewicz [46, 47, 48] has proposed a number of methods for handling constraintsin evolutionary optimisation.Research in evolutionary optimisation mainly concentrates on the encoding scheme, searchoperators (genetic operators), and selection mechanisms for an optimisation problem. Tuningvarious parameters of an EA has been one of the major tasks in experimental studies.6

It is generally accepted by most researchers that binary encoding is no longer the only or thebest scheme for encoding either numerical or non-numerical values (e.g., permutations). The bestencoding scheme di�ers from problem to problem. This is especially true for combinatorial problemswhere a problem normally has an inherent structure which should be captured by the encodingscheme.3.1 MutationMutation has traditionally been regarded as a secondary operator for GAs and a primary operatorfor EP and ESs. For numerical optimisation, real number encoding coupled with adequate mutationoperators can perform at least as well as binary encoding. Some well-known mutation operatorsapplicable to real numbers are Gaussian mutation and Cauchy mutation. These mutation operatorsall have the form of Xoffspring = Xparent + �where � is a random number generated by a Gaussian, a Cauchy, or even a uniform distribution.Such mutation can be applied to each dimension of a real vector independently by adding a one-dimensional random number with certain distribution. It can also be applied to the real vectorby adding a random vector with certain distribution. These two methods are di�erent althoughsome people tend to mix them up. For example, adding an independent one dimensional Gaussianrandom number to each dimension of an n dimensional real vector is quite di�erent from adding an ndimensional Gaussian random vector to the same n dimensional real vector. Most implementationsof EP and ESs use the former method.The above mutation could also be implemented in binary-encoded EAs. All we need to dois to decode a binary string into a real number �rst. Then the mutation operator is applied tothe real number. After mutation, the real number is encoded back into binary strings. This isquite di�erent from traditional bit- ipping mutation for binary strings. Bit- ipping mutation hasa uniform probability to change any bit in a binary string. Borrowing an idea proposed for SA[17, 18], we can used either a Gaussian or Cauchy distribution to determine the number of bitswhich will be ipped for a binary string.3.2 CrossoverCrossover is also known as recombination. It is a primary operator in GAs. There have beena lot of discussions on the usefulness of crossover in the literature. The answer to the questionwhether crossover is useful is problem dependent. The encoding scheme has a crucial impact onthe e�ectiveness of a crossover operator. The usefulness of a crossover operator has to be addressedwith respect to an encoding scheme. In general, if the encoding scheme encourages the propagationand formation of useful \building blocks" under a crossover operator, the the crossover operator ise�ective for this encoding scheme. Jones' work [15] has provided us a good framework to investigatevarious crossover operators.For the binary encoding scheme, two-point crossover and uniform crossover have been empiri-cally shown to be e�ective for many problems [49], especially for numerical optimisation problems.Some combinatorial optimisation problems, such as the TSP and various scheduling problems,require certain kinds of order-based crossover operators [5, 50, 51].The crossover operator for a combinatorial optimisation problem often needs to be designedspeci�cally for the problem. Such design may not be easy because combinatorial problems usuallyhave some constraints which must be satis�ed. The crossover operator should handle these con-straints and at the same time promote the formation of useful building blocks and reduce disruption7

to the building blocks. The design of crossover operators is still an art rather than a science. It isalso less mature that mutation design.3.3 SelectionThe selection mechanisms used by most EAs can be classi�ed into three major categories:1. Fitness proportional (roulette wheel) selection,2. Rank based selection, and3. Tournament selection.Roulette wheel selection is the probably the oldest one among the three. It was widely used inthe early days of GAs. Let f1; f2; : : : ; fn be the �tness values of the n individuals in a population.The roulette wheel selection scheme assigns probability Pi to individual i according to the followingformula: Pi = fiPnj=1 fjRoulette wheel selection is very simple, but it has di�culty in dealing with super-individuals andtends to converge prematurely without additional scaling methods. These de�ciencies lie in itsdependence on the raw �tness value.Rank-based selection does not depend on the raw �tness of an individual. It depends on therank of an individual in the population. There are several variants of rank-based selection, suchas Baker's [52, 53] and Whitley's [54] linear ranking and Yao's [51] nonlinear ranking. Di�erencerank-based selection mechanisms have di�erent selection pressures. These selection pressures arethe same across generations because they do not depend on the raw �tness of individuals.Both roulette wheel and rank-based selection are global in the sense that all the individuals ina population have to participate in calculation of selection probabilities. This is undesirable forparallel EAs because it increases the communication overheads among individuals from di�erentprocessors. Tournament selection is a better candidate for parallel EAs. It is a local selectionmechanism because the probability of selecting an individual depends only on a subset of the wholepopulation. In fact, tournament selection implements an approximate form of ranking [20]. Onetypical example of tournament selection is Boltzmann tournament selection proposed by Goldberg[5]. Fogel's competition scheme [7] is also a type of tournament selection.3.4 Hybrid AlgorithmsGAs are very good at coarse-grained global search, but are rather poor at �ne-grained local search.One way to improve GA's performance is to combine it with another algorithm which has betterlocal search ability. SA has been used as such a local searcher for GAs [55]. The proposed geneticannealing algorithm [55] compared favourably with the GA or SA alone. M�uhlenbein et al. [56]used a simple hill-climbing algorithm as the local searcher for his parallel GA and also achievedvery good experimental results.Every search algorithm, except for uniform random search, introduces some kind of bias into itssearch. Di�erent algorithms have di�erent biases. Evolutionary algorithms with di�erent operatorsalso have di�erent biases. There have been some discussions on the biases introduced by variouscrossover operators in the genetic algorithm literature [57]. It is such biases that make an algorithmvery e�cient for one class of problems (i.e., the biases lead to the e�cient search of a class of �tnesslandscapes) but not for others. Since we do not know the shape and characteristics of the �tness8

landscape of a given new problem, it is very di�cult to decide which search algorithm (i.e., whichbias) to use to solve the problem. We probably do not want any bias in such a case. Ideally, wewould like some kind of adaptive search operators which can change search bias dynamically duringsearch.Another way to reduce possible detrimental e�ects of the bias introduced by a single algorithm isto use a hybrid algorithm which includes di�erent biases introduced by several di�erent algorithms.Kido et al. [58] described a hybrid algorithm combining GAs, SA and TS and achieved very goodresults for a TSP. They showed experimentally that GA+SA+TS performed better than eitherGA+SA or GA+TS. This seems to indicate that a greater diversity of biases in GA+SA+TSenables search to be conducted e�ciently in di�erent regions of the search space.4 Evolutionary Learning4.1 An Introduction to Learning Classi�er SystemsLearning Classi�er Systems (LCS) are also known as Classi�er Systems (CS). They are a particularclass of message-passing, rule-based systems [59]. They can also be regarded as a type of adaptiveexpert system that uses a knowledge base of production rules in a low-level syntax that can bemanipulated by a genetic algorithm [60]. In a CS, each low-level rule is called a classi�er. A CSproposed by Holland [59] can be described by Figure 3.The general operational cycle for the classi�er system is as follows:1. Allow the detectors (input interface) to code the current environment status and place theresulting messages on the message list.2. Determine the set of classi�ers that are matched by the current messages.3. Resolve con icts caused by limited message list size or contradictory actions.4. Remove those messages which match the conditions of �ring classi�ers from the message list.5. Add the messages suggested by �ring messages to the list.6. Allow the e�ectors (output interface) that are matched by the current message list to takeactions in the environment.7. If a payo� signal is received from the environment, assign credit to classi�ers.8. Goto Step 1.4.1.1 Rules and MessagesEach rule is a simple message processor: Conditions look for certain kinds of messages and, whenthe conditions are satis�ed, the action speci�es a message to be sent. Messages are normally codedby symbolic strings. The alphabet of the symbols is f0; 1;#g, where # is a \don't care" symbolwhich matches either 1 or 0. For example, 1### matches any length 4 messages starting with 1.An example of classi�er can be described as:condition-1, condition-2 / action (message) [strength]111##### 10010010 / 00101101 [56]9

post

matchRule ListMessage List

inputmessage

outputmessage

Output interfaceeffectors

Bucket brigade(adjusts rule strengths)

(generates new rules)Genetic algorithm

Payoff

Environment

Input interface

detectors

Figure 3: An overview of a classi�er system [59].10

One message is more speci�c than another if its condition is more speci�c than another's. Onecondition is more speci�c than another if the set of messages that satisfy the one is smaller thanthe set of messages that satisfy the other.The usefulness of a classi�er is determined by its strength and updated by the credit assignmentscheme, which is based on its average usefulness in the contexts in which it has been tried previously.All classi�ers whose conditions are satis�ed have to compete for the right to post their messages.Competition provides a simple, situation-dependent means of resolving con icts between classi�ers.The actual competition is based on a bidding process. The bid can be treated as some proportionof the classi�er's strength. The bid ratio is determined by the number of speci�c bits divided bythe total number of bits in a message. That is, the competition favours more speci�c classi�ers.4.1.2 Default HierarchiesAn important feature of classi�er systems is the possible adaptive formation of default hierarchies,i.e., layered sets of default and exception rules. A CS organises default hierarchies by favouringexception rules over defaults in its con ict resolution and credit assignment schemes. The simplestexample of a classi�er hierarchy consists of only two classi�ers:1. The �rst (\default") one has a relatively unspeci�c condition and provides action that iscorrect in most cases and incorrect sometimes.2. The second (\exception") one is satis�ed only by a subset of the messages satisfying the �rstclassi�er and its action generally corrects errors committed by the �rst classi�er.The speci�c classi�er both provides the correct action and saves the general classi�er from amistake when it prevents the general classi�er from winning. The exception classi�er may in turnmake mistakes that can be corrected by even more speci�c classi�ers, and so on. Such defaulthierarchies can be learned by the classi�er system.4.1.3 Credit AssignmentCredit assignment is a very di�cult task because credit must be assigned to early-acting classi�ersthat set the stage for a sequence of actions leading to a favourable situation. The most famouscredit assignment algorithm is bucket brigade algorithm which uses metaphors from economics [59].For a classi�er called middleman, its suppliers are those classi�ers that have sent messagessatisfying its conditions, and its consumers are those classi�ers that both have conditions satis�edby its message and have won their competition in turn. When a classi�er wins in competition, itsbid is actually apportioned to its suppliers, increasing their strengths by the amounts apportionedto them. At the same time, because the bid is treated as a payment for the right to post a message,the strength of the winning classi�er is reduced by the amount of its bid. Should the classi�erbid but not win, its strength remains unchanged and its suppliers receive no payment. Winningclassi�ers can recoup their payments from either wining consumers or the environment payo�.4.1.4 Classi�er Discovery by GAsA GA is used in CSs to discover new classi�ers by crossover and mutation. The strength of aclassi�er is used as its �tness. The GA is only applied to the classi�ers after certain number ofoperational cycles in order to approximate strengths better. There are two approaches to CSs;the Michigan approach and the Pitts approach. For the Michigan approach, each individual in apopulation is a classi�er. The whole population represents a complete CS. For the Pitts approach,11

each individual in a population represents a complete CS. The whole population includes a numberof competing CSs.4.2 Population-Based Machine LearningPopulation-based machine learning includes CSs, evolutionary arti�cial neural networks (EANNs)[61], and other evolutionary [62, 63] or non-evolutionary [64] systems.EANNs can be considered as a combination of arti�cial neural networks (ANNs) and EAs. EAshave been introduced into ANNs at three di�erent levels: the evolution of connection weights,architectures, and learning rules. At present, most work on EANNs concentrates on the evolutionof architectures, i.e., connectivities of ANNs [65, 66, 67]. Very good results have been achieved forsome arti�cial and real-world benchmark problems.4.2.1 Co-evolutionary LearningCo-evolutionary learning has two di�erent forms. The �rst one indicates the situation where twopopulations are evolved at the same time [68]. The �tness of an individual in one populationdepends on the individuals in another population. There is no crossover or other informationexchange between two populations. This can be regarded as co-evolution at the population level.The second form of co-evolution is at the individual level. There is only one population involved.The �tness of an individual in the population depends on other individuals in the same population[69, 62, 63]. For example, the same strategy for playing an iterated prisoner's dilemma game mayget quite di�erent �tness values depending on what other strategies are in the population.Both forms of co-evolution have a dynamic environment and a dynamic �tness function. Suchlearning problems are more interesting and challenging than the learning problems in a �xed envi-ronment.5 Parallel Evolutionary AlgorithmsExisting parallel EAs belongs roughly to three major models; the memory model, the island model,and the cellular model [20, 70]. The memory model is virtually the same as the simple GA exceptfor the selection mechanism. Tournament selection is used in the memory model of parallel EAs.A simple GA can be executed in parallel on N=2 processors, where N is the population size andis assumed to be even. Each processor is assigned two individuals. During each generation, eachprocessor conducts two independent tournaments by sampling individuals in the population atrandom. The two winners replace the two existing individuals on the processor. Then crossoverand mutation are applied probabilistically to the two strings on each processor. In general, morethan 2 individuals can be assigned to each processor, e.g., 4, 6, 8, etc.The island model of parallel EAs is designed for coarse-grained parallel machines. It dividesa large population into smaller sub-populations and assigns a sub-population to each processor.In each processor, the EA runs on the sub-population just like a sequential EA except for theoccasional migration of the best individuals from and to other processors. That is, after certaingenerations, these processors will swap their best individuals. Parallel GAs based on this modelhave been implemented on transputer arrays and PVM.The cellular model of parallel EAs is designed for �ne-grained parallel machines or massivelyparallel machines. It assigns each individual to a processor. A neighbourhood is often de�nedfor each processor in this case so that communications between processors is always within theneighbourhood. Selection, crossover, and mutation are applied only within the neighbourhood12

in order to minimise the communication overheads. There have been several implementations ofparallel GAs on CM5.6 ConclusionsThis paper provides an overview of the evolutionary computation �eld. Four major areas arereviewed; evolutionary computation theories, evolutionary optimisation, evolutionary learning, andparallel EAs. It is indicated that while EAs have achieved a lot of success in solving practicalproblems, the question why they work so well for these problems still remains open. More theoreticalwork, such as the analysis of EA's computational complexity, needs to be done.Evolutionary optimisation is by far the most researched area in evolutionary computation. Itis emphasised that hybrid algorithms that combine EAs with other local search algorithms canproduce better results than either EAs or local search algorithms alone. An argument based onglobal versus local search and search biases has been presented.This paper also gives an introduction to classi�er systems and population-based machine learn-ing. In particular, co-evolutionary learning has been singled out due to its dynamic learningenvironment which cannot be handled easily by traditional learning approaches.EAs are population-based algorithms and well suited for parallel implementation. Three basicparallel models of EAs have been described in this paper. Tournament selection or neighbourhoodshave been used in these models to restrict global communications.References[1] L. J. Eshelman (ed.).Proc. of the Sixth Int'l Conf. on Genetic Algorithms.Morgan Kaufmann, San Mateo, CA, 1995.[2] J. R. McDonnell, R. G. Reynolds, and D. B. Fogel (ed.).Evolutionary Programming IV: Proceedings of the Fourth Annual Conference on EvolutionaryProgramming.MIT Press, 1995.[3] X. Yao (ed.).Progress in Evolutionary Computation, Lecture Notes in Arti�cial Intelligence, Vol. 956.Springer-Verlag, Heidelberg, Germany, 1995.[4] J. H. Holland.Adaptation in Natural and Arti�cial Systems (1st MIT Press Edn).The MIT Press, Cambridge, MA, 1992.[5] D. E. Goldberg.Genetic Algorithms in Search, Optimization, and Machine Learning.Addison-Wesley, Reading, MA, 1989.[6] L. J. Fogel, A. J. Owens, and M. J. Walsh.Arti�cial Intelligence Through Simulated Evolution.John Wiley & Sons, New York, NY, 1966.[7] D. B. Fogel.System Identi�cation Through Simulated Evolution: A Machine Learning Approach to Model-ing. 13

Ginn Press, Needham Heights, MA 02194, 1991.[8] H.-P. Schwefel.Numerical Optimization of Computer Models.John Wiley & Sons, Chichester, 1981.[9] H.-P. Schwefel.Evolution and Optimum Seeking.John Wiley & Sons, New York, 1995.[10] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi.Optimization by simulated annealing.Science, 220:671{680, 1983.[11] H. H. Szu and R. L. Hartley.Fast simulated annealing.Physics Letters A, 122:157{162, 1987.[12] L. Ingber.Very fast simulated re-annealing.Mathl. Comput. Modelling, 12(8):967{973, 1989.[13] X. Yao.A new simulated annealing algorithm.Int. J. of Computer Math., 56:161{168, 1995.[14] J. J. Grefenstette.Incorporating problem speci�c knowledge into genetic algorithms.In L. Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 4, pages 42{60.Morgan Kaufmann, San Mateo, CA, 1987.[15] T. Jones.Evolutionary Algorithms, Fitness Landscapes and Search.PhD thesis, The University of New Mexico, Albuquerque, New Mexico, May 1995.[16] G. B. Sorkin.E�cient simulated annealing on fractal energy landscapes.Algorithmica, 6:367{418, 1991.[17] X. Yao.Simulated annealing with extended neighbourhood.Int. J. of Computer Math., 40:169{189, 1991.[18] X. Yao.Dynamic neighbourhood size in simulated annealing.In Proc. of Int'l Joint Conf. on Neural Networks (IJCNN'92), Vol. I, pages 411{416, Beijing,November 1992. IEEE Press, Piscataway, NJ.[19] X. Yao.Comparison of di�erent neighbourhood sizes in simulated annealing.In P. Leong and M. Jabri, editors, Proc. of Fourth Australian Conf. on Neural Networks, pages216{219, Sydney, Australia, 1993.[20] D. Whitley.A genetic algorithm tutorial.Technical Report CS-93-103, Department of Computer Science, Colorado State University,Fort Collins, CO 8052, March 1993. 14

[21] T. E. Davis and J. C. Principe.A simulated annealing like convergence theory for the simple genetic algorithm.In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms,pages 174{181. Morgan Kaufmann, San Mateo, CA, 1991.[22] A. E. Eiben, E. H. L. Aarts, and K. M. van Hee.Global convergence of genetic algorithms: a markov chain analysis.In H.-P. Schwefel and R. M�anner, editors, Parallel Problem Solving from Nature, pages 4{12.Springer-Verlag, Heidelberg, 1991.[23] D. B. Fogel.Evolving Arti�cial Intelligence.PhD thesis, University of California, San Diego, CA, 1992.[24] J. Suzuki.A markov chain analysis on a genetic algorithm.In S. Forrest, editor, Proc. of the Fifth Int'l Conf. on Genetic Algorithms and Their Applica-tions, pages 146{153. Morgan Kaufmann, San Mateo, CA, 1993.[25] G. Rudolph.Convergence properties of canonical genetic algorithms.IEEE Trans. on Neural Networks, 5(1), 1994.[26] G. Rudolph.Convergence of non-elitist strategies.In Z. Michalewicz et al., editor, Proc. of the 1994 IEEE Int'l Conf. on Evolutionary Compu-tation (ICEC'94), pages 63{66. IEEE Press, Piscataway, NJ, 1994.[27] I. Rechenberg.Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evo-lution.Frommann-Holzboog Verlag, Stuttgart, Germany, 1973.[28] K. A. De Jong.An analysis of the behavior of a class of genetic adaptive systems.PhD thesis, University of Michigan, Ann Arbor, 1975.[29] D. L. Isaacson and R. W. Madsen.Markov Chains Theory and Applications.John Wiley & Sons, 1976.[30] S. Geman and D. Geman.Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images.IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-6:721{741, 1984.[31] X. Yao and G.-J. Li.General simulated annealing.J. of Computer Sci. and Tech., 6:329{338, 1991.[32] G. H. Sasaki and B. Hajek.The time complexity of maximum matching by simulated annealing.Journal of the ACM, 35:387{403, 1988.[33] W. E. Hart and R. K. Belew.Optimizing an arbitrary function is hard for the genetic algoriothm.15

In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms,pages 190{195. Morgan Kaufmann, San Mateo, CA, 1991.[34] D. B. Fogel.An evolutionary approach to the traveling salesman problem.Biological Cybernetics, 60:139{144, 1988.[35] D. B. Fogel.Applying evolutionary programming to selected traveling salesman problems.Cybernetics and Systems, 24:27{36, 1993.[36] D. B. Fogel.Empirical estimation of the computation required to discover approximate solutions to thetraveling salesman problem using evolutionary programming.In D. B. Fogel and W. Atmar, editors, Proc. of the Second Ann. Conf. on Evol. Prog. Evolu-tionary Programming Society, La Jolla, CA, to appear, 1993.[37] M. R. Garey and D. S. Johnson.Computers and Intractability: A Guide to the Theory of NP-Completeness.W. H. Freeman Co., San Francisco, 1979.[38] R. Das and D. Whitley.The only challenging problems are deceptive: global search by solving order-1 hyperplanes.In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms,pages 166{173. Morgan Kaufmann, San Mateo, CA, 1991.[39] A. J. Mason.Partition coe�cients, static deception and deceptive problems.In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms,pages 210{214. Morgan Kaufmann, San Mateo, CA, 1991.[40] R. Lister.Annealing networks and fractal landscapes.In Proc. of the IEEE Int'l Conf. on Neural Networks (San Francisco), Vol. 1, pages 257{262.IEEE Press, Piscataway, NJ, March 1993.[41] F. Glover and M. Laguna.Tabu search.In C. R. Reeves, editor, Modern Heuristic Techniques for Combinatorial Problems, chapter 3,pages 70{150. Blackwell Scienti�c, Oxford OX2 0EL, 1993.[42] L. Ingber and B. Rosen.Genetic algorithms and very fast simulated reannealing: a comparison.Mathl. Comput. Modelling, 16(11):87{100, 1992.[43] S. Baluja.An empirical comparison of seven iterative and evolutionary function optimization heuristics.Technical Report CMU-CS-95-193, School of Computer Science, Carnegie Mellon University,September 1995.[44] G. McMahon and D. Hadinoto.Comparison of heuristic search algorithms for single machine scheduling problems.In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti�cial Intelli-gence, Vol. 956, pages 293{304, Heidelberg, Germany, 1995. Springer-Verlag.[45] D. H. Wolpert and W. G. Macready. 16

No free lunch theorems for search.Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM87501, USA, July 1995.[46] Z. Michalewicz.Genetic Algorithms + Data Structures = Evolution Programs.Springer-Verlag, Berlin, Germany, 1992.[47] Z. Michalewicz.A perspective on evolutionary computation.In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti�cial Intelli-gence, Vol. 956, pages 73{89, Heidelberg, Germany, 1995. Springer-Verlag.[48] Z. Michalewicz.Genetic algorithms, numerical optimisation, and constraints.In L. J. Eshelman, editor, Proc. of the Sixth Int'l Conf. on Genetic Algorithms, pages 151{158.Morgan Kaufmann, San Mateo, CA, 1995.[49] L. Davis.Handbook of Genetic Algorithms.Van Nostrand Reinhold, New York, NY 10003, 1991.[50] D. Whitley, T. Starkweather, and D. Shaner.The traveling salesman and sequence scheduling: quality solutions using genetic edge recom-bination.In L. Davis, editor, Handbook of Genetic Algorithms, chapter 22, pages 350{372. Van NostrandReinhold, New York, NY, 1991.[51] X. Yao.An empirical study of genetic operators in genetic algorithms.Microprocessing and Microprogramming, 38:707{714, 1993.[52] J. E. Baker.Analysis of the e�ects of selection in genetic algorithms.PhD thesis, Vanderbilt University, Nashville, 1987.[53] J. E. Baker.Adaptive selection methods for genetic algorithms.In J. J. Grefenstette, editor, Proc. of an Int'l Conf. on Genetic Algorithms and Their Appli-cations, pages 101{111, 1985.[54] D. Whitley.The GENITOR algorithm and selective pressure: why rank-based allocation of reproductivetrials is best.In J. D. Scha�er, editor, Proc. of the Third Int'l Conf. on Genetic Algorithms and TheirApplications, pages 116{121. Morgan Kaufmann, San Mateo, CA, 1989.[55] X. Yao.Optimization by genetic annealing.In M. Jabri, editor, Proc. of Second Australian Conf. on Neural Networks, pages 94{97, Sydney,Australia, 1991.[56] H. M�uhlenbein, M. Schomisch, and J. Born.The parallel genetic algorithm as function optimizer.Parallel Computing, 17:619{632, 1991. 17

[57] L. Eschelman, R. Caruana, and J. D. Scha�er.Biases in the crossover landscape.In J. D. Scha�er, editor, Proc. of the Third Int'l Conf. on Genetic Algorithms and TheirApplications, pages 10{19. Morgan Kaufmann, San Mateo, CA, 1989.[58] T. Kido, K. Takagi, and M. Nakanishi.Analysis and comparisons of genetic algorithm, simulated annealing, tabu search, and evolu-tionary combination algorithm.Informatica, 18:399{410, 1994.[59] J. H. Holland.Using classi�er systems to study adaptive nonlinear networks.In D. L. Stein, editor, Lectures in the Sciences of Complexity, pages 463{499. Addison-Wesley,Redwood City, CA, 1988.[60] R. E. Smith and D. E. Goldberg.Reinforcement learning with classi�er systems: adaptive default hierarchy formation.Applied Arti�cial Intelligence, 6:79{102, 1992.[61] X. Yao.Evolutionary arti�cial neural networks.International Journal of Neural Systems, 4(3):203{222, 1993.[62] X. Yao and P. Darwen.An experimental study of N-person iterated prisoner's dilemma games.Informatica, 18:435{450, 1994.[63] P. J. Darwen and X. Yao.On evolving robust strategies for iterated prisoner's dilemma.In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti�cial Intelli-gence, Vol. 956, pages 276{292, Heidelberg, Germany, 1995. Springer-Verlag.[64] B. W. Wah.Population-based learning: a method for learning from examples under resource constraints.IEEE Trans. on Knowledge and Data Engineering, 4:454{474, 1992.[65] X. Yao and Y. Shi.A preliminary study on designing arti�cial neural networks using co-evolution.In Proc. of the IEEE Singapore Intl Conf on Intelligent Control and Instrumentation, pages149{154, Singapore, June 1995. IEEE Singapore Section.[66] X. Yao and Y. Liu.Evolving arti�cial neural networks for medical applications.In Proc. of 1995 Australia-Korea Joint Workshop on Evolutionary Computation. KAIST, Tae-jon, Korea, September 1995.[67] Y. Liu and X. Yao.A population-based learning algorithm which learns both architectures and weights of neuralnetworks.Chinese Journal of Advanced Software Research (Allerton Press, Inc., New York, NY 10011),3(1):To appear, 1996.[68] W. Daniel Hillis.Co-evolving parasites improve simulated evolution as an optimization procedure.In Santa Fe Institute Studies in the Sciences of Complexity, Volume 10, pages 313{323.Addison-Wesley, 1991. 18

[69] R. Axelrod.The evolution of strategies in the iterated prisoner's dilemma.In L. Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 3, pages 32{41.Morgan Kaufmann, San Mateo, CA, 1987.[70] J. Stender (ed.).Parallel Genetic Algorithms: Theory and Applications.IOS Press, Amsterdam, 1993.

19


Recommended