Steve Dower | Musings and Mutterings

Evolutionary System Definition Language (ESDL)

Contents

Introduction

ESDL is a conceptual model and language for describing evolutionary systems efficiently and unambiguously. It may be interpreted and executed by systems such as esec, allowing rapid modification and testing of different algorithms.

My technical report “Evolutionary System Definition Language.” PDF describes ESDL in specific detail with a number of examples, while the ESDL page on the esec wiki approaches the details from a programmer’s point-of-view.

The concept underlying ESDL is that evolutionary systems are searching for the best solution to a problem from a very large collection of potential solutions. In simple systems, each potential solution is an individual; in complex systems, a potential solution may be made up of multiple individuals.

Since the collection of possible solutions is too large to enumerate, relatively small subsets of individuals, groups, are used instead. In most examples of evolutionary systems, the main group is called the population. Since the population is small, every individual can be tested to determine its fitness for solving the given problem. When a good enough fitness (not necessarily an ideal fitness) is found, the problem is solved.

It is very unlikely that the initial population will contain a good enough individual, so a new population will need to be created. Unlike the first time, however, there is some extra information available: some individuals in the current population are better (more fit) than others, and this knowledge can be used to bias the generation of the next population. Exactly how is a matter of much research, debate and experimentation and there is generally accepted to be no “right” answer for every situation. However, there are some very common approaches.

Most new populations are generated by somehow modifying the previous generation. The potential range of modification operations is well beyond the scope of this article, except for the general approach that an operator takes one or more current individuals and produces one or more very slightly modified individuals. By only making small modifications it is possible to assume causation between the change and any resulting change to the fitness. If the fitness improves, the change is assumed to be helpful and it is retained in the population. Conversely, if the fitness reduces, the change that was made was not helpful and it is forgotten. This feedback process is implied by the selection methods used, which are usually based on fitness. Note, however, that always preferring the individual with better fitness is very greedy and not necessarily ideal for any particular problem.

ESDL defines the links between the groups, including specifying which operator is used. Since dealing with individuals on a one-by-one basis is cumbersome, operators are applied to all the individuals in a certain group, producing a new group. Because ESDL uses a general concept of “groups” rather than a specific “population-parents-offspring” structure, much more complicated systems can be expressed. For example, multi-population systems are nothing special, since each population is just another group.

This section bears a startling resemblance to the esec wiki page. That’s because I wrote both of them (actually, I wrote that one and copy-pasted into this one). Either way, no plagiarism to see here… move along now…

Examples

I’m going to dive straight into some simple examples. If you are familiar with the algorithms being shown (which are also briefly described) then the ESDL syntax should become obviously reasonably quickly. Otherwise, I recommend the ESDL page on the esec wiki or my technical report “Evolutionary System Definition Language.” PDF.

These examples use elements provided by esec. Other implementations (which currently don’t exist… just planning ahead) may use different names, in which case the definitions will need to be adjusted.

Genetic Algorithms

This system describes a single Genetic Algorithm (GA) that uses a population of 100 randomly initialised individuals of ten bits each. Fitness proportional selection (with replacement) is used to select parents, which are then crossed over at a single point and have approximately 10% of their genes replaced at random.

FROM random_binary(length=10) SELECT 100 population
YIELD population

BEGIN GENERATION
  FROM population SELECT 100 parents USING fitness_proportional(replacement=True)
  FROM parents    SELECT population  USING crossover_one, mutate_random(per_gene_rate=0.1)
  YIELD population
END GENERATION

Algorithm descriptions typically include a fitness function and termination condition, but these have been omitted here. The fitness function is abstracted as an evaluator, which is specified completely separately to the system definition. The reasoning behind this is that GA is still GA regardless of what problem it is solving (though it is possible to specify the evaluator in system for cases where it does matter, such as credit assignment).

The termination condition is also separate, since it is an experimental parameter rather than an algorithm parameter. esec uses a monitor object to determine when to terminate, based on iteration count or the contents of the population group (the YIELD command provides the group at that point to the monitor - parents is never yielded and hence never seen). The algorithm may only terminate at the end of the GENERATION block, with the alternative being to repeat the block.

Genetic Algorithm with Elitism

This system is identical to the one above with elitism added. The best individual from the population is retained each generation (assuming that not every offspring is better than it) by selecting it into a separate group and merging it into the population group at the end.

FROM random_binary(length=10) SELECT 100 population
YIELD population

BEGIN GENERATION
  FROM population SELECT 1 elitist   USING best
  FROM population SELECT 100 parents USING fitness_proportional(replacement=True)
  FROM parents    SELECT offspring   USING crossover_one, mutate_random(per_gene_rate=0.1)
  
  FROM parents, elitist SELECT 100 population USING best
  YIELD population
END GENERATION

Frequently Asked Questions

What is ESDL for?

In a word, sharing.

When I started my research in EC, one of the first things I wanted to do was to code up some algorithms (that’s one of the ways I know that I really understand things). So I set about doing this, and ran into problems almost immediately when the paper I was working from didn’t quite specify everything. So I picked another paper, and found the same thing. Over and over again, algorithms have been published, with results, but without enough information for someone to reproduce it.

ESDL provides a way to neatly and succinctly include the code for the algorithm (a system definition) in a paper. If the author has been using an ESDL interpreter/compiler (such as esec) this is a simple copy-paste.

So, it’s a way to avoid writing code yourself?

Pretty much. Unless, of course, you’re the author, in which case you’ve designed an awesome algorithm with great performance and you want to share it. From this point-of-view, you can look at ESDL in one of two ways. Providing a system definition in ESDL is either helping those too lazy to do their own coding, or it’s ensuring that anyone who wants to use your algorithm will implement it correctly.

Anyone publishing an algorithm obviously intends to describe it completely, but not everybody produces beautiful looking code. So either you release your code under an appropriate licence, or you choose another way of presenting it (for example, mathematical equations, flowcharts, pseudocode, prose, etc.). ESDL is a way of presenting your algorithm neatly. It also happens to be unambiguous enough that you can execute it using an interpreter and save yourself from ever having to write ugly code.

Does using ESDL mean you don’t have to write the rest of your paper?

Not at all. One of the critical design decisions made was to completely separate operator definitions (such as mutation schemes, selection models, etc.) from the system definition. So while the definition specifies which operator is used (by name), it doesn’t include any information on how it works. By separating these details out, the definition is kept focused and clear and ESDL is kept simple. Meanwhile, you can still use any style you like to describe how your operator works, except now it is in terms of one (or multiple) individual in, one (or multiple) individual out, rather than being tied up within the rest of the system.

Some examples I have put together, largely for the purpose of demonstrating how ESDL can be used, include Ant System PDF, Differential Evolution PDF and Particle Swarm Optimisation PDF. Each of these examples include equations and pseudocode as well as ESDL to describe the entire algorithm. I’ve also compared the results of running the ESDL description (in esec) to those in the papers that I sourced the algorithm descriptions, ensuring that my description is correct.

Did you steal the syntax from SQL?

Only a little bit. It would be more accurate to say that I stole it from English, and SQL stole theirs from English as well. There are enough fundamental differences between ESDL and SQL that it really doesn’t help explain how ESDL works by comparing it to SQL, though it is possible to find some similarities. Because if they were completely different, ESDL would use different keywords with different meanings.

I really really really want an IF statement in ESDL.

Not a question, but since I don’t have a section called “Frequently Made Statements” I’ll respond here:

That’s nice. I spent quite a bit of time thinking, designing and discussing ways to include conditional statements in ESDL and eventually decided that it wasn’t worth it. There are ways to achieve the same effect by clever selection, adjusting parameters and (soon) using multiple blocks. Allowing system definitions to change significantly at run-time makes them much more difficult to understand. ESDL is not a replacement for a general purpose programming language: it has only one purpose and application. Anything that distracts from that is fighting an uphill battle to be included.

If you feel really strongly about adding feature X to ESDL, make your case publicly, in effect, write it up. Put together a brief paper describing what changes need to be made to ESDL to support your feature, provide examples of code using your feature and suggest some alternatives that you considered and rejected (you did consider alternatives, right?). Put your name on it, publish it (I will happily put (half-decent) efforts up on my own site if you don’t have your own) and send me a link. At the moment, I have full control over the language, but I have no intention of keeping that forever, particularly if it becomes popular. (This concept is similar to the Python Enhancement Proposal (PEP) process.)

My ESDL Multiblock Extension Proposal PDF should serve as an example of what to include (the code patch is probably unnecessary except as proof that it is possible to implement the feature).

Can I include ESDL in LaTeX documents?

Please do. I’ve published my settings for the listings package, which has been sufficient for my use (so far).