(718) 226-7965

In my last (real) post I wrote about the effects of elitism on a population, and concluded that a small amount of elites, who cloned themselves, produced the greatest increase in population fitness. Just like in real life.

I also wrote:

I also don’t have the time to run everything multiple times (takes days at the moment).

I am pleased to say that I rewrote the program a weekend ago using the CUDA runtime library, allowing me to use my GPU and speed things up quite a bit. Which is good, because I screwed up running this test 3 times and had to restart it – if I was on the old version I would not have the time for such second chances.

This program is mostly similar to the first, with a few differences:

No Elitism

Using the findings of the previous post, I decided to set this one up using the “Only the Best” method. As a reminder, in this heuristic parents and kids are both evaluated and on the top N are passed onto the next generation. Flexible elitism.

Pixel Fitness

The fitness metric was changed slightly – it’s still RMSE – but at the pixel level instead of color channel level. Conceptually there is almost no difference, what’s important to know is that fitness still increases without bound the more similar the images are, and that the fitness levels in this post cannot be directly compared to the CPU version.

Haploid Chromosomes

In the previous model, the “chromosomes” had two genomes each, a “dominant” one and a “recessive” one. During crossover, the dominant and recessive genomes would crossover. This all happens within a single artists. Babies are made by taking combining the dominant genome from one parent with the recessive genome of the other.

In the haploid model of crossover, the parents combine their genomes at the crossover point to make a new, child genome. i.e. the child might get the first 20% of its genome from parent 1 and the remaining 80% from parent 2.

I also changed genome initialization so that all triangles are set to invisible in the beginning (previously the first triangle was made visible).

No Bit Level Crossover

I didn’t have time to implement it and I don’t think it makes a difference. Bit level crossover is basically byte level crossover with an extra point mutation.

No Location Simulation

In this version genomes are free to mate with anyone other genome (in stochastic proportion to their fitness).

Randomness Fixed

The random seed actually works in this version – I can guarantee that all runs started with the same set of genomes and all runs are repeatable.


Another difference is that you no longer specify the number of generations you want to run, but instead the effort you want the algorithm to put into searching for the best image.

$$effort = \text{population size} + \left(\text{number of children} \times \text{number of generations}\right)$$

This allows me to compare settings with different numbers of population and children on even footing.


This time I ran the program on three different types of images: a picture, a drawing and a drawing with a transparent background. I wanted to see if a different type of image would affect the results.

From left to right: A picture of a robotic girl, a picture of a red panda, a transparent of Lina from DotA 2I ran the program with 100,000 effort, 100 population that produces 100 children each round with 100 triangles in their genomes and adjusted their crossover chance between [0,1] in 0.1 increments. I ran the program 10 times per image per increment to produce the average I used for the graph below:

The error bars represent the 95% confidence interval.

From this light testing it seems that crossover does indeed improve fitness. This is a relief because despite there being theory on why it should improve fitness, I have not personally discerned a difference. This is probably because the difference is only dramatic between 0 and 0.25. I seems that it’s more important to have some crossover than how worry about how much crossover.

I’ll be using a 100% crossover chance going forward and I’ll try to tease out the difference between crossover at the byte level and crossover at the triangle level.



Elitism in genetic algorithms refers to the top N members of a population P getting a free pass to the next generation. Leaving \(\vert P \vert – N\) spaces to be filled with the children of high performing parents. The goal of elitism is to add additional selective pressure to the population, causing early convergence (and higher fitness) but reducing genetic diversity. For those of you familiar with simulated annealing, adding elitism is like reducing the size of the step of your gradient descent algorithm. You’re going to climb the hill faster, but you’re less likely to find the biggest hill to climb.

In my program, there is an additional option for elitism, cloning. If cloning is set to true the top N members of the population get to skip into the next round, but they are also exempt from crossover and mutation in the next round. This has two effects:

  1. This lowers the genetic diversity even more
  2. This makes the fitness a monotonically increasing function over each generation because the highest fitnesses of each generation cannot be less than the highest fitness of the last generation.

Cloning Comparison

I ran the program with 10 triangles, 50,000 effort, simulating location with a 4 dimensional adjacency matrix and 25 members in the population on the below image:

(302) 272-8404

“Effort” is population size \(\times\) number of generations. This is so different population sizes can be compared fairly and allows us to ask the question, which is better: 1 artist that evolves over 100 generations or 100 artists evolved over 1?  In this example, 50,000 effort translates to 2,000 generations of 25 artists.

Using these settings, I set elitism from [0,25] per-mutated with cloning set to either “true” or “7863589933.” I ran five trials for each elitism setting and took the average to plot the following graph:

Graph comparing the runs of a genetic algorithm with varying levels of elitism, with cloning of elites set to true or false

The first thing to notice is that having any number of elites is better than having no elites. This makes sense when you notice that no elites allows the population to stabilize at lower levels, where it is unlikely that any one artist will mutate beneficially enough to dominate the gene pool and force fitness upwards, before it returns to the mean. This can be thought of as a stable period in the punctuated equilibrium model of evolution, which is not what we want, we want constant pressure to increase fitness. Elitism artificially magnifies the small differences by anchoring them in the gene pool so they can’t return to the mean.

At elitism=25, cloning=true the fitness should never change after the first. For cloning=false it is basically an exercise in naive, parallel hill climbing. However I just came to the terrible realization that I was doing elitism wrong. Instead of taking the top N members of the population after proportionally allocating them reproductive preferences by fitness. This means there is a small chance for some artists to not be cloned and that in all my previous tests some artists were being over-cloned, further tilting the balance in favor of convergence over diversity.


I’ll end up redoing everything but I’ve spent too much time to not write this.

The second thing to notice is that cloning produces strictly better results than non-cloning. This is probably because cloning forces the fitness to increase, where as elitism without cloning simply makes it more likely.

Cloning: True - Elitism 2
Cloning: True – Elitism: 2
Cloning: False – Elitism 2

Another thing to notice is that when cloning is set to true, the gap between the elite and average artists grows. In this example, the top artists in the cloning=true run are, on average 3.7 std devs above the average fitness, where in the cloning=false run the elites are, on average, 2 std devs above the average fitness. In addition, in the cloning=false run, the difference remains constant, where in the cloning=true it seems to grow linearly with the number of generations, so the average gap of the last 10 generations is 4.7 std deviations.

Elitism Proportion

The next question I had was how many elites does one need in the population? In the first study, 4 is the optimal amount. Does this mean ~4 elites are optimal, or does this mean ~16% of the population being elite is optimal?

To find out I ran the same study but with population size 100. I kept all other options the same (like effort 50,000, resulting in 500 generations). I only ran with cloning=true since the results of the previous study showed that to be the dominant setting.

Elitism Test: Population 100

You can immediately see the similarities to the population 25 run. Max fitness shoots up and then gradually decreases and flat lines as it increases. In this sample the peak occurs at the elitism=8 mark. The first thing that comes to my mind is that we quadrupled the population and only needed to double the number of elites, suggesting a half or square root relationship. Further testing is required.

Only The Best

The last thing I tried was a bit different. In this method, I bred the next generation, scored them and compared them with their parents. This means there was \(2\times\vert P\vert\) artists alive at a time. The the top \(\vert P \vert\) were spared and the bottom half were culled before starting the process over again.

This is, in no way, in the spirit of genetic algorithms. You’re not allowed to have two generations alive at once and allow parents to live eternal if they’re good enough. It’s basically parallel hillclimbing with a stupid step function, and it’s sort of comparable to classic elitism because of this.

So, how does it compare? Really well, actually. I ran it 6 times and it had an average max fitness of 6.68, which is comparable, and certainly within the margin of error, of the max elitism=8, population=100 max fitness of 6.71. It is interesting to not that the average fitness of the artists was much higher as well: 6.65 versus 6.00. This suggests that this method has the best of both worlds (cloning true/false). It’s also incredibly disheartening because it seems that the best method for arranging triangles is some sort of parallel hill climbing and not a classic GA.

The more I investigate GA’s it seems that they are just bad versions of hill climbing. Provably? Not always, but probably.


Now, I didn’t do the stats properly because I’m lazy, so I don’t know what the margin of error is on these lines, but I imagine it’s substantial. I also don’t have the time to run everything multiple times (takes days at the moment). So in the future I’d like to run these test to significance.

There is the chance that if I ran these with more effort that the greater diversity of cloning=false would generate higher fitnesses in the long run. Based on every run I’ve ever done, I seriously doubt this is the case but is a possibility.

Also, as stated above, I screwed everything up so this is some weird stochastic form elitism instead of pure elitism.




Triangles Revamped

Last weekend I finished v1.0.0 of my new triangles program.

It was depressing to see that there were 3 other versions collecting dust on my neglected GitHub so I deleted them. I choose to forget. I live the noble lie that existence is worth the cost.

What Does It Do?

It takes an image as input, and tries to draw it using a fixed number of triangles to the best of its abilities. It uses a genetic algorithm to arrange the size, shape and color of the triangles. It also takes in a bunch of different parameters, but we’ll get into those later.

At least I tried

How Well Does It Work?

Not that well, which is supremely disappointing. I have two theories on why:

  1. Pure Genetic Algorithms may not work well for this use case
  2. The hyper-parameters might need tuning (which is step II)

I’ll be trying to optimize the hyper parameters as step two of this program, and I’ll try a pseudo-GA approach to see if the output can be improved; I’m certain that it can.

How Does It Work?

The way genetic algorithms work is that they encode the solution space as a string of bits, and manipulates these bits to search through the solution space to find a satisficing solution. In other words, genetic algorithms are a nature-inspired version of parallel hill climbing/gradient descent. To get started, we have to know how to encode our solution space as a string of bits, and how to manipulate this string of bits to improve our solution.

Encoding the Solution Space

The end goal is to have an image drawn with a finite number of translucent triangles. I choose to represent this a background color, followed by a list of triangles. This would then be drawn on a “blank” image of the same size. Encoding it this way reduces the number of bits needed to represent the image, and thus the solution space we need to search. Another thing important to the success of genetic algorithms is that the bit locations themselves have meaning. Since the triangles will be drawn in order, and order matters when you have transparency, a list is ideal way to provide this bit-location semantics.

Basic representation of triangle image genome

Background Color

The first 32 bits of my image encode the RGBA value of the image background color. Each byte (8 bits) represents one color value, and one translucency value, giving us 8 bit color depth, or ~16.7 million colors to choose from (not including transparency). Unfortunately, this also means finding the right background color has ~4 billion possible solutions.

Background Color Genome Representation

For those not in the know, “alpha” is how the translucency parameter is typically referred to. Adding an alpha channel means that this program can handle transparent images in addition to images with a defined background.

The program will read this value first, transform it into a color and set every pixel in the image to that color. For example, if the bit string looked like this:


We’d break it up into the component RGBA bytes:

\begin{align} r &= [11000100] \\ &= 196 \\\\ b &= [01100001] \\ &=  97 \\\\ g &= [01101111] \\ &= 111 \\\\ a &= [11111111] \\ &= 255 \end{align}

The rgb value is (196,97,111) which renders a nice red color. Opacity is determined as a value between 0 and 1, with 0 being invisible and 1 being completely opaque. The alpha value can be any number [0,255]. To derive opacity we simply divide by the maximum number. In this case:

$$255 \div 255 = 1$$

So the background is totally opaque. If our source image was 500×500 pixels, this bit string would produce an image that looks like this:

Red RGBA example image

This RGBA encoding is also how each triangle’s color will be represented.


Triangles are encoded as a set of set of 3, (x,y) points, an RGBA color and a bit that encodes whether or not the triangle is active or not. Each triangle can be encoded in 11 bytes, or 88 bits.

Triangle bit encoding


As you may know, triangles have three points, the question is, how to render them onto an image? Images can be thought of as a grid of pixels. Using discrete Cartesian coordinates, we can address pixels on this 2 dimensional grid.

In this program, the origin is at the upper-left corner of an image, with the x-axis indicating width offset and the y-axis the height offset. This means that the upper-left pixel can be addressed by the point (0,0). If the image was 100 pixels tall, and 100 pixels wide; the bottom-right pixel could be addressed by the point (99,99).

Here is some pixel art I made that is 16×16 pixels (scaled up to 128×128):


The point (0,0) is purple. The point (0, 2) is gray. The point (1,4) is yellow. You get the idea.

Each point is represented by an (x,y) value:


This is actually a lie. I tried to directly encode the points using ten bits, allowing me to directly address pixels in images up to 1024×1024 in size. I later switched to encoding the width/height percentage, where Why?

  1. If I use images smaller than that the bits are wasted (while still expanding the search space).
  2. If I uses images smaller than 1024×1024 the addressing of pixels would no longer be distributed evenly, distorting the algorithm.
  3. I cannot use the algorithm on images of any size.
  4. Making the numbers 8 bit gives nice byte alignment (without limiting me to 256×256 images, which are quite small).

So instead the (x, y) point indicates the percent width and height of the image. Since each point is 8 bits, they represent 1/255th of the image. For example, if the (x,y) values were (123, 16) and the source image was 512×512 pixels:

\begin{align} x &= [01111011] \\ &= 123 \\\\ 4 &= [00010000] \\ &= 16 \\\\  pixel_{x} &= (123 \div 256) \times 512 \\ &= (~0.48) \times 512  \\ &= 246 \\\\  pixel_{y} &= (16 \div 256) \times 512 \\ &= (0.0625) \times 512  \\ &= 32 \\\\ pixel_{x,y} &= (246,32) \end{align}

Using these 3 points, we can define which pixels are inside of the triangle, and therefore color them appropriately to “draw” them.


Color is done much the same as the background color, with one major difference.


Like before, triangles have an 8-bit RGB color profile. Unlike before, they only have 7 bits to determine their opacity. I did this so I could sneak in bool value, while maintaining byte alignment. I don’t think the triangles suffer from only having 128 opacity states versus 256.

The “visible” bit determines if the triangle is drawn at all. Although this could be achieved by setting all the opacity bits to 0 – we will see why it’s necessary below.

Putting it all together

Here’s what a bit string – or “genome” – of a red image with a centered, 50% transparent, light blue triangle would look like:


Again, notice how the dimensions aren’t specified. They depend on the source image. The encoding is dimension agnostic – which is a nice feature.

The size of a genome, in bytes: \(4 + ( \text{number of triangles} \times 11)\)

Multiply by 8 to get the number of bits. This means a drawing with 2 triangles has \(2^{208}\) possible configurations.

Manipulating the Bits

So now we know how to encode solutions are bit strings, which I will affectionately name “genomes.” Given that even simple genomes can represent an incredibly large search space, we need an intelligent way to manipulate the strings, and narrow the search space. Unfortunately, this is not an AI post, this is a genetic algorithms post – we only have the power of natural selection. Natural selection is not very smart. It can be effective at find solutions but only if you harness the power of the only thing it cares about – reproductive fitness. Elizier Yudkowsky puts it nicely:

within each species, the blind idiot god is purely obsessed with inclusive genetic fitness. No quality is valued, not even survival, except insofar as it increases reproductive fitness. There’s no point in an organism with steel skin if it ends up having 1% less reproductive capacity.

from his blog post: (909) 644-9362


This means that the most important part of the algorithm, besides the encoding, is the fitness function. We’ll get there eventually, but first we need to know the life  of an Artist.


To keep the nature analogy going, artists are the creatures produced by the genome. They are very similar – since an artist is defined by its genome, but they are not the same. An artist is an extension of the genome in three ways.

  1. Artists are finite. They, not their genomes, are the ones that live and die. They are the ones that reproduce. The same genome may appear in different artists, multiple artists, in different places and different times. Artists live once.
  2. Artists live in a place. They have a location within their environment.
  3. Artists have two copies of their genome, a dominant and recessive strand.

When an artist is created, it is given two, equal length, random bit strings. One we’ll label “dominant” and the other “recessive.” These random strings are modified so that the “visible” bit of each triangle, except the first, is set to 0.

Population & Location

The first thing we do is create a population of artists. As stated above, these artists start with totally random bit-strings, or genotypes.

These artists are distributed on a map. This map is really an adjacency matrix, of \(N\) dimensions. This means that each artist is adjacent to \(N \)other artists. i.e. if \(N=4\) it simulates a 2D grid, wraparound grid. Artists are only allowed to mate with artists that are adjacent to them.


Each artist has a chance to crossover \((~70%)\). A random point in the genome is picked, and the dominant and recessive genomes swap their bits after that point.

(787) 506-8504

This crossover can be set to happen at the bit, byte or triangle alignment. i.e. if set to triangle alignment, triangles are never split, and always swapped intact.


Good question. The reason seems to be “because nature does it.” To be fair, that’s why anybody does genetic algorithms in the first place, but there is another reason why crossover might be valuable. Since GAs operate on bit strings, it stands to reason that there would be valuable subsequences of bits. Crossover allows these sub sequences to be recombined, and swapped around.


Each artist has a chance \((~0.05%)\) to mutate at each bit. If a mutation occurs, the bit is flipped. i.e. if it is a 1 it becomes a 0, and if it is a 0 becomes a 1.


Mutation is important because it prevents the algorithm from being stuck. If the initial population of artists did not have the sequence present, then, without mutation, that sequence can never be created. Just like in nature, mutation is usually harmful, but seemingly necessary for advancement.


The most important part of a genetic algorithm is how “fitness” is measured and reproduction is handled. This makes sense, natural selection is purely concerned with “reproductive fitness” which has both of these words in it.

Fitness is a measure of how close a solution is to ideal. This means you have to know what a good, or perfect solution, looks like before you start. This is tricky, because if you knew what a good solution was, you wouldn’t need to find one. In this case we need another function, one that takes the bit string and maps it back to the original domain. This domain lends itself to evaluation in a way the genome doesn’t.

Expressing Yourself

In biology, a genotype is the DNA, the sequence of ribonucleic acids that make up a gene. The phenotype is the expression of the DNA, or gene. In this algorithm, the bit strings, or the list of triangles is like the genotype. If we take this information and convert it into an image, that is like the phenotype.

In this case, the background color for the image is determined by the RGBA background color value encoded in the dominant genome. Then, each triangle is examined in order from both the dominant and recessive genomes. If the triangle from the dominant genome is “visible” (the visible bit is set to 1) then it is drawn. If the dominant triangle is not visible and the recessive triangle is visible then the recessive triangle is drawn. If neither are visible, no triangle is drawn. Repeat until we run out of triangles. This gives us our artist’s image.

Calculating Image Distance

We can compare this image to the source image to see how close it is. The closer it is to the source image, the better. In this program I settled on using the per-pixel root mean squared error for my fitness function. The larger the error, the more different the images, and the lower the fitness. If you’re interested in learning more, I have a previous post you might want to check out.

I also tried adding a custom image hash and an image hash based off of a discrete cosine transform – but neither seemed to improve the fitness function so I dropped them as they are expensive to calculate.


Once all the artists have been assigned a fitness, it is time for them to reproduce. Again, to harness natural selections powers we must allow more fit artists to reproduce with greater frequency. The question is, how much more often?

If we allow high fitness artists to reproduce too prolifically, we may converge too quickly as they take over the gene pool in rapid succession. If we don’t allow them any opportunity to reproduce more often then the algorithm will make progress only by blind luck.

The method I chose was a sigma-scaling. The standard deviation of fitness is calculated, and artists are allowed to reproduce in proportion to their z-factor. This has two benefits: when the variance is very high (in the beginning) a lucky artist won’t prematurely dominate the gene pool. In later generations, when the program has converged, the variance will be low, but there will still be artists that are several standard deviations from the mean and so selective pressure will remain.

$$ExpectedReproduction(a, t)= \left\{ 1 + \frac{f(a)-\bar f(t)}{2\sigma(t)} \right\}$$

Where \(a\) is the artist, \(f(a)\) is the fitness of the artist, \(\bar f(t)\) is the average fitness of the population at time t, \(\sigma (t)\) is the standard deviation of the fitness of the population at time t. If \(\sigma (t) =0\) then ExpectedReproduction is 1. If ExpectedReproduction is less than 0.1 it is set to 0.1 out of kindness.

Using a stochastic process artists are selected to mate such that they will reproduce at least \( \lfloor ExpRepr(i,t) \rfloor \) times and at most \( \lceil ExpRepr(i,t)\rceil\) times.


Artists are only allowed to mate with artists adjacent to them, and these are chosen at random. Their offspring inherit the dominant genome from the mating parent, and the recessive genome from the selected mate.

There is also an elitism parameter that can be set. If set, the top \(X\) members of each generation are cloned into the next generation, bypassing mating. This can help the program converge faster.


All the offspring of the current generation are distributed onto the map, roughly where their parents were; The current generation are deleted and the process starts over.

What’s Next?

I’ve been running the program on images and I’m not super impressed with the results. I plan on tinkering with it to see if there is any obvious improvements that I can make (that are still in the spirit of genetic algorithms). Most important is to do some actual research on which hyper parameters perform best. I’ve never machined learned parameters before so wish me luck!

  1. Test which hyper-parameters are the best
    1. This is the science part where I run the program many times to determine which hyper parameters produce the best looking image
  2. Switch to OpenCV so I can use CUDA to compare images and rasterize them
    1. This should make the code much faster – probably requires rewriting nearly all of it though.
  3. Hook it up to a webcam
    1. Have the “environment” change each frame so that the “artists” are constantly evolving.


I’m re-re-re-re-starting trying to build a program that uses a genetic algorithm to draw images using triangles. Like my blog that I keep “restarting.” Like everything in my pathetic life. This sushi is like your life, many things started and nothing finishedLast week I read An Introduction to Genetic Algorithms by Melanie Mitchell as a primer. It’s a good book that talks about the history, various technics and mathematics behind understanding GAs. That said, I could never shake the feeling that the central theme of the book was “They’re an interesting way to program things, but we’re not sure what they’re good for and are most likely just a bad form a hill climbing.” I wasn’t too bothered by this, because my goal with this program is to learn more about GAs and not build scalable, efficient machine learning algorithms.

I had an old version of this program that made pretty good images (see the panda on this earlier 3039342961, or this 850-916-9263). I then remade a shitty version of the program, like 2 years ago, that I managed to get working again as well. It takes longer and produces worse results, more proof that I am declining in my old age.

Crystal maiden image compared to the same image rendered only in triangles

I sketched out a rough plan of my revised algorithm over the weekend, and now it’s finally time to implement. Too bad it’s been ages since I’ve written any code let alone ceepeepee. Wish me luck.