1. Introduction
“Like a moth to a flame” is an old expression relating a strong attraction to something similar to the way moths are attracted to flames. The flying behavior of moths makes it seem as if they converge on a flame.
Since, in optimization problems, we look for algorithms that can discover and converge towards optimal solutions, perhaps moths can help us design one such algorithm. Like many solutions to optimization problems, Moth Flame Optimization (MFO) is inspired by the natural world. The algorithm we explore exploits the spiraling behavior of moths around flames.
In this tutorial, we’ll cover the MFO algorithm in detail.
2. The Optimization Problem
Moths often appear to cluster around bright lights at night. It very much seems like they are attracted to these lights. In reality, it is a consequence of their natural navigation strategy, known as transverse orientation. Moths rely on a distant light source in order to fly in a straight line. They do this by keeping the light source at a fixed angle to themselves. With no unnatural light, this would, usually, be the moon. However, new light sources much closer to the moths distort this behavior. In order to keep a constant angle to this closer light source, the moths end up spiraling around the light.
We frame our optimization algorithm by considering a population of moths. As a population-based algorithm, we represent our moths using an matrix called . is the number of moths, and is the dimensionality of the solution vector that a moth represents.
As we do not know where the solutions are, we need to approximate them. We do this using another matrix of moths, called the Flame matrix . is also an matrix, the same size as . represents the best solutions found so far. We think of as the flame matrix since these are the beacons that our search agents in will orbit. Each of and have an associated vector, and respectively, holding the fitness values of the proposed solutions.
2.1. Algorithm Structure
We visualize the algorithm in the presented flowchart. The algorithm is relatively compact. We can see from the flow chart that we need to initialize the moth population. We then need to initialize or update the flames. After updating the flames, the moths move towards their respective flames. This process then continues until convergence, or the process reaches the step limit. What we have left to do is to define the update behavior for moths and flames:
2.2. Spiral Behavior
Moths need to be able to fly around and approach light sources. We achieve this by defining a logarithmic spiral. Other options for spirals are also possible. Let’s now define the formula for the logarithmic spiral:
(1)
(2)
The spiral depends on the distance between a moth and its corresponding flame . We also include as a user parameter and as a random variable that decreases the range of the spiral over time.
2.3. Exploration vs. Exploitation
As in many agent-based optimization paradigms, exploration/exploitation must be considered. In our case, moths are forced to exploit their one corresponding flame. This prevents over-exploitation early on and promotes exploration. On the other hand, if we kept the number of flames constant over time, we may end up under-exploiting the best solutions discovered so far. So, to avoid this, the number of flames is also decreased over time according to equation 3:
(3)
is the current iteration, is the maximum number of flames, and is the maximum number of iterations.
We update the moths in relation to the best corresponding flames. Since we progressively decrease the number of flames, the remaining moths are updated with respect to the last flame.
2.4. Features and Applications
We have seen how we can model moth behavior and use it to solve optimization problems. We can now re-iterate some of the features of the algorithm and discuss why they work.
Local optima can be a problem for optimization algorithms. As a population-based algorithm, many potential solutions are found and iterated upon, allowing for the avoidance of many local optima. Further, by assigning moths to flames and updating the sequence of flames, local optima are also avoided.
Solutions can be found by searching near our existing good solutions, represented by flames. The adaptive decrease in the number of flames encourages the exploitation of good solutions. The algorithm, therefore, converges over time while also being able to explore alternative areas of the search space.
3. MFO Pseudocode
Let’s see the pseudocode for MFO:
algorithm MothFlameOptimization(NumMoths, MaxIter):
// INPUT
// NumMoths = the number of moths in the population
// MaxIter = the maximum number of iterations
// OUTPUT
// The best solution found by the optimization process
M <- Initialize Moth Population
OM <- Fitness(M)
t <- 0
while t < MaxIter:
if t = 1:
F <- sorted(M)
OF <- sorted(OM)
else:
F <- sorted(F[t-1] + M[t])
OF <- sorted(OF[t-1] + OM[t])
for i <- 0 to n:
Calculate D for the corresponding Moth and Flame
Update Moth(i)
t <- t + 1
F <- sorted(F[t-1] + M[t])
return F[0] // Best Solution
We can see the outline of the algorithm in the pseudocode above. First, we initialize a population of moths. We then initialize the flames and then perform an update step. After that, we combine the previous flames and newly updated moths to generate a new list of flames. Finally, we see that we repeat these update steps for a total of time-steps.
4. Example
Suppose there is a campfire somewhere in the wilderness, and we want to locate it using the MFO algorithm. Further, suppose that moths benefit from the campfire’s heat, and that benefit is inversely proportional to their distance from the campfire. This gives us an easy to compute fitness function. Our wilderness is a plane. The campfire is located at coordinates on our grid. We set the max iterations to .
4.1. Generate the Moths and Flames
We begin by generating moths. For this example we set . We see the moth positions in the table below:
Moth
X-coordinate
Y-coordinate
Fitness
Sorted Rank
1
1
2
5
3
2
3
3
2
1
3
7
7
2
2
4.2. Move the Moths
We have our moths and we have scored them. We now need flames to optimize towards. These are the moths that perform the best. Achieved by sorting the first set of moths by their fitness values. We show the flames in the table below:
Flame
X-coordinate
Y-coordinate
Fitness
1
3
3
2
2
7
7
2
3
1
2
5
We now want to calculate the movement of each moth using equation 1. This requires first calculating the distance between the searching moth and it’s flame. That is to say moth and flame . The distance is going to be simply the absolute value of the difference between the two vectors. We show this in equation 4:
(4)
We then multiply this distance by our logarithmic spiral term and finally add the position of our flame. For our purposes we will set and our random exploration variable will sample as . We work this out in equation 5. The new position is closer to the moth representing the flame. It is also closer to our target campfire:
(5)
4.3. Update and Repeat
We repeat this calculation for each moth. After that we then update the number of flames, we then update out list of flames using our old flames and the new moth positions:
Moth
X-coordinate
Y-coordinate
Fitness
Sorted Rank
1
2.7
2.4
3.46
1
2
1.84
1.84
4.47
2
3
-0.75
.54
7.28
3
In this case, we do not decrease the number of flames this iteration, because we are running the algorithm for iterations. So when we combine our old flames and new moths we are still looking for flames. This means our new flame matrix contains flames from the previous table and flame from the new moths matrix:
Flame
X-coordinate
Y-coordinate
Fitness
1
3
3
2
2
7
7
2
3
2.7
2.4
3.46
Finally we can repeat these calculations until convergence.
5. MFO Complexity
MFO relies on sorting the moths. The original paper suggests quicksort, which has a worst-case run time of . Other sorting algorithms with better worst-case run times are possible. However, we will use quicksort. For more on choosing a sorting algorithm, see our article on choosing a sorting algorithm.
We compose the overall run-time in equation 6. We see from this that we have to run a sort at each iteration and then run the MFO update loop. Consequently, this gives us the big complexity shown in 7:
(6)
(7)
6. Conclusion
In this article, we have covered the Moth Flame Optimization algorithm. We further described the inspiration for the algorithm and then illustrated how to formalize that inspiration.
MFO is a numerical optimization algorithm with many potential uses, from hyper-parameter selection to optimizing multi-layer perceptrons. There are many similar nature-inspired population-based optimization algorithms. For further reading on this topic, we can see some of our other articles on Gray Wolf Optimization, Grasshopper Optimization, and Ant Colony Optimization.
To conclude, we summarize the benefits of MFO. As a population-based algorithm, there is some robustness to getting stuck in local minima. The approach maintains the best solutions discovered so far as flames and progressively exploits areas with good solutions. Adaptively updating the number of flames allows for balancing the exploration/exploitation trade-off. So, on the whole, MFO provides an algorithm that balances exploration against exploitation. It is also easy to understand and implement.