Optimization – How to Use Particle Swarm Optimization (PSO)

optimization

I am learning Particle Swarm Optimization.

The problem is to find the pair of number whose sum is lowest.

Lets say we have some numbers z1,z2,z3,z4

Number  Value
 z1     -2
 z2     -3
 z3      3
 z4     -5

The goal is to find pairs of number whose sum is minimum (z2,z4).

Initially we have 3 available pairs to choose
(z1,z2),(z2,z3),(z3,z4).

Questions

  1. How can I formulate this as an optimization problem? I want to be able to include different combinations as well such as the combinations (x1,x2,x3,x4),(x4,x5,x6,x7) etc.

  2. Can PSO work with this situation?

Best Answer

I took the liberty of adapting a portion of my thesis work to more clearly articulate what Particle Swarm Optimization is and how it works.

Many existing resources on PSO are not really that great or useful, in my experience. Hopefully this is more clear. It's comprehensive and has some additional references which might be helpful.

The following is the approach to take when working with optimization problems.

tl;dr:

  1. Understand what you are actually doing and what optimization is. You need to have context for what's going on and what you are trying to do.
    • The below explains step by step, sequentially, the theory required to understand and formulate an optimization problem
    • You might want to skip the Kuhn-Tucker conditions section here unless you like theory the deep end
    • Several sections deal with constraints, you (or future readers) may not care. Again, feel free to skip as needed
  2. Formulate your situation as an actual optimization problem
    • Mathematically, not in words
    • Follow the sequence of this post while doing so.
  3. Implement PSO using your formal optimization problem

In your case, the hard part is the second step (and first step). Once you can define your problem and design space, how PSO is implemented becomes a relatively trivial "implementation detail" as you can adapt any variant of PSO to accommodate your situation.

The trick is converting a word problem into equations, which is how this approach will let you.


Generalized Optimization Problem Statement

A general optimization problem contains one or more functions to be minimized or maximized. Also included are design variables and constraints. Design variables are independent parameters from which all system equations are built. Constraints are equations or numerical routines that limit certain combinations of design variables (e.g., a prescribed cost or stress value).

The generalized form for a nonlinear constrained optimization problem is given as

enter image description here

with x representing design variables for the optimization problem 1.

This answer will address only problems with a single objective function (p=1).

Design Variables

Each optimization problem has a variety of input parameters that can be changed. These inputs are known as design variables (DVs). For example, if attempting to minimize weight for a pop can, design variables might be can height, material thickness, and can diameter.

Objective Function

The objective function is the function being optimized. Its calculated value is often referred to as a fitness value and objective functions are generally represented as a function of some or all design variables. For the pop can example, the objective function might be an expression representing the total weight of the can as a function of its height, thickness, and diameter.

An objective function can either be minimized or maximized, but convention within formal optimization is to minimize objective functions.

Constrained Optimization

Engineering problems often have limitations imposed on the system. These may be the result of resource limitations such as geometric constraints, physical limitations, or other sources. Continuing the example of a pop can, constraints might be a maximum total cost, a target can volume, or a target diameter to height ratio. Constraints are reflected in optimization problems through numeric equations or computational routines.

An example constrained optimization problem is shown below:

enter image description here

There are three primary types of constraints within optimization problems - inequality, equality, and side. The example contains a single inequality constraint (x1 - x2 -5 <= 0), a single equality constraint (x1^2+x2^2 - 10 = 0), side constraints for both x1 and x2, and a total of two design variables.

Inequality Constraints

Constraints expressed as inequalities are simply referred to as inequality constraints. The inequality constraint above would be x1+x2 -5 <= 0.

Inequality constraints are usually converted to be of the form g(x) <= 0 and normalized. In a generalized case, inequality constraints within optimization are represented by:

  • gj(x) <= 0 j = 1,...,m

where j represents each of m constraints, and x is the independent design variable vector.

When a constraint g(x) = 0, the constraint is considered to be an active constraint. When g(x) < 0, it is considered a satisfied constraint.

Equality Constraints

Equality constraints are similar to inequality constraints, except of the form h(x) = 0.

The equality constraint from above is

  • x12+x22 - 10 = 0

Generally, equality constraints are represented with a small tolerance to help optimization methods obtain feasible solutions.

Side Constraints

Side constraints are lower and upper bounds for the independent design variables. From above, side constraints are

  • 0 <= x1 <= 5
  • 0 <= x2 <= 5

Design Space

Within a given problem, the space within the coordinate system design variables can exist is called the design space. This space is multidimensional, so as the number of design variables increases, the dimensionality of the design space also increases with an equal number of dimensions. This dimensionality is often described as n-D, with n representing the number of dimensions. A combination of design variables is known as a design point and reflects a location within the design space.

The design space refers to the numerical domain where the design variables, objective function, and constraints all coincide. A design space has a dimensionality equal to the number of design variables. A design point is a specific combination of design variables in the design space. Using the values at a design point, outputs for the objective function and constraints can be computed.

An example design space is shown in Figure 1.1 (Kalivarapu8) with 1 design variable, c. The x-axis represents the design space while the y-axis represents the objective function value.

enter image description here

A 2-D problem can be thought to have a design space occupying the XY plane. The Z-axis represents the objective function value for each point in the XY plane. The objective function is represented by contours for arbitrary objective function values plotted on the XY plane. Inequality constraints are plotted in the XY plane and represent boundaries where the constraint equals zero.

A 3-D problem would have its design space as the XYZ volume. A design space for pop can design might be different combinations of height, material thickness, and can diameter.

Problems with higher dimensionality cannot be directly visualized.

Solving Constrained Optimization Problems

Kuhn-Tucker Conditions

(Skip this section unless you really, really want to understand constrained optimization)

An important requirement for solving constrained optimization problems are Kuhn-Tucker conditions. A solution must satisfy the three conditions described by Kuhn-Tucker in the 1950s to represent a local optimum for a constrained optimization problem 2. The three conditions are described below.

enter image description here

The first Kuhn-Tucker condition is shown in Equation 1.3 and is simply a requirement stating a design point representing a local optimum must satisfy all constraints.

Next, the product of the Lagrange multipler (lambdaj) and constraint value g(x) must be equal to zero as shown in Equation 1.4. A Lagrange multiplier reflects the rate of change of an objective function with respect to rate of change in a constraint. Lagrange multipliers (lambdaj$) are equal to zero for satisfied and non-active constraints. When a constraint is active, the value of the constraint g(x) is equal to zero. When all constraints are either satisfied or active, the second Kuhn-Tucker condition represented in Equation 1.3 is equal to zero.

enter image description here

Finally, Figure 1.3 shows the region where an example design point can move to improve as the Usable-Feasible sector. This is the direction in design space which decreases the objective function and remains feasible. At optimality, the direction for improved search represented by Delta F(x) will be perpendicular to all active constraints. This is the third Kuhn-Tucker condition and represented in Equation 1.5. Satisfied constraints are not included as their Lagrange multiplier is 0 as discussed previously.

Kuhn-Tucker conditions do not guarantee a global optimum unless a design space is convex but are used by some methods to determine whether a point is a local minimum.

Sequential Unconstrained Minimization Techniques

(Skip this section unless you really, really want to understand constrained optimization)

Sequential Unconstrained Minimization Techniques (SUMT) are methods to convert a constrained optimization problem into an unconstrained problem. This is generally done by adding a penalty for design points that violate constraints. This penalty results in a new objective function as shown in Equation 1.6 below. This is sometimes referred to as a pseudo objective function. P(x) represents a penalty component added to the objective function based on constraint violations. This is shown in Equation 1.6.

enter image description here

where F(x) is the original objective function, P(x) represents the penalty function, and rp is a scalar multiplier that is sometimes used by the penalty function. The scalar rp is generally set initially and changed throughout subsequent optimization runs to improve solution characteristics.

SUMT methods result in the solution of several unconstrained minimization problems in sequence by adjusting the penalty term until convergence is achieved.

Penalty Functions

(Skip this section unless you really, really want to understand constrained optimization)

Many of the SUMT algorithms are commonly known as penalty functions because they incorporate a numeric penalty when constraints are violated. Adding a penalty to the objective function for infeasible points causes infeasible points to be less desirable as this increases the pseudo-objective function value.

Two of the most common ways to calculate P(x) are the Exterior Penalty Function (EPF) and Interior Penalty Function (IPF). EPF adds a penalty term based on the sum of squares of constraint violations. IPF provides a penalty based on the inverse of the constraint value. Both of these are discussed in more details by Vanderplaats1. Because the way a penalty is imposed can lead to numerically ill-conditioned problems, care must be taken when applying a penalty factor to avoid divergence.

When using penalty functions, constraint violations are used to create an additional term which is added to the term being optimized. Consequentially this allows otherwise unconstrained optimization techniques to be use for constrained problems.

Direct Methods

(Skip this section unless you really, really want to understand constrained optimization)

While the previous section discussed modifications to problem formulation allowing unconstrained methods to be applied to problems containing constraints, another class of methods deals directly with constraints. Direct methods tend to be efficient and offer improved precision but generally require solutions to remain feasible and both constraints and objective functions must be composed of differentiable functions 1.

One of these methods is the Method of Feasible Directions. Heavily relying on gradients, this method identifies a region within the design space a point can move to reduce the objective function and remain feasible 1. Iterations progress by forcing the design point to consistently decrease in fitness while remaining feasible. This is shown in Figure 1.4 (Vanderplaats) The usable-feasible sector represents the area the current design point can move and simultaneously decrease the objective function and satisfy constraints.

enter image description here

The Generalized Reduced Gradient method effectively converts all constraints into equality constraints by adding variables called slack variables to all inequality constraints. It then uses gradients to calculate a reduced gradient and performs a 1-D search (see Vanderplaats for more information). The basic search logic for the General Reduced Gradient is shown in Figure 1.5.

enter image description here

Particle Swarm Optimization

Particle swarm optimization (or PSO) is a heuristic based method developed in 1995 in order to solve optimization problems 3. The PSO method was developed with inspiration from the social and nesting behaviors exhibited in nature (e.g., of flocks of birds, schools of fish, and swarming insects) 4. It uses an array of mathematical particles simultaneously examining a mathematical space searching for improved locations. This search is analogous to the search for food or nesting locations within nature. PSO is designed to solve unconstrained optimization problems.

PSO contains multiple design points simultaneously. These are called particles and individually explore the design space. Typically there are 10 times as many particles as design variables. Collectively, particles are referred to as a swarm and each possess the following attributes:

  • Design variables - current location of the particle represented by design variables values
  • Fitness - objective function value at the current location
  • Velocity - speed the particle is moving in each direction, with a velocity component for each design variable dimension
  • pBest - design point representing the best design point in the particle's history as it has moved in the design space
  • gBest - current best particle in the swarm, evaluated by fitness value

The basic format for each PSO iteration is shown in Figure 1.6. After random initialization, particles first move to different design points within the design space. Once particles move, fitness is evaluated and pBest each updated. The swarm gBest is then updated and each particle velocity calculated for the next iteration. Position and velocity are updated using update Equation 1.7 through Equation 1.9.

enter image description here

Equation 1.7 is the velocity update equation. This determines the new velocity for each particle i for the next iteration (viter+1, i). The current velocity (viter, i), current particle location (xi), and current gBest and pBest locations (xgBest and xpBest) are used to calculate the new velocity. Parameters c1 and c2 are user defined and typically set to 2.0 5. The particle moves each iteration according to Equation 1.8.

enter image description here

Figure 1.7 (Kalivarapu8) shows how the components of the velocity update affect the particle's location each iteration.

enter image description here

In order to facilitate convergence and better allow exploration during initial iterations, Equation 1.9 represents an inertial weight affecting velocity (witer). As iterations increase, the effect of lambdaw causes velocity to be slowly damped throughout iterations as it is typically a value slightly below 1.0 6. Additional discussion on the inertial weight factor can be found in 7.

Convergence criteria are user defined and checked each iteration. An example is having 50 sequential iterations with no more than 0.0001 improvement in gBest.

References

1: Garret N Vanderplaats. “Multidiscipline design optimization.” Vanderplaats Research & Development, Incorporated, 2007. 


2: Harold W Kuhn and Albert W Tucker. “Nonlinear programming.” In Proceedings of the second Berkeley symposium on mathematical statistics and probability, volume 5. California, 1951. 


3: Russell Eberhart and James Kennedy. “A new optimizer using particle swarm theory.” In 
Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on, pages 39–43. IEEE, 1995.

4: Craig W. Reynolds. “Flocks, herds and schools: A distributed behavioral model.” ACM SIGGRAPH Computer Graphics, 21(4):25–34, August 1987. 


5: Yuhui Shi and R Eberhart. “Parameter selection in particle swarm optimization.” Evolutionary Programming VII, pages 1–12, 1998.

6: Y. Shi and R. Eberhart. “A modified particle swarm optimizer.” 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), pages 69–73.

7: R.C. Eberhart and Y. Shi. “Comparing inertia weights and constriction factors in particle swarm optimization.” Proceedings of the 2000 Congress on Evolutionary Computation, 1(7):84–88, 2000.

8: Vijay Kiran Kalivarapu. “Improving solution characteristics of particle swarm optimization through the use of digital pheromones, parallelization, and graphical processing units (GPUs).” Iowa State University, 2008. 


Related Topic