Complexity – Understanding P vs NP vs NP Complete vs NP Hard

big ocomplexitydefinitionnp-complete

I am trying to understand these classifications and why they exist. Is my understanding right? If not, what?

P is polynomial complexity, or O(n^k) for some non-negative real number k, such as O(1), O(n^1/2), O(n²), O(n³), etc. If a problem belongs to P, then there exists at least one algorithm that can solve it from scratch in polynomial time. For example I can always figure out if some integer n is prime by looping over 2 <= k <= sqrt(n) and checking at each step if k divides n.
NP is non-deterministic polynomial complexity. I don't really know what it means for it to be non-deterministic. I think it means it is easy to verify in polynomial time, but may or may not be polynomial time to solve from scratch if we didn't already know the answer. Since it may be solvable in polynomial time, all P problems are also NP problems. Integer factorization gets quoted as an example of NP, but I don't understand why it's not P, personally, since trial factorization takes O(sqrt(n)) time.
NP-Complete I don't understand at all, but the Traveling Salesman Problem is quoted as an example of this. But in my opinion the TSP problem might just be NP, because it takes something like O(2ⁿ n²) time to solve, but O(n) to verify if you are given the path up front.
NP-Hard I assume is just full of unknowns. Hard to verify, hard to solve.

Best Answer

You're basically correct about P and NP, but not about NP-hard and NP-complete.

For starters, here are the super-concise definitions of the four complexity classes in question:

P is the class of decision problems which can be solved in polynomial time by a deterministic Turing machine.
NP is the class of decision problems which can be solved in polynomial time by a non-deterministic Turing machine. Equivalently, it is the class of problems which can be verified in polynomial time by a deterministic Turing machine.
NP-hard is the class of decision problems to which all problems in NP can be reduced to in polynomial time by a deterministic Turing machine.
NP-complete is the intersection of NP-hard and NP. Equivalently, NP-complete is the class of decision problems in NP to which all other problems in NP can be reduced to in polynomial time by a deterministic Turing machine.

And here's a Euler diagram from Wikipedia showing the relationships between these four classes (assuming that P is not equal to NP):

The part that I assume you're most unfamiliar with or confused by is the notion of a "polynomial time reduction" from problem X to problem Y. A reduction from X to Y is simply an algorithm A which solves X by making use of some other algorithm B which solves problem Y. This reduction is called a "polynomial time reduction" if all parts of A other than B have a polynomial time complexity. As a trivial example, the problem of finding the smallest element in an array is constant-time reducible to the sorting problem, since you can sort the array and then return the first element of the sorted array.

One thing that's easy to miss about the NP-hard definition is that the reduction goes from NP problems to the NP-hard problem, but not necessarily vice versa. This means that NP-hard problems might be in NP, or in a much higher complexity class (as you can see from the Euler diagram), or they might not even be decidable problems. That's why people often say something like "NP-hard means at least as hard as NP" when trying to explain this stuff informally.

The halting problem is a good example of an NP-hard problem that's clearly not in NP, as Wikipedia explains:

It is easy to prove that the halting problem is NP-hard but not NP-complete. For example, the Boolean satisfiability problem can be reduced to the halting problem by transforming it to the description of a Turing machine that tries all truth value assignments and when it finds one that satisfies the formula it halts and otherwise it goes into an infinite loop. It is also easy to see that the halting problem is not in NP since all problems in NP are decidable in a finite number of operations, while the halting problem, in general, is undecidable.

Related Solutions

Unit-testing – How to simplify the complex stateful classes and their testing

The State design pattern might be of use, if you're not already using it.

The core idea is that you create an inner class for each distinct state - so to continue your example, SleepingRobot, AwakeRobot, RechargingRobot and DeadRobot would all be classes, implementing a common interface.

Methods on the Robot class (like sleep() and isSleepAvaliable()) have simple implementations that delegate to the current inner class.

State changes are implemented by swapping out the current inner class with a different one.

The advantage with this approach is that each of the state classes is dramatically simpler (as it represents just one possible state), and can be independently tested. Depending on your implementation language (not specified), you may still be constrained to having everything in the same file, or you may be able to split things out into smaller source files.

Design Patterns – Reducing the Complexity of a Class

The long sequence of operations seems like it should be its own class, which then needs additional objects to perform its duties. It sounds like you are close to this solution. This long procedure can be broken down into multiple steps or phases. Each step or phase could be its own class that exposes a public method. You would want each phase to implement the same interface. Then your assembly line keeps track of a list of phases.

To build on your car assembly example, imagine the Car class:

public class Car
{
    public IChassis Chassis { get; private set; }
    public Dashboard { get; private set; }
    public IList<Seat> Seats { get; private set; }

    public Car()
    {
        Seats = new List<Seat>();
    }

    public void AddChassis(IChassis chassis)
    {
        Chassis = chassis;
    }

    public void AddDashboard(Dashboard dashboard)
    {
        Dashboard = dashboard;
    }
}

I'm going to leave out the implementations of some of the components, like IChassis, Seat and Dashboard for the sake of brevity.

The Car is not responsible for building itself. The assembly line is, so let's encapsulate that in a class:

public class CarAssembler
{
    protected List<IAssemblyPhase> Phases { get; private set; }

    public CarAssembler()
    {
        Phases = new List<IAssemblyPhase>()
        {
            new ChassisAssemblyPhase(),
            new DashboardAssemblyPhase(),
            new SeatAssemblyPhase()
        };
    }

    public void Assemble(Car car)
    {
        foreach (IAssemblyPhase phase in Phases)
        {
            phase.Assemble(car);
        }
    }
}

The important distinction here is that the CarAssembler has a list of assembly phase objects that implement IAssemblyPhase. You now are dealing explicitly with public methods, meaning each phase can be tested by itself, and your code quality tool is happier because you aren't stuffing so much into one single class. The assembly process is dead simple too. Just loop over the phases and call Assemble passing in the car. Done. The IAssemblyPhase interface is also dead simple with just a single public method that takes a Car object:

public interface IAssemblyPhase
{
    void Assemble(Car car);
}

Now we need our concrete classes implementing this interface for each specific phase:

public class ChassisAssemblyPhase : IAssemblyPhase
{
    public void Assemble(Car car)
    {
        car.AddChassis(new UnibodyChassis());
    }
}

public class DashboardAssemblyPhase : IAssemblyPhase
{
    public void Assemble(Car car)
    {
        Dashboard dashboard = new Dashboard();

        dashboard.AddComponent(new Speedometer());
        dashboard.AddComponent(new FuelGuage());
        dashboard.Trim = DashboardTrimType.Aluminum;

        car.AddDashboard(dashboard);
    }
}

public class SeatAssemblyPhase : IAssemblyPhase
{
    public void Assemble(Car car)
    {
        car.Seats.Add(new Seat());
        car.Seats.Add(new Seat());
        car.Seats.Add(new Seat());
        car.Seats.Add(new Seat());
    }
}

Each phase individually can be pretty simple. Some are just one liners. Some might be just blinding adding four seats, and another has to configure the dashboard before adding it to the car. The complexity of this long drawn out process is split into multiple, reusable classes that are easily testable.

Even though you said the order doesn't matter right now, it might in the future.

To finish this off, let's look at the process for assembling a car:

Car car = new Car();
CarAssembler assemblyLine = new CarAssembler();

assemblyLine.Assemble(car);

You can sub class CarAssembler to assemble a specific kind of car, or truck. Each phase is one unit within a larger whole, making it easier to reuse code.

Is this design hopelessly flawed, and I'd better admit defeat and ditch - this architecture altogether? Not good for my career, but would writing ill-designed code be any better in the long term?

If deciding what I wrote needs to be rewritten was a black mark on my career, then I would be working as a ditch digger right now, and you shouldn't take any of my advice. :)

Software engineering is an eternal process of learning, applying, and then re-learning. Just because you decide to rewrite your code doesn't mean you're going to get fired. If that were the case, there would be no software engineers.

Is my current choice actually the One True Way, and I need to fight to get better quality metrics (and/or instrumentation) installed?

What you have outlined is not the One True Way, however bear in mind that code metrics are also not the One True Way. Code metrics should be viewed as warnings. They can point to maintenance problems in the future. They are guidelines, not laws. When your code metrics tool points something out, I would first investigate the code to see if it properly implements the SOLID principals. Many times refactoring code to make it more SOLID will satisfy the code metrics tools. Sometimes it doesn't. You have to take these metrics on a case by case basis.

Best Answer

Related Solutions

Unit-testing – How to simplify the complex stateful classes and their testing

Design Patterns – Reducing the Complexity of a Class

Related Topic