High Cohesion in Programming – Definition and Importance

programming practices

I am a student who recently joined a software development company as an intern. Back at the university, one of my professors used to say that we have to strive to achieve "Low coupling and high cohesion".

I understand the meaning of low coupling. It means to keep the code of separate components separately, so that a change in one place does not break the code in another.

But what is meant by high cohesion. If it means integrating the various pieces of the same component well with each other, I dont understand how that becomes advantageous.

What is meant by high cohesion? Can an example be explained to understand its benefits?

Best Answer

One way of looking at cohesion in terms of OO is if the methods in the class are using any of the private attributes. Using metrics such as LCOM4 (Lack of Cohesive Methods), as pointed out by gnat in this answer here, you can identify classes that could be refactored. The reason you want to refactor methods or classes to be more cohesive is that it makes the code design simpler for others to use it. Trust me; most tech leads and maintenance programmers will love you when you fix these issues.

You can use tools in your build process such as Sonar to identify low cohesion in the code base. There are a couple of very common cases that I can think of where methods are low in "cohesiveness":

Case 1: Method is not related to the class at all

Consider the following example:

public class Food {
   private int _foodValue = 10;

   public void Eat() {
     _foodValue -= 1;
   }

   public void Replenish() {
     _foodValue += 1;
   }

   public void Discharge() {
     Console.WriteLine("Nnngghhh!");
   }
}

One of the methods, Discharge(), lacks cohesion because it doesn't touch any of the class's private members. In this case there is only one private member: _foodValue. If it doesn't do anything with the class internals, then does it really belong there? The method could be moved to another class that could be named e.g.FoodDischarger.

// Non-cohesive function extracted to another class, which can
// be potentially reused in other contexts
public FoodDischarger {
  public void Discharge() {
    Console.WriteLine("Nnngghhh!");
  }
}

In you're doing it in Javascript, since functions are first-class objects, the discharge can be a free function:

function Food() {
    this._foodValue = 10;
}
Food.prototype.eat = function() {
    this._foodValue -= 1;
};
Food.prototype.replenish = function() {
    this._foodValue += 1;
};

// This
Food.prototype.discharge = function() {
    console.log('Nnngghhh!');
};
// can easily be refactored to:
var discharge = function() {
    console.log('Nnngghhh!');
};
// making it easily reusable without creating a class

Case 2: Utility Class

This is actually a common case that breaks cohesion. Everyone loves utility classes, but these usually indicate design flaws and most of the time makes the codebase trickier to maintain (because of the high dependency associated with utility classes). Consider the following classes:

public class Food {
    public int FoodValue { get; set; }
}

public static class FoodHelper {

    public static void EatFood(Food food) {
        food.FoodValue -= 1;
    }

    public static void ReplenishFood(Food food) {
        food.FoodValue += 1;
    }

}

Here we can see that the utility class needs to access a property in the class Food. The methods in the utility class has no cohesion at all in this case because it needs outside resources to do it's work. In this case, wouldn't it be better to have the methods in the class they're working with itself (much like in the first case)?

Case 2b: Hidden objects in Utility Classes

There is another case of utility classes where there are unrealized domain objects. The first knee-jerk reaction a programmer has when programming string manipulation is to write a utility class for it. Like the one here that validates a couple of common string representations:

public static class StringUtils {

  public static bool ValidateZipCode(string zipcode) {
    // validation logic
  }

  public static bool ValidatePhoneNumber(string phoneNumber) {
    // validation logic
  }

}

What most don't realize here is that a zip code, a phone number, or any other string repesentation can be an object itself:

public class ZipCode {
    private string _zipCode;
    public bool Validates() {
      // validation logic for _zipCode
    }
}

public class PhoneNumber {
    private string _phoneNumber;
    public bool Validates() {
      // validation logic for _phoneNumber
    }
}

The notion that you shouldn't "handle strings" directly is detailed in this blogpost by @codemonkeyism, but is closely related to cohesion because the way programmers use strings by putting logic in utility classes.

Related Topic