Java – Tradeoff between clean code, duplicate code and code efficiency in java

clean codecoding-stylejava

I have a question on writing clean code. I’m trying to refactor the following method:

private static Map<String, String> createMapOfAttributes(
        final String Id,
        final String attributes, 
        final Map<String, String> invalidLines
) {          
    final String[] arrayOfAttributes = attributes.split(";");
    final int numberOfAttributes = arrayOfAttributes.length;
    final Map<String, String> mapOfAttributes = 
        new HashMap<String, String>(numberOfAttributes);

    for (int i = 0; i < numberOfAttributes; i++) {
        final String attributeEntry = arrayOfAttributes[i];
        if (attributeEntry == null || attributeEntry.isEmpty()) {
            continue;
        }

        //extract family and attribute
        final int attributeEntryDelimitPosition =  
            attributeEntry.indexOf("=");

        final String family = 
            attributeEntry.substring(0, attributeEntryDelimitPosition).trim();

        final String attribute =
            attributeEntry.substring(attributeEntryDelimitPosition + 1).trim();

        final String familyAndAttribute = family + '=' + attribute;

        final String previousFamilyAndAttribute = 
            mapOfAttributes.put(family,familyAndAttribute);

        if (previousFamilyAndAttribute != null) {
            invalidLines.put(Id, family);
        }
    }
    return mapOfAttributes;
}

So the first two arguments are input arguments, the last argument is an output argument that is manipulated and then there is an output argument returned.

One guideline for writing clean code is that any method should do only one thing, which the method does not do.

When I try to separate the things the method does I run into a problem: in the four last lines of the for loop, the mapOfAttributes is filled and it is tested whether an entry already existed in the map; if so the entry is collected in the invalidLines map.

When I try to separate those two things I would come up with the following: one method returns the mapOfAttributes and a second method returns the invalidLines map. In the second method I would need to somehow test for each entry if it’s a dublicate entry, possibly by adding it again to a map and thereby doing the same as in the first method (leading to dublicate code and dublicate computational burden). Furthermore, I would need to have the code to extract family and attribute in both methods, also leading to dublicate code.

So my question is, what would be your take on this? How would you refactor the method?
And also, in more general terms, are readable code, code efficiency and code dublicity contrary goals that sometimes cannot all be satisfied at the same time? (Which in this case might mean that there is no satisfying solution?)

Best Answer

I think you're taking the "do one thing" advice too literally. In my view, this method does do only one thing: it parses input in a text format into an internal data structure. That your data structure has two parts (the valid data and the list of invalid items) is, to my view, irrelevant. The parsing process is a single, atomic concern.

Things may be clearer if you collect the resulting map of valid items and list of invalid ones in a class so that you can return then both as an object. Output parameters are a design smell; the results of a function should be in it's return value, wherever possible.

Related Solutions

Language-Agnostic, Clean Code – Is This Code a ‘Train Wreck’ Violating the Law of Demeter?

The problem here is the signature of setLocation. It's stringly typed.

To elaborate: Why would it expect String? A String represents any kind of textual data. It can potentially be anything but a valid location.

In fact, this poses a question: what is a location? How do I know without looking into your code? If it were a URL than I would know a lot more about what this method expects.
Maybe it would make more sense for it to be a custom class Location. Ok, I wouldn't know at first, what that is, but at some point (probably before writing this.configuration.getLocation() I would take a minute to figure out what it is this method returns).
Granted, in both cases I need to look some place else to understand, what is expected. However in the latter case, if I understand, what a Location is, I can use your API, in the former case, if I understand, what a String is (which can be expected), I still don't know what your API expects.

In the unlikely scenario, that a location is any kind of textual data I would reinterpret this to any kind of data, that has a textual representation. Given the fact, that Object has a toString method, you could go with that, although this demands quite a leap of faith from the clients of your code.

Also you should consider, that this is Java you're talking about, which has very few features by design. That's what's forcing you to actually call the toString at the end.
If you take C# for example, which is also statically typed, then you would actually be able to omit that call by defining behavior for an implicit cast.
In dynamically typed languages, such as Objective-C, you don't really need the conversion either, because as long as the value behaves like a string, everybody is happy.

One could argue, that the last call to toString is less a call, than actually just noise generated by Java's demand for explicitness. You're calling a method, that any Java object has, therefore you do not actually encode any knowledge about a "distant unit" and thereby don't violate the Principle of Least Knowledge. There is no way, no matter what getLocation returns, that it doesn't have a toString method.

But please, do not use strings, unless they are really the most natural choice (or unless you're using a language, that doesn't even have enums ... been there).

Clean Code – Comments vs Class Documentation

As others have said, there's a difference between API-documenting comments and in-line comments. From my perspective, the main difference is that an in-line comment is read alongside the code, whereas a documentation comment is read alongside the signature of whatever you're commenting.

Given this, we can apply the same DRY principle. Is the comment saying the same thing as the signature? Let's look at your example:

Retrieves a product by its id

This part just repeats what we already see from the name GetById plus the return type Product. It also raises the question what the difference between "getting" and "retrieving" is, and what bearing code vs. comment has on that distinction. So it's needless and slightly confusing. If anything, it's getting in the way of the actually useful, second part of the comment:

returns null if no product was found.

Ah! That's something we definitely can't know for sure just from the signature, and provides useful information.

Now take this a step further. When people talk about comments as code smells, the question isn't whether the code as it is needs a comment, but whether the comment indicates that the code could be written better, to express the information in the comment. That's what "code smell" means- it doesn't mean "don't do this!", it means "if you're doing this, it could be a sign there's a problem".

So if your colleagues tell you this comment about null is a code smell, you should simply ask them: "Okay, how should I express this then?" If they have a feasible answer, you've learned something. If not, it'll probably kill their complaints dead.

Regarding this specific case, generally the null issue is well known to be a difficult one. There's a reason code bases are littered with guard clauses, why null checks are a popular precondition for code contracts, why the existence of null has been called a "billion-dollar mistake". There aren't that many viable options. One popular one, though, found in C# is the Try... convention:

public bool TryGetById(int productId, out Product product);

In other languages, it may be idiomatic to use a type (often called something like Optional or Maybe) to indicate a result that may or may not be there:

public Optional<Product> GetById(int productId);

So in a way, this anti-comment stance has gotten us somewhere: we've at least thought about whether this comment represents a smell, and what alternatives might exist for us.

Whether we should actually prefer these over the original signature is a whole other debate, but we at least have options for expressing through code rather than comments what happens when no product is found. You should discuss with your colleagues which of these options they think is better and why, and hopefully help move on beyond blanket dogmatic statements about comments.

Best Answer

Related Solutions

Language-Agnostic, Clean Code – Is This Code a ‘Train Wreck’ Violating the Law of Demeter?

Clean Code – Comments vs Class Documentation

Related Topic