Php – Why is PHP’s method of comparing different types bad

comparisondynamic-typinglanguage-designPHPprogramming-languages

I'm working on designing a new programming language and trying to decide how I will do variable comparisons. Along with many different types of languages, I've used PHP for years and personally had zero bugs related to its comparison operations other than situations where 0 = false. Despite this, I've heard a lot of negativity towards its method of comparing types.

For example, in PHP:

 2  <  100      # True
"2" < "100"     # True
"2" <  100      # True

In Python, string comparison goes like this:

 2  <  100      # True
"2" < "100"     # False
"2" <  100      # False

I don't see any value in Python's implementation (how often do you really need to see which of two strings is lexicographically greater?), and I see almost no risk in PHP's method and a lot of value. I know people claim it can create errors, but I don't see how. Is there ever really going to be a situation where you are testing if (100 = "100") and you don't want the string to be treated as a number? And if you really did, you could use === (which I've also heard people complain about but without any substantial reason).

So, my question is, not counting some of PHP's weird conversion and comparison rules dealing with 0's and nulls and strings mixed with characters and numbers, are there any substantial reasons that comparing ints and strings like this is bad, and are there any real reasons having a === operator is bad?

Best Answer

The biggest problem is that an equivalence relationship, the mathy term for things like ==, is supposed to satisfy 3 laws.

reflexivity, a == a
commutativity a == b means b == a
transitivity a == b and b == c means a == c

All of these are very intuitive and expected. And PHP doesn't follow them.

'0'==0 // true
 0=='' // true
'0'==''// false, AHHHH

So it's not actually an equivalence relationship, which is a pretty distressing realization for some mathy people (including me).

It also hints at one of the things that people really hate about implicit casts, they often behave unexpectedly when combined with the mundane. It's basically just an arbitrary set of rules because it's unprincipled in this sense, weird stuff happens and it all needs to be specified case by case.

Basically we sacrifice consistency and the developer has to shoulder the extra burden of making sure there's no funny (and expensive) conversions happening behind the scene's. To quote this article

Language consistency is very important for developer efficiency. Every inconsistent language feature means that developers have one more thing to remember, one more reason to rely on the documentation, or one more situation that breaks their focus. A consistent language lets developers create habits and expectations that work throughout the language, learn the language much more quickly, more easily locate errors, and have fewer things to keep track of at once.

EDIT:

Another gem I stumbled across

  NULL == 0
  NULL < -1

So if you try to sort anything, it's nondetermistic and entirely dependent on the order in which comparisons are made. Eg suppose bubble sort.

  bubble_sort([NULL, -1, 0]) // [NULL, -1, 0]
  bubble_sort([0, -1, NULL]) // [-1, 0, NULL]

Related Solutions

Coding Style – Is Changing Variable Types Mid-Procedure Bad?

I'll go out on a limb and say: No, this is a terrible idea.

It's just a special case of reusing a variable, which is a bad idea - mainly because it makes it hard to understand what a variable contains at any given point in the program flow. See e.g. Should I reuse variables?

About your points: The points you raise are valid, it's just that reusing the variable is not a good solution :-).

a) it conveys that the response variable contains basically the same information, just 'transformed' into a different type

Providing this information is a good idea. However, don't do this by using the same variable, because then you obscure the fact that it the information was transformed. Rather, use names with a common pre-/postfix. In your example:

rawResponse = urlopen(some_url)
[...]    
jsonResponse = response.read()
[...]    
responseData = json.loads(response)
[...]

This makes it clear that the variables are closely related, but also that they do not contain the same data.

b) it conveys that the earlier objects aren't going to be needed any further down the function, since by reassigning over their variable I've made them unavailable to later code.

Again, communicating this "no longer needed" is good, but don't do it by reusing the variable: The reuse assignement will usually be hard to see, so you only confuse the reader.

Rather, if a variable lives long after its last use, that is an indication the method/function is too long. Split the part with the short-lived variables into a sub-function; that makes the code easier to read, and limits the variable lifetime.

Note: I usually even go one step further than not reusing variables, and try to even only assign a value once (i.e. never change the value, make it immutable). This is an idea mainly from functional languages, but I found it can make code much clearer. Of course, in non-functional languages, you sometimes need to change a variable (obvious example being a loop variable), but once you start looking, you'll see that in most cases a "fresh" variable makes for more readable and less bug-prone code.

Java Operators – Why == Operator is Not Used for String Comparison

I guess it's just consistency, or "principle of least astonishment". String is an object, so it would be surprising if was treated differently than other objects.

At the time when Java came out (~1995), merely having something like String was total luxury to most programmers who were accustomed to representing strings as null-terminated arrays. String's behavior is now what it was back then, and that's good; subtly changing the behavior later on could have surprising, undesired effects in working programs.

As a side note, you could use String.intern() to get a canonical (interned) representation of the string, after which comparisons could be made with ==. Interning takes some time, but after that, comparisons will be really fast.

Addition: unlike some answers suggest, it's not about supporting operator overloading. The + operator (concatenation) works on Strings even though Java doesn't support operator overloading; it's simply handled as a special case in the compiler, resolving to StringBuilder.append(). Similarly, == could have been handled as a special case.

Then why astonish with special case + but not with ==? Because, + simply doesn't compile when applied to non-String objects so that's quickly apparent. The different behavior of == would be much less apparent and thus much more astonishing when it hits you.

Best Answer

Related Solutions

Coding Style – Is Changing Variable Types Mid-Procedure Bad?

Java Operators – Why == Operator is Not Used for String Comparison

Related Topic