Programming Languages – Declaring Variables in Python and PHP

debuggingdeclarationserrorsprogramming-languages

The question is how to cope with absence of variable declaration in Python, PHP, and the like.

In most languages there is a way to let the compiler know whether I introduce a new variable or refer to an existing one: my in Perl (use strict) or \newcommand vs. \revewcommand in LaTeX. This prevents from two major sources of errors and headache:

(1) accidentally using the same name of a variable for two different purposes, such as in (PHP)

$v = $square * $height;
<...lots of code...>
foreach ($options as $k => $v)
    echo "For key $k the value is $v\n";
<...lots of code...>
echo "The volume is $v";

or a lot nastier (PHP)

$a = [1, 2, 3];
foreach ($a as $k => &$v)
    $v++;
foreach ($a as $k => $v)
    echo "$k => $v\n";

(can you see a bug here? try it!); and

(2) prevent from typos (PHP):

$option = 1;
$numberofelements = 1;
if ($option)
{
    $numberofelenents = 2;
}
echo $numberofelements;

(can you see a bug here? PHP will execute silently).

Using something like my (Perl)

use strict; 
my $option = 1;
my $numberofelements = 1;
if ($option)
{
    $numberofelenents = 2;
}
say $numberofelements;

(Perl will immediately report the bug) is a tiny effort and HUGE benefit both in debug time and (much more importantly) in losses (potentially huge) from incorrect programs.

However, some languages, notably Python, PHP, and JavaScript, do not give any protection from these types of bugs.

My question is how can we effectively cope with this?

The only way I can foresee is to create two functions (PHP):

function a ($x)
{
    if (isset ($x))
        die();
    else
        return &$x;
}

and

function the ($x)
{
    if (isset ($x))
        return &$x;
    else
        die();
}

and use them always:

a($numberofelements) = 1;
the($numberofelenents)++;
say the($numberofelements);

but of course this is extremely cumbersome. Any better way of effectively protecting from such errors?

No, "use another language", "be careful and don't make errors", and "split your code in tiny functions" are not good answers (the latter may protect from the errors of type 1 but not type 2).

Best Answer

In my experience, there are three ways to prevent the problems you described above:

Limit the scope of your variables
Name your variables something meaningful and descriptive
Use a pre-compiler to notify of any errors (Doval mentioned pylint for Python)

1) Limiting the scope of your variables will limit the first error. You will have fewer variables that have the possibility of containing the same name. Odds are that you won't have any collisions. You can limit scope by declaring variables only in the scope that they will be used. The reason this works is because variables will be disposed of as a result of the natural cycle in your code. I've provided an example below for clarify.

class:
    classVariable = "classVar";

    function ThisIsAFunction(functionVar) {
        var functionVar2 = "functionVar2";
        if functionVar > functionVar2 :
            var ifStatementVar = "ifStatementVar";

            for i in range(0,2):
                ifStatementVar += i;
            // i will go out of scope here
        // ifStatementVar will go out of scope here
    // functionVar and functionVar2 will go out of scope here

2) Naming your variables something meaningful will go a long way to preventing re-use of the same variable name. The key in naming your variables is to make them specific enough that their name cannot be reused. When refactoring code it is a good idea to look for function, variable and class names that can be renamed to better reflect their purpose and meaning. An example of good variable names is the following:

function GetSumOfTwoIntegers(intFirstNum, intSecondNum):
    return intFirstNum + intSecondNum;

There is a lot of discrepency when deciding on good names. Everyone has their own style. The main thing to ensure is that you it is clear to yourself and others what the method, parameter or class is supposed to do and be used for. GetSumOfTwoIntegers as a method name tells anyone calling this method that they need to pass in two integers and they will be receiving the sum as a result.

3) Finally, you can use a pre-compiler to tell you of any mistakes that have been made. If you are using a good IDE, it will notify you of any errors. Visual Studio uses Intellisence to let the developer know of any errors before compiling. Most languages have an IDE that supports this functionality. Using one would certainly solve your second problem.

The reason someone might choose to create the syntax of a language in a specific way is hard to determine. I can posture that in Python's case it was likely that the creator wanted to type less when writing code. It only takes a print statement to create a Hello World program in Python. Creating a comparable program in Java requires a lot more typing. Anyways, I don't really know why the creator chose this syntax.

Related Solutions

How should compilers report errors and warnings

Your question doesn't seem to actually be about how we report compiler errors - rather, it's about the classification of problems and what to do about them.

If we start by assuming, for the moment, that the warning/error dichotomy is correct, let's see how well we can build on top of that. Some ideas:

Different "levels" of warning. A lot of compilers sort-of implement this (for example GCC has lots of switches for configuring exactly what it will warn about), but it needs work - for example, reporting what severity a reported warning is, and the ability to set "warnings are errors" for only warnings above a specified severity.
Sane classification of errors and warnings. An error should only be reported if the code doesn't meet the specification, and hence cannot be compiled. Unreachable statements, while probably a coding error, should be a warning, not an error - the code is still "valid", and there are legitimate instances in which one would want to compile with unreachable code (quick modifications for debugging, for instance).

Now things I disagree with you on:

Making extra effort to report every problem. If there's an error, that breaks the build. The build is broken. The build will not work until that error is fixed. Hence, it's better to report that error immediately, rather than "carrying on" in order to try and identify everything else "wrong" with the code. Especially when a lot of those things are probably caused by the initial error anyway.
Your specific example of a warning-that-should-be-an-error. Yes, it's probably a programmer mistake. No, it shouldn't break the build. If I know the input to the function is such that it will always return a value, I should be able to run the build and do some tests without having to add those extra checks. Yes, it should be a warning. And a damn high-severity one at that. But it shouldn't break the build in and of itself, unless compiling with warnings-are-errors.

Thoughts?

Php – Why is PHP’s method of comparing different types bad

The biggest problem is that an equivalence relationship, the mathy term for things like ==, is supposed to satisfy 3 laws.

reflexivity, a == a
commutativity a == b means b == a
transitivity a == b and b == c means a == c

All of these are very intuitive and expected. And PHP doesn't follow them.

'0'==0 // true
 0=='' // true
'0'==''// false, AHHHH

So it's not actually an equivalence relationship, which is a pretty distressing realization for some mathy people (including me).

It also hints at one of the things that people really hate about implicit casts, they often behave unexpectedly when combined with the mundane. It's basically just an arbitrary set of rules because it's unprincipled in this sense, weird stuff happens and it all needs to be specified case by case.

Basically we sacrifice consistency and the developer has to shoulder the extra burden of making sure there's no funny (and expensive) conversions happening behind the scene's. To quote this article

Language consistency is very important for developer efficiency. Every inconsistent language feature means that developers have one more thing to remember, one more reason to rely on the documentation, or one more situation that breaks their focus. A consistent language lets developers create habits and expectations that work throughout the language, learn the language much more quickly, more easily locate errors, and have fewer things to keep track of at once.

EDIT:

Another gem I stumbled across

  NULL == 0
  NULL < -1

So if you try to sort anything, it's nondetermistic and entirely dependent on the order in which comparisons are made. Eg suppose bubble sort.

  bubble_sort([NULL, -1, 0]) // [NULL, -1, 0]
  bubble_sort([0, -1, NULL]) // [-1, 0, NULL]

Best Answer

Related Solutions

How should compilers report errors and warnings

Php – Why is PHP’s method of comparing different types bad

Related Topic