PHP architecture, and pass-by-reference vs pass-by-value

pass-by-referencePHP

Seeking suggestions from PHP architects!

I'm not terribly familiar with PHP but have taken over maintenance of a large analytics package written in the language. The architecture is designed to read reported data into large key/value arrays, which are passed through various parsing modules to extract those report parameters known to each of those modules. Known parameters are removed from the master array, and any leftovers which were not recognized by any of the modules, are dumped into a kind of catch-all report showing the "unknown" data points.

There are a few different methods being used to call these parser modules, and I would like to know which if any are considered to be "proper" PHP structure. Some are using pass-by-reference, others pass-by-value, some are functions, some are objects. All of them modify the input parameter in some way.

A super-simplified example follows:

#!/usr/bin/php
<?php

$values = Array("a"=>1, "b"=>2, "c"=>3, "d"=>4 );


class ParserA {
    private $a = null;
    public function __construct(&$myvalues) {
        $this->a = $myvalues["a"];
        unset($myvalues["a"]);
    }
    public function toString() { return $this->a; }
}

// pass-by-value
function parse_b($myvalues) {
    $b = $myvalues["b"];
    unset($myvalues["b"]);
    return Array($b, $myvalues);
}

// pass-by-reference
function parse_c(&$myvalues) {
    echo "c=".$myvalues["c"]."\n";
    unset($myvalues["c"]);
}

// Show beginning state
print_r($values);

// will echo "1" and remove "a" from $values
$a = new ParserA($values);
echo "a=".$a->toString()."\n";
print_r($values);

// w ill echo "2" and remove "b" from $values
list($b, $values) = parse_b($values);
echo "b=".$b."\n";
print_r($values);

// will echo "3" and remove "c" from $values
parse_c($values);
print_r($values);

?>

The output will be:

Array
(
    [a] => 1
    [b] => 2
    [c] => 3
    [d] => 4
)
a=1
Array
(
    [b] => 2
    [c] => 3
    [d] => 4
)
b=2
Array
(
    [c] => 3
    [d] => 4
)
c=3
Array
(
    [d] => 4
)

I'm really uncomfortable having so many different call methods in use, some of which have hidden effects on the call function parameters using "&pointer"-style functions, some requiring the main body to write their output, and some writing their output independently.

I would prefer to choose a single methodology and stick with it. In order to do so, I would also like to know which is most efficient; my reading of the PHP documentation indicates that since it uses copy-on-write, there shouldn't be much performance difference between using pointers to vs passing the object directly and re-reading a return value. I would also prefer to use the object-oriented structure, but am uncomfortable with the hidden changes being made to the input parameter on the constructor.

Of the three calling methods, ParserA(), parse_b(), and parse_c(), which if any is the most appropriate style?

Best Answer

I'm not really an expert in PHP but from my experience passing by value is better. This way code won't have side effects and that mean it will be easier to understand and maintain and do all sorts of crazy things on it, like using it as callback for map function. So I'm all for parse_b way of doing things.