Design Patterns – How to Decouple Configuration Data from Programs

design-patterns

I am a beginning programmer who has written a spider application in PHP. Currently there are three parts:

1) The Spider (spider.php)
2) The Harvester (harvest.php)
3) The Configuration file (for example, craigslist_config.php)

I use the spider to search the web for items I want to buy. An item can be found on any website, like ebay, craigslist, etc. The Harvester provides three functions to spider so it can act on the data it finds – get_title_from($markup), get_description_from($markup), and get_price_from($markup).

Each web site that I want to spider has, of course, different markup surrounding the data that I want to extract. My config file contains a configuration array that holds the regex patterns for each of the items I want to find. The structure of the file is always the same, the only thing that changes is the regex patterns. So, I would have craigslist_config.php, ebay_config.php, etc.

$conf = array(
    'title' => ' specific_site title pattern',
    'description => 'specific_site description pattern',
    'price' => 'specific_site price pattern'
);

My problem is when I want to add a new website. I have to edit the Spider.php file and add to an ever-growing "if, elseif" statement that detects what site is currently being read, and load the correct config file, which in turn feeds the correct REGEX data to the harvester functions.

How can I decouple my configuration from my Spider.php file? What I have designed does not feel like a flexible, scalable solution, and I don't want to have to mess with spider.php everytime I want to add or take away a new site.

Ultimately, what I am trying to achieve is the ability to simply drop in a new configuration file into my config directory and move the 'if, elseif' logic somewhere else so that the spider and harverster functions never have to worry about what files are or are not included in the config directory. It's the "somewhere else" I am having trouble figuring out. Actually, it would be even better if I could get rid of the 'if else' logic all together so that everything just 'works.'

My current design is not an OOP approach, however I am not opposed to one. I am currently reading, "PHP Objects, Patterns, and Practice" to get up to speed on OOP and related design patterns, so feel free to suggest in that direction should you feel it a solution.

EDIT: Based on Doc Brown's direction, I have come up with the following. I have individual configuration files with content like so:

$conf['specificwebsite1.com'] = array(
    'title' => 'title pattern',
    'price' => 'price pattern',

    etc...
);  

In my Harvester file I have a new function called load_config($url, $config). As suggested, I loops through all the configuration files and load them into one large $conf array. Then, the load_config function checks if the key is a sub string of the url I'm currently reading. If so, then it loads all the necessary values to continue parsing. This is the function:

function load_config($url, $config){
  foreach($config as $key => $value){
    if(stristr($url, $key) !== FALSE){
      ## see if a key in our config file
      ## is a substring of our url. 
      $conf = $config[$key];
      break;
    } else {

      $conf = FALSE;
    }
  }

  return $conf;
}

This is working really well, so I'll accept that as the answer. But please feel free to make suggestions for improvements in the comments or as another answer.

Best Answer

Ultimately, what I am trying to achieve is the ability to simply drop in a new configuration file into my config directory

First, add the related web address to your config file:

$conf = array(
    'url' => ' specific_site url or url pattern',
    'title' => ' specific_site title pattern',
    'description => 'specific_site description pattern',
    'price' => 'specific_site price pattern'
);

Now, change the code in your Spider.php that it checks all .php files in that config directory, loads them all dynamically one-by-one, and store the "$conf" content in a dictionary where the url is used as a key. Then it should be easy to replace your "if-else" list by a simple loop over that dictionary.

Related Topic