Php cURL. preg_match , extract text from xhtml

curlPHPpreg-match

I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get

 Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22

I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! .
Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.

$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); curl_exec($ch); curl_close($ch); $result2 = str_replace('"', '\"', $result);

$tagname1= ");</script> "; $tagname2= "</noscript> per month</a>";

$pattern = "/$tagname1(.*?)$tagname2/"; preg_match($pattern, $result, $matches); $prices = $matches[1]; print_r($prices); ?>

Best Answer

I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.



$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);

preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);

Outputs



Array
(
    [0] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

    [1] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

)

I tried this on my box and it worked - let me know if it worked for you

Related Topic