Php – Tor Web Crawler

PHPPROXYtortransparentproxyweb-crawler

Ok, here's what I need.
I have a PHP based web crawler.
It is accessible here:
http://rz7ocnxxu7ka6ncv.onion/
Now, my problem is that my spider that actually crawls pages needs to do so on a SOCKS port 9050. The thing is, I have to tunnel its connection through Tor so that It can resolve .onion domains, which is what I'm indexing. (Only ending in .onion.)
I call this script from the command line using php crawl.php, and I add the appropriate parameters to crawl the page.
Here is what I think:
Is there any way to force it to use Tor?
OR can i force my ENTIRE MACHINE to tunnel things through Tor, and how?
(Like forcing all traffic through 127.0.0.1:9050)
perhaps if i set up global proxy settings, php would respect them?

If any of my solutions work, how would I do it? (Step by step instructions please, I am a noob.)

I just want to crate my own Tor search engine. (Don't recommend my p2p search engines- it's not what I want for this- I know they exist, I did my homework.)
Here is the crawler source if you are interested to take a look at:
Perhaps someone with a kind heart can modify it to use 127.0.0.1:9050 for all crawling requests?
http://pastebin.com/kscGJCc5

Best Answer

cURL also supports SOCKS connections; try this:

<?php

$ch = curl_init('http://google.com'); 
curl_setopt($ch, CURLOPT_HEADER, 1); 
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); 

// SOCKS5
curl_setopt($ch, CURLOPT_PROXY, 'localhost:9050'); 
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);

curl_exec($ch); 
curl_close($ch);
Related Topic