How to configure apache’s mod_proxy_html to work as an ajax proxy

apache-2.2mod-proxy

I'm trying to build a web site that let's you view and manipulate data from any page in any other website. To do that, I have to bypass 'Allow Origin' problems: i'm loading the other domain's content in an iframe and i have to manipulate its content with javascript downloaded from my domain.

My first attempt was to write a simple proxy myself, requesting the other domains page through a server proxy coded in Java that not only serves the content but rebuilds links (src's and href's) in the content so that the content referenced by these links alse get downloaded through my handmade proxy. The result is not bad but has problems with url's in css and scripts.

It's then that i realized that mod_proxy_html is supposed to do exactly all this job. The problem is that i cannot figure out how to make it work as expected.

Let's suppose my server runs in my-domain.com and to proxy and transform content from another domain i'd make a request like this:

my-domain.com/proxy?url=http://another-domain.com/some/content

I'd want mod_proxy_html to serve the content and rewrite following URLs in http://another-domain.com/some/content in the following ways:

  1. Absolute URLs not from another-domain.com: no rewritting
  2. Relative from root urls:/other/content -> /proxy?url=http://another-domain.com/other/content
  3. Relative urls: other/content -> /proxy?url=http://another-domain.com/some/content/other/content
  4. Relative to parent urls: ../other/content -> /proxy?url=http://another-domain.com/some/other/content

The url should be specified at runtime, not configuration time.

Can this be achieved with mod_proxy_html? Could anyone provide a simple working configuration to start with?

EDIT 1-First approach

The following site config will work fine with sites that use absolute url's everywhere like http://www.huffingtonpost.es/. Youc could try on this config on localhost: http://localhost/asset/http://www.huffingtonpost.es/

<VirtualHost *:80>
    ServerName localhost

    LogLevel debug

    ProxyRequests off
    RewriteEngine On
    RewriteRule ^/asset/(.*) $1 [P]
    ProxyHTMLURLMap $1 /asset/


    <Location /asset/>
            ProxyPassReverse /
        ProxyHTMLURLMap / /asset/
    </Location>
</VirtualHost>

But as explained in the documentation, if I hit a site using relative url's, I'd like to have these rewritten on the html via mod_proxy_html. So I shoud change the Location block as follows:

    <Location /asset/>
            ProxyPassReverse /

            #Depending on your system use one line or the other
            #Ubuntu:
            #SetOutputFilter proxy-html
            #any other system:
            ProxyHTMLEnable On 

        ProxyHTMLURLMap / /asset/
    </Location>

…which doesn't seem to work. Comments, hints and ideas welcome!

Best Answer

Here's a thought on how to do this - it's a bit more complex in setup, but I think it would be secure. I'm currently unable to test this since I can't get at my test server, but it's a start.

The main problem is that if you just set up a ProxyPassReverse, you also need to specify which servers you connect to. Since you want to be able to use this on multiple servers, that would be a bit of a pain, to put it mildly. So here's a two-step approach to neatly sidestep this problem.

First, setup a separate apache instance to listen only on 127.0.0.1 and a specific port - I've gone with 2323 in my example. This instance should be configured as a straightforward proxy server and do no rewriting. Example:

<Proxy *>
   Order Deny,Allow
   Deny from all
   Allow from 127.0.0.1
</Proxy> 

On your main server, set up a reverse proxy, something like this:

<Location /proxy/>
    ProxyPass http://127.0.0.1:2323
    ProxyPreserveHost On
    ProxyHtmlEnable ON
    ProxyHtmlMap / /proxy/
</Location> 

This will mean that the actual proxying works as a regular proxy, while the rewriting happens in the same apache instance where the script runs. Again, note that this is untested, but I think it's the right direction to start looking.

Related Topic