Reverse Proxy Server: mod proxy html

MPG UniCenter

MPG UniCenter

Uniform Server 3.5-Apollo
Reverse Proxy.

If you are going to do any serious work with reverse proxies you will require one essential module mod proxy html it allows embedded page links to be written on-the-fly. It is a third party module not included in the standard Apache package. This page provides details where to obtain and how to use it.

To understand why mod proxy html is essential examples on this page, proxy real applications (MediaWiki, Wordpress and Joomla) using the ready to go mini-servers.

Downloads

Mini-Servers

I have assumed you have downloaded the following mini-servers for details check out Mini Servers Ready To Go

  • Mini Server 15 - MediaWiki 1.12.0
  • Mini Server 16 - WordPress 2.6.1
  • Mini Server 18 - Joomla 1.5.6
  • Mini Server 20 - Reverse Proxy (Note includes mini-server 6)

mod proxy html

Mini Server 20 includes mod_proxy_html 3.0.1 for the latest version check out Alachelounge

The mini server uses the files extracted from:

mod_proxy_html-3.0.1-w32.zip 27 Jun '08 485K - output filter to rewrite HTML links in a proxy situation

Found on the download page

Installing - mod proxy html

Extract all files from mod_proxy_html-3.0.1-w32.zip. This creates a new folder mod_proxy_html-3.0.1-w32 inside this you will find folder mod_proxy_html containing files that need to be copied to Apache as follows:

  • Copy file mod_proxy_html.so to folder \udrive\usr\local\apache2\modules
  • Copy file libxml2.dll to folder \udrive\usr\local\apache2\bin
  • Copy file proxy_html.conf to folder \udrive\usr\local\apache2\conf

httpd.conf

Open Apache's configuration file: httpd.conf Located in folder: \udrive\usr\local\apache2\conf

# ===================================================
# Modules
# ===================================================
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule dir_module modules/mod_dir.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule mime_module modules/mod_mime.so
LoadModule php5_module "/usr/local/php/php5apache2_2.dll"
Loadfile "/usr/local/php/libmysql.dll"

LoadModule proxy_module modules/mod_proxy.so
#LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
#LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
#LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
LoadModule proxy_http_module modules/mod_proxy_http.so

LoadFile /usr/local/apache2/bin/libxml2.dll
LoadModule proxy_html_module modules/mod_proxy_html.so

LoadModule rewrite_module modules/mod_rewrite.so

Add the lines highlighted bold.

LoadFile /usr/local/apache2/bin/libxml2.dll - forces the executables to be loaded.

LoadModule proxy_html_module modules/mod_proxy_html.so - load the Apache module.

Towards the end of the configuration file add the line highlighted bold just above the virtual host:

# ===================================================

Include conf/proxy_html.conf

NameVirtualHost *

<VirtualHost *>

Note: The examples use the default settings in proxy_html.conf.

Top

Proxy commands - revisited

When using mod proxy html the two basic commands ProxyPass and ProxyPassReverse are still required. However ProxyPassReverse is now contained in a Location block. This has the advantage of being more robust and allows additional commands to be added that specifically target the requested location.

From the previous page the following shows server 6 with original proxy commands and the new location block.

ProxyPass /info/ http://localhost:8086/
ProxyPassReverse  /info/  http://localhost:8086/

ProxyPass: Any requests to folder info are passed to the proxy server localhost:8086.This command, maps a remote server into your proxy server’s name space.

ProxyPassReverse: Has the same syntax as ProxyPass, it rewrites the response URI to keep it pointing at the same place as seen by a browser.

ProxyPass  /info/  http://localhost:8086/

<Location /info/>
 ProxyPassReverse http://localhost:8086/
</Location>

ProxyPassReverse http://localhost:8086/ is actually shorthand for ProxyPassReverse /info/ http://localhost:8086/. This shorthand is possible because the ProxyPass directive is inside a <Location> block.

This command masquerades one server as another. Apache adjusts URL's (location, Content-Location and URI headers) from the reverse-proxied server so they look as if they came from the server running the proxy engine.

Both of the above produce the same result, when a user types the following http://some_domain/info/ into a browser the contents from server http://localhost:8086/ will seamlessly appear in folder info.

Top

Example 4 - Standard Apache reverse proxy

An interesting experiment is to proxy our test servers using only ProxyPass and ProxyPassReverse. Change Uniform Server’s config file httpd.conf as shown below.

If using mini-serer 20 open folder example 4 and copy file httpd.conf up one level into folder \udrive\usr\local\apache2\conf and let it overwrite the existing file.

NameVirtualHost *

<VirtualHost *>
 ServerName localhost:80
 DocumentRoot /www

ProxyRequests off
<Proxy *>
  Order deny,allow
  Deny from all
  Allow from 127.0.0.1
</Proxy>

ProxyPass /info/ http://localhost:8086/
<Location /info/>
  ProxyPassReverse  http://localhost:8086/
</Location>

ProxyPass /wiki/ http://localhost:8095/wiki/
<Location /wiki/>
 ProxyPassReverse http://localhost:8095/wiki/
</Location>

ProxyPass /wordpress/ http://localhost:8096/wordpress/
<Location /wordpress/>
 ProxyPassReverse http://localhost:8096/wordpress/
</Location>

ProxyPass /joomla/ http://localhost:8098/joomla/
<Location /joomla/>
 ProxyPassReverse http://localhost:8098/joomla/
</Location>

</VirtualHost>

Start servers

  • Uniform Server first or Mini Server 20 - followed by:
  • Mini Server 6 - Info
  • Mini Server 15 - MediaWiki 1.12.0
  • Mini Server 16 - WordPress 2.6.1
  • Mini Server 18 - Joomla 1.5.6

Results as follows

Info

With the exception of Test 3 the sites are proxied with no problems. Test3 fails because it uses web-root relative links while Test1 and Test2 use relative links

Media Wiki

Checkout some of the Wiki links, note they work as do images. I would conclude from this that Media Wiki uses relative links.

Wordpress

At first sight it appears to work correctly. On the hello world page click either of the two links Uncategorized or 1 Comment, note what is displayed in your browser’s address bar. It’s the real Wordpress server address (domain) this clearly shows that absolute addresses break out of the proxy server.

Joomla

Checkout some of the links, note they work as do images. I would conclude from this that Joomla uses relative links.

With the configuration structure in place it is easy to apply mod proxy html to any servers showing link problems, Info and Wordpress. The other servers do not require rewriting if you do they will take a small extra amount of processing time.

Top

Example 5 - mod proxy html

For completeness and to show how symmetrical the code is. I have applied mod proxy to our test servers.

Change Uniform Server’s config file httpd.conf as shown below.

If using mini-serer 20 open folder example 5 and copy file httpd.conf up one level into folder \udrive\usr\local\apache2\conf and let it overwrite the existing file.

Include conf/proxy_html.conf

NameVirtualHost *

<VirtualHost *>
 ServerName localhost:80
 DocumentRoot /www

ProxyRequests off
<Proxy *>
  Order deny,allow
  Deny from all
  Allow from 127.0.0.1
</Proxy>

ProxyPass /info/ http://localhost:8086/
ProxyHTMLURLMap http://localhost:8086 /info
<Location /info/>
  ProxyPassReverse  http://localhost:8086/
  SetOutputFilter proxy-html
  ProxyHTMLURLMap /          /info/
  ProxyHTMLURLMap /info      /info
</Location>

ProxyPass         /wiki/  http://localhost:8095/wiki/
ProxyHTMLURLMap http://localhost:8095/wiki /wiki
<Location /wiki/>
 ProxyPassReverse http://localhost:8095/wiki/
 SetOutputFilter proxy-html
 ProxyHTMLURLMap /           /wiki/
 ProxyHTMLURLMap /wiki       /wiki
</Location>

ProxyPass /wordpress/ http://localhost:8096/wordpress/
ProxyHTMLURLMap http://localhost:8096/wordpress /wordpress
<Location /wordpress/>
 ProxyPassReverse http://localhost:8096/wordpress/
 SetOutputFilter proxy-html
 ProxyHTMLURLMap /              /wordpress/
 ProxyHTMLURLMap /wordpress     /wordpress
</Location>

ProxyPass /joomla/ http://localhost:8098/joomla/
ProxyHTMLURLMap http://localhost:8098/joomla /joomla
<Location /joomla/>
 ProxyPassReverse http://localhost:8098/joomla/
 SetOutputFilter proxy-html
 ProxyHTMLURLMap /            /joomla/
 ProxyHTMLURLMap /joomla      /joomla
</Location>

</VirtualHost>

Note: Each block contains three ProxyHTMLURLMap directives and a SetOutputFilter directive.

Looking at the first block:

ProxyPass /info/ http://localhost:8086/
ProxyHTMLURLMap http://localhost:8086 /info
<Location /info/>
  ProxyPassReverse  http://localhost:8086/
  SetOutputFilter proxy-html
  ProxyHTMLURLMap /           /info/
  ProxyHTMLURLMap /info      /info
</Location>

The first ProxyHTMLURLMap directive informs mod_proxy_html to rewrite any occurrences of http://localhost:8086 to /info.

This means any absolute URLs will be rewritten to location /info/, which will then get proxied appropriately.


SetOutputFilter informs Apache to pass the proxied content through the proxy-html filter. This performs all rewritting as follows:


The next ProxyHTMLURLMap directive handles URLs that are server web root relative, for example /test3. This will be rewritten to /info/test3/ and proxied correctly.


The last ProxyHTMLURLMap directive prevents creating infinite looping.

Top

Summary

Apache’s basic reverse proxy is ideal for sites that use only relative links however it was never intended to proxy sites containing absolute links or web root relative links. To proxy these sites requires a third party module mod proxy html this rewrites links within a page before a page is served.

References: Running a Reverse Proxy in Apache


Top


  Ric