When you use a proxy server, and Clean URLs, the URL the visitor enters in their browser is modified twice before it reaches Drupal.
The URL is modified once by the proxy server, and once by the local mod_rewrite directives.
You enter /XYZ
The proxy server changes that to /group/foo/cgi-bin/drupal/XYZ
The mod_rewrite directives in .htaccess change that to /group/foo/cgi-bin/drupal/index.php?q=XYZ
Drupal doesn't know the first change took place since it took place on a different server. When it queries its Apache for the URL entered by the user (REQUEST_URI) it gets the one that's already been modified by the proxy server. It'll get /group/foo/cgi-bin/drupal/XYZ as the response, not /XYZ.
Drupal then uses REQUEST_URI as the value of the action attribute for forms like the one used to turn on Clean URLs.
So, the chain of events unfolds this way:
You create the mod_rewrite directives. You can now go to: http://techcommons.stanford.edu/admin/settings/clean-urls and it looks like you can turn on Clean URLs using the form, but when you hit update, you get a 404 error. If you look at the source of the form you can see the form is set to submit to:
/group/extech/cgi-bin/drupal/admin/settings/clean-urls
instead of
/admin/settings/clean-urls
When that request goes through the proxy server, an additional path is added, repeating "/group/extech/cgi-bin/drupal" twice.
The fix is pretty simple, remove the extra path information that arrives from the proxy server. Unfortunately, you have to specify exactly what to strip out. In includes/boostrap.inc:
/**
* Since $_SERVER['REQUEST_URI'] is only available on Apache, we
* generate an equivalent using other environment variables.
*/
function request_uri() {
if (isset($_SERVER['REQUEST_URI'])) {
$uri = $_SERVER['REQUEST_URI'];
// mrmarco - removing the extra path information found in REQUEST_URI
$uri = preg_replace('/\/group\/extech\/cgi-bin\/drupal/', '', $uri);
}
else {
if (isset($_SERVER['argv'])) {
$uri = $_SERVER['SCRIPT_NAME'] .'?'. $_SERVER['argv'][0];
}
else {
$uri = $_SERVER['SCRIPT_NAME'] .'?'. $_SERVER['QUERY_STRING'];
}
}
return $uri;
}
The new lines are:
// mrmarco - removing the extra path information found in REQUEST_URI
$uri = preg_replace('/\/group\/extech\/cgi-bin\/drupal/', '', $uri);
The extra path information appears on Drupal installations using a proxy server even when Clean URLs are not involved.
Forms would still post to:
http://techcommons.stanford.edu/group/extech/cgi-bin/drupal/?q=admin/set...
rather than
http://techcommons.stanford.edu/?q=admin/settings/clean-urls
But, Drupal doesn't seem to care. I think it's because it can still tell where the query part of the URL begins (thanks to the ?).
In fact, you could have complete junk in front of the ? and Drupal would still work.
http://techcommons.stanford.edu/qwertyqwerty/?q=drupal
and
https://www.stanford.edu/group/ic/cgi-bin/drupal/thisisignoredstuff?q=faq
Cheers,
- marco