![]() |
Home / Documentation / Tutorials / | ![]() |
||
![]() ![]() ![]() |
||||
Web Content Compression FAQ | ||||
![]() ![]() |
|
||
Apache::Registry
with Apache::Dynagzip
?
Apache::Dynagzip
?
Apache::Dynagzip
is supposed to be ported to Apache-2?
gzip
and deflate
formats?
Compression of outgoing traffic from web servers is beneficial for clients who get quicker responses, as well as for providers who experience less consumption of bandwidth.
Recently content compression for web servers has been provided mainly through use of the gzip format.
Other (non perl) modules are available that provide
so-called deflate
compression.
Both approaches are currently very similar and use the LZ77 algorithm
combined with Huffman coding.
Luckily for us, there is no real need to understand all the details
of the obscure underlying mathematics in order to compress
outbound content.
Apache handlers available from CPAN can usually do the dirty work for us.
Content compression is addressed through
the proper configuration of appropriate handlers in the httpd.conf file.
Compression by its nature is a content filter:
It always takes its input as plain ASCII data that it converts
to another binary
form and outputs the result to some destination.
That's why every content compression handler usually belongs
to a particular chain of handlers within the content generation phase
of the request-processing flow.
A chain of handlers is one more common term that is good to know about
when you plan to compress data.
There are two of them recently developed for Apache 1.3.X:
Apache::OutputChain
and Apache::Filter
.
We have to keep in mind
that the compression handler developed for one chain usually fails
inside another.
Another important point deals with the order of execution of handlers
in a particular chain.
It's pretty straightforward in Apache::Filter
.
For example, when you configure
PerlModule Apache::Filter <Files ~ "*\.blah"> SetHandler perl-script PerlSetVar Filter On PerlHandler Filter1 Filter2 Filter3 </Files>
the content will go through Filter1
first,
then the result will be filtered by Filter2
,
and finally Filter3
will be invoked to make the final changes
in outgoing data.
However, when you configure
PerlModule Apache::OutputChain PerlModule Apache::GzipChain PerlModule Apache::SSIChain PerlModule Apache::PassHtml <Files *.html> SetHandler perl-script PerlHandler Apache::OutputChain Apache::GzipChain Apache::SSIChain Apache::PassHtml </Files>
execution begins with Apache::PassHtml
.
Then the content will be processed with Apache::SSIChain
and finally with Apache::GzipChain
.
Apache::OutputChain
will not be involved in content processing at all.
It is there only for the purpose of joining other handlers within the chain.
It is important to remember that the content compression handler should always be the last executable handler in any chain.
Another important problem of practical implementation
of web content compression deals with the fact
that some buggy web clients declare the ability to receive
and decompress gzipped data in their HTTP requests,
but fail to keep their promises when an actual compressed response arrives.
This problem is addressed through the implementation of
the Apache::CompressClientFixup
handler.
This handler serves the fixup
phase of the request-processing flow.
It is compatible with all known compression handlers and is available from CPAN.
Web content compression noticeably increases delivery speed to clients and may allow providers to serve higher content volumes without increasing hardware expenditures. It visibly reduces actual content download time, a benefit most apparent to users of dialup and high-traffic connections.
The compression ratio is highly content-dependent. For example, if the compression algorithm is able to detect repeated patterns of characters, compression will be greater than if no such patterns exist. You can usually expect to realize an improvement between of 3 to 20 times on regular HTML, JavaScript, and other ASCII content. I have seen peak HTML file compression improvements in excess of more than 200 times, but such occurrences are infrequent. On the other hand I have never seen ratios of less than 2.5 times on text/HTML files. Image files normally employ their own compression techniques that reduce the advantage of further compression.
On May 21, 2002 Peter J. Cranstone wrote to the mod_gzip@lists.over.net mailing list:
"...With 98% of the world on a dial up modem, all they care about is how long it takes to download a page. It doesn't matter if it consumes a few more CPU cycles if the customer is happy. It's cheaper to buy a newer faster box, than it is to acquire new customers."
This approach works in most of the cases I have seen. In some special cases you will need to take extra care with respect to the global architecture of your web application, but such cases may generally be readily addressed through various techniques. To date I have found no fundamental barriers to practical implementation of web content compression.
All modern browser makers claim to be able to handle compressed content and are able to decompress it on the fly, transparent to the user. There are some known bugs in some old browsers, but these can be taken into account through appropriate configuration of the web server.
I strongly recommend use of the Apache::CompressClientFixup
handler
in your server configuration in order to prevent compression
for known buggy clients.
a mod_perl handler developed by Ken Williams (U.S.).
Apache::Compress
is capable to gzip
output through Apache::Filter
.
This module accumulates all incoming data and then compresses
the whole content body at once.
a mod_perl handler, developed by Slava Bizyayev -- a
Russian programmer residing in the U.S.
Apache::Dynagzip
uses the gzip format to compress
output through the Apache::Filter
or through the internal
Unix pipe.
Apache::Dynagzip
is most useful when one needs to compress dynamic
outbound web content (generated on the fly from databases, XML, etc.)
when content length is not known at the time of the request.
Apache::Dynagzip
's features include:
Vary
HTTP header.
Expires
HTTP header.
removal of leading blank spaces and/or blank lines, which works for all browsers, including older ones that cannot uncompress gzip format.
an example of mod_perl filter developed by Lincoln Stein and Doug
MacEachern for their book Writing Apache Modules with Perl and C
(U.S.), which like Apache::Compress
works through Apache::Filter
.
Apache::Gzip
is not available from CPAN.
The source code may be found on the book's companion web site at
http://www.modperl.com/
a mod_perl handler developed by Andreas Koenig (Germany), which
compresses output through Apache::OutputChain
using the gzip format.
Apache::GzipChain
currently provides in-memory compression only.
Using this module under perl-5.8
or higher is appropriate for Unicode data.
UTF-8 data passed to Compress::Zlib::memGzip()
are converted to raw
UTF-8 before compression takes place.
Other data are simply passed through.
If your page/application is initially configured like
<Directory /path/to/subdirectory> SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On Options +ExecCGI </Directory>
you might want just to replace it with the following:
PerlModule Apache::Filter PerlModule Apache::Dynagzip PerlModule Apache::CompressClientFixup <Directory /path/to/subdirectory> SetHandler perl-script PerlHandler Apache::RegistryFilter Apache::Dynagzip PerlSendHeader On Options +ExecCGI PerlSetVar Filter On PerlFixupHandler Apache::CompressClientFixup PerlSetVar LightCompression On </Directory>
You should be all set usually after that.
In more common cases you need to replace the line
PerlHandler Apache::Registry
in your initial configuration file with the set of the following lines:
PerlHandler Apache::RegistryFilter Apache::Dynagzip PerlSetVar Filter On PerlFixupHandler Apache::CompressClientFixup
You might want to add optionally
PerlSetVar LightCompression On
to reduce the size of the stream even for clients incapable to speak gzip (like Microsoft Internet Explorer over HTTP/1.0).
Finally, make sure you have somewhere declared
PerlModule Apache::Filter PerlModule Apache::Dynagzip PerlModule Apache::CompressClientFixup
This basic configuration uses many defaults.
See Apache::Dynagzip
POD for further thin tuning if required.
HTML::Mason::ApacheHandler
is compatible with
Apache::Filter
chain.If your application is initially configured like
PerlModule HTML::Mason::ApacheHandler <Directory /path/to/subdirectory> <FilesMatch "\.html$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch> </Directory>
you might want just to replace it with the following:
PerlModule HTML::Mason::ApacheHandler PerlModule Apache::Dynagzip PerlModule Apache::CompressClientFixup <Directory /path/to/subdirectory> <FilesMatch "\.html$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler Apache::Dynagzip PerlSetVar Filter On PerlFixupHandler Apache::CompressClientFixup PerlSetVar LightCompression On </FilesMatch> </Directory>
You should be all set safely after that.
In more common cases you need to replace the line
PerlHandler HTML::Mason::ApacheHandler
in your initial configuration file with the set of the following lines:
PerlHandler HTML::Mason::ApacheHandler Apache::Dynagzip PerlSetVar Filter On PerlFixupHandler Apache::CompressClientFixup
You might want to add optionally
PerlSetVar LightCompression On
to reduce the size of the stream even for clients incapable to speak gzip (like Microsoft Internet Explorer over HTTP/1.0).
Finally, make sure you have somewhere declared
PerlModule Apache::Dynagzip PerlModule Apache::CompressClientFixup
This basic configuration uses many defaults.
See Apache::Dynagzip
POD for further thin tuning.
Apache::Dynagzip
is the only handler to date
that begins transmission of compressed data as soon
as the initial uncompressed pieces of data arrive
from their source, at a time when the source process
may not even have completed generating the full document
it is sending.
Transmission can therefore be taking place concurrent
with creation of later document content.
This feature is mainly beneficial for HTTP/1.1 requests, because HTTP/1.0 does not support chunks.
I would also mention
that the internal buffer in Apache::Dynagzip
always prevents Apache from the creating too short chunks over HTTP/1.1,
or from transmitting too short pieces of data over HTTP/1.0.
an Apache handler written in C by Igor Sysoev (Russia).
an Apache handler written in C. Original author: Kevin Kiley, Remote Communications, Inc. (U.S.)
Both of these modules support HTTP/1.0 only.
For example, vanilla Apache 1.3.X is CGI/1.1 compatible. It allows development of CGI scripts/programs that might be generating compressed outgoing streams accomplished with specific HTTP headers.
Alternatively, on mod_perl enabled Apache some application environments carry their own compression code that could be activated through the appropriate configurations:
Apache::ASP
does this with the CompressGzip
setting;
Apache::AxKit
uses the AxGzipOutput
setting to do this.
See particular package documentation for details.
mod_deflate
, has recently become available for Apache-2.mod_deflate
for Apache-2 is written by Ian Holsman (USA).
This module supports HTTP/1.1 and is filters compatible.
Despite its name mod_deflate
for Apache-2 provides gzip
-encoded content.
It contains a set of configuration options sufficient to keep control
over all recently known buggy web clients.
Apache::Dynagzip
to Apache-2:mod_deflate
for Apache-2 seems to be capable to provide all basic functionality
required for dynamic content compression:
The rest of the main Apache::Dynagzip
options could be easily addressed
through the implementation of pretty tiny and specific accomplishing filters.
gzip
format is published as rfc1952,
and deflate
format is published as rfc1951.You can find many mirrors of RFC archives on the Internet. Try, for instance, my favorite at http://www.ietf.org/rfc.html
You can use Apache::CompressClientFixup
to disable compression
for these files dynamically.
Apache::Dynagzip
is capable of providing
so-called light compression
for these files.
Try it at http://www.schroepl.net/projekte/mod_gzip/browser.htm
I highly appreciate efforts of Dan Hansen done in order to make this text better English...
The maintainer is the person you should contact with updates, corrections and patches.
Slava Bizyayev <slava (at) cpan.org>
Slava Bizyayev <slava (at) cpan.org>
Only the major authors are listed above. For contributors see the Changes file.
|
![]() |
![]() ![]() ![]() |