mod_perl logo
perl icon







previous page: Workarounds for some known bugs in browsers.page up: Tutorialsno next page


Web Content Compression FAQ











Practical mod_perl

Practical mod_perl

By Stas Bekman, Eric Cholet
The mod_perl Developer's Cookbook

The mod_perl Developer's Cookbook

By Geoffrey Young, Paul Lindner, Randy Kobes
mod_perl Pocket Reference

mod_perl Pocket Reference

By Andrew Ford


Table of Contents

Basics of Content Compression

Compression of outgoing traffic from web servers is beneficial for clients who get quicker responses, as well as for providers who experience less consumption of bandwidth.

Recently content compression for web servers has been provided mainly through use of the gzip format. Other (non perl) modules are available that provide so-called deflate compression. Both approaches are currently very similar and use the LZ77 algorithm combined with Huffman coding. Luckily for us, there is no real need to understand all the details of the obscure underlying mathematics in order to compress outbound content. Apache handlers available from CPAN can usually do the dirty work for us. Content compression is addressed through the proper configuration of appropriate handlers in the httpd.conf file.

Compression by its nature is a content filter: It always takes its input as plain ASCII data that it converts to another binary form and outputs the result to some destination. That's why every content compression handler usually belongs to a particular chain of handlers within the content generation phase of the request-processing flow.

A chain of handlers is one more common term that is good to know about when you plan to compress data. There are two of them recently developed for Apache 1.3.X: Apache::OutputChain and Apache::Filter. We have to keep in mind that the compression handler developed for one chain usually fails inside another.

Another important point deals with the order of execution of handlers in a particular chain. It's pretty straightforward in Apache::Filter. For example, when you configure

  PerlModule Apache::Filter
  <Files ~ "*\.blah">
    SetHandler perl-script
    PerlSetVar Filter On
    PerlHandler Filter1 Filter2 Filter3
  </Files>

the content will go through Filter1 first, then the result will be filtered by Filter2, and finally Filter3 will be invoked to make the final changes in outgoing data.

However, when you configure

  PerlModule Apache::OutputChain 
  PerlModule Apache::GzipChain 
  PerlModule Apache::SSIChain 
  PerlModule Apache::PassHtml 
  <Files *.html>
  SetHandler perl-script
    PerlHandler Apache::OutputChain Apache::GzipChain Apache::SSIChain Apache::PassHtml
  </Files>

execution begins with Apache::PassHtml. Then the content will be processed with Apache::SSIChain and finally with Apache::GzipChain. Apache::OutputChain will not be involved in content processing at all. It is there only for the purpose of joining other handlers within the chain.

It is important to remember that the content compression handler should always be the last executable handler in any chain.

Another important problem of practical implementation of web content compression deals with the fact that some buggy web clients declare the ability to receive and decompress gzipped data in their HTTP requests, but fail to keep their promises when an actual compressed response arrives. This problem is addressed through the implementation of the Apache::CompressClientFixup handler. This handler serves the fixup phase of the request-processing flow. It is compatible with all known compression handlers and is available from CPAN.



TOP

Q: Why it is important to compress web content?



TOP

A: Reduced equipment costs and the competitive advantage of dramatically faster page loads.

Web content compression noticeably increases delivery speed to clients and may allow providers to serve higher content volumes without increasing hardware expenditures. It visibly reduces actual content download time, a benefit most apparent to users of dialup and high-traffic connections.



TOP

Q: How much improvement can I expect?



TOP

A: Effective compression can achieve increases in transmission efficiency from 3 to 20 times.

The compression ratio is highly content-dependent. For example, if the compression algorithm is able to detect repeated patterns of characters, compression will be greater than if no such patterns exist. You can usually expect to realize an improvement between of 3 to 20 times on regular HTML, JavaScript, and other ASCII content. I have seen peak HTML file compression improvements in excess of more than 200 times, but such occurrences are infrequent. On the other hand I have never seen ratios of less than 2.5 times on text/HTML files. Image files normally employ their own compression techniques that reduce the advantage of further compression.

On May 21, 2002 Peter J. Cranstone wrote to the mod_gzip@lists.over.net mailing list:

"...With 98% of the world on a dial up modem, all they care about is how long it takes to download a page. It doesn't matter if it consumes a few more CPU cycles if the customer is happy. It's cheaper to buy a newer faster box, than it is to acquire new customers."



TOP

Q: How hard is it to implement content compression on an existing site?



TOP

A: Implementing content compression on an existing site typically involves no more than installing and configuring an appropriate Apache handler on the web server.

This approach works in most of the cases I have seen. In some special cases you will need to take extra care with respect to the global architecture of your web application, but such cases may generally be readily addressed through various techniques. To date I have found no fundamental barriers to practical implementation of web content compression.



TOP

Q: Does compression work with standard web browsers?



TOP

A: Yes. No client side changes or settings are required.

All modern browser makers claim to be able to handle compressed content and are able to decompress it on the fly, transparent to the user. There are some known bugs in some old browsers, but these can be taken into account through appropriate configuration of the web server.

I strongly recommend use of the Apache::CompressClientFixup handler in your server configuration in order to prevent compression for known buggy clients.



TOP

Q: What software is required on the server side?



TOP

A: There are four known mod_perl modules/packages for the web content compression available to date for Apache 1.3.X (in alphabetical order):



TOP

Q: Is it possible to compress the output from Apache::Registry with Apache::Dynagzip?



TOP

A: Yes, it is supposed to be pretty easy:

If your page/application is initially configured like

  <Directory /path/to/subdirectory>
    SetHandler perl-script
    PerlHandler Apache::Registry
    PerlSendHeader On
    Options +ExecCGI
  </Directory>

you might want just to replace it with the following:

  PerlModule Apache::Filter
  PerlModule Apache::Dynagzip
  PerlModule Apache::CompressClientFixup
  <Directory /path/to/subdirectory>
    SetHandler perl-script
    PerlHandler Apache::RegistryFilter Apache::Dynagzip
    PerlSendHeader On
    Options +ExecCGI
    PerlSetVar Filter On
    PerlFixupHandler Apache::CompressClientFixup
    PerlSetVar LightCompression On
  </Directory>

You should be all set usually after that.

In more common cases you need to replace the line

    PerlHandler Apache::Registry

in your initial configuration file with the set of the following lines:

    PerlHandler Apache::RegistryFilter Apache::Dynagzip
    PerlSetVar Filter On
    PerlFixupHandler Apache::CompressClientFixup

You might want to add optionally

    PerlSetVar LightCompression On

to reduce the size of the stream even for clients incapable to speak gzip (like Microsoft Internet Explorer over HTTP/1.0).

Finally, make sure you have somewhere declared

  PerlModule Apache::Filter
  PerlModule Apache::Dynagzip
  PerlModule Apache::CompressClientFixup

This basic configuration uses many defaults. See Apache::Dynagzip POD for further thin tuning if required.



TOP

Q: Is it possible to compress the output from Mason-driven application with Apache::Dynagzip?



TOP

A: Yes. HTML::Mason::ApacheHandler is compatible with Apache::Filter chain.

If your application is initially configured like

  PerlModule HTML::Mason::ApacheHandler
  <Directory /path/to/subdirectory>
    <FilesMatch "\.html$">
      SetHandler perl-script
      PerlHandler HTML::Mason::ApacheHandler
    </FilesMatch>
  </Directory>

you might want just to replace it with the following:

  PerlModule HTML::Mason::ApacheHandler
  PerlModule Apache::Dynagzip
  PerlModule Apache::CompressClientFixup
  <Directory /path/to/subdirectory>
    <FilesMatch "\.html$">
      SetHandler perl-script
      PerlHandler HTML::Mason::ApacheHandler Apache::Dynagzip
      PerlSetVar Filter On
      PerlFixupHandler Apache::CompressClientFixup
      PerlSetVar LightCompression On
    </FilesMatch>
  </Directory>

You should be all set safely after that.

In more common cases you need to replace the line

    PerlHandler HTML::Mason::ApacheHandler

in your initial configuration file with the set of the following lines:

    PerlHandler HTML::Mason::ApacheHandler Apache::Dynagzip
    PerlSetVar Filter On
    PerlFixupHandler Apache::CompressClientFixup

You might want to add optionally

    PerlSetVar LightCompression On

to reduce the size of the stream even for clients incapable to speak gzip (like Microsoft Internet Explorer over HTTP/1.0).

Finally, make sure you have somewhere declared

  PerlModule Apache::Dynagzip
  PerlModule Apache::CompressClientFixup

This basic configuration uses many defaults. See Apache::Dynagzip POD for further thin tuning.



TOP

Q: Why is it important to keep control over chunk size?



TOP

A: It helps to reduce the latency of the response.

Apache::Dynagzip is the only handler to date that begins transmission of compressed data as soon as the initial uncompressed pieces of data arrive from their source, at a time when the source process may not even have completed generating the full document it is sending. Transmission can therefore be taking place concurrent with creation of later document content.

This feature is mainly beneficial for HTTP/1.1 requests, because HTTP/1.0 does not support chunks.

I would also mention that the internal buffer in Apache::Dynagzip always prevents Apache from the creating too short chunks over HTTP/1.1, or from transmitting too short pieces of data over HTTP/1.0.



TOP

Q: Are there any content compression solutions for vanilla Apache 1.3.X?



TOP

A: Yes, There are two compression modules written in C that are available for vanilla Apache 1.3.X:

Both of these modules support HTTP/1.0 only.



TOP

Q: Can I compress the output of my site at the application level?



TOP

A: Yes, if your web server is CGI/1.1 compatible and allows you to create specific HTTP headers from your application, or when you use an application framework that carries its own handler capable of compressing outbound data.

For example, vanilla Apache 1.3.X is CGI/1.1 compatible. It allows development of CGI scripts/programs that might be generating compressed outgoing streams accomplished with specific HTTP headers.

Alternatively, on mod_perl enabled Apache some application environments carry their own compression code that could be activated through the appropriate configurations:

Apache::ASP does this with the CompressGzip setting;

Apache::AxKit uses the AxGzipOutput setting to do this.

See particular package documentation for details.



TOP

Q: Are there any content compression solutions for Apache-2?



TOP

A: Yes, a core compression module written in C, mod_deflate, has recently become available for Apache-2.

mod_deflate for Apache-2 is written by Ian Holsman (USA).

This module supports HTTP/1.1 and is filters compatible.

Despite its name mod_deflate for Apache-2 provides gzip-encoded content. It contains a set of configuration options sufficient to keep control over all recently known buggy web clients.



TOP

Q: When Apache::Dynagzip is supposed to be ported to Apache-2?



TOP

A: There no recent plans to port Apache::Dynagzip to Apache-2:

mod_deflate for Apache-2 seems to be capable to provide all basic functionality required for dynamic content compression:

The rest of the main Apache::Dynagzip options could be easily addressed through the implementation of pretty tiny and specific accomplishing filters.



TOP

Q: Where can I read the original descriptions of gzip and deflate formats?



TOP

A: gzip format is published as rfc1952, and deflate format is published as rfc1951.

You can find many mirrors of RFC archives on the Internet. Try, for instance, my favorite at http://www.ietf.org/rfc.html



TOP

Q: Are there any known compression problems with specific browsers?



TOP

A: Yes, Netscape 4 has problems with compressed cascading style sheets and JavaScript files.

You can use Apache::CompressClientFixup to disable compression for these files dynamically. Apache::Dynagzip is capable of providing so-called light compression for these files.



TOP

Q: Where can I find more information about the compression features of modern browsers?



TOP

A: Michael Schroepl maintains a highly valuable site

Try it at http://www.schroepl.net/projekte/mod_gzip/browser.htm



TOP

Acknowledgments

I highly appreciate efforts of Dan Hansen done in order to make this text better English...



TOP

Maintainers

The maintainer is the person you should contact with updates, corrections and patches.



TOP

Authors

Only the major authors are listed above. For contributors see the Changes file.







TOP
previous page: Workarounds for some known bugs in browsers.page up: Tutorialsno next page