Apache 2.2.0: use or wait?

Apache 2.2.0 is the main version of the Apache server and contains many important changes. Many of these changes are improvements to existing modules. Several new modules have also been added and some improvements have been made to the server architecture. In this article we will talk about the most important changes, and also think about whether to upgrade to the new version or wait.

The new version 2.2.0 is not only an updated version of the existing tree. A lot of new code has been added to provide new functionality and to extend the old one.

configuration changes

The Apache configuration file with its users has always been in an uneasy “love and hate” relationship. Some like a single “all-configurations-in-one-file” approach. Others prefer to split their files and use the inclusion mechanism to insert specific data. Although this does not affect the configuration itself, using multiple files is easier to understand and more convenient, since it allows you to put the configurations of each virtual host in a separate file.

The Apache configuration file was made unified by default and often contained many directives that some users not only did not use, but also did not understand. Some Linux distributions (for example, Gentoo) share a configuration file by default. Now it has become a standard feature of the official server distribution.

The main configuration file httpd.conf has been and remains. In addition to it, standard configuration files with the following elements are optionally included:

– server multitasking management (MPM configuration);

– multilingual error messages;

– directory index;

– language settings;

– user directories;

– request/configuration information (/server-info and /server-status);

– configuration of virtual hosts;

– access to Apache documentation;

– WebDAV;

– various default settings;

– SSL configuration.

Splitting the file is optional, and you can still use both one and several files without any problems. However, by default, the standard configuration is split into several files.

authorization/identification modules

Although authorization and identification themselves have not changed, the modules that were responsible for them have been redesigned, and in some cases, to make it easier to make a choice of components, and renamed. A new module has also been added that provides authorization via LDAP

The standard authorization modules have been modified to create a logical relationship between the module names and the authorization types they are

provide. For example, the original mod_auth module was split into mod_auth_basic and mod_authn_file, which provides an API for identification using files. The module prefix now shows the role of the module in the authorization/identification process:

– mod_auth_* shows that the module provides an HTTP identification mechanism (for example, mod_auth_basic and mod_auth_digest, etc.);

– mod_authn_* shows that the module provides its own identification mechanism (for example, mod_authn_file and mod_authn_dbm);

– mod_authz_* indicates that the module provides an authorization mechanism (for example, mod_authz_dbm and mod_authz_host);

– mod_authnz_* indicates that the module provides identification and authorization mechanisms (for example, the new mod_authnz_ldap module). As a result, we have an intuitive set of modules with which you can fully manage the support for authorization and identification on your server. It also makes it easier to develop custom identification and authorization modules that you can easily integrate with existing components.

proxy/cache

The new mod_proxy_balancer module has been added to provide load balancing capability for the main mod_proxy proxy module. Load balancing distributes requests between workflows using two methods: request counting and traffic counting. The first method counts the number of requests and distributes them among the workflows so that each of the processes serves the same number of requests. The second method basically works the same way as a simple query count, but in it you can set the amount of traffic processed for each workflow separately so that certain processes process a larger volume of requests than others.

Unlike the first method, here the load is set by the number of bytes. For example, you can configure one process to process twice as many bytes as other processes. The number of processed requests is not taken into account.

The new proxy load balancing module also included an additional output of status information similar to /server-status and /server-info. The operation of caching modules (mod_cache, mod_disk_cache and mod_mem_cache) is rarely considered in detail, and many organizations use these modules without any result. Although these modules are designed to improve performance. Also included in version 2.2.0 is a new program, htcacheclean, which cleans the database of cached documents. It can be run both for a single database cleanup, and work as a daemon. It also provides statistics on the size of the cache directory.

filtering

To allow filters to run based on some condition, the mod_filter filter has been substantially redesigned. This changed the old model, according to which documents were simply filtered without any conditions in accordance with the AddOutputFilter directive or using AddOutputFilterByType. Now, instead of adding filters for specific file types, you need to create the required filter chain – where the output data will be processed by each filter in the chain. This approach requires specifying the available filter types and, if necessary, the requirements for the data source (file type). The filters used are also specified.

Let’s explain with an example taken from the standard documentation. The SSI example shows how the old approach has changed.

It was:

AddOutputFilter INCLUDES .shtml

Become:

FilterDeclare SSI

FilterProvider SSI INCLUDES resp=Content-Type $text/html

FilterChain SSI

Declaring a filter chain gives us the ability to add filters to specific places in the chain and even specify which filter needs to be removed based on some condition. For example, you want to add SSI for any data other than data received from CGI. This can be done by adding an SSI filter to the chain, and when the request is sent to a CGI script, use a chain without SSI.

database support

Previously, using databases with Apache modules required additional programming in order to make a wrapper that provides access to a specific database. For example, if you want to create an identity using MySQL or PostgreSQL, then you need to use a module that provides an interface for the SQL database. Due to additional programming and performance issues, this solution was not very attractive to many.

In Apache 2.2, there is a mod_dbd module that provides database connections using a standard interface. The module uses the apr_dbd interface, which means that database connections can be processed in the context of threads by providing a pool of available connections. This should help to increase the flexibility and improve the performance of modules that need to be connected to the database.

Note that the module is not intended for accessing databases by dynamic websites, but in the future it can be implemented through the interfaces of modules such as mod_perl and mod_php.

 

changes in module development

Several internal changes have also appeared in the Apache module creation interface.

Logging connection errors made it possible to save connection-related errors.

The new configuration test hooks provide the results of checking the configuration during its testing.

In thread MP modules, the stack size can be changed.

Protocol processing is performed in the output filters. Also now filters can transfer the duty of setting the output type to the mod_filter module. The monitor hook allows modules to automatically request the execution of regular tasks.

The regular expression interface has been changed, and the PERL-compatible Regular Expression Library (PCRE) has been updated to version 5.

The new DBD framework has made working with SQL databases easier, but now you need to redo your own modules to work with the database.

changes in the program

Several improvements have been made to the Apache server software environment. A new command line parameter for httpd has been added to facilitate debugging of modules.

Using the l parameter, it was always possible to display a list of modules embedded in httpd, for example:

$ httpd –l

Compiled in modules:

core.c

prefork.c

http_core.c

mod_so.c

But this list shows only those modules that are embedded in the server binaries. Dynamically loaded Modules (DSO) are not displayed. Using the –L parameter, you can output a list of modules that have directives in the configuration file, but it will not contain modules without directives. The new command–line parameter -M displays a list of all modules – both those that are statically linked to the server, and those that are loaded at startup. The module type (static or shared) is also output:

httpd -M

Loaded Modules:

core_module (static)

mpm_prefork_module (static)

http_module (static)

so_module (static)

authn_anon_module (shared)

env_module (shared)

expires_module (shared)

headers_module (shared)

mime_module (shared)

negotiation_module (shared)

setenvif_module (shared)

log_config_module (shared)

logio_module (shared)

cgi_module (shared)

alias_module (shared)

rewrite_module (shared)

userdir_module (shared)

info_module (shared)

status_module (shared)

actions_module (shared)

autoindex_module (shared)

dir_module (shared)

ext_filter_module (shared)