Chapter 5

5.1 Goal definition

Before the analysis of the web application can be started, the goal of the tuning has to be defined. This is done in order to find the points on where to install caching mechanisms.

Primarily the web application should be optimized regarding load of the web server. That means the generation of a single web page should not be very CPU intensive. There are mainly three spots where the CPU is involved heavily:

Reducing the CPU load mainly works through following two schemes:

For web servers the caching strategy seems to be the best and biggest chance to not only reduce the need for CPU time, but also for speeding up the service. Still, this is not that simple and evident because there are many points where caching can be applied.

What is more important: often it is not trivial to implement a caching module because of the complexity of the problem or application.

5.2 Processing a Request

The sources of load as well as the recurring processes can best be understood by having a closer look at a script is requested by a client and returned by the server. This is shown in Figure 5.1.

Figure 5.1: Processing a Request

  1. The client establishes a connection to the web server via TCP, typically to port 801 , where the web server is listening for connections.
  2. The request for a document is sent to the server using the HTTP protocol. This would look typically like this
    GET /index.php HTTP/1.1  
    Accept-Language: en  
    Content-encoding: utf-8

    (The first two lines are a minimum request for using NameVirtualHosts – multiple web sites residing on one IP address – with HTTP/1.1)

  3. As soon as the request is fully transmitted (this is only a short GET request, but especially POST requests can become very long, e.g. for transmitting files) it is processed by the web server: the correct virtual host is selected, preprocessing modules (e.g. URL rewriting) are executed, the existance of the requested file is checked, and eventually the action for the request is chosen and applied.
  4. If the file is of a MIME type for which there exists a responsible module, it is loaded and processes the file. Otherwise the web server simply delivers the file (skip to step 8). Here an example for a MIME type definition in /etc/apache2/mods-avaliable/php4.conf (default location for debian-based systems):
    <IfModule mod_php4.c>  
      AddType application/x-httpd-php .php .phtml .php3  
      AddType application/x-httpd-php-source .phps  

  5. The web server module (a script interpreter in this case) reads, parses and executes the file.
  6. A data base connection can be invoked – either by establishing a new connection or reusing a persistent one. Also other third party tools (libraries, etc.) required by the script are invoked at this point.
  7. Once the script has been executed, its output is delivered as if it was the content of a file.
  8. The file (or output of the script) is returned to the client, reusing the TCP connection established by the client.
  9. HTTP 1.1 [FGM+99] allows the client to reuse this TCP connection for further requests (this is called “keep alive” [Mog95], go back to step 2) if the server is configured to allow this behaviour.

5.3 Possible Hooking Points

The description of the request-process reveals the following points where caching might have a good effect (to be evaluated). These considerations are kept fairly general first and will later in Section 5.4 be focussed on the example application,

5.3.1 Client Request

A client most commonly will request the home page (or index page) first. Usually this is the page named / or /index.php. This sounds like a good opportunity to cache those pages and not even have the script touched, and deliver a previously stored copy instead. Unforunately this is not easily possible with a script generated page. Is it randomly displays parts on the page or other highly dynamic contents; most commonly the page cannot be cached as a whole (but in parts, see Section 9).

The requests of clients are very similar, but the most important thing they have in common is that they have a slow connection to the server. This does not necessarily mean they add to the server load, but they have the web server program reside longer in memory than necessary: The program has to wait until the TCP connection is closed which will take longer if it is a slow one (the speed of the data to be transmitted can usually only be as fast as the slowest part of the connection).

A solution would be a cache to handle the transmission of the data. Rather this should be described as a one-time cache or buffer, as the data is dismissed after transmission. It still fits in the field of caching.

A proxy program consuming very little memory will act in favour of the web server, i.e. listens on port 80, receives the connection, and waits for the client to submit its request (how ever long this takes). Then it transmits the request to the web server with full speed by using the loop-back interface when residing on the same machine, or over a fast LAN connection. The result of the request is transferred back again at high speed to the proxy. Now it handles the transmission of the data while the web server executable can be removed from memory or handle the next request.

5.3.2 PHP Module

Each time a script is requested by the client, PHP needs to go through this sequence: it reads the script, parses it, converts it to an executable intermediate code and executes it. If the script requires additional files (“included” files), this procedure has to be repeated for each file. Commonly there are many includes; especially when connecting to databases the login data resides in an included file, a wrapper library is loaded, and so on.

In the procedure necessary for executing a PHP script, the first steps are heavily recurring. The ration of the real need to re-“compile” – this is when the script has been modified – and when it is really done – every time the script is executed – is very unfortunate. For smaller scripts the time for compilation might be longer than the time for execution. For example, a script only outputting a few lines runs considerably faster if the compilation steps are skipped.

The repeated process of compiling can be left out quite easily when its result – the runnable intermediate code – is stored in a cache. To maintain the aspects of a scripting language the script file only has to be checked for modifications when its run – this is by far less expensive (in terms of time) than a re-compilation. If the script is modified, though, a little delay will be added, as the script is both checked for modification and then compiled.

5.3.3 Database

When using databases in combination with web servers there are also opportunities for caching. Similar queries are executed over and over as most of the content of web sites are not personalized which means most of the queries usually do not differ. As the database changes comparably seldom, caching the database output has a good gain in speed in evidence.

The caching of results would commonly be assigned to the application, because the database knows little about the way data is retrieved from the application. Moving the cache from application to database, allows more efficient cache invalidation, because table modifications can be caught by internal triggers. The cache does not need to be very “intelligent” as queries are re-executed by a computer program. Therefore repeated queries match byte-wise and are consequently easy to detect.

Another hooking point is the establishing of a connection to the database. This can be quite expensive and has to be done every time a script needs to connect to the database. The use of persistent connections can help in this scenario: The connection between application and database is not cut when the script ends but instead lives on merely forever. The drawbacks of these connection types must be kept in mind: bugs in the application can block a persistent connection forever causing the pool of available connections – which is limited by definition – to shrink.

5.3.4 Application

For an application general improvements can hardly be suggested. It heavily depends on the application and the requirements whether and what methods of improvement can be used. Here, the knowledge and experience of the application programmer is enforced. There are some “standard” approaches leading to the result of gaining speed.

An important point with caching is the recognition of recurring patterns. The more often a pattern appears, the better are the chances for efficient caching. Good points for caching are application arranged data combinations, for example a result set consisting of combined database queries. Also, intermediate results of algorithms are often worth caching: for instance, data structures used by an algorithm have to be built up first – frequently a pricey thing.


Evaluating opportunities for application level improvement requires a specific look at the application itself. As stated in the introduction (see Section 1.2), a skeleton of a page as well as a few key pages will be examined.

When analyzing tuning potentials we take a two-stage approach: First the page is investigated regarding its structure and elements; for this overview caching opportunities are identified. The second stage involves a profiler which identifies other bottlenecks that suggest a search for better solutions for the critical regions.

5.4.1 Skeleton page

A skeleton page is a page common to every other page of the application. For web sites a separation between two different types of skeletons can be made: An “ultimate” skeleton and a “normal” skeleton that depends on the ultimate one. This can also be shown using a layer model, see Section 5.2.

Figure 5.2: Script layers of an application script

When looking at Listing 5.1, a separation into 4 documents is clearly visible.

Listing 5.1: A Presentation Skeleton – pres-skel.php

If only the first file was included, the script would represent the application skeleton.

The additional files load the corresponding page parts, including database calls when needed.

5.4.2 Index page index.php

The index page belongs to a group of three page types (described below) and is a good representative for a common page of the site. In addition to the presentation skeleton, news items are displayed which underlie constant modification. The selection of the news items (equals the SELECT statement and its WHERE clause for an SQL database query) is based on the page type:

The index page which usually receives the most hits on a server and therefore is worth close consideration.

5.4.3 Search page search.php

A search page can be separated into two parts:

Both points can be included in the testing of the index page. This is due to the MySQL query cache which will be included in testing.

5.4.4 Links page links.php

The links page is somewhat unlike the other page types, but or rather because of that it is worth to take a closer look at it: For each letter of the alphabet, matching bands are displayed on one page, including all relevant information:

The interesting thing about the page is that for database design reasons the information is split across several tables: Band name and Homepage address are stored in the bn_band table, the country – for translation reasons – in bn_country and the genres in bn_band_genre respectively in bn_genre as a band can have more than one genre assigned (an m:n-relation, modeled using bn_band_genre as weak entity).

There are many (expensive) queries needed to build this page. It is therefore important to know, how the various caching strategies can optimize the speed of this page.

5.5 Testing

When testing the capacity of a web server, there are several things to be considered [BD99]. Aspects like the latency of a WAN connection are not taken into consideration in this thesis, all tests are done via the loopback interface.

In our scenario, we are not testing a web server program delivering text files, but the output of a script instead. This significantly decreases the speed. In a basic test on the author’s system, the Apache web server is able to serve about 3,000 pages of a 2 kilobyte document in a second, while a script generating 2 kb of random data only produces a through put of about 190 requests per second (lacking any tuning).

For each test a certain sequence of steps is maintained to produce comparable results. This is done by using a script developed for this purpose. More information and source code can be found in the appendix.

  1. The configuration files are modified to reflect the changes to be tested.
  2. The web server and database server are restarted to provide a fresh environment.
  3. The script to be tested is requested – without measuring – one or more times (depending on the test case). This is used to load caches. This step can be skipped if the generation of the cache is to be included in the test.
  4. A certain amount of time, for instance 10 seconds, the process is paused to ensure no trailing requests block anything.
  5. The test run is started. The document is requested some 10 to some 1000 times, also parallel requests (concurrent requests, CCR) are possible.
  6. The log file of the test is parsed and converted to a format used for generating a chart.

5.5.1 Preparations

The author’s script for benchmarking can automate many tests as it automatically generates the possible testcases with each tool switched on or off. There are a few steps to take until a tool can be used with the benchmark.

Usually changes to one or more text files have to be made to configure a tool to be used and restart the appropriate server. Unfortunately it is not possible to exchange the whole configuration file as two tools might need a change in the same file. The configuration files are therefore patched using a file representing the changes to be made.

When the benchmark script is started it expects all configuration files to be disabled. This can be established by prepending a restoring script. The tools need to be turned off anyway for the process of generating a file that can be used to integrate the tool with the benchmark. See Listing 5.2 for how such a patch file is generated.

Listing 5.2: Creating a patch file for the MySQL query cache
1cd /tmp 
2cp /etc/mysql/my.cnf . 
3vi my.cnf # do the necessary editing 
4diff -c /etc/mysql/my.cnf my.cnf > mqc

The result is a file containing only the modified lines (plus some contextual lines, so that the file can even be patched if the line numbers do not match). What such a file exactly looks like can be found in the appendix, Listing A.4.

Figure 5.3: Typical output while benchmarking

When all necessary patch files are generated, the testing can be started. The benchmark script is invoked with the patch files as command line arguments. A single option can be tested just by specifying one argument. All in all 2n (with n being the number of arguments) test cases will be run through. The scripts to be tested (k) are hardcoded and can be overridden (see appendix A.1). As a grand total k 2n benchmarks are run.

5.5.2 Testing environment

As testing environment a Pentium 4 2.8 GHz system with 512 MB DDR RAM is used. As operating system Ubuntu Linux Hoary 5.04 has been installed. Versions of the used programs are shown in Lsting 5.3.

Listing 5.3: Program versions
1$ uname -a 
2Linux main 2.6.10-5-686-smp #1 SMP Tue Apr 5 12:41:40 UTC 2005 i686 GNU/Linux 
4$ apache2 -v 
5Server version: Apache/2.0.53 
6Server built:   Apr  1 2005 18:17:53 
8$ php -v 
9PHP 4.3.10-10ubuntu4 (cli) (built: Apr  1 2005 14:16:27) 
10Copyright (c) 1997-2004 The PHP Group 
11Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies 
13$ mysqld -V 
14mysqld  Ver 4.1.10a-Debian_2-log for pc-linux-gnu on i386 (Source distribution) 
16$ grep @version libs/Smarty.class.php 
17 * @version 2.6.7 
19$ squid -v | head -1 
20Squid Cache: Version 2.5.STABLE8 
22$ pear info APC | grep Version 
23Version            2.0.4 
25$ pear info APD | grep Version 
26Version            0.9.2 
28$ ab -V 
29This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0 
30Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, 
31Copyright (c) 1998-2002 The Apache Software Foundation,

The SMP kernel was installed because of the HyperThreading feature of Pentium 4.