Chapter 1
Introduction

1.1 Motivation

As the Internet resp. the World Wide Web (WWW) is gaining more and more popularity, servers have to handle more requests accordingly. The more people (or simply clients) request resources (in this case files) from web servers, the faster servers have to accept and process the requests. To cope with these requirements programmers as well as system administrators must take countermeasures.

From the very beginning of the WWW the requirements for servers have not only changed from the view of traffic, but also from the type of content they deliver to the client. Initially static pages had to be served, today – in 2005 – content is usually taken from a database, and dynamically generated pages are to be transferred.

This development takes the main source of load away from the operating system responsible for reading the files from the hard disk or another type of memory and shifts it to the program that dynamically generates the page.

Also computer hardware has evolved. This makes it possible to have web pages generated the way they are today. Generally speaking, servers are capable of serving most pages in quite a reasonable amount of time. This is true as long as only a small number of visitors request pages to be generated. The larger the number of clients, the more pages have to be generated simultaneously. Multi-tasking enables servers to do so, but CPU capacity is limited.

If it was only for system administrators, they would add more hardware power (for instance clustering servers, load balancing). Often this can be done only to a certain extent, mainly due to financial but also for logistical reasons. From a programmer’s view, however, algorithms can be optimized (consider an algorithm in O(n2) on a fast computer which can easily be overtaken by a slower one running an O(n)) but also by caching techniques.

The basis for this diploma thesis will be the analysis of caching strategies for this scenario. They will be used to speed up an existing application. The combination of various methods will be tested and benchmarked to reach a stage at which the application runs at reasonable speed even under high load.

1.2 Method

We will explore the topic of this thesis using an existing web site (Bandnews.org) as an example to which the caching strategies are applied.

The site consists of an underlying structure which is common to each page. Therefore the examination is not solely restricted to standard pages but also a skeleton page is taken into account. To compare the pages we measure the time for delivery on a single system (i.e. on an Intel PC, see 5.5.2). Due to the nature of different computer systems these results are only valid in a relative way. This method still produces significant results because the differences between versions are at a similar level on faster or slower systems.

We simulate high load on the page using a load generator which effectively makes the server deliver pages simultaneously.

We examine single pages using a profiler – a tool that measures not only the overall performance of page generation, but also the time consumed by single function calls.

1.3 Expected Results

As a result of this work we expect a web application, that delivers pages multiple times faster than an uncached version of the site (considering repetitive calls to have the caching taken into account).

As methods for revealing bottlenecks within the application also faster delivery is expected for the first call of a web page. This is only considered as a side effect. The thesis will concentrate on caching pages or parts of pages.

1.4 Outline of the Thesis

The paper is organized as follows:

In the first part we will present the application as well as the used tools. As application the web site Bandnews.org (see Section 3.1) was chosen. Tools used are the Apache HTTP Server (Section 4.1), PHP (Section 4.2), MySQL (Section 4.3), Smarty (Section 4.4), Squid (Section 4.5), APC (Section 4.6), APD (Section 4.7), and ab (Section 4.8).

The second part, the central part of this diploma thesis, describes and evaluates the caching strategies to be applied.

In Section 5 we test the original site and chose pages for later evaluation.

The following sections deal with each technique in detail and provide benchmarking results which are analyzed and discussed. These sections include Squid (Section 6), APC (Section 7), MySQL (Section 8), and Smarty Caching (Section 9).

In the conclusion (Section 10) we review the results as a whole. Section 10.1 gives an outlook of how future work can further improve the performance.

The appendix includes source listings and lists of figures, tables and listings.