An important aspect in most web application is the database. As the content is dynamic, it is commonly stored in a database. Therefore, the speed of the database is nearly as important as the speed of the script. Or, to say it differently, a tuned database will also speed up the scripts.
There are two concepts that will be tested in this thesis: the MySQL Query Cache and Persistent Connections. The third concept of Query Tuning should also be taken into consideration but is a too extensive topic.
Version 4.0.1 of MySQL – the database used in this thesis, see Section 4.3 – supports a caching mechanism that allows quicker retrieval of common queries.
MySQL is commonly used as a database for a web application. Charactaristic for this scenario are little changes for the stored data and many identical queries. The MySQL query cache reserves a given amount of storage for saving queries plus their results. A script typically requests exactly the same query (the SQL command has to be the same – byte-by-byte) several times and MySQL can use the cache to instantly return the results.
This concept works as long as nothing is changed in the database that affects the query. The most common, “dangerous” commands are INSERT/REPLACE, UPDATE, and DELETE. When tables are modified, any relevant entries in the query cache are flushed; no stale data is ever returned.
When using the MySQL query cache, we want to consider certain queries not to be cached at all. There is a special command for this case that will be discussed in Section 8.2.
A script that wants to access the database has to connect to the database daemon first. This is called “establishing a link”. Usually this is done via a TCP connection and an authentication mechanism.
The cost for connection to a database and establishing a link depends heavily on the environment. Most important factors for the speed for connecting are the speed (and/or latency) of the network interface (can also be the very fast loop back interface if the DBMS resides on the same machine) and load on the database machine. Dependent on the configuration a certain overhead for connecting will slow the script down.
The concept of persistent connections is somewhat similar to preforking of Apache (see Section 4.1): a set of connections is ready to be reused without having to go through the whole connection phase. The higher the overhead for a connection is, the higher the gain from persistent connections will be.
The drawbacks of persistent connections are caused by their persistency. If for some reason a link is ruined (e.g. by a connection loss or a faulty script) it cannot be reused any more. There exist concepts to detect broken connections and re-establish them, but they require extra overhead. A greater problem are table lockings that have been turned on by mistake. A programmer can avoid this by using a so called “shutdown function” that clears all locks when the script finishes.
Apart from caching and speeding up the environment, one also has to make the queries behave well: take care that they are not (too) wasteful.
Often indices and good database layouts can improve the speed even more than caching techniques. In combination these techniques result in the highest speed, of course.
Tuning queries is very application dependent, but [ZB04] gives a good introduction and leads to good starting points for optimizing the queries. You can commonly start with the slowest queries of our application. They can be automatically logged by MySQL if we specify a threshold of x seconds.
The changes needed for this testing can both be made in the configuration files of MySQL (for the query cache) and PHP (for generally enabling the persistent connection feature), but it has also to be ensured that the connection script code takes advantage of this feature.
To turn on the MySQL Query Cache (abbreviated MQC), in the MySQL configuration file /etc/mysql/my.cnf simply the size of the cache has to be set to a non-zero value (see also appendix, Listing A.4):
The size is specified in bytes. Here a cache of 25 mega bytes is created.
Persistent connections are activated, by enabling them in /etc/php4/
apache2/php.ini. The number of possible links and persistent connections are
usually set to unlimited (i.e. -1).
| Listing 8.2: | Activate persistent connections – /etc/php4/apache2/php.ini |
Additionally we have to asure that the scripts really use the persistent connections. When using plain PHP we need to use the function mysql_pconnect instead of mysql_connect to connect to the database. Wrapper APIs (like PEAR::DB, see Section 4.3.1) need individual care. In PEAR::DB either the construct
$db = DB::connect($dsn, true);
|
works. Listing 8.3 shows a more verbose and extensible solution.
| Listing 8.3: | Use persistent connections with PEAR::DB – db/db.php |
When using other Wrapper APIs the appropriate steps (usually well documented) need to be taken too, of course.
For these tests, APC was turned on. This allows the MySQL query cache to show its full potential and makes the following results to be the most useful ones so far.
As can be seen in figures 8.2–8.4 (p. §–§) and Table 8.1, the important scripts now really gain speed and move to interesting regions regarding the possible requests per second. index.php moves up by 471% for 10 CCR and even by 784% for 100 concurrent requests. Also links.php gets faster by 3-digit percentage numbers, an increase between 104% and 178%. The underlying pres-skel-t.php receives similar gain.
|
| |||||||||||||||||||||||||||||||||||||||||||
The only exception is skeleton-t.php (Figure 8.1, page §). There is no gain, but loss instead, although the numbers can be taken as equal due to measurement inaccuracies: The number of tested requests was only as low as 1,000, and the original request rate already started around 400 requests/s.
If a test took for any reason (e.g. an I/O event) 0.1 seconds longer than the original one (2.5s vs. 2.6s) the measured request rate would already drop from 400 to 385 requests per second. Another point is that due to the query cache of 25 mega bytes there is less memory available.
|
|
Table 8.2 (page §) shows for the requests per second (rps) the corresponding
(mean) number of seconds that it takes to generate a page (gt). This follows the
simple formula
= gt.
This test was done with an activated MySQL query cache. skeleton-t.php was not tested since it does not use the database.
The results (Table 8.3) for this test do not show any significant change in speed. Only index.php (Figure 8.6) profits a little. This result is quite evident, though. None of the pro-persistent connection arguments really fits our scenario. The database server resides on the same machine, there is no network latency, the system load is low. Nevertheless, as the use of persistent connection does not slow down anything significantly, it is arbitrary to use them or not. The author feels more comfortable with reusing stuff that ain’t broke1 .
|
| ||||||||||||||||||||||||||||||||||||||
Web applications such as Bandnews.org profit enormously from the MySQL query cache. Especially for sequential testing the results are amazing. According to [AB04] the overhead is minimal even for frequently changed tables. It should always be considered to turn on this feature.
The power of persistent connections did not quite show in the benchmarks. This is primarily due to the fact that script and database run on the same machine so that expensive factors for link establishing such as network latency do not come into play. Still the use of persistent connections can be recommended. Because of detection techniques for damaged connections and no real limits in connection numbers the minimal advantages of spontaneous connections do not weigh much.