Spamhaus.org no longer lists Austrian Registry on its Block List

It has come to my attention today that the almost famous Spam Block List provider put the IP addresses of the Austrian Registry nic.at on their block list.

The list that Spamhaus provides is actually something good: it allows mail server administrators to automatically block mails arriving from servers that are known to be operated by phishers.

At this point Spamhaus took the wrong term, though. They demanded from the Austrian Registry to delete 15 domains that they consider to be used by phishers, apparently without providing (enough) evidence to nic.at. So nic.at responded that — because of Austrian law — they cannot just delete domains without proof of bogus WHOIS addresses.

I cannot judge who is ultimately right in this dispute (like did Spamhaus provide enough evidence or not), but I can definitely judge that Spamhaus took the wrong decision when they started to block the IP addresses of nic.at in their list.

Welcome to the Kindergarten, guys.

nic.at is bound to Austrian law, and as a foreign company you can't just come along and ask them to remove certain domains. What if someone would go to your registry and request deletion of spamhaus.org without providing any legitimate reason.

Dear Spamhaus, you need to stick to your policy. Your block list is about phishers, and nic.at did not send out any phishing mails. You can't just put someone on there because you want to pressure them.

As a result, mail server administrators should no longer rely on block lists of such a provider who misuses his own list for trying to put other companies/organizations under pressure. So this is the right moment to remove sbl-xbl.spamhaus.org from your server configuration.

Coverage on the German Heise.de.

Update 2007-06-20: They have stopped listing nic.at. Finally they see reason. (They have changed the IP address block to 193.170.120.0/32 which matches no addresses); also see german futurezone.

, ,

Blummy wins Web 2.0 Bookmarking Award!

SEOmoz Web 2.0 Awards - WinnerI'm proud to announce that Blummy has been voted to first place of the bookmarking category by the Web 2.0 Awards by seomoz.org.

Blummy placed in front of Looksmart's Furl.net and it's similar sounding competitor spurl.net.

As all the other winners of each category, I have also been interviewed. I can highly recommend reading the other interviews, with

SEOmoz Web 2.0 Awards - Blummy Bookmarking Winner

Thanks for the award and congratulations for the excellent choice of nominees and winners.

, ,

Looking into the Skype Protocol

As you all know, Skype is a very popular Voice-over-IP software. Skype also claims that all its communication is encrypted (which raised some discussion whether you should considered a criminal (digged also) if you "hide away" from eavesdropping).

Philippe Biondi and Fabrice Desclaux from EADS held a talk at Blackhat Europe conference where they show their latest discoveries.

The talk is rather technical and might be hard to understand. I picked some of the most interesting points:

  • Almost everything is obfuscated (looks almost random)
    This is a sign for good use of encryption.
  • Automatically reuse proxy credentials
    When Skype gets to know how to use your proxy, it will hand on the information to other Skypes.
  • Traffic even when the software is not used (pings, relaying).
    I heard quite a few times of some office PCs being promoted to Supernodes, generating enormous traffic.
  • No clear identification of the destination peer
    The destination IP is not disclosed to a firewall for example, network administrators can't block certain IPs.
  • Many protections, antidebugging tricks, and ciphered code
    This is an attempt to protect themselves from spies (i.e. hackers, government) but it might also hide away secret backdoors or them spying. This is often a problem of closed software.
    Using this techniques also hinders open source or simply 3rd party software from building compatible clients.

In Skype's FAQ they state that they use AES encryption. This seems to be proved and seems a good thing, but they embed the data into a proprietary protocol which may have its drawbacks and is incompatible to others. It's their right to do so, but this gives much power to those who know about the inner workings (this does not necessarily only include Skype).

They give as a conclusion:

  • Impossible to protect from attacks (which would be obfuscated)
    It basically means that we have to trust Skype that they keep up their secrets. There are very many users which makes the Skype audience an interesting target.
  • Total blackbox. Lack of transparency. No way to know if there is/will be a backdoor
  • Skype was made by clever people; Good use of cryptography
    They admit that it was built in a good way. But it's like the government that may be suspicious if you encrypt all your communication. Skype encrypts everything and itself. Should we be suspicious?

Further readings: Skype network structure, Skype's Guide for Network Administrators

, ,

digg it

Posted in web

10 Realistic Steps to a Faster Web Site

I complained before about bad guides to improve the performance of your website.

digg it, add to delicious

I'd like to give you a more realistic guide on how to achieve the goal. I have written my master thesis in computer sciences on this topic and will refer to it throughout the guide.

1. Determine the bottleneck
When you want to improve the speed of your website, you feel that it's somehow slow. There are various points that can affect the performance of your page. Here are the most common ones.

Before we move on, you should always remember that you answer each question with your target audience in mind.

1.1. File Size
How much data is the user required to load before (s)he can use the page.

It is a frequent question, how much data your web page is allowed to have. You cannot answer this unless you know your target audience.

In the early years of the internet one would suggest a size of 30k max for the whole page (including images, etc.). Now that many people have a broadband connection, I think we can push the level to a value between 60k and 100k. Although, you should consider lowering the size if you also target modem users.

Still, the less data you require to download, the faster your page will appear.

1.2. Latency
The time it takes between your request to the server and when the data reaches your PC.

This time adds together from twice the network latency (which depends on the uplink of the hosting provider, the geographical distance between server and user, and some other factors) and the time it takes until the server produces the output.

Network latency can hardly be optimized without moving the server, so this guide will not cover this.
The processing time of the server combines complex time factors and contains most often much room for improvement.

2. Reducing the file size
First, you need to know how large your page really is. There are some useful tools out there. I picked Web Page Analyzer which does a nice job at this.

I suggest not spending too much time on this, unless your page size is larger than 100kb. So skip to step 3.

Large page sizes are nowadays often caused by large JavaScript libraries. Often you only need a small part of their functionality, so you could use a cut-down version of it. For example when using prototype.js just for Ajax, you could use pt.ajax.js (also see moo.ajax), or the moo.fx as a script.aculo.us replacement.

Digg for example used to have about 290kb, they now have reduced the size to 160kb by leaving out unnecessary libraries.

Also large images can cause large file sizes, this is often caused by the wrong image format. A rule of thumb: JPG for photos, PNG for most other aspects, especially if plain colors are involved. Also: use PNG for screen shots, JPGs are not only larger but also look ugly. You can also use GIF instead of PNG when the image has only few colors and/or you want to create an animation.

Also often large images are scaled via the HTML width and height attributes. You should do this in your graphical editor and scale it there. This will also reduce the size.

Old HTML style can also cause large file size. There is no need for thousands of tags anymore. Use XHTML and CSS!

A further important step to smaller size is on-the-fly compressing of your content. Almost all browsers already support gzip compression. For an Apache 2 web server, for example, there is the mod_deflate module can do this transparently for you.

If you don't have access to your server's configuration, you can use the zlib for PHP or for Django (Python) there is GZipMiddleware, Ruby on Rails has a gzip plugin, too.

Beware of compressing JavaScript, there are quite some bugs with Internet Explorer.

And for heaven's sake, you can also strip the white space after you've completed the previous steps.

3. Check what's causing a high latency
As mentioned, the latency can be caused by two large factors.

3.1. Is it the network latency?
To determine whether the network latency is the blocking factor you can ping your server. This can be done from the command line via the command ping servername.com

If your server admin has disabled the pinging function you can also use a traceroute which uses another method to determine the time tracert servername.com (Windows) or traceroute servername.com (Unix).

If you address an audience that is geographically not very close to you, you can also use a service such as Just Ping which pings the given address from 12 different locations in the world.

3.2. Does it take too long to generate the page?
If the ping times are ok, it might take too long to generate the page. Note that this applies to dynamic pages, for example written in a scripting language such as PHP. Static pages are usually served very quickly.

You can measure the time it takes to generate the page quite easily. You just need to save an time stamp at the beginning of the page and subtract it from the time stamp when the page has been generated. For example in PHP you do it like this (due to technical restrictions a space is inserted before the question mark):

< ?php // Start of the Page $start_time = explode(' ', microtime()); $start_time = $start_time[1] + $start_time[0]; ?>

and at the end of the page:

< ?php $end_time = explode(' ', microtime()); $total_time = $end_time[0] + $end_time[1] - $start_time; printf('Page loaded in %.3f seconds.', $total_time); ?>

The time needed to generate the page is now displayed at the bottom of it.

You can also compare the time between loading a static page (often a file ending in .html) and a dynamic one. I'd advise to use the first method because you are going to need that method to go on optimizing the page.

You can also use a Profiler which usually offers even more information on the generation process.

For PHP you can, as a first easy step, enable Output Buffering and restart the test.

Also you should consider testing your page with a benchmarking program such as ApacheBench (ab). This will stress the server via requesting several copies at once.

It is difficult to say what time suffices for generating a web page. It depends on your own requirements. You should try to keep the generation time under 1 second, as this is a delay which users usually can cope with.

3.3. Is it the rendering performance?
This plays only a minor role in my guide, but still this can be a reason why your page takes long to load.

If you use a complex table structure (which can render slowly), you most probably are using old style HTML, try to switch to XHTML and CSS.

Don't use overly complex JavaScript, like slow scripts in combination with onmousemove events make a page real sluggish. If your JavaScript makes the page load slowly (you can use a similar technique as the PHP time measuring, using the (new Date()).getMilliseconds()), you are doing something wrong. Rethink your concept.

4. Determine the lagging component(s)
As your page usually consists of more than one component (such as header, login window, navigation, footer, etc.) you should next check which one needs tuning. You can do this by integrating a few of the measuring fragments to the page which will show you several split times throughout the page.

The following steps can now be applied to the slowest parts of the page.

5. Enable a Compiler Cache
Scripting languages recompile their script upon each request. As there are far more requests to the unchanged script, it makes no sense to compile the script over and over (especially when core development has finished).

For PHP there is amongst others APC (which will probably be integrated with PHP 6), Python stores a compiled version by itself.

6. Look at the DB Queries
At university most complex queries with lots of JOINs and GROUPs are taught, but in real life it can often be useful to avoid JOINs between (especially large) tables. Instead you do multiple selects which can be cached by the SQL server. This is especially true if you don't need the joined data for every row. It really depends on your application, but trying without a JOIN is often worth it.

Ensure that you use query folding (also called query cache; such as the MySQL Query Cache). Because in a web environment the same SELECT statements are executed over and over. This almost screams for a cache (and explains why avoiding JOINs can be much faster).

7. Send the correct Modification Data
Dynamic Web pages often make one big mistake: They don't have their date of last modification set. This means that the browser always has to load the whole page from the server and cannot use its cache.

In HTTP there are various headers important for caching: for 1.0 there is the Last-Modified header which plays together with the browser-sent If-Modified-Since (see specification). HTTP 1.1 uses the ETag (so called Entity Tag) which allows different last modification dates for the same page (e.g. for different languages). Other relevant headers are Cache-Control and Expires.

Read on about how to set the headers correctly and respond to them (1.0) and 1.1.

8. Consider Component Caching (advanced)
If optimizing the database does not improve your generation time enough, you are most likely doing something complex ;)
So for public pages it's very likely that you will present two users with the same content (at least for a specific component). So instead of doing complex database queries, you can store a pre-rendered copy and use that when needed, to save time.

This is a rather complex topic but can be the ultimate solution to your performance problems. You need to make sure that you don't deliver a stale copy to the client, you need think about how to organize your cache files so you can invalidate them quickly.

Most web frameworks give you a hand when doing component caching: for PHP there is Smarty's template caching, Perl has Mason's Data Caching, Ruby's Rails has Page Caching, Django supports it as well.

This technique can eventually lead to a result when loading your page does not need any request to the data base. This can be a favorable result as a connection to the database is often the most obvious bottleneck.

If your page is not that complex you could also consider just caching the whole page. This is easier but makes the page usually feel less up-to-date.

One more thing: If you have enough RAM you should also consider storing the cache files in a RAM drive. As the data is discardable (as it can be re-generated at any time) a loss when rebooting would not matter. Keeping disk I/O low can boost the speed once again.

9. Reducing the Server Load
Consider that your page loads quickly and everything looks alright, but when too many users access the page, it suddenly becomes slow.

This is most likely due to a lack of resources on the server. You cannot add an indefinite amount of CPU power or RAM into the server but you can handle what you've got more carefully.

9.1. Use a Reverse Proxy (needs access to the server)
Whenever a request needs to be handled, a whole copy (or child process) of the web server executable needs to be held in memory. Not only for the time of generating the page but also until the page has been transferred to the client. Slow clients can cost performance. When you have many users connecting, you can be sure that quite a few slow ones will block the line for somebody else just for transferring back the data.

So there is a solution for this. The well known Squid proxy has a HTTP Acceleration mode which handles communication with the client. It's like a secretary that handles all communication.

It waits patiently until the client has filed his request. Asks the web server to respond, quickly receives the response (while the web server can move on to the next request) and then will patiently return the file to the client.

Also the Squid server is small, lightweight, and specialized for that task. Therefore you need less RAM for more clients which allows a higher throughput (regarding served clients per time unit).

9.2. Take a lightweight HTTP Server (needs access to the server)
Often people also say that Apache is quite huge and does not do it's work quickly enough. Personally I am satisfied with its performance, but when it comes to dealing with scripting languages that handle their web server communication via the (fast)CGI interface, Apache is easily trumped by a lightweight alternative.

It's called LightTPD (pronounced "lighty") and does a good job at doing that special task very quickly. You can already see from a configuration file that it keeps things simple.

I suggest testing both scenarios if you gain from using LightTPD or if you should stay with your old web server. The Apache Web Server is stable and is built on long lasting experience in the web server business, but LightTPD is taking it's chance.

10. Server Scaling (extreme technique)
Once you have gone through all steps and your page still does not load fast enough (most obvious because of too many concurrent users), you can now duplicate your hardware. Because of the previous steps there isn't too much work left.

The Reverse Proxy can act as a load balancer by sending its requests to one of the web servers, either quite-randomly (Round Robin) or server load driven.

Conclusion
All in all you can say that the main strategy for a large page is a combination of caching and intelligent handling of the resources helps you reach the goal. While the first 7 steps apply to any page, the last 3 points are usually only useful (and needed) at sites with many concurrent users.

The guide shows that you don't need a special server to withstand slashdotting or digging.

Further Reading
For more detail on each step I recommend taking a look at my diploma thesis.

MySQL tuning is nicely described in Jeremy Zawodny's High Performance MySQL. A presentation about how Yahoo tunes its Apache Servers. Some tips for Websites running on Java. George Schlossnagle gives some good tips for caching in his Advanced PHP Programming. His tips are not restricted to PHP as a scripting language.

digg it, add to delicious

, ,

Announcing Wizlite: Collaborative Page Highlighting

So I'm not the first to write about my project? Well. Nice ;)

Wizlite takes the good old highlighting marker from paper to web. People get different colors and mark important sections on any homepage.

Users can create groups and wizlite away on a certain topic (either private or public).

You'd have to use it to experience how fun it is ;) So: http://wizlite.com/

, , ,

Introducing: Wish-O-Matic

So Christmas is approaching and you need to buy some presents for your friends. So good for the idea, but what to buy?

I've created a little web app for that, called Wish-O-Matic. You choose a few things of which you know that your friend likes them. That's it. That app will tell you what would match these items.
The Amazon Web Service API is used, so the products are somewhat restricted to the products they offer.

Give it a shot: Wish-O-Matic

, , ,

Squid's HTTP Acceleration Mode

I have recently configured a server of mine to use the Squid Cache in HTTP Acceleration mode. So what's this anyway?

A typical request to a webserver looks like this: Client browser opens connection to server port 80, server sends back the data through that connection. For the time of the transfer the server "loses" one child process. So if a client with a slow connection requests a large file this can take some minutes. If many slow clients block child processes, eventually too few will be left for "ordinary" clients.

A solution for this is to prepend a proxy server to the HTTP server. The proxy server is lightweight and does the communication with the client browser. The communication with the web server is done via a high speed interface (either loopback when it's just one server or an lan with 100(0) mbit), so almost no time is spent waiting for a transfer to finish.

Setup is easy, and I've covered this in my thesis already.

But I've got some more real-life info for you.

There are two usual ways for setting this up.

  1. Set the web server to listen on port 81, Squid on 80.
  2. Web server still listens on port 80 but just for 127.0.0.1, the loopback interface. Squid listens on port 80 on the external interface.

What makes number two the favourable is that you are not haveing a server process listening on an unconventional port, and, for redirects (Location: /somewhereelse) the port number is correct (see the corresponding question in the Squid FAQ). For existing configurations with virtual hosts there is no need to change a < VirtualHost *:80> to < VirtualHost *:81>.

So in ports.conf of Apache, for example, you change this:

# Listen 80
Listen 127.0.0.1:80

In squid.conf you do these changes (apart from those listed in my thesis):

# http_port 3128
http_port ip.add.re.ss:80

So this works nice already, but there is one more thing. Now the source address for a http request is 127.0.0.1. So if you want to do some processing with the REMOTE_ADDR, for example in PHP, you'd have to insert something like this before you'd could use the address again.

if (isset($_SERVER["HTTP_VIA"])) {
// squid http accel
$_SERVER["REMOTE_ADDR"] = $_SERVER["HTTP_X_FORWARDED_FOR"];
}

Also in the log files there is now a 127.0.0.1 as source instead of the real ip address. The following changes things back to normal (in apache2.conf):

# LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined

This should be all for now. Happy speed-boosting ;)

Introducing: Blummy

So the project I've been working on lately is called blummy.

blummy is a tool for quick access to your favorite web services via your bookmark toolbar.
It consists of small widgets, called blummlets, which make use of Javascript to provide
rich functionality (such as bookmarklets).

blummy in action

You can create your own blummy by drag-n-dropping blummlets onto it.

It's hard to explain unless you've tried it yourself. So Come along. (it's free, of course)

, , , ,

Competitive Reproduction

We've seen it serveral times in history. Company A launches a new, innovative product and company B takes it, copies it and wins the competition.

Not that we have come so far in just such a short time of AJAX apps but there are quite a few examples:

A nice conversation or rather a little fight on words has started between the listed writing apps: an article in TechCrunch reported about Zoho Writer. In the comments for this article Jason Fried of 37signals (author of Writeboard) and Sam Schillace (from Writely) seem very annoyed about "such blatant rips".

But haven't we seen this earlier? Competition keeps the market alive. You cannot rest on your laurels. You have to prove that you deserve the predominant position. So if such a competitior beats you with your own product you have done something wrong, or simply waited to long. See IBM vs. Microsoft 1980, Microsoft vs. Google 2005. If you're better anyway you don't have to fear a thing.

Some interesting thoughts about this can be found at Bokardo: Web-based Office Competition Heats Up.

Posted in web

Office Web Apps are just Proof-of-Concepts

AJAX applications are far from replacing desktop office apps. So is Flash by the way.

Several projects are trying to prove the opposite. I still think that it will not happen.

The current development is only a rise of quite sophisticated JavaScript applications. We had such applications before but now it's "in" or rather acceptable to use JavaScript extensively. No. It seems to be a must to use JavaScript in new applications now.

I've created JS based applications back in 1997 when I couldn't afford web space with server side scripting. As soon as I started working with PHP I gave it up because servers were clearly faster at generating pages than browsers at interpreting JavaScript.

Rich interfaces were left to Flash at that time. As the Flash Player resides in the browser as a plug-in and operates as a natively compiled program for the platform it is run on, it provides more speed and is not only fairly dependent on browser restrictions. Additionally it is optimized for multimedia operation which made it first choice for complex navigation.

Browsers (or rather the PCs) are now fast enough to support JavaScript apps. And XmlHttpRequest of AJAX has provided the kick-off. We are now seeing rich interfaces done in JavaScript with the possibility of real time server communication for failure fallback.

There are a few points that keep AJAX apps from taking over. They mainly go together with arguments against Flash.

  • We are still caught in a browser. Ordinary web apps sit — by definition, of course — in a web browser. There are no means for accessing the local storage — which is initially a good thing. But when it comes to web apps you need to do all this up- and downloading to use these apps. Or you store everything at their server.
  • We are still caught in a browser. This is also a problem of user interface. "Normal" users have slowly adopted a different way of using interfaces when surfing inside a browser (single click vs. double click). With new interfaces we challenge them to start using web apps in another way once again. We should think about that thoroughly.
  • Web apps want your data. (see What is Web 2.0 by Tim O'Reilly) When using web applications you need to trust that app and give them all your data. Also for security reasons there is no chance to properly store the data on the client side. But even if there was, the web app would already have all your data anyway — as it needs it for processing it.
  • Running complex apps in JavaScript is a waste of CPU power. Our computers have become faster, that's true. But I don't think it's a good idea to use the speed for having a browser execute an app in JavaScript when we have stronger equivalents on our desktop.
  • Flash is a plugin. On the one hand it's a good thing. We have more CPU power. On the other hand it just does not feel right. I cannot use the browser's find function. Brr.

For these reasons I stick to my opinion that most of the web based office apps we see now are just a proof of concept. In near future they will not replace real office apps.

We also need to find methods to be able to effectively share data with our desktop computer. The current solutions I know are far from usable and prevent any ordinary user from getting into such projects.

All in all I am far from being against AJAX apps. But we need to keep the focus on apps where the technique can be applied in a useful way. I see them in the fields of collaboration and communication.

, , , ,

Posted in web