JavaScript Tricks And Good Programming Style – Original Version

Note that there is an updated version

I have been programming for about 10 years now, and I am always longing for improving my code. Throughout time I added a few habbits that I consider to be good practices and increase the quality of my code.

In a loose series I’d like to point out a few of them. As I am currently mostly programming in JavaScript, I will write most of my samples in that language; also some of the tricks I mention only apply to JavaScript. But most of them apply to most programming languages around.

Optional parameter and default value #
When defining a function in PHP you can declare optional parameters by giving them a default value (something like function myfunc($optional = "default value") {}).

In JavaScript it works a bit differently:

var myfunc = function(optional) {
if (typeof optional == "undefined") {
optional = "default value";
}
alert(optional);
}

This is a clean method to do it. Basically I pretty much recommend the use of typeof operator. Some people would do the above with a if (!optional), but my version works cross browser (e.g. Safari will throw an error when you try to negate null).

Parameters Hints #
The larger your app gets, the more functions you get which you would use throughout the app. It also creates a problem with maintenance. As each function can contain multiple arguments it is not unlikely that you forget what those parameters were for (especially for boolean variables) or mix up their sequence (I am especially gifted for that).

So what I do is this:

var myfunc2 = function(title, enable_notify) {
// [...]
}
myfunc2(title = "test", enable_notify = true);

This piece of code relies on the functionality of programming languages that the return value of an assignment is the assigned value. (This is something that you should also maintain in your app, for example with database storage calls, give the assignment value as a return value. It’s minimal effort and you might be glad at some point that you did it).

If you do this you can see at any point in the code, what parameters the function takes. Of course this is not always useful, but especially for functions with many parameters it gets very useful.

Also be careful that you would override the variable names in the scope of which you are calling the function. You might mini-namespace the variables, e.g. with letter+underscore (p_title, p_enable_notify).

Search JavaScript documentation #
When I need some documentation for JavaScript I use the mozilla development center (mdc). To quickly search for toLocaleString, I use Google: http://google.com/search?q=toLocaleString+mdc

As I am a German speaker I also use the excellent (though a bit out-dated) JavaScript section SelfHTML. I use the downloaded version on my own computer for even faster access.

The self variable #
This technique comes from Private Members in JavaScript by Douglas Crockford. By assigning a value in a function to this.value it will be publically accessible afterwards.

function Animal(name) {
this.name = name;
var self = this;
this.hello = function() {
alert("hello " + self.name);
//alert("hello " + this.name); // would fail
}
}
var dog = new Animal("Jake");
button = {
onclick = dog.hello;
}
button.onclick();

The cause of this problem is that the this keyword receives different values in different contexts. See here for a closer explanation.

Problem with this solution is that I am not absolutely sure if this creates a memory leak in internet explorer

Reduce indentation amount #
One of the most annoying things I find in other people’s code is this: (multiple) nested if clauses. Something like this:

var arr = ["dog", "cat"];
var action = 'greet';
for(i = 0, ln = arr.length; i < ln; i++) { animal = arr[i]; if (animal == "cat") { alert("hello " + animal); } }

This is only a short example, but I often saw this going deep into 10 levels of nested clauses. I suggest using the break and continue (and next in Perl):

var arr = ["dog", "cat"];
for(i = 0, ln = arr.length; i < ln; i++) { animal = arr[i]; if (animal != "cat") continue; alert("hello " + animal); }

This accomplishes the same with only one level of indentation. One more example for a function:

function greet_animal(animal) {
if (typeof animal == "undefined") return;
if (animal != "cat") return;
alert("hello " + animal);
}

Javascript is one of the few languages where you can leave the return value empty (i.e. typeof greet_animal() == "undefined"). You might want to rather use return false so that you can easily determine if the function failed for some reason.

, ,

Unknown Blummy Treasures

Steve Rubel re-mentions blummy on his blog. Thanks a lot!

I’d like to take this chance to show you some of the lesser known blummlets (or bookmarklets if you prefer). Take a look at how I have configured my blummy.

blummy: alexa

Alexa

The favorite of my blummlets (and in this case it deserves to be called like that, as it is an image), is the Alexa blummlet on the bottom. I just have to open blummy on any page on the web and I can instantly see how much traffic it has (well as far as you can count on Alexa on that).

When you add the blummlet you might want to resize it to fit yours. So first add it to your blummy, then resize the blummlet by holding shift and drag from within that blummlet. Voilá.

Click2Zap

blummy: click2zap

Click2Zap by Steve Clay (site is currently down) is something for you if you print web pages a lot. I dislike reading long texts on-screen (waiting for the Sony PRS-500), I print blog texts a lot.

Using click2zap you can remove unnecessary text or links from a web page before you print it. You can click on a paragraph, a red dotted border will appear. You click again, it’s deleted. Very neat.

BlummyWiki

What Google Notebook has added to Firefox, has been possible through Blummy for a long time. Cross-browser.

blummywiki: google notebook

When you surf around you can use the BlummyWiki to store information you find on a page for later use. Or maintain a list of your most often visited pages. Most useful thing about it is that can use the same blummy account on multiple browsers and multiple computers.

Andy Roberts already had some ideas what to do with the BlummyWiki. Check it out.

, , , , ,

Firefox 1.5, XmlHttpRequest, req.responseXML and document.domain

Recently I have been working on a web application, extending it with an iframe on another subdomain.

When you set up communication with an iframe on another subdomain, it works by setting document.domain in both pages. Pretty nice and straight forward.
But it can mess up the rest of your page.

As soon as you have set document.domain you should be able to do an XHR to your original domain according to the same domain policy.

This will work in IE, Safari, and Opera.
This will not work in Firefox 1.0. This is very awkward but at least it has been fixed in 1.5.
So it will work in Firefox 1.5. But:

The responseXML object is useless. You can’t access it, you receive a Permission Denied when trying to access it’s content (e.g. documentElement). Very annoying.
Even stranger that responseText is still readable. What’s the reason for this? Is there some security risk i am unaware of or is it a plain bug?

As the responseText is available there is a pretty simple fix: re-parse the XML, which is kinda stupid and cpu intense if you have a lot of them. (something like: var doc =
(new DOMParser()).parseFromString(req.responseText, "text/xml");
)

I have some sample code available here.

Apparently a bug report has been filed at 1.5.0.1. No response from developers. Great.
Unfortunately it has only been filed for OSX, but it also afffects Windows Firefox.

Mozilla guys, fix this ASAP.

Update 2007-06-21: Things seem to start moving, we will likely have a fix for Firefox 3.

, , ,

Misuse of the Array Object in JavaScript

There is a very good post about Associative Arrays considered harmful by Andrew Dupont.

The title is a bit misleading but correct. When coming accross a piece of JavaScript like this
foo["test"] = 1;
there is nothing wrong about it. It’s the basic usage scheme of assoziative arrays. Or should i rather say objects?

While in languages such as PHP arrays used like this $foo = array("test" => 1); is perfectly correct.

In JavaScript
var foo = new Array();
foo["test"] = 1;

works but does not do what you want.

I don’t need to repeat Andrew’s really great post, but basically you should use Object instead of Array.

var foo = new Object(); // same as var foo = {};
foo["test"] = 1; // same as foo.test = 1;

Now go and read Andrew’s post.

via Erik Arvidsson.

btw: that post lead me to Object.prototype is verboten which explains for me why my for (i in myvar) {} loops never worked correctly. I was using prototype.js version < 1.4 (which messed with Object.prototype).

, , , ,

Blummy wins Web 2.0 Bookmarking Award!

SEOmoz Web 2.0 Awards - WinnerI’m proud to announce that Blummy has been voted to first place of the bookmarking category by the Web 2.0 Awards by seomoz.org.

Blummy placed in front of Looksmart’s Furl.net and it’s similar sounding competitor spurl.net.

As all the other winners of each category, I have also been interviewed. I can highly recommend reading the other interviews, with

SEOmoz Web 2.0 Awards - Blummy Bookmarking Winner

Thanks for the award and congratulations for the excellent choice of nominees and winners.

, ,

Looking into the Skype Protocol

As you all know, Skype is a very popular Voice-over-IP software. Skype also claims that all its communication is encrypted (which raised some discussion whether you should considered a criminal (digged also) if you “hide away” from eavesdropping).

Philippe Biondi and Fabrice Desclaux from EADS held a talk at Blackhat Europe conference where they show their latest discoveries.

The talk is rather technical and might be hard to understand. I picked some of the most interesting points:

  • Almost everything is obfuscated (looks almost random)
    This is a sign for good use of encryption.
  • Automatically reuse proxy credentials
    When Skype gets to know how to use your proxy, it will hand on the information to other Skypes.
  • Traffic even when the software is not used (pings, relaying).
    I heard quite a few times of some office PCs being promoted to Supernodes, generating enormous traffic.
  • No clear identification of the destination peer
    The destination IP is not disclosed to a firewall for example, network administrators can’t block certain IPs.
  • Many protections, antidebugging tricks, and ciphered code
    This is an attempt to protect themselves from spies (i.e. hackers, government) but it might also hide away secret backdoors or them spying. This is often a problem of closed software.
    Using this techniques also hinders open source or simply 3rd party software from building compatible clients.

In Skype’s FAQ they state that they use AES encryption. This seems to be proved and seems a good thing, but they embed the data into a proprietary protocol which may have its drawbacks and is incompatible to others. It’s their right to do so, but this gives much power to those who know about the inner workings (this does not necessarily only include Skype).

They give as a conclusion:

  • Impossible to protect from attacks (which would be obfuscated)
    It basically means that we have to trust Skype that they keep up their secrets. There are very many users which makes the Skype audience an interesting target.
  • Total blackbox. Lack of transparency. No way to know if there is/will be a backdoor
  • Skype was made by clever people; Good use of cryptography
    They admit that it was built in a good way. But it’s like the government that may be suspicious if you encrypt all your communication. Skype encrypts everything and itself. Should we be suspicious?

Further readings: Skype network structure, Skype’s Guide for Network Administrators

, ,

digg it

Posted in web

Delicious Interface Updates

Today del.icio.us did some really nice interface updates.

In the first place, they announced inline editing which is very slick. You just click on “edit” on the “your bookmarks” page and you can edit the item right away.

They also updated the URL page which looks very nice and tidy now.

These updates don’t affect blummy, you can still use it to add your bookmarks from any page. If you haven’t seen it, give it a try.

The announcement also says that private bookmarking (one of the big missing features) will be released next week.

digg it, add to del.icio.us

, ,

A better understanding of JavaScript

I’ve been working with JavaScript for years. It was my replacement for a server side language when I couldn’t afford to buy web space in the mid-90’s. Still, as the language becomes popular again, I recognized that I did understand the basics but there was much more to the language.

digg it, add to delicious

So I dug into the topic a little deeper. I can highly recommend reading the blogs of all the great JavaScript guys like Alex Russell (of Dojo), Aaron Boodman, Erik Arvidsson (both at Google), Douglas Crockford (at Yahoo). (Give me more in the comments ;)

So, JavaScript is easy to start with. You can take a procedural approach like in C. You declare a function, you call a function.

A Survey of the JavaScript Programming Language (by Douglas Crockford) does an amazing job at explaining the notable aspects of the language on a quite short page.

I want to point out the most interesting points for me:

Subscript and dot notation
You can access a member of an object by using two different notations:

var y = { p: 1 };
alert(y["p"]); // subscript notation
alert(y.p); // dot notation

The great difference is that with subscript notation you can also access member vars that contain reserved words (of which there quite a few in JavaScript). Dot notation is shorter and more convenient.

Different meanings of the this keyword
Consider this piece of code creating a small object.

click here
<script type="text/javascript">
var myobject = {
id: 'obj',
method: function() {
alert(this.id);
}
};
myobject.method();
var l = document.getElementById("link");
l.onclick = myobject.method;
</script>

When you call myobject.method();, this points to the current object and you receive an alert box with the text ‘obj’. But there are exceptions:

If you call this function from within a HTML page via an onclick event, this is refers to the calling object (i.e. the link). You will therefore receive and alert box containing ‘link’ as message.

This can be useful in many cases, but if you want to access “your” object, you can’t. Aaron Boodman proposed a function that was eventually named hitch:

function hitch(obj, meth) {
return function() { return obj[meth].apply(obj, arguments); }
}

You’d use it like this: l.onclick=hitch(myobject, 'method'); Now the this keyword points at the correct object.

You could also change the function to something like this and still use the previous notation:

method = function() {
if (this != myobject) { return myobject.method(arguments); }
alert(this.id);
}

Creating objects with new
I was always wondering how to create objects from a class as I am used to with other programming languages, which means that by instanciating the object is created according to the “building instructions” of a class.

Douglas shows this in more detail on his Private Members in JavaScript page.

I’ve quickly hacked together this example:

var x = function () {
var created = new Date();
this.when = function () { alert(created); }
}
var p, u = new x();
window.setTimeout("n()", 1000);
function n () {
p = new x();
p.when();
u.when();
alert(typeof p.created);
}

You receive 2 objects p and u that have different creation times. They also have a private variable created which is only accessible via the public function when (because specified via this).

So even as you create an object by using the new Object() or {} notation, you only receive a static object. If you want to instanciate it, you need to create it as function.

Closures
The example above already demonstrated closures. The fact that closures exist in JavaScript make it only possible to create private variables.

A closure is, to put it simply, a function within another function. The inner function has access to it’s parents variables but not the other way round.

All together a function is just another data type that can be assigned to a variable. Therefore these two notations can be used interchangably:

function test() { alert(new Date()); }
var test = function() { alert(new Date()); }

The ominous prototype “object” is a way of using the this keyword from “outside”.
Modifying the piece of code from before:

var x = function () {
var created = new Date();
}
x.prototype.when = function () { alert(created); }

But there’s a pitfall. The created variable is private. Even though the function when now is a member of the object x it does not “see” the variable created. So in the original example the function when had privileged access (see Private Members in JavaScript).

Concluding
All in all I see that JavaScript is a powerful language. Many things that can be accomplished in an elegant (and sometimes quite unusual) way. (Curried JavaScript demonstrates even how to use it as a functional programming language)

I realize that there is a nice and clean solution for almost every problem you come across. This is where libraries come into play. The downside: you can quickly add tons of libraries, leading to large page sizes and memory consumption.

dojo for example is a really great library that provides you with numerous well thought-out functions, making your life a lot easier. But the size is 132 KB, just for the basic functions. More than a mega byte all in all. It circumvents needing to load everything by an in time loading mechanism (dojo.require).

In my opinion we’d need something like a local library storage. A Firefox extension would be a nice first step.
As far as I have looked into that topic, though, there are some difficulties. Foremost there is a problem with namespaces. Firefox clearly separates JS code by extensions from those coming from the web. A good thing, security-wise, but hindering in this case.

Maybe some Firefox guru can tell a way how to circumvent this, I think it might be worth a shot.

digg it, add to delicious

, , ,

Feature Updates (close and turn off password query)

Some new features now, mainly fulfilling public requests which I had to admit opened a few annoyances.

  • [Feature] Close Blummy after clicking on a blummlet. Blummy will disappear after you have clicked a blummlet. This option is under advanced view on the config screen and under Preferences.
  • [Feature] You can now turn off the password request when using blummy. Although most browsers should give you the option to remember the username/password combination, still here and there the query poped up. So now you can make your blummy available to everyone again (so it will omit the query).

I also created a small donation page. If you think that blummy is good value, I’d appreciate a small donation. Thanks!

10 Realistic Steps to a Faster Web Site

I complained before about bad guides to improve the performance of your website.

digg it, add to delicious

I’d like to give you a more realistic guide on how to achieve the goal. I have written my master thesis in computer sciences on this topic and will refer to it throughout the guide.

1. Determine the bottleneck
When you want to improve the speed of your website, you feel that it’s somehow slow. There are various points that can affect the performance of your page. Here are the most common ones.

Before we move on, you should always remember that you answer each question with your target audience in mind.

1.1. File Size
How much data is the user required to load before (s)he can use the page.

It is a frequent question, how much data your web page is allowed to have. You cannot answer this unless you know your target audience.

In the early years of the internet one would suggest a size of 30k max for the whole page (including images, etc.). Now that many people have a broadband connection, I think we can push the level to a value between 60k and 100k. Although, you should consider lowering the size if you also target modem users.

Still, the less data you require to download, the faster your page will appear.

1.2. Latency
The time it takes between your request to the server and when the data reaches your PC.

This time adds together from twice the network latency (which depends on the uplink of the hosting provider, the geographical distance between server and user, and some other factors) and the time it takes until the server produces the output.

Network latency can hardly be optimized without moving the server, so this guide will not cover this.
The processing time of the server combines complex time factors and contains most often much room for improvement.

2. Reducing the file size
First, you need to know how large your page really is. There are some useful tools out there. I picked Web Page Analyzer which does a nice job at this.

I suggest not spending too much time on this, unless your page size is larger than 100kb. So skip to step 3.

Large page sizes are nowadays often caused by large JavaScript libraries. Often you only need a small part of their functionality, so you could use a cut-down version of it. For example when using prototype.js just for Ajax, you could use pt.ajax.js (also see moo.ajax), or the moo.fx as a script.aculo.us replacement.

Digg for example used to have about 290kb, they now have reduced the size to 160kb by leaving out unnecessary libraries.

Also large images can cause large file sizes, this is often caused by the wrong image format. A rule of thumb: JPG for photos, PNG for most other aspects, especially if plain colors are involved. Also: use PNG for screen shots, JPGs are not only larger but also look ugly. You can also use GIF instead of PNG when the image has only few colors and/or you want to create an animation.

Also often large images are scaled via the HTML width and height attributes. You should do this in your graphical editor and scale it there. This will also reduce the size.

Old HTML style can also cause large file size. There is no need for thousands of tags anymore. Use XHTML and CSS!

A further important step to smaller size is on-the-fly compressing of your content. Almost all browsers already support gzip compression. For an Apache 2 web server, for example, there is the mod_deflate module can do this transparently for you.

If you don’t have access to your server’s configuration, you can use the zlib for PHP or for Django (Python) there is GZipMiddleware, Ruby on Rails has a gzip plugin, too.

Beware of compressing JavaScript, there are quite some bugs with Internet Explorer.

And for heaven’s sake, you can also strip the white space after you’ve completed the previous steps.

3. Check what’s causing a high latency
As mentioned, the latency can be caused by two large factors.

3.1. Is it the network latency?
To determine whether the network latency is the blocking factor you can ping your server. This can be done from the command line via the command ping servername.com

If your server admin has disabled the pinging function you can also use a traceroute which uses another method to determine the time tracert servername.com (Windows) or traceroute servername.com (Unix).

If you address an audience that is geographically not very close to you, you can also use a service such as Just Ping which pings the given address from 12 different locations in the world.

3.2. Does it take too long to generate the page?
If the ping times are ok, it might take too long to generate the page. Note that this applies to dynamic pages, for example written in a scripting language such as PHP. Static pages are usually served very quickly.

You can measure the time it takes to generate the page quite easily. You just need to save an time stamp at the beginning of the page and subtract it from the time stamp when the page has been generated. For example in PHP you do it like this (due to technical restrictions a space is inserted before the question mark):

< ?php // Start of the Page $start_time = explode(' ', microtime()); $start_time = $start_time[1] + $start_time[0]; ?>

and at the end of the page:

< ?php $end_time = explode(' ', microtime()); $total_time = $end_time[0] + $end_time[1] - $start_time; printf('Page loaded in %.3f seconds.', $total_time); ?>

The time needed to generate the page is now displayed at the bottom of it.

You can also compare the time between loading a static page (often a file ending in .html) and a dynamic one. I’d advise to use the first method because you are going to need that method to go on optimizing the page.

You can also use a Profiler which usually offers even more information on the generation process.

For PHP you can, as a first easy step, enable Output Buffering and restart the test.

Also you should consider testing your page with a benchmarking program such as ApacheBench (ab). This will stress the server via requesting several copies at once.

It is difficult to say what time suffices for generating a web page. It depends on your own requirements. You should try to keep the generation time under 1 second, as this is a delay which users usually can cope with.

3.3. Is it the rendering performance?
This plays only a minor role in my guide, but still this can be a reason why your page takes long to load.

If you use a complex table structure (which can render slowly), you most probably are using old style HTML, try to switch to XHTML and CSS.

Don’t use overly complex JavaScript, like slow scripts in combination with onmousemove events make a page real sluggish. If your JavaScript makes the page load slowly (you can use a similar technique as the PHP time measuring, using the (new Date()).getMilliseconds()), you are doing something wrong. Rethink your concept.

4. Determine the lagging component(s)
As your page usually consists of more than one component (such as header, login window, navigation, footer, etc.) you should next check which one needs tuning. You can do this by integrating a few of the measuring fragments to the page which will show you several split times throughout the page.

The following steps can now be applied to the slowest parts of the page.

5. Enable a Compiler Cache
Scripting languages recompile their script upon each request. As there are far more requests to the unchanged script, it makes no sense to compile the script over and over (especially when core development has finished).

For PHP there is amongst others APC (which will probably be integrated with PHP 6), Python stores a compiled version by itself.

6. Look at the DB Queries
At university most complex queries with lots of JOINs and GROUPs are taught, but in real life it can often be useful to avoid JOINs between (especially large) tables. Instead you do multiple selects which can be cached by the SQL server. This is especially true if you don’t need the joined data for every row. It really depends on your application, but trying without a JOIN is often worth it.

Ensure that you use query folding (also called query cache; such as the MySQL Query Cache). Because in a web environment the same SELECT statements are executed over and over. This almost screams for a cache (and explains why avoiding JOINs can be much faster).

7. Send the correct Modification Data
Dynamic Web pages often make one big mistake: They don’t have their date of last modification set. This means that the browser always has to load the whole page from the server and cannot use its cache.

In HTTP there are various headers important for caching: for 1.0 there is the Last-Modified header which plays together with the browser-sent If-Modified-Since (see specification). HTTP 1.1 uses the ETag (so called Entity Tag) which allows different last modification dates for the same page (e.g. for different languages). Other relevant headers are Cache-Control and Expires.

Read on about how to set the headers correctly and respond to them (1.0) and 1.1.

8. Consider Component Caching (advanced)
If optimizing the database does not improve your generation time enough, you are most likely doing something complex ;)
So for public pages it’s very likely that you will present two users with the same content (at least for a specific component). So instead of doing complex database queries, you can store a pre-rendered copy and use that when needed, to save time.

This is a rather complex topic but can be the ultimate solution to your performance problems. You need to make sure that you don’t deliver a stale copy to the client, you need think about how to organize your cache files so you can invalidate them quickly.

Most web frameworks give you a hand when doing component caching: for PHP there is Smarty’s template caching, Perl has Mason’s Data Caching, Ruby’s Rails has Page Caching, Django supports it as well.

This technique can eventually lead to a result when loading your page does not need any request to the data base. This can be a favorable result as a connection to the database is often the most obvious bottleneck.

If your page is not that complex you could also consider just caching the whole page. This is easier but makes the page usually feel less up-to-date.

One more thing: If you have enough RAM you should also consider storing the cache files in a RAM drive. As the data is discardable (as it can be re-generated at any time) a loss when rebooting would not matter. Keeping disk I/O low can boost the speed once again.

9. Reducing the Server Load
Consider that your page loads quickly and everything looks alright, but when too many users access the page, it suddenly becomes slow.

This is most likely due to a lack of resources on the server. You cannot add an indefinite amount of CPU power or RAM into the server but you can handle what you’ve got more carefully.

9.1. Use a Reverse Proxy (needs access to the server)
Whenever a request needs to be handled, a whole copy (or child process) of the web server executable needs to be held in memory. Not only for the time of generating the page but also until the page has been transferred to the client. Slow clients can cost performance. When you have many users connecting, you can be sure that quite a few slow ones will block the line for somebody else just for transferring back the data.

So there is a solution for this. The well known Squid proxy has a HTTP Acceleration mode which handles communication with the client. It’s like a secretary that handles all communication.

It waits patiently until the client has filed his request. Asks the web server to respond, quickly receives the response (while the web server can move on to the next request) and then will patiently return the file to the client.

Also the Squid server is small, lightweight, and specialized for that task. Therefore you need less RAM for more clients which allows a higher throughput (regarding served clients per time unit).

9.2. Take a lightweight HTTP Server (needs access to the server)
Often people also say that Apache is quite huge and does not do it’s work quickly enough. Personally I am satisfied with its performance, but when it comes to dealing with scripting languages that handle their web server communication via the (fast)CGI interface, Apache is easily trumped by a lightweight alternative.

It’s called LightTPD (pronounced “lighty”) and does a good job at doing that special task very quickly. You can already see from a configuration file that it keeps things simple.

I suggest testing both scenarios if you gain from using LightTPD or if you should stay with your old web server. The Apache Web Server is stable and is built on long lasting experience in the web server business, but LightTPD is taking it’s chance.

10. Server Scaling (extreme technique)
Once you have gone through all steps and your page still does not load fast enough (most obvious because of too many concurrent users), you can now duplicate your hardware. Because of the previous steps there isn’t too much work left.

The Reverse Proxy can act as a load balancer by sending its requests to one of the web servers, either quite-randomly (Round Robin) or server load driven.

Conclusion
All in all you can say that the main strategy for a large page is a combination of caching and intelligent handling of the resources helps you reach the goal. While the first 7 steps apply to any page, the last 3 points are usually only useful (and needed) at sites with many concurrent users.

The guide shows that you don’t need a special server to withstand slashdotting or digging.

Further Reading
For more detail on each step I recommend taking a look at my diploma thesis.

MySQL tuning is nicely described in Jeremy Zawodny’s High Performance MySQL. A presentation about how Yahoo tunes its Apache Servers. Some tips for Websites running on Java. George Schlossnagle gives some good tips for caching in his Advanced PHP Programming. His tips are not restricted to PHP as a scripting language.

digg it, add to delicious

, ,

Blummy: Major Update

I’m proud to announce a major update of Blummy.

Blummy now includes blummyWiki, a small notebook-wiki that can be displayed within the current page. So you could store favorite URLs or commonly used information there which will be available on any page just like Blummy. And of course it’s crossbrowser and crosscomputer.

In my opinion this is a very unique and useful feature.

There is now also custom color and css support. You can browse other users’ Blummys via the “Explore” link in the menu.
Furthermore Blummy is now secured by password at last (so private blummlets are quite secure now).

A few words on the blummyWiki: It is invoked by switching an option on the Prefs page or by using one of these blummlets: open on the left, open below and open on the right. (display all three blummlets in configuration view).

, ,

Announcing Wizlite: Collaborative Page Highlighting

So I’m not the first to write about my project? Well. Nice ;)

Wizlite takes the good old highlighting marker from paper to web. People get different colors and mark important sections on any homepage.

Users can create groups and wizlite away on a certain topic (either private or public).

You’d have to use it to experience how fun it is ;) So: http://wizlite.com/

, , ,

Speed up your page, but how?

Today I ran accross the blog entry by Marcelo Calbucci, called "Web Developers: Speed up your pages!".

It’s a typical example of good idea, bad execution. Most of the points he mentions are really bad practice.

He suggests reducing traffic (and therefore loading time) by removing whitespace from the source code, to write all code in lower case (for better compression?!?), reduce code by writing invalid xhtml, and to keep javascript function names and variables short. This is nit-picking. And results in a maintenance nightmare.

For big sites, e.g. Google, the white space reduction tricks make sense. But they have enourmous numbers of page impressions. Saving 200 bytes tops by stripping whitespace is nearly worthless for smaller sites. And not worth the trouble. Additionally I bet that Google does not maintain that page as such, but has created some kind of conversion script.

Other thoughts are quite nice but commonplace. Most of the comments (e.g. by Sarah) posted at that article reflect my opinion quite well and deal with each point in more detail.

For most dynamic pages the bottleneck for responding to a client request is the script loading (or running) time. I suggest the writer to read some articles about server caching (my thesis also deals with that topic) and optimization.

Often also the latency between client and server can be held responsible for considerable delays. As the client has to parse the HTML file to decide what files to load next, delays can sum up.

All in all, it’s a good idea to deal with the loading time of a page. But you have to search at the right place.

, , , ,

Further Speed Improvements

I have decided to turn on caching to full throttle.

This means that I don’t request the browser any more to check back with blummy.com (to see if something has changed in the configuration) when you click on your blummy bookmarklet. So if you have high latency times between you (e.g. when you are living on the opposite site of the planet) and blummy.com this will most probably be good for you.

The bad side — and that’s why I haven’t done this before — is that you might run into a stale copy of blummy more easily when you do some re-configuration. So here are two hints if you don’t see what you have configured:

  • Delete the blummy bookmarklet and re-drag it from the configuration page. There is a little random value that should prevent a stale copy.
  • Clear your browser cache.

I hope that you encounter a further speed increase with blummy and keep on spreading the word ;)

Introducing: Wish-O-Matic

So Christmas is approaching and you need to buy some presents for your friends. So good for the idea, but what to buy?

I’ve created a little web app for that, called Wish-O-Matic. You choose a few things of which you know that your friend likes them. That’s it. That app will tell you what would match these items.
The Amazon Web Service API is used, so the products are somewhat restricted to the products they offer.

Give it a shot: Wish-O-Matic

, , ,

Create your blummlet: Wizard

Do you miss a service on blummy? Well, create it for yourself. Don’t have a clue how Javascript works? Does not matter anymore.

  • [Feature] Wizard for create your own blummlet

This should make it quite easy to create a blummlet for any service yourself.

Because space is a little limited there, I give a more detailed introduction here:

Suppose there was no blummlet for searching at Google and you wanted to create one:

  1. First, log in and go to the create a blummlet page. Open the wizard there.
  2. Open a new window or tab and navigate to the service (i.e. http://www.google.com/).
  3. Enter the query BLUMMY and submit the form. Wait for the results to appear.
  4. Click the location bar of your browser and copy the whole URL (mark the text and press Ctrl-C or Cmd-C). This would be http://www.google.com/search?hl=en&q=BLUMMY&btnG=Google+Search for me.
  5. Go back to the create blummlet page and type in the service name (i.e. Google) and paste the URL to the corresponding field (Ctrl-V or Cmd-V).
  6. Click “Create”.
  7. The wizard should have disappeared and most of the fields have been filled out. Now enter a description and click “Save”. Done.

,

Loading Blummy…

People have been complaining that it took more than one click to open blummy. The problem is that it may take some time to transfer the blummy script source to your computer (due to slow server, network, whatever). So after the first click, blummy is loaded from blummy.com, when you click again (before blummy appears), you instantly close blummy again, so nothing happens. Only after the third click blummy opens.

Now our user fuska had the great idea to display a small “loading blummy” text, so the user knows what’s going on.

  • [Feature] “Loading blummy…” informs the user about the current ongoings.

The downside of this update is that you need to replace your blummy bookmarklet (as the loading text is displayed by the bookmarklet to do that instantaneously). So go to blummy.com, log in, and drag blummy to your toolbar again. That should do the trick.

Thanks, fuska!

, ,

Squid’s HTTP Acceleration Mode

I have recently configured a server of mine to use the Squid Cache in HTTP Acceleration mode. So what’s this anyway?

A typical request to a webserver looks like this: Client browser opens connection to server port 80, server sends back the data through that connection. For the time of the transfer the server “loses” one child process. So if a client with a slow connection requests a large file this can take some minutes. If many slow clients block child processes, eventually too few will be left for “ordinary” clients.

A solution for this is to prepend a proxy server to the HTTP server. The proxy server is lightweight and does the communication with the client browser. The communication with the web server is done via a high speed interface (either loopback when it’s just one server or an lan with 100(0) mbit), so almost no time is spent waiting for a transfer to finish.

Setup is easy, and I’ve covered this in my thesis already.

But I’ve got some more real-life info for you.

There are two usual ways for setting this up.

  1. Set the web server to listen on port 81, Squid on 80.
  2. Web server still listens on port 80 but just for 127.0.0.1, the loopback interface. Squid listens on port 80 on the external interface.

What makes number two the favourable is that you are not haveing a server process listening on an unconventional port, and, for redirects (Location: /somewhereelse) the port number is correct (see the corresponding question in the Squid FAQ). For existing configurations with virtual hosts there is no need to change a < VirtualHost *:80> to < VirtualHost *:81>.

So in ports.conf of Apache, for example, you change this:

# Listen 80
Listen 127.0.0.1:80

In squid.conf you do these changes (apart from those listed in my thesis):

# http_port 3128
http_port ip.add.re.ss:80

So this works nice already, but there is one more thing. Now the source address for a http request is 127.0.0.1. So if you want to do some processing with the REMOTE_ADDR, for example in PHP, you’d have to insert something like this before you’d could use the address again.

if (isset($_SERVER["HTTP_VIA"])) {
// squid http accel
$_SERVER["REMOTE_ADDR"] = $_SERVER["HTTP_X_FORWARDED_FOR"];
}

Also in the log files there is now a 127.0.0.1 as source instead of the real ip address. The following changes things back to normal (in apache2.conf):

# LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined

This should be all for now. Happy speed-boosting ;)

Favicon caching

Hello there,

You may have noticed that your blummy now opens a bit faster. This is due to a new feature:

  • [Feature] Favicons are cached on the blummy server.

So now your blummy will not hesitate to open because some server is not ready to serve his icon.

I hope this causes some speed-ups ;)

PS: In the first few hours there could be some hick-ups with caching, as some servers might be down while filling up the cache.

Using a Feedback Form for Spam

Have you ever received weird spam via the feedback form of your site? Something with your own address as sender or with some Mime stuff in the body? Your form is likely to be misused for spamming.

How does it work?

For PHP, for example, there is the mail function that can be used to easily send an e-mail. Most probably you’d use some code like this to send the message from your feedback form.

< ?php $msg .= "Name: " . $_POST["name"] . "\n"; $msg .= "E-Mail: " . $_POST["email"] . "\n"; $msg .= $_POST["msg"]; mail("my.e.mail@addr.es", "feedback from my site", $msg); ?>

That’s simple and works well, but it’s a little annoying if you want to answer that e-mail. You click the e-mail address to open a new message and have to paste the whole message into the new window for quoting. There’s an easy solution: Pretend that the e-mail comes from the customer requesting some info. This can be simply done via the additional_headers parameter of mail.

< ?php $sender = "From: " . $_POST["email"] . "\r\n"; $sender = "From: " . $_POST["name"] . " <" . $_POST["email"] . ">\r\n"; // even nicer, shows the name, not the address
mail("my.e.mail@addr.es", "feedback from my site", $msg, $sender);
?>

Well. We’ve just introduced 2 potential spamming opportunities. Why? Let’s see. For mail transport we use SMTP. Our outgoing mail might look like this (generated by mail).

From: tester < test@test.com>
To: my.e.mail@addr.es
Subject: feedback from my site

this is my message

(Before, the From would have looked something like From: webserver@mydomain.com)
So if the spammer manages to insert another field (like To, CC, or BCC), not only we would receive that e-mail but also the guy entered as CC. This works by inserting a line break into the name or e-mail address. For example, for a given name such as

Alex
CC: other@e.mail.addr.es

that would be the case.
Although this is usually not possible through a normal textbox () a post request can easily be constructed containing that linebreak and the malicious CC.

So be sure to strip out at least the characters \r and \n from name or e-mail address or just strip out any non-latin characters (people with german umlauts in their names, for example, will have to live with that).

So a quite good method would be to use this piece of code:

$name = preg_replace("|[^a-z0-9 \-.,]|i", "", $_POST["name"]);
$email = preg_replace("|[^a-z0-9@.]|i", "", $_POST["email"]);
$sender = "From: " . $name . " < " . $email . ">\r\n";
mail("my.e.mail@addr.es", "feedback from my site", $msg, $sender);

The conclusion is simple (and always the same one): Never trust any data you receive from a user.
Verify all data you receive and strip potentially harmful characters. Common bad characters are:

  • for mails: \r, \n,
  • for HTML: < , > (you could use htmlspecialchars for that),
  • for URLs: &, =,
  • complete the list in the comments ;)

Ah, the conclusion. Never trust any data you receive from a user.

, , ,

Moderation in place

The latest update to blummy covers moderation.
I have cleaned up the blummlets so that there should not be too many blummlets doing the same thing left. While cleaning up I had to merge a few blummlets; if everything went okay you wouldn’t have noticed anything, if not, a blummlet might have disappeared from your blummy. Sorry for that.

  • [GUI] A check next to a blummlet means that it has been verified by a moderator.
  • [Feature] In preferences you can check whether you want to see unverified blummlets, too (default is no)
  • [Feature] You can now choose to open a blummlet in a new Window or Tab (Preferences under advanced). This is actually a not seen feature for bookmarklets. It is established by using the Blummy library function Blummy.href() which resembles location.href = url as Blummy.href(url).
  • [GUI] Cleaned up the preferences window a bit.

With cleaning up I moved almost each blummlet using location.href to use Blummy.href so that for most of them the “open in tab” feature should work. Additionally I have converted all the document.selection/window.getSelection etc. to Blummy.getSelection so that using the selection should work in almost any browser now.

,

Now with Preferences

So the latest release is online. The additions this time:

  • [GUI] There is now a simple and advanced mode.
  • [Feature] A Preferences page is now available.
  • [App] You can now change blummy’s opacity in the Preferences.
  • [App] You can now change the close button position in blummy in the Preferences.
  • [Feature] There is now a Save button if XHR does not work for you in configuration.
  • [GUI] A small Flash movie on the Start Page shows how blummy works.