Prototype: Create a Website from a Screenshot and Refine It, All in the Browser

I’ve been working on this experiment, combining OpenAI’s gpt-4-vision-preview with WordPress Playground to create a website based on a screenshot.

This follows on the heels of Matt Mullenweg’s announcement at the 2023 State of the Word that in 2024, the WordPress project wants to work on Data Liberation.

While the typical approach to migrating data is to build importers for specific services, a truely universal migration could happen through screenshots. With AI vision this might now be in reach.

So I built this prototype that combines a OpenAI-powered chat interface with WordPress Playground. First a screenshot, a screen recording further down.

So this is only a start. The website somewhat resembles the screenshot but it’s far from being pixel perfect.

My idea is that you’ll work together with the assistant in refining the site. It can help you update, you can ask it questions. An import is rarely perfect from the start but you can see and test the result right away in the browser and refine it.

I imagine that when done, you can then transfer the site to a web hoster who from then on can host your website for everyone.

You can try this yourself here, you “just” need an OpenAI API key that will be stored in your local storage: https://akirk.github.io/website-liberator/

Source (very unpolished) at https://github.com/akirk/website-liberator/

Some notes on this first implementation:

  • Every message is a new conversation. Modifying a website can be token intense, so for now it cannot refer to previous messages.
  • It’s using function calling to allow OpenAI to gather more information to fulfil the request.
  • Chosing the right functions to provide can be tricky.
  • It uses different models depending on the task. gpt4-vision-preview for the screenshot, gpt-3.5-turbo for the rest. I need to experiment more with gpt-4 for the latter tasks.

Finally, a screen recording of an early iteration, I have since moved the chat to the right side.

As I’ve been working on the prototype, it has shown to be interesting to have the bot be there just for customizing sites, it can create and modify pages, update settings of the website. Maybe install plugins.

So starting with a basis from screenshots and imported data, it might just be able to assist you to arrive at a comparable WordPress website, and all with the ease and effortless setup of WordPress Playground. I wonder where we can take this!

Bonus

Some screenshots from a recent version:

New Unit Tests Need to Fail (Running the Old Code)

When possible, I very much recommend adding unit tests to a Pull Request when you fix something as a way to prevent that it breaks again in future. This is no news.

One important step of adding that unit test, though, is to make sure it actually tests the bug you are fixing. Specifically, this means that you need to test that fact like this:

Keep the new unit test, undo the other code changes. The unit test now needs to fail.

If your newly introduced unit test still passes, then you didn’t expose the bug in the test. A simple check but often neglected.

A tooling solution

Here is how you can automatically test this in Github. I have created a sample repo with 3 pull requests https://github.com/akirk/unit-test-failure/pulls:

3 Pull requests of which one fails the unit tests

You can see the first one failed:

A screenshot of a Github Action that has a unit test passing when it shouldn't

The problem is that the old code already passes the new tests. This means the tests don’t test the change.

The second one passes but only because no new tests were added at all.

A screenshot of a Github Action that skipps testing without the unit test changes because no new unit test was added

And the third one does it correctly. The old code fails the new tests as desired:

A screenshot of a Github Action that has correctly fails the unit test when running the old code

Here is the code for the Github Action to test this:

name: Pull Request Unit Test Validation

on:
  pull_request:
    types:
      - opened
      - synchronize

jobs:
  test-changes:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: Run unit tests with changes
        run: python -m unittest discover -s tests

  test-without-changes:
    needs: test-changes
    runs-on: ubuntu-latest
    if: ${{ github.event_name == 'pull_request' }}

    steps:
      - name: PR commits + 1
        run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"

      - name: Checkout PR branch and all PR commits
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          fetch-depth: ${{ env.PR_FETCH_DEPTH }}

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: 'Fetch the other branch with enough history for a common merge-base commit'
        run: git fetch origin ${{ github.event.pull_request.base.ref }}

      - name: Check if tests were changed
        run: |
          git restore --source=$(git merge-base origin/${{ github.event.pull_request.base.ref }} HEAD) --worktree tests/
          if git diff --quiet; then
            echo "TESTS_CHANGED=0" >> "${GITHUB_ENV}"
          else
            echo "TESTS_CHANGED=1" >> "${GITHUB_ENV}"
          fi
          git restore .

      - name: Revert code changes (excluding tests)
        run: git restore --source=$(git merge-base origin/${{ github.event.pull_request.base.ref }} HEAD) --worktree :^tests/
        if: ${{ env.TESTS_CHANGED }} == '1'

      - name: Run unit tests without changes
        run: |
          if [[ ${{ env.TESTS_CHANGED }} -eq 0 ]]; then
            echo "No unit test changes detected, skipping"
            exit 0
          fi

          if python -m unittest discover -s tests; then
            echo "Unit test should not pass"
            exit 1
          else
            echo "Unit test failed as expected"
          fi

Thanks @jesusamieiro-com for pointing out that you need to be careful with this for code that adds new test coverage. Probably best to use a Github label to activate or deactivate this action.

Using Jetpack Blocks without a connection to WordPress.com

Recently I wanted to use a Jetpack block for the WordPress Gutenberg editor, specifically the Slideshow Block, which I think is very slim and elegant compared to other Slideshow blocks on the WordPress.org plugin directory.

The one thing I didn’t want to do, though, is to connect my site to WordPress.com. For the most part because I didn’t intend to use any of the functionality that requires the WordPress.com connection.

The difficulty in this: normally, you can’t actually use Jetpack until you connect the site to your WordPress.com account. This behavior is in place so you can enjoy a friction-less experience after the connection is established (i.e. you can activate any of its features whether or not it needs the connection).

In my case, the connection is not needed for using the Slideshow block since it works with the local media library.

So, I present not a hack but an official way to use Jetpack without a WordPress.com connection: it’s called Offline Mode.

You activate it by adding a constant or filter to your wp-config.php, for example:

define( 'JETPACK_DEV_DEBUG', true );

While it is intended for local installs and actually development, it is useful if you just want to use a single feature of Jetpack that doesn’t require the connection.

Many people say that Jetpack is slow because it’s so big. It is indeed huge from a file size and feature perspective, but also it’s very optimized, so that it will only load the PHP or JS code it needs. The other files lay dormant in your file system and don’t contribute to the loading or rendering time of your site.

SSL Certificate Expiry Warning Script

With the increasing trend of SSL on the web, where Google values SSL sites higher and you can have your site be added to the HSTS preload list (the browser will first try HTTPS before trying HTTP), it is a good idea to start using SSL yourself.

The downside: you need to get a certificate through a (pre-trusted by the browser) CA, or certificate authority. This usually costs money, though there are some services that give you a certificate for free. The free certificates only last for one year or less, this means you need to request and install a new certificate frequently, especially when you have multiple domains.

Now it can happen to anyone, even Microsoft (Windows Azure Service Disruption from Expired Certificate), that you forget to renew (and update) your certificate in time.

There is a nice service called certalert.me (interestingly enough not over HTTPS) that will send you an e-mail when a certificate is due to be updated. But as with any web service, unfortunately you can never be sure how long it’s going to live.

So, I have created a script that I run through a cronjob every day that will send me a notification e-mail several times in advance (1 day and 2 7 14 30 60 days ahead), so that you are not dependent on a third party to get notified about expiries. As it is supposed to be with cronjobs, there is no output when there is nothing to report (thus no e-mail).

Here is the script (download warn_about_certificate_expiry.sh):


#!/bin/sh 

CertExpiries=$(mktemp)
for i in /etc/certificates/*.pem; do
	echo $(basename $i): $(openssl x509 -in $i -inform PEM -text -noout -enddate | grep "Not After" | tail -1 | awk '{print $4, $5, $7}') >> $CertExpiries
done

Date=$(date -ud "+1 day" | awk '{print $2, $3, $6}')
Expiries=$(grep "$Date" $CertExpiries)
if [ $? -eq 0 ]; then 
	echo These Certificates expire TOMORROW!
	echo $Expiries
	echo
fi
for i in 2 7 14 30 60; do
	Date=$(date -ud "+$i day" | awk '{print $2, $3, $6}')
	Expiries=$(grep "$Date" $CertExpiries)
	if [ $? -eq 0 ]; then 
		echo These Certificates expire in $i days:
		echo $Expiries
		echo
	fi
done
rm $CertExpiries;

Add a Rate Limit to Your Website

Suppose you have a ressource on the web (for example an API) that either generates a lot of load, or that is prone to be abused by excessive use, you want to rate-limit it. That is, only a certain number of requests is allowed per time-period.

A possible way to do this is to use Memcache to record the number of requests received per a certain time period.

Task: Only allow 1000 requests per 5 minutes

First attempt:
The naive approach would be to have a key rate-limit-1.2.3.4 (where 1.2.3.4 would be the client’s IP address) with a expiration time of 5 minutes (aka 300 seconds) and increment it with every request. But consider this:

10:00: 250 reqs -> value 250
10:02: 500 reqs -> value 750
10:04: 250 reqs -> value 1000
10:06: 100 reqs -> value 1250 -> fails! (though there were only 850 requests in the last 5 minutes)

Whats the problem?

Memcache renews the expiration time with every set.

Second attempt:
Have a new key every 5 minutes: rate-limit-1.2.3.4-${minutes modulo 5}. This circumvents the problem that the key expiration but creates another one:

10:00: 250 reqs -> value 250
10:02: 500 reqs -> value 750
10:04: 250 reqs -> value 1000
10:06: 300 reqs -> value 300 -> doesn’t fail! (though there were 1050 requests in the last 5 minutes)

Solution:
Store the value for each minute separately: rate-limit-1.2.3.4-$hour$minute. When checking, query all the keys in the last 5 minutes to calculate the requests in the last 5 minutes.

Sample code:


foreach ($this->getKeys($minutes) as $key) {
    $requests += $this->memcache->get($key);
}

$this->memcache->increment($key, 1);

if ($requests > $allowedRequests) throw new RateExceededException;

For your convenience I have open sourced my code at github: php-ratelimiter.

munin smart plugin: ignore error in the past

As a hard drive in my server failed, my hosting provider exchanged the drive with another one which obviously had some sort of error in its past, but now seems to be fully ok again. I would have wished to receive a drive without any problems but as my server is RAID 1, I can live with that.

I do my monitoring with Munin and for monitoring my hard drives I use the smart plugin. Now this plugin also monitors the exit code of smartctl, where smartctl sets bit no 6 if there was an error in the past, so now while everything is alright, the exit code is always numeric 64.

Now the smart plugin reports this as an error, if the exit code is > 0, i.e. now it always reports a problem.

I could set the threshold to 65, but then I wouldn’t be notified of other errors which essentially makes the plugin useless.

I asked at Serverfault but no one seems to have a solution for that.

So I attacked the problem on my own and patched the plugin. In the source code the important line is here:


if exit_status!=None :
# smartctl exit code is a bitmask, check man page.
num_exit_status=int(exit_status/256)

which I have modified to look like this:

if exit_status!=None :
# smartctl exit code is a bitmask, check man page.
num_exit_status=int(exit_status/256)
# filter out bit 6
num_exit_status &= 191
if num_exit_status<=2 : exit_status=None if exit_status!=None :

Now it doesn't bug me anymore when bit 6 is set, but if any other bit goes on again, I will still be notified. The most interesting part is the line where there is a bitwise operation with 191: this is 0x11011111 in binary, so doing an AND operation with the current value it will just set bit no 6 to 0 while letting the other values untouched.

Therefore a value of 64 (as mine does) will be reported as 0 while a value of 8 would remain at 8. But also, very importantly, a value of 72 (bit 6 set as always and bit 3 set because the disk is failing) it would also report 8.

And there we have another reason, why it is good to be firm with knowledge about how bits and bytes behave in a computer. Saved me from a warning message every 5 minutes :-)

preg_match, UTF-8 and whitespace

Just a quick note, be careful when using the whitespace character \s in preg_match when operating with UTF-8 strings.

Suppose you have a string containing a dagger symbol. When you try to strip all whitespace from the string like this, you will end up with an invalid UTF-8 character:

$ php -r 'echo preg_replace("#\s#", "", "?");' | xxd
0000000: e280

(On a side note: xxd displays all bytes in hexadecimal representation. The resulting string here consists of two bytes e2 and 80)

\s stripped away the a0 byte. I was unaware that this character was included in the whitespace list, but actually it represents the non-breaking space.

So actually use the u (PCRE8) modifier as it will be aware of the a0 “belonging” to the dagger:

$ php -r 'echo preg_replace("#\s#u", "", "?");' | xxd
0000000: e280 a0

By the way, trim() doesn’t strip non-breaking spaces and can therefore safely be used for UTF-8 strings. (If you still want to trim non-breaking spaces with trim, read this comment on PHP.net)

Finally here you can see the ASCII characters matched by \s when using the u modifier.

$ php -r '$i = 0; while (++$i < 256) echo preg_replace("#[^\s]#", "", chr($i));' | xxd 0000000: 090a 0c0d 2085 a0 $ php -r '$i = 0; while (++$i < 256) echo preg_replace("#[^\s]#u", "", chr($i));' | xxd 0000000: 090a 0c0d 20

Functions operating just on the ASCII characters (with a byte code below 128) are generally safe, as the multi-byte characters of UTF-8 have a leading bit of one (and are therefore above 128).

Restoring single objects in mongodb

Today I had the need to restore single objects from a mongodb installation. mongodb offers two tools for this mongodump and mongorestore, both of which seem to be designed to only dump and restore whole collections.

So I’ll demonstrate the workflow just to restore a bunch of objects. Maybe it’s a clumsy way, but we’ll improve this over time.

So, we have an existing backup, done with mongodump (for example through a daily, over-night backup). This consists of several .bson files, one for each collection.

  1. Restore a whole collection to a new database: mongorestore -d newdb collection.bson
  2. Open this database: mongo newdb
  3. Find the items you want to restore through a query, for example: db.collection.find({"_id": {"$gte": ObjectId("4da4231c747359d16c370000")}});
  4. Back on the command line again, just dump these lines to a new bson file: mongodump -d newdb -c collection -q '{"_id": {"$gte": ObjectId("4da4231c747359d16c370000")}}'
  5. Now you can finally import just those objects into your existing collection: mongorestore -d realdb collection.bson

trac Report for Feature Voting

I use trac for quite a few projects of mine. Recently I tried to find a plugin for deciding which features to implement next. Usually trac hacks has something in store for that, but not this time.

I wanted to be able to create a ticket and then collect user feedback as comments for the feature, with each piece of feedback being a vote for that feature, like this:

After searching for a bit I came up with a solution by using just a report with a nicely constructed SQL query.

SELECT p.value AS __color__,
   t.type AS `type`, id AS ticket, count(tc.ticket) as votes, summary, component, version, milestone,
   t.time AS created,
   changetime AS _changetime, description AS _description,
   reporter AS _reporter
  FROM ticket t, ticket_change tc, enum p
  WHERE t.status <> 'closed'
AND tc.ticket = t.id and tc.field = 'comment' and tc.newvalue like '%#vote%'
AND p.name = t.priority AND p.type = 'priority'
GROUP BY id, summary, component, version, milestone, t.type, owner, t.time,
  changetime, description, reporter, p.value, status
HAVING count(tc.ticket) >= 1
 ORDER BY votes DESC, milestone, t.type, t.time

So just by including “#vote” in a comment, it would count towards the number of votes. You can change this text to anything you want, of course. For example like this:

I hope this can be useful for someone else, too.