Welcome to Geoffrey Swift's βlog. Please subscribe to the Atom feed.


ITV Player adverts skipped

Watching the adverts on the ITV Player was taking up enough time, that it seemed worth blocking them in the web proxy.

Considered various ways of doing this, and found it's quite easy block any URLs which start with http://sam.itv.com/XTSERVER/

ITV Player makes requests for VAST XML file there. The XML files describe each advert, and are created by the "Atlas AdManager" server software.

Since blocking the files which contain the URL for the advert videos, I don't see any adverts when watching online with this station either.

Demand Five Adverts skipped

In an old post I detailed how to avoid advert videos on the 4oD service. I have again I made use of the redirect_program feature of the Squid web proxy, but this time the adverts from Channel Five's Demand Five service are being removed.

I set about analysing how Demand Five works, by using the Fiddler web debugger from Microsoft. It became apparent that similar to the Channel Four service, there was a play list of (Flash) video files in an XML format.

So I attempted using the exact same technique as for 4oD. I created a "man in the middle" to filter out the adverts from the Demand Five play list, using my own PHP and XSLT code. Unfortunately the doctored play list I generated was rejected by the Flash player as invalid, presumably because required elements (adverts) were lacking. It's not clear whether this an intentional feature in Demand Five or not! Either way a different technique was needed.

I have avoided seeing banner adverts for some time, by having Squid serve upa 1 pixel by 1 pixel transparent GIF instead. Since modifying the XML play list on the fly didn't work out, I decided to get Squid to serve up a video of 0 seconds duration in place of an advertisement.

I searched the web and found a suitable dummy Flash video of 0 seconds length to use. So I just needed to let Squid know I wanted the dummy video instead of the advert videos. To do this I added the following line was added to my existing Perl script squid-redirector.pl:

s@http://[^ ]*\.akamai\.net/[^ ]*five\.tv[^ ]*\.flv@http://www.trollied.org/~blimey/fake/dummy.flv@;

Now I can enjoy Neighbours without having to wait for the adverts to finish. This is particularly useful when the playback gets interrupted and you have to start from the beginning.

Website pet peeves part 4

It's been a while since the last rant, here are a few serious concerns for web developers to consider!

Resending POST data

The "back" button is very useful feature to have in web browsers. Unfortunately you can be prevented from going back to a page that is generated in response to filling in a form.

When you try to do this, the browser explains that the page can't be displayed without resubmitting the form data. This is since there are special rules which generally prevent an HTTP POST response from being cached.

So you can either resubmit the form, but you might not get the same results the next time around. There is no option of just seeing the original version of the page. The browser could ask you whether you wanted to see the old version, and could perhaps warn you that showing you the old version would be in defiance of web standards. This has to be an improvement over what is essentially a flat refusal to behave intuitively.

Web programmers can work around this problem, by making a web server respond to a web browser's posted form data appropriately. The solution is to instruct the web browser to be redirected to another web page, which can retrieved conventionally. This page you're redirected to appears in the browser's history instead, so your back button can work as normal.

For further technical details, I recommend reading the HTTP RFCs and the Wikipedia article on the "Post Redirect Get" pattern.

Click to close window

Many web pages, particularly those in popup windows have a hypertext link entitled "Click to close window". I feel this is redundant, since web browsers windows can be closed like any other. Given that web browsers open new "windows" as tabs, the wording itself isn't quite right either.

The only case for having such a link is for popup windows created in such a way to specifically hide the standard close window option. Even so this appears to be a case of trying invent a rounder wheel.

Print friendly version

When viewing websites like Google maps, it can be useful to make a hardcopy so you can review the information later. Unfortunately what you see on screen doesn't necessarily work so well on paper. This is what prompts web developers to create a "printer friendly" version.

While this seems like a great idea, I feel this is really just a sorry excuse that they couldn't figure out how to make a web page that appears correctly both on screen and on paper.

There exists a useful feature in CSS2, which lets you define formatting based on the "media" being used. You can use this to make your web page to look totally different when printed.

For an example of this in action, try using the "print preview" option in your browser when looking at my website. You should see immediately that the navigation menu disappears and background colours are turned off.

A useful article on this can be found on about.com

Bypass JavaScript with Squid

I mentioned previously how I used the redirect_program feature in the Squid web proxy to filter out adverts from Channel 4's 4oD service. I did that by intercepting and modifying ASX files. In this article I explain how similarly one can deactivate unwanted JavaScript code.

One motivation for intercepting JavaScript is to avoid advertising, I'm also bypassing Google Analytics. For an entirely self contained JavaScript, your own blank script can be substituted. This works well for example with Google's advertising script show_ads.js.

To use Google Analytics, web site authors must write their own code to invoke the functions Google supply in their JavaScript files. So if Google's code is simply replaced by a blank file, JavaScript errors will result as required functions have not been defined. I've therefore written my own versions of their scripts which mimic the necessary code entry points, but don't actually do anything.

A replacement for the original Google file urchin.js was trivial:

function urchinTracker(){}

but the recently updated Google Analytics script ga.js was a bit more complicated:

var _gat = { _getTracker:function(s) { return { _initData:function(){}, _trackPageview:function(){} } } }

The scripts mentioned above are very popular on many websites. My replacement versions are obviously going to help speed up load times on several web pages, as the web browser has less work to do. Another bonus is that I avoid being included in my own Analytics data, and I get a better idea of who else is looking at my site!

XHTML DTD in XML catalog

I had read an article "W3C Gets Excessive DTD Traffic" on Slashdot last month. This struck a chord since I use XSLT to generate XHTML, and had noticed that the XHTML DTD is downloaded each time it is referenced. This is wasteful of both bandwidth and slows down the XSLT considerably.

To remedy this problem, I downloaded these files to my Linux box and added some XML catalogue entries. This ensures local copies of these files ares used instead of repeatedly downloading them from w3.org.

Here's the shell commands I used on my Slackware 12 box to set things up:

mkdir /usr/share/xml/xhtml1 cd /usr/share/xml/xhtml1 wget http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd \ http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd \ http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd \ http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent \ http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent \ http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent xmlcatalog --noout --add rewriteURI http://www.w3.org/TR/xhtml1/DTD/ \ file:///usr/share/xml/xhtml1/ \ /etc/xml/catalog

To confirm this was working as expected I ran xsltproc under strace. I did however discover what I'd done had no effect on libxslt in PHP. I looked through the source code for PHP, and found no code to initialise libxml to use XML catalogues. This would be a nice feature, hopefully it will be included in a future release of PHP.

Atom to RSS 2.0 feed conversion

I am pleased to say that there is now a valid RSS 2.0 feed available for my blog. Previously there was just the Atom 1.0 feed, since this is format I am using for storage.

I couldn't find any available Atom to RSS conversion utilities, except for atom2rss.xsl from http://atom.geekhood.net/. I had tried using this a while ago, but found it would not produce valid RSS and even corrupted parts of my text through inappropriate handling of escape sequences.

Feeling a bit more confident in my XSLT, I set about addressing these issues. The feed validation service from w3.org now tells me that "this is a valid RSS feed", and so consider this to be good enough for my purposes.

It is slightly disappointing that RSS limits you to a single enclosure (podcast) for each item in a channel, whereas Atom allows for multiple enclosures. Because of this it is not recommended to use the RSS feed since you will actually lose out on some of the mp3 content.

My Atom feed has brought to light some cases that cause a problem with the XSLT code, and there are probably other valid Atom feeds that would not be translated to a valid RSS feed. Nonetheless I feel this is an improvement and am releasing this updated version into the public domain.

I've emailed my fixes over to porneL who originally wrote atom2rss.xsl, and he's kindly updated his site to include my changes.

Radio Tuner in IE7 - XSLT meets conditional comments

Getting Internet Explorer 7 to work with the HTML version of my internet radio tuner was no walk in the park. Read further to learn how to avoid the same pitfalls.

The first problem was that using <object type="application/x-mplayer2" data= … > yields the error "Internet Explorer has blocked this site from using an ActiveX control in an unsafe manner. As a result, this page may not display correctly." Puzzlingly I found that there is no such warning when using <embed type="application/x-mplayer2" src= … > instead.

Using <embed> rather than <object> just for the sake of Internet Explorer didn't make me feel entirely happy, as this proprietary tag is not valid HTML. To solve this problem and retain the validity of my HTML, I chose to use the even more proprietary trick specific to Internet Explorer - conditional comments. This means that the IE specific <embed> is effectively commented out, and the <object> tag is ignored by IE. See some example HTML below:

<!--[if IE]> <embed type="application/x-mplayer2" src="" width="290" height="64"></embed> <![endif]--> <!--[if !IE]><!--> <object type="application/x-mplayer2" data="" width="290" height="64"></object> <!--<![endif]-->

This was slightly tricky to get right in the XSLT. I could have tried using xsl:comment and CDATA sections, but xsl:text and disable-output-escaping seemed like less trouble:

<xsl:text disable-output-escaping="yes">&lt;!--[if IE]&gt;</xsl:text> <embed type="application/x-mplayer2" src="{@href}" width="290" height="64"/> <xsl:text disable-output-escaping="yes">&lt;![endif]--&gt;</xsl:text> <xsl:text disable-output-escaping="yes">&lt;!--[if !IE]&gt;&lt;!--&gt;</xsl:text> <object type="application/x-mplayer2" data="{@href}" width="290" height="64"/> <xsl:text disable-output-escaping="yes">&lt;!--&lt;![endif]--&gt;</xsl:text>

Problem number two relates to the "fix" Microsoft have made to avoid infringement of Eolas Technologies' patent "distributed hypermedia method for automatically invoking external application providing interaction and display of embedded objects within a hypermedia document." This means that Internet Explorer displays a message saying "Click to activate and use the control". Surprisingly the music starts playing automatically anyway, but this is still a minor annoyance. The problem was solved by calling this JavaScript function via document.onload():

function eolasWorkaround() { var i; if ('Microsoft Internet Explorer' != navigator.appName) { return; } if (typeof document.getElementsByTagName == 'undefined') { return; } var embeds = document.getElementsByTagName("embed"); for (i = 0; i < embeds.length; i++) { embeds[i].outerHTML = embeds[i].outerHTML; } }

This complete solution seems to work quite well on all the Windows based browsers I have available now, even Internet Explorer! I have no facility to develop and test this on other platforms, which may not know which plugin to use for the MIME type "application/x-mplayer2". I intend to address this in due course.

In any case the MIME type should strictly speaking be "audio/mpegurl", but this is not supported by Windows. But in case of any difficulty, you can always just click the hypertext link to download the radio.m3u playlist and play it in an external application.

Eolas is a registered trademark of Eolas Technologies Inc. Microsoft, Windows, ActiveX and Internet Explorer are registered trademarks of Microsoft Corporation in the United States and other countries.

Goodbye 4oD adverts

DISCLAIMER: THIS HACK DOES NOT WORK WITH 40D ANY LONGER

I quite like using the 4oD service from Channel 4, I can watch their programs at a time that suits me without the regular ad breaks on live TV. The adverts appear just before the program starts instead, and although you can skip to the right part of a program once it's playing, you can't fast forward the adverts.

The program is delivered to the 4oD player as a play list for Windows Media Player in the ASX file format. The first few items in the play list are the URLs for adverts, and the last one is the actual TV program. To avoid having to sit through these adverts, I saw there was the potential to to intercept this play list and remove the entries pertaining to advertising.

To achieve this, I made use of the redirect_program feature of the Squid web proxy. In this case, it allows me to tell Squid to fetch the play list from my website rather than directly from Channel 4. I wrote the following Perl script based on the example in the Squid documentation:

#!/usr/bin/perl use URI::Escape; $|=1; while (<>) { s@(http://vodapp\.grid\.channel4\.com/c4site-web/playlist\.do\?[^ ]*)@"http://www.trollied.org/~blimey/4oD.php?url=" . uri_escape($1)@e; print; }

So when the 4oD client requests a play list from the vodapp.grid.channel4.com server, it instead requests the playlist from my website. My website then downloads the desired play list, and does the required filtering using PHP and XSLT.

This PHP, very simply downloads the play list into a DOM document and applies the the required XSLT to it.

<?php $basename = basename($_SERVER['SCRIPT_NAME'], '.php'); $xslfile = $basename . '.xsl'; $xmlfile = $_GET['url']; $xml = new DOMDocument; $xml->load($xmlfile); $xsl = new DOMDocument; $xsl->load($xslfile); $xslProc = new XSLTProcessor(); $xslProc->importStylesheet($xsl); header('Content-Type: video/x-ms-asf'); $doc = $xslProc->transformToDoc($xml); echo $doc->saveXML($doc->firstChild); ?>

Here's the XSL, that simply copies everything in the document except for any ENTRY that has a TITLE starting with the word advert.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" method="xml" media-type="video/x-ms-asf"/> <xsl:template match="/ASX/ENTRY"> <xsl:if test="not(starts-with(TITLE, 'Advert'))"> <xsl:copy-of select="."/> </xsl:if> </xsl:template> <xsl:template match="/ASX/*[name() != 'ENTRY']"> <xsl:copy-of select="."/> </xsl:template> <xsl:template match="/"> <ASX VERSION="3.0"> <xsl:apply-templates/> </ASX> </xsl:template> </xsl:stylesheet>
Website pet peeves part 3

Text size changing underneath mouse pointer

It might seem like a nice idea to have text become bold, or even larger when you position the mouse over it. But this can cause problems when the layout of the page is affected.

Suppose I position my mouse over such text, and it gets bigger. This can cause the text to be moved away from the mouse pointer, which means that the text reverts to its original size. Once the text has shrunk back to its original location under the mouse, it then grows and the cycle repeats until the mouse is repositioned.

My suggestion is to only do this if you can be sure you will not be affecting the layout of the web page. For example make sure the text is bounded within a box that is large enough for the text, regardless of whether it is enlarged by the position of the mouse.

Redirecting to an error page

Occasionally websites have problems, and so it is appropriate to display an error to the user. Rather than having the desired web page display an error, some websites redirect to a dedicated web page that explains an error has occurred.

Errors quite typically are due to a temporary glitch, and so you might expect to be able to use the refresh button in your browser to reload the web page. If you've been redirected to a separate page this isn't possible, even clicking back will redirect you to the error page again.

Instead it seems necessary to click back in rapid succession, before the redirection kicks in. That way you can once again attempt to repeat the steps required to load up the problematic page. This may involve filling in a form once again, but either way this is not user friendly! It would be preferable to display error messages on the web page the error occurred, so that it's possible to simply click refresh.

Website pet peeves part 2

What country am I in today?

Lots of websites have forms that let select your country. Unfortunately these websites generally have a large number of different countries to choose from. I find this means a regular chore of guessing which country to look for, the right one could be any of the following: Britain, England, Great Britain, United Kingdom or UK. The only option I haven't come across is the actual name of the country where I reside, the United Kingdom of Great Britain and Northern Ireland.

So to anyone writing a web pages with forms allowing a choice of countries. It would be nice to see a default country value determined using a country IP database. That would make things a bit easier.

"Back to" … the future

A lot of sites have links with text such as "Back to homepage", or "Return to to search results". But you may have accessed the page via a link on an external site or even a bookmark. So the wording of such links presumes that a visitor to the site came via the web page that's linked "back" to. My suggestion is to leave out the word "back", as it is fairly redundant and potentially inappropriate.

Website pet peeves
From my referer logs I've noticed people looking up problems with XHTML, JavaScript and CSS. Hopefully some web designer will read my pet peeves below, and mend their ways!

Bad support for font resizing

I use a relatively high resolution 1600*1200, and so most websites look small to me. When I increase the font size in my browser (Firefox) to compensate, the layout of most sites become a complete mess. Typically words are crammed into columns that just look too narrow, or text becomes illegible because it no longer fits inside its containing box. This problem typically occurs because dimensions have been stipulated using pixels in the CSS code, but this will go wrong for other fixed unit of measurement. So when you're coding up your layout using the "box model", try using only em, ex or percentages instead of pixels and the layout should resize in proportion with the text.

Login forms setting focus away from password

Many websites have login pages, which set the focus to the user name field when the page finishes loading. This might sound like a great idea, but not while I'm in the middle of typing in my password! I don't really want half my password in the password box, and the other half plainly visible in the user name field. This is another quite common problem. A simple fix might be to only set focus on the user name field, when it is empty.

Click to enter site

There still seems to be no shortage of sites, with a main page that just says "click to enter site." The fact that such a page actually is the website, seems to be lost on the web designer. Unless there's a good reason for having such an entrance page (like a disclaimer) just let people surf straight in to meaningful content without any such inconvenience.