I had read an article "W3C Gets Excessive DTD Traffic" on Slashdot last month. This struck a chord since I use XSLT to generate XHTML, and had noticed that the XHTML DTD is downloaded each time it is referenced. This is wasteful of both bandwidth and slows down the XSLT considerably.
To remedy this problem, I downloaded these files to my Linux box and added some XML catalogue entries. This ensures local copies of these files ares used instead of repeatedly downloading them from w3.org.
Here's the shell commands I used on my Slackware 12 box to set things up:
mkdir /usr/share/xml/xhtml1
cd /usr/share/xml/xhtml1
wget http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd \
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd \
http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd \
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent \
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent \
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
xmlcatalog --noout --add rewriteURI http://www.w3.org/TR/xhtml1/DTD/ \
file:///usr/share/xml/xhtml1/ \
/etc/xml/catalog
To confirm this was working as expected I ran xsltproc under strace. I did however discover what I'd done had no effect on libxslt in PHP. I looked through the source code for PHP, and found no code to initialise libxml to use XML catalogues. This would be a nice feature, hopefully it will be included in a future release of PHP.