You're viewing a single post. I have much more to say! The main blog page is a good starting point.
Wednesday, August 10, 2005 ★ 17:29 ★ Category Programming
If you’re a web developer using XHTML, you have to make sure your pages are valid XHTML (that implies well-formed XML). This is easy to do using the command-line utility xmllint:
$ xmllint --valid --noout somefile.html
This works fine for all static content. However, if you use server-side scripts to create dynamic pages, you won’t have any real files for xmllint to check. Once again, xmllint to the rescue:
$ xmllint --valid --noout http://example.com/dynamic/page
This one works fine if you want to check just one page. Since xmllint is just an XML validator and not a web crawler, it doesn’t interpret HTML and is not able to follow links and recursively retrieve pages.
That’s where wget comes in. This nifty command-line download utility has a -r flag to recursively grab a site off the net. Be careful what you’re doing, because wget can quickly generate a lot of network traffic and can dramatically increase your server load. Oh, I don’t think I need to tell you that you shouldn’t try this on other people’s websites.
To automate the task of recursively retrieving pages from a server and having xmlllint check those pages for validaty, I wrote a small script. This script will grab all pages from a website and runs xmllint on the retrieved data. All errors will be displayed on your screen.
The script, named check-valid-xhtml, should be called with the top URL you want to check as its first and only parameter. So, if you specify http://example.com/subdir/ it will only check for url’s in and below that subdirectory.
#!/bin/sh DUMPDIR=$(mktemp -d) echo "Using $DUMPDIR to store files" cd $DUMPDIR echo "Downloading files..." wget -r -np -nH -nv -E --header="Accept: text/xml,application/xml,application/xhtml+xml,text/html" $1 echo "All files downloaded" echo "Checking files..." find . -type f -name '*.html' -print0 |xargs -0 -r -n1 xmllint --valid --noout echo "All files checked." read -n1 -p "Clean up? (y/n) " CLEAN_UP echo if [ $CLEAN_UP == "y" ] ; then echo -n "Cleaning up $DUMPDIR... " rm -r $DUMPDIR echo "done" else echo "$DUMPDIR not cleaned." fi
Happy fixing! I didn’t need to, because all sites I tried this script on produced perfectly valid XHTML. Yay for me!
Random photo from Various pictures (June, 2005)
Wouter Bolsterlee, also known as uws, a postmodern geek living in the Netherlands. Read more about me…
Unless stated otherwise, all material on this site is available under a Creative Commons Share-Alike license.