Mooquackwooftweetmeow

Concatenating zoölogical onomatopœia since 1999

About Mooquackwooftweetmeow

Hi! This is Mooquackwooftweetmeow, a collection of stuff by Greg K Nicholson.

The Twaddlebot has been unleashed

Last night version 1.0 of The Twaddle went live. It uses arbitrary XML and XSLT to generate valid XHTML pages... offline.

The idea of uploading bare-bones articles and an XSLT template, allowing the browser to generate pages as they're required, was a no-go. But I managed to rig up the transformation offline, to be run as a batch.

Following the tradition of giving XML languages names that are barely-logical acronyms beginning with X, I call the language XTw, which stands for XML... Twaddle... something.

Here's how I worked the magic (borrowing liberally from a newsgroup posting I made on the subject):

This assumes: no programming experience, but enough computer savvy to create XML and XSL files to need transforming in the first place; and a Windows (XP) machine)

First off, you'll need Xalan, available from http://xml.apache.org/xalan-j/ (and the requisite Java runtime, which you probably already have)

The actual file I downloaded was http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.tar.gz

There's also http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.zip if you prefer a zip.

The version I got was 2.6.0 (the Java version).

Unzip Xalan into a folder. I used C:\Program Files\xalan-j_2_6_0

Now the code from http://evc-cit.info/cit041x/batchfiles.html#transform:

echo off
java -cp h:\java\xmljar\xalan-j_2_5_1\bin\xml-apis.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xercesImpl.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xalan.jar;. org.apache.xalan.xslt.Process -IN %1 -XSL %2 -OUT %3 %4 %5 %6 %7 %8 %9

The only line break should be after echo off.

Copy this into a plain text editor (e.g. Notepad), and save it as filename.bat (I used ANSI encoding, if it matters)

You should now have an MS-DOS Batch File.

(Apparently some versions of Notepad append .txt to filenames, even if they contain a file extension. In these cases, quoting the filename - e.g. “filename.bat” - allegedly solves the problem)

You'll most likely have to modify the code to point to the actual locations of your Xalan installation and files.

I only plan on using one XSL stylesheet with multiple files; the input files will be filename.xml. The output files will be filename.htm and will be kept in the folder above the one where the input and XSL files are kept. So, I modified the code a little:

java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xml -XSL "c:\path\to\an\xsl\file\xsl.xml" -OUT ..\%1.htm

This should all be on one line. %1 in the code will be replaced by the first argument passed to the batch file, %2 by the second argument, etc. ..\ means up one folder. The quotation marks around the filenames cause them to be treated as one item, despite their containing spaces.

You can add @echo off (without quotes) in an empty line above, if you prefer not to have masses of textual output in the command console. e.g.:

@echo off
java -cp "c:\...

echo off turns off the display of subsequent commands; @ hides the echo off command.

To perform the transformation, open a command console (Start > Run > "cmd") and navigate to the location of your XML, XSL and batch files, by typing

cd "c:\path\to\files"

(including the quotes)

For simplicity's sake, I've shoved everything in the same folder, and used absolute paths for the programs. You could probably also mess around with relative paths or the path environment variable, but I can't be bothered.

I ended up having to use HTML Tidy to contort the output into valid XHTML. My final batch file reads:

java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xtw -XSL "XTw2XHTML.xsl" -OUT ..\thetwaddle\%1.htm

"C:\Program Files\HTMLTidy\tidy.exe" -q -m -c --show-warnings no --output-xml yes --output-xhtml yes -latin1 --doctype strict --tidy-mark no --wrap 0 --ascii-chars no --drop-proprietary-attributes yes --fix-bad-comments no ..\thetwaddle\%1.htm

echo Done %1.

(Line breaks have been doubled for clarity.)

The input XML files are all labelled filename.xtw; the XSL stylesheet is XTw2XHTML.xsl, and the output files are cacked into the folder thetwaddle, a sibling of the folder where the batch file lives, and assigned a suffix of .htm.

Those options shown for Tidy are the result of trial and error, or rather, trial and testing and reading Tidy's Quick Reference - no warranty implied. The echo command prints out a message for each finished file.

This batch file is wrapped up in another one, which repeatedly calls the first, thus:

@echo off
echo Transforming XTw into XHTML...
call xtw2xhtml afile
call xtw2xhtml otherfiles
echo Done.

The text output is just to make the command console more interesting while the batch program is running. It also helps pinpoint any errors, such as typos, which show up as blobs of text in the command console.

The result of all this fiddling is that I can change pages' contents more easily; I've been able to, fairly easily, implement a few minor changes that would have taken effort before. The final product lives here.

In semi-related news, it turns out that PURLs such as purl.org/mooquackwooftweetmeow, without the trailing slash, are possible - it's just partial redirects that have to end with slashes. The Twaddle's now on PURLs, too - purl.org/thetwaddle - with or without the slash.

While uploading Unleash The Twaddlebot! (The Twaddle v1.0), I was reminded that we're approaching the 50-file limit; that's not including styles, which are kept in a separate account. This means we'll probably have to change hosts.

Fortunately, ntl provide 55 megabytes of space, so I'm planning to shift everything there. This shouldn't be too troublesome now that everything's on PURLs.

About this entry

I published this entry on 7 June 2004 at around tea-time. That means this is pretty old. Beware parachronisms.

Questions? Comments? Plaudits? Microblog at identi.ca/gregknicholson, or with the tag #mqwtm; or email me at weblog035@gkn.me.uk.