This script could have been quite useful for a project I'm working on, but my experiments with it and from what I read in the source code, it's nothing more than a tag soup parser for XML.
It seems to just ignore well-formedness errors and continue on it's way. For example, it ignores unencoded ampersands and less than signs. It has some weird behaviour when it encounters anything beginning with <!, even if it isn't a comment, DOCTYPE, CDATA section, etc.
It also seems to completely ignore issues of character encodings. I couldn't find anything in it that indicated proper support for UTF-8, or anything other than ISO-8859-1.
I'm sure I could keep listing problems, but I gave up even looking at that point.