This class can be used to remove unwanted tags and data from HTML document.
It takes a string with the HTML document to clean and parses it assuming a given character set encoding.
The class can perform several types of clean-up operations like:
- Removing style definitions
- Remove tags or attributes based on white lists or blacklists
- optimize code (merge inline tags, strip empty inline tags, trim excess new lines)
- Use the HTML tidy extension to clean the document and format the output as XHTML and drop proprietary attributes from Microsoft Word HTML documents
- Drop empty paragraphs
- Remove needless white space
- Fill empty table cells
| Ratings | Utility |
Consistency |
Documentation |
Examples |
Tests |
Videos |
Overall |
Rank |
| All time: |
Good (95.0%) |
Sufficient (75.0%) |
Good (85.0%) |
Good (90.0%) |
- |
- |
Sufficient (72.5%) |
96 |
| Month: |
Not yet rated by the users |
No application links were specified for this class.

If you know an application of this package, send a message to the
author to add a link here.