PHP Classes

HTML SQL: Parse and extract information from HTML using SQL

Recommend this page to a friend!

  Author Author  
Name: J. <contact>
Classes: 1 package by
Country: Germany Germany
Age: ???
All time rank: 51634 in Germany Germany
Week rank: 541 Down22 in Germany Germany Up
Innovation award
Innovation award
Nominee: 1x

  Detailed description   Download Download .zip .tar.gz  
This class can be used to parse and extract information from HTML documents using a query language similar to SQL to define the information to be extracted.

The class can open HTML documents stored as local files or as remote pages using the Snoopy class.

The class can execute a query with a syntax similar to SQL SELECT statements to search an find certain tags in the opened document with attributes that match query condition.

The occurrences that it find are returned as result set rows that may contain a given list of attributes of the matched tags.

htmlSQL - Version 0.5 - README
AUTHOR: Jonas John (

htmlSQL is a experimental PHP class which allows you to access HTML
values by an SQL like syntax. This means that you don't have to write
complex functions (regular expressions) to extract specific values.
The htmlSQL queries look like this:

SELECT href,title FROM a WHERE $class == "list"
       ^ Attributes    ^       ^ search query (can be empty)
         to return     ^ 
                       ^ HTML tag to search in 
                         "*" is possible = all tags
This query returns an array with all links that contain
the attribute class="list".

All web transfers in htmlSQL are using the awesome Snoopy class 
(package version 1.2.3 - URL:
But for file or string queries Snoopy is not required. You find all
Snoopy related documents (copyright, readme, etc) in the snoopy_data/ 

Just include the "snoopy.class.php" and the "htmlsql.class.php" files 
into your PHP scripts and look at the examples (examples/) to get an
idea of how to use the htmlSQL class. It should be very simple :-)

I had this idea while extracting some data from a website. As I realized
that the algorithms and functions to extract links and other tags are 
often the same - I had the idea to combine all functions to an universal
usable class. While drinking a coffee and thinking on that problem, I 
thought it would be cool to access HTML elements by using SQL. So I 
started creating this class... 

The eval() function is used for the WHERE statement. Make sure that all 
user data is checked and filtered against malicious PHP code. 
Never trust user input! 

- enhance the HTML parser
- test htmlSQL with invalid and bad HTML files
- replace the ugly eval() method for the WHERE statement
  with an own method
- more error checks
- include the LIMIT function/method like in SQL

htmlSQL uses a modified BSD license, you find the full license text 
in the "htmlsql.class.php". 

  Classes of J.  >  HTML SQL  >  Download Download .zip .tar.gz  >  Support forum Support forum (6)  >  Blog Blog  >  RSS 1.0 feed RSS 2.0 feed Latest changes  
Base name: htmlsql
Description: Parse and extract information from HTML using SQL
Version: -
PHP version: -
License: BSD License
All time users: 5507 users
All time rank: 426
Week users: 1 user
Week rank: 699 Down
  Groups   Screenshots Screenshots   Rate classes User ratings  
  Trackback   Applications   Related pages   Files Files  

Group folder image HTML HTML generation and processing View top rated classes
Group folder image Text processing Manipulating and validating text data View top rated classes

  Files folder image Screenshots  
File Role Description
Accessible without login Image file htmlsql_syntax_example.png Screen htmlSQL syntax example

  Innovation Award  
PHP Programming Innovation award nominee
May 2006
Number 2

Prize: One subscription to the PHP Magazine
Certain types of applications need to retrieve HTML pages and extract information from them to be processed for specific purposes.

Often, parsing HTML pages to extract only the relevant information is not an easy task. On the other hand, most Web developers are very familiar with SQL and can use it to define what information they want from their database tables.

This class provides a means to extract data from HTML pages using a query language very similar to SQL. It simplifies greatly the implementation of scripts that need to process data from HTML pages.

Manuel Lemos

  User ratings  
RatingsUtility Consistency Documentation Examples Tests Videos Overall Rank
All time: Good (97%) Good (93%) Good (88%) Good (89%) - - Sufficient (75%) 107
Month: Not yet rated by the users

  Pages that reference this package  
HTML SQL » Burak Kanmaz
3-4 yildir ben bir web sayfasinin icerigini analiz edecek bir web sayfasi yapmayi hayal etmisimdir hep...
Navegar un pagina en HTML con querys en SQL.
HTML_SQL es el nombre de una clase en PHP que nos permite navagar una pagina en HTML, con querys hechos en SQL. La clase ha ganado el segundo lugar de los premios de innovación en PHPClasses. Esto es sin duda, un avance interesante. Se le podran dar habilidades extras a los frameworks MVC. Algunos ejemplos en el enlace. Saludos

Latest pages that reference packages Latest pages that reference packages

  Applications that use this package  
No pages of applications that use this class were specified.

Add link image If you know an application of this package, send a message to the author to add a link here.

  Related pages  
A detailed htmlSQL description
A detailed htmlSQL description
htmlSQL live demo
Test the htmlSQL class...

  Files folder image Files  
File Role Description
Files folder imageexamples (15 files)
Accessible without login Plain text file htmlsql.class.php Class Contains the main htmlSQL class
Accessible without login Plain text file snoopy.class.php Class The famous snoopy class by Monte Ohrt - v1.01
Accessible without login Plain text file readme.txt Doc. English readme with description and todo list
Accessible without login Plain text file readme_german.txt Doc. The same as the readme.txt just in german language

  Files folder image Files  /  examples  
File Role Description
  Accessible without login Plain text file demo_01.php Example Example 1 - Shows a simple query
  Accessible without login Plain text file demo_02.php Example Example 2 - Shows a simple query and the "href as url" usage
  Accessible without login Plain text file demo_03.php Example Example 3 - Shows how to connect to a file and a simple query
  Accessible without login Plain text file demo_04.php Example Examples 4 - Shows a advanced query with preg_match
  Accessible without login Plain text file demo_05.php Example Example 5 - Shows a advanced query (with substr)
  Accessible without login Plain text file demo_06.php Example Example 6 - Show how to connect to a string
  Accessible without login Plain text file demo_07.php Example Example 7 - Shows a complex query
  Accessible without login Plain text file demo_08.php Example Example 8 - Shows how to parse a RSS/XML file with htmlSQL
  Accessible without login Plain text file demo_09.php Example Example 9 - Shows how to use the "select" function
  Accessible without login Plain text file demo_10.php Example Example 10 - Shows how to use the "isolate_content" function
  Accessible without login Plain text file demo_11.php Example Example 11 - Shows how to query a simple XML file
  Accessible without login Plain text file demo_12.php Example Example 12 - Shows how to replace the user agent and the referer with custom values
  Accessible without login Plain text file demo_data.htm Example Demo HTML data (used for parsing examples)
  Accessible without login Plain text file demo_xml.xml Example Example XML file (to test parsing)
  Accessible without login Plain text file query_examples.txt Doc. Some query examples for copy and paste

Download Download all files: htmlsql.tar.gz
NOTICE: if you are using a download manager program like 'GetRight', please Login before trying to download this archive.