File: README.markdown

Recommend this page to a friend!
  Classes of Jericko Tejido  >  basset-ir  >  README.markdown  >  Download  
File: README.markdown
Role: Documentation
Content type: text/markdown
Description: Documentation
Class: basset-ir
Retrieve, transform and process text documents
Author: By
Last change: Formalized ResultSet and removed trie structure. Added MetaData class for doc infos. Refactored feedback. Removed IndexSearch to make way for IndexManager. Updated ReadMe
Date: 2 years ago
Size: 1,580 bytes



Class file image Download


Build Status


Basset is a full-text PHP Information Retrieval library. This is a collection of developments in the field of IR and ported over to PHP for research purposes.

Basset provides different ways of searching through documents in a collection (ad-hoc retrieval), by applying advanced and experimental IR algorithms and/or techniques gathered from different Research studies and Conferences, most notably:

  1. TREC
  2. SIGIR
  3. ECIR
  4. ACM


You can read about it here

Using the Cranfield Collection and the sample.php file

The Cranfield Collection has been the pioneer collection in information retrieval to validate a system's effectiveness.

I've included the 1400 abstract Cranfield Collection as an XML file that you can parse into separate files.

The test file at tests/sample.php can be executed right away to do the parsing and do a search for a single test query. Customize it to your needs if needed.

You can read Cranfield/cranfield-collection/cranqrel for Glassgow's qrels result.

I've also included SMART system's stopword list for standardization (see stopwords/stopwords.txt).

For more information send a message to info at phpclasses dot org.