Text Cat: Guess the language of a given text

Recommend this page to a friend!
  Info   View files (18)   Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 902 All time: 3,870 This week: 314
Version License PHP version Categories
libtextcat 1.0.0Public Domain3.0Text processing
Description Author

This class can be used to guess the language of a given text.

The class reads data files that contain ranking information about characters that are most likely to be found in texts of several languages.

The text being analyzed is converted to Unicode to be compared with the language character ranking data.

The class returns an array of the language sorted by ranking .

Currently it support the language are: Arabic, Belarus, Chinese, Czech, Danish, Dutch, English, Esperanto, French, German, Greek, Hebrew, Italian, Japanese, Russian, and Spanish.

Have a lot of fun with this!

Innovation Award
PHP Programming Innovation award nominee
June 2006
Number 5


Prize: One book of choice by SAMS
A text can be written in many different idioms. Without a prior knowledge of the idiom on which a text is written, it is hard for a human to guess and eventually use an appropriate idiom translation tool.

This class can be used to guess the idiom of a text. It takes prebuilt data files that are used to give different weights to the presence of certain characters in a text that are more associated to an idiom.

This way the class can give a good idea of the idioms on which a given text is more likely to be written.

Manuel Lemos
Picture of Cesar D. Rodas
  Performance   Level  
Name: Cesar D. Rodas is available for providing paid consulting. Contact Cesar D. Rodas .
Classes: 38 packages by
Country: Paraguay Paraguay
Age: 32
All time rank: 71 in Paraguay Paraguay
Week rank: 84 1 in Paraguay Paraguay
Innovation award
Innovation award
Nominee: 25x

Winner: 5x

  Files  
File Role Description
arabic.lm Data arabic
belarus.lm Data belarus
chinese.lm Data chinese
czech.lm Data czech
danish.lm Data danish
dutch.lm Data dutch
english.lm Data english
esperanto.lm Data esperanto
french.lm Data french
german.lm Data german
greek.lm Data greek
hebrew.lm Data hebrew
italian.lm Data italian
japanese.lm Data japanese
russian.lm Data russian
saddorlibtextcat.php Class This is the main class
spanish.lm Data spanish
test.php Example test

 Version Control Unique User Downloads Download Rankings  
 0%
Total:902
This week:0
All time:3,870
This week:314

For more information send a message to info at phpclasses dot org.