Login   Register  
PHP Classes
elePHPant
Icontem

Text Cat: Guess the language of a given text

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Info   View files View files (18)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Last Updated Ratings Unique User Downloads Download Rankings  
2006-06-04 (8 years ago) RSS 2.0 feedNot enough user ratingsTotal: 880 All time: 3,692 This week: 1,433Up
Version License PHP version Categories  
libtextcat 1.0Public Domain3.0Text processing
Description Author  

This class can be used to guess the language of a given text.

The class reads data files that contain ranking information about characters that are most likely to be found in texts of several languages.

The text being analyzed is converted to Unicode to be compared with the language character ranking data.

The class returns an array of the language sorted by ranking .

Currently it support the language are: Arabic, Belarus, Chinese, Czech, Danish, Dutch, English, Esperanto, French, German, Greek, Hebrew, Italian, Japanese, Russian, and Spanish.

Have a lot of fun with this!

Innovation Award  
PHP Programming Innovation award nominee
June 2006
Number 5


Prize: One book of choice by SAMS
A text can be written in many different idioms. Without a prior knowledge of the idiom on which a text is written, it is hard for a human to guess and eventually use an appropriate idiom translation tool.

This class can be used to guess the idiom of a text. It takes prebuilt data files that are used to give different weights to the presence of certain characters in a text that are more associated to an idiom.

This way the class can give a good idea of the idioms on which a given text is more likely to be written.

Manuel Lemos
Picture of Cesar D. Rodas
Name: Cesar D. Rodas is available for providing paid consulting. Contact Cesar D. Rodas .
Classes: 37 packages by
Country: Paraguay Paraguay
Age: 27
All time rank: 71 in Paraguay Paraguay
Week rank: 20 Down1 in Paraguay Paraguay Equal
Innovation award
Innovation award
Nominee: 24x

Winner: 5x

  Files folder image Files  
File Role Description
Accessible without login Plain text file arabic.lm Data arabic
Accessible without login Plain text file belarus.lm Data belarus
Accessible without login Plain text file chinese.lm Data chinese
Accessible without login Plain text file czech.lm Data czech
Accessible without login Plain text file danish.lm Data danish
Accessible without login Plain text file dutch.lm Data dutch
Accessible without login Plain text file english.lm Data english
Accessible without login Plain text file esperanto.lm Data esperanto
Accessible without login Plain text file french.lm Data french
Accessible without login Plain text file german.lm Data german
Accessible without login Plain text file greek.lm Data greek
Accessible without login Plain text file hebrew.lm Data hebrew
Accessible without login Plain text file italian.lm Data italian
Accessible without login Plain text file japanese.lm Data japanese
Accessible without login Plain text file russian.lm Data russian
Plain text file saddorlibtextcat.php Class This is the main class
Accessible without login Plain text file spanish.lm Data spanish
Accessible without login Plain text file test.php Example test

 Version Control Unique User Downloads Download Rankings  
 0%Total:880All time:3,692
 This week:0This week:1,433Up