This class can be used to guess the language of a given text.
The class reads data files that contain ranking information about characters that are most likely to be found in texts of several languages.
The text being analyzed is converted to Unicode to be compared with the language character ranking data.
The class returns an array of the language sorted by ranking .
Currently it support the language are: Arabic, Belarus, Chinese, Czech, Danish, Dutch, English, Esperanto, French, German, Greek, Hebrew, Italian, Japanese, Russian, and Spanish.
Have a lot of fun with this!
Prize: One book of choice by SAMS
|A text can be written in many different idioms. Without a prior knowledge of the idiom on which a text is written, it is hard for a human to guess and eventually use an appropriate idiom translation tool.
This class can be used to guess the idiom of a text. It takes prebuilt data files that are used to give different weights to the presence of certain characters in a text that are more associated to an idiom.
This way the class can give a good idea of the idioms on which a given text is more likely to be written.