PHP Similar Text Percentage: Compare two strings to compute a similarity score

Recommend this page to a friend!
  Info   View files (17)   Download .zip   Reputation   Support forum (1)   Blog    
Ratings Unique User Downloads Download Rankings
69%Total: 352 All time: 6,858 This week: 233
Version License PHP version Categories
similar-text 4.0.0MIT/X Consortium ...5Algorithms, PHP 5, Text processing
Description Author

This class can compare two strings to compute a similarity score.

It takes the text of two strings and analyze them using pure PHP code to evaluate how equal they are.

The class returns a number that represents a percentage of the two strings to tell the level of similarity.

It achieves that by sorting words, ignoring white space and punctuation, removing or adding word, strip URLs, replace words by acronyms or expanding acronyms into the original words, compare words with similar sounds using stems, checking parts of the strings, replace words by abbreviations or using anagrams.

Recommendations

check similariries between text files
i want to check different text documents to find similarities

Innovation Award
PHP Programming Innovation award nominee
April 2018
Number 6
PHP comes with built-in functions for comparing strings and determine how similar they are.

This package provides a pure PHP solution that works in a more sophisticated way by performing text comparison on a sentences basis, rather than on a word by word basis.

Manuel Lemos
Picture of zinsou A.A.E.Mo´se
  Performance   Level  
Name: zinsou A.A.E.Mo´se is available for providing paid consulting. Contact zinsou A.A.E.Mo´se .
Classes: 50 packages by
Country: Benin Benin
Age: 30
All time rank: 7971 in Benin Benin
Week rank: 48 1 in Benin Benin
Innovation award
Innovation award
Nominee: 23x

Winner: 2x

Details
PHP Similar Text Percentage: Compare two strings to compute a similarity score
==============================================================================

[![Build Status](https://travis-ci.org/manuwhat/similar-text.svg?branch=master)](https://travis-ci.org/manuwhat/similar-text)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/manuwhat/similar-text/?branch=master)
[![Build Status](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/build.png?b=master)](https://scrutinizer-ci.com/g/manuwhat/similar-text/build-status/master)
[![Code Intelligence Status](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/code-intelligence.svg?b=master)](https://scrutinizer-ci.com/code-intelligence)

### Library which help to Compare two strings to compute a similarity score and get stats on how linked are the strings.


**Requires**: PHP 5.3+


### What this library exactly does?
this library can compare two strings to compute a similarity score.

It takes the text of two strings and analyze them using pure PHP code to evaluate how equal they are.

The class returns a number that represents a percentage of the two strings to tell the level of similarity.

Based on the stats provided It actually can help to detect similarity even if these cases occurred :
WORD REORDER,WHITESPACE AND PUNCTUATION,REMOVE WORDS,ADD WORDS,URL STRIPPING,
FORM ACRONYM,EXPAND ACRONYM,STEMMING,SUBSTRING ,SUPERSTRING,ABBREVIATION ,ANAGRAM


### How to use it

Require the library by issuing this command:

```bash
composer require manuwhat/similar-text
```

Add `require 'vendor/autoload.php';` to the top of your script.

Next, use it in your script, just like this:

```php
use ezama/similar-text;

100.0===similarText('qwerty', 'ytrewq')
```

This is an example of how to use the stats to check a special case.Here we will use them to check about anagrams
(note that this has already been implemented in the library check the file similar_text.php to know more about all available implementation) 

```php
function areAnagrams($a, $b)
{
	return  Ezama\similar_text::similarText($a, $b, 2, true, $check)?$check['similar'] === 100.0&&$check['contain']===true:false;
}

areAnagrams('qwerty', 'ytrewq');// return true;

```

Nb: 
some functions and methods are more subtle than one can think.
For example the method  simpleCommonTextSimilarities::aIsSuperStringOfB and its helper aIsSuperStringOfB 
are not at all equal to the usual checking functions built on top of preg_match ,stripos and PHP similar functions

a simple example is :

```php
function aisSuperStringOfB_stripos($a, $b)
{
	return  false!==stripos($a,$b);
}

function aisSuperStringOfB_PCRE($a, $b)
{
	return  preg_match('#'.preg_quote($b).'#i',$a);
}

require './vendor/manuwhat/similar-text/similar_text.php';

aIsSuperStringOfB('mum do you want to cook something', 'do you cook something mum');//return true;
aIsSuperStringOfB_stripos('mum do you want to cook something', 'do you cook something mum');//false;
aIsSuperStringOfB_PCRE('mum do you want to cook something', 'do you cook something mum');//return false;
```


### How To run unit tests 
```bash
phpunit  ./tests
```
  Files  
File Role Description
src (9 files)
tests (1 file)
.travis.yml Data Auxiliary data
composer.json Data Auxiliary data
LICENSE Lic. License text
phpunit.xml Data Auxiliary data
README.md Doc. Documentation
readme.txt Doc. readme
similar_text.php Aux. Auxiliary script

  Files  /  src  
File Role Description
   complexCommonTextSimilarities.php Class Class source
   complexCommonTextSimilaritiesHelper.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   diceDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   distance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   hammingDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methodsimplemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   jaroWinklerDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   levenshteinDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
   similar_text.php Class Class source
   simpleCommonTextSimilarities.php Class Class source

  Files  /  tests  
File Role Description
   Similar_textTest.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 94%
Total:352
This week:0
All time:6,858
This week:233
 User Ratings  
 
 All time
Utility:100%
Consistency:100%
Documentation:91%
Examples:-
Tests:-
Videos:-
Overall:69%
Rank:491
  

For more information send a message to info at phpclasses dot org.