Author: Khaled Al-Shamaa
Viewers: 153
Last month viewers: 57
Package: Al-Kashi
Read this article to learn more about this PHP package and examples of its application.
Contents
Introduction to Al-Kashi PHP Statistics Class
Example Data and Statistics
Summary Statistics
Statistical Graphics
Correlation, Regression, and t-Test
Distributions
Chi-square test or Contingency tables (A/B testing)
Diversity index
Analysis of Variance (ANOVA)
Cluster Analysis
Time Series Analysis
To-do list
Introduction to Al-Kashi PHP Statistics Class
The Al-Kashi is a project that can be used in applications that may incude an online log file analysis, advertising campaign statistics, or survey or voting results on-the-fly analysis.
Example Data and Statistics
The data presented in this example was extracted from the 1974 Motor Trend US magazine. It comprises fuel consumption and 10 aspects of automobile design and performance for 32 cars 1973-74 models). You can download the example data file from here.
Description (Motor Trend Car Road Tests)
Format: A data frame with 32 observations on 12 variables.
ID | Title | Description |
1 | model | Car models |
2 | mpg | Miles/(US) gallon |
3 | cyl | Number of cylinders |
4 | disp | Displacement (cu.in.) |
5 | hp | Gross horsepower |
6 | drat | Rear axle ratio |
7 | wt | Weight (lb/1000) |
8 | qsec | 1/4 mile time |
9 | vs | V/S |
10 | am | Transmission (0 = automatic, 1 = manual) |
11 | gear | Number of forward gears |
12 | carb | Number of carburetors |
Example code read example data and feed it to Al Kashi
- $sep = "\t"; $nl = "\n";
- $content = file_get_contents('data.txt');
- $records = explode($nl, $content);
- $header = explode($sep, trim(array_shift($records)));
- $data = array_fill_keys($header, array());
- foreach ($records as $id=>$record) {
- $record = trim($record);
- if ($record == '') continue;
- $fields = explode($sep, $record);
- $titles = $header;
- foreach ($fields as $field) {
- $title = array_shift($titles);
- $data[$title][] = $field;
- }
- }
- $x = $data['wt'];
- $y = $data['mpg'];
- require('kashi.php');
- $kashi = new Kashi();
PHP Statistical Functions Summary
Rank (x) | 9, 12, 7, 16, 18, 21, 23, 15, 13, 18, 18, 29, 25, 26, 30, 32, 31, 6, 2, 3, 8, 22, 17, 27, 28, 4, 5, 1, 14, 10, 23, 11 |
- // $x is an array of values
- echo 'Arithmetic Mean: ' . $kashi->mean($x) . '<br>';
- echo 'Aeometric Mean: ' . $kashi->mean($x, "geometric") . '<br>';
- echo 'Harmonic Mean: ' . $kashi->mean($x, "harmonic") . '<br>';
- echo 'Mode: ' . print_r($kashi->mode($x)) . '<br>';
- echo 'Median: ' . $kashi->median($x) . '<br>';
- echo 'Variance: ' . $kashi->variance($x) . '<br>';
- echo 'SD: ' . $kashi->sd($x) . '<br>';
- echo '%CV: ' . $kashi->cv($x) . '<br>';
- echo 'Skewness: ' . $kashi->skew($x) . '<br>';
- echo 'Is it significant (i.e. test it against 0)? ';
- var_dump($kashi->isSkew($x));
- echo 'Kurtosis: ' . $kashi->kurt($x) . '<br>';
- echo 'Is it significant (i.e. test it against 0)? ';
- var_dump($kashi->isKurt($x));
- echo 'Rank (x): ';
- echo implode(', ', $kashi->rank($x)) . '<br>';
Statistical Graphics
- echo 'Boxplot: <br><pre>';
- print_r($kashi->boxplot($x));
- echo '</pre><br>';
- echo 'Histogram: <br><pre>';
- print_r($kashi->hist($x, 8));
- echo '</pre><br>';
- echo 'Normal Q-Q Plot: <br>';
- $qq = $kashi->qqnorm($x);
- echo 'x = ' . implode(', ', $qq['x']) . '<br>';
- echo 'y = ' . implode(', ', $qq['y']) . '<br>';
- echo 'Ternary Plot: <br>';
- $xy = $kashi->ternary($data['wt'], $data['mpg'], $data['qsec']);
- echo 'x = ' . implode(', ', $xy['x']) . '<br>';
- echo 'y = ' . implode(', ', $xy['y']) . '<br>';
Correlation, Regression, and t-Test
- echo 'Covariance: ' . $kashi->cov($x, $y) . '<br>';
- echo 'Correlation: ' . $kashi->cor($x, $y) . '<br>';
- $r = $kashi->cor($x, $y);
- $n = count($x);
- echo 'Significant of Correlation: ' . $kashi->corTest($r, $n) . '<br>';
- echo 'Regression: ' . print_r($kashi->lm($y, $x), true) . '<br>';
- echo 't-Test unpaired: ' . $kashi->tTest($x, $y, false) . '<br>';
- echo 'Test: ' . $kashi->tDist($kashi->tTest($x, $y, false),
- (count($x)-1)*(count($y)-1)) . '<br>';
- echo 't-Test paired: ' . $kashi->tTest($x, $y, true) . '<br>';
- echo 'Test: ' . $kashi->tDist($kashi->tTest($x, $y, true),
- count($x)-1) . '<br>';
Distributions
- echo 'Normal distribution (x=0.5, mean=0, sd=1): ' .
- $kashi->norm(0.5, 0, 1) . '<br>';
- echo 'Probability for the Student t-distribution (t=3, n=10)',
- ' one-tailed: ';
- echo $kashi->tDist(3, 10, 1) . '<br>';
- echo 'Probability for the Student t-distribution (t=3, n=10)',
- ' two-tailed: ';
- echo $kashi->tDist(3, 10, 2) . '<br>';
- echo 'F probability distribution (f=2, df1=12, df2=15): '.
- $kashi->fDist(2, 12, 15) . '<br>';
- echo 'Inverse of the standard normal cumulative distribution',
- ' (p=0.95): ';
- echo $kashi->inverseNormCDF(0.95) . '<br>';
- echo 't-value of the Student\'s t-distribution (p=0.05, n=29): ';
- echo $kashi->inverseTCDF(0.05, 29) . '<br>';
- echo 'Standardize (x) (i.e. mean=0 & variance=1): ';
- echo implode(', ', $kashi->standardize($x)) . '<br>';
Chi-square test or Contingency tables (A/B testing)
Calculate the probability that number of cylinders distribution in automatic and manual transmission cars is same | 0.012646605046107 |
- $table['Automatic'] = array('4 Cylinders' => 3, '6 Cylinders' => 4,
- '8 Cylinders' => 12);
- $table['Manual'] = array('4 Cylinders' => 8, '6 Cylinders' => 3,
- '8 Cylinders' => 2);
- $results = $kashi->chiTest($table);
- $probability = $kashi->chiDist($result['chi'], $result['df']);
- echo 'Chi-square test probability: ' . $probability . '<br>';
Diversity index
Shannon index for number of forward gears | 1.0130227035447 |
Simpson index for number of cylinders | 0.357421875 |
- $gear = array('3' => 15, '4' => 12, '5' => 5);
- $cyl = array('4' => 11, '6' => 7, '8' => 14);
- echo 'Shannon index for gear: ' . $kashi->diversity($gear) .
- '<br>';
- echo 'Simpson index for cyl: ' . $kashi->diversity($cyl, 'simpson').
- '<br>';
Analysis of Variance (ANOVA)
- require('kashi_anova.php');
- // $obj = new KashiANOVA($dbname, $dbuser, $dbpass, $dbhost);
- $obj = new KashiANOVA('test', 'root', '', 'localhost');
- $str = file_get_contents('anova_data.txt');
- $obj->loadString($str);
- // mpg ~ cyl
- $result = $obj->anova('cyl', 'mpg');
- print_r($result);
Cluster Analysis
- require('kashi_cluster.php');
- $obj = new KashiCluster();
- $obj->dataLoad($data);
- $result = $obj->kMean(2);
- print_r($result);
- // Heretical tree output has no header, and consists of four columns. For each row, the first column is the
- // identifier of the node, the second and third columns are child nodes identifier, and the fourth column used
- // to determine the height of the node when rendering a tree.
- $tree = $obj->hClust();
- echo "<pre>$tree</pre>";
Time Series Analysis
- echo 'Moving Average for x: ' . implode(', ', $kashi->movingAvg($x, 5)) . '<br>';
To-do list
Principal Component Analysis (PCA)
Multiple Linear Regression & Relative Weights
Analysis of Covariance
Extra Clustering Methods (i.e. Linkage Criteria)
You need to be a registered user or login to post a comment
1,616,107 PHP developers registered to the PHP Classes site.
Be One of Us!
Login Immediately with your account on:
Comments:
7. Correction in function tDist? - Pavlos Stamboulides (2014-08-07 02:06)
I think I have a correction... - 0 replies
Read the whole comment and replies
6. awasome package - Maulana malik ibrahim (2013-10-13 07:30)
awasome... - 1 reply
Read the whole comment and replies
5. Hello - Gary Bhat (2013-10-13 07:29)
Good start... - 1 reply
Read the whole comment and replies
4. Great start - Rodney C Kite (2013-10-13 07:28)
Suggested priority... - 1 reply
Read the whole comment and replies
3. Nice - Nikos M. (2013-10-13 07:28)
Nice... - 1 reply
Read the whole comment and replies
2. blog - Tom Fitzgerald (2013-10-13 07:28)
graphics Broken... - 1 reply
Read the whole comment and replies
1. great package - Carlos Cabral (2013-10-13 07:27)
just a thank you note :)... - 1 reply
Read the whole comment and replies