PHP Classes

CSV Pair File: Manage values for training machine learning system

Recommend this page to a friend!
  Info   View files Example   View files View files (1)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog (1)    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 61 All time: 10,335 This week: 257Up
Version License PHP version Categories
csv_pair_file 1.0.0Custom (specified...5Algorithms, PHP 5, Files and Folders, A...
Description 

Author

This package can manage values in CSV files datasets for training machine learning systems.

It can take two files in CSV format with values for training an artificial intelligence system with data for the system input and the expected output.

The class can perform several types of operations with the data files. The result of these operations generates new files with the transformed data values. Currently, it can:

- Shuffle the data values by a random order
- Split the data values into two sets with a percentage number of rows defined

Innovation Award
PHP Programming Innovation award winner
July 2021
Winner
Machine learning is a process used in many artificial intelligence systems by which the software tries to learn patterns of values that it will later try to recognized autonomously.

The machine learning process starts with training the system with some values that it should recognize and the desired values that the system should output when it has learned to recognize the patterns correctly.

This package helps to prepare different sets of values to train a machine learning system from the same set of input values and the respective desired output values.

Manuel Lemos
Picture of Rafael Martin Soto
  Performance   Level  
Name: Rafael Martin Soto <contact>
Classes: 13 packages by
Country: Spain Spain
Age: 49
All time rank: 230058 in Spain Spain
Week rank: 76 Up2 in Spain Spain Up
Innovation award
Innovation award
Nominee: 7x

Winner: 4x

Example

<?php

/** csv_pair_file Example file
 *
 * Class for manage csv pair files
 *
 * Used normally for Neural Networks, Deep learning, Machine learning, Intelligence Artificial, ....
 *
 *
 *
 * @author Rafael Martin Soto
 * @author {@link http://www.inatica.com/ Inatica}
 * @since July 2021
 * @version 1.0.0
 * @license GNU General Public License v3.0
*/


require_once( 'csv_pair_file_class.php' ); // For manage csv files

$csv_original_dataset = new csv_pair_file('original_dataset.csv', 'desired_dataset.csv');


echo
'Randomize original and desired Datasets'.PHP_EOL;

$csv_original_dataset->randomize();

$perc_Train = 70;
$perc_rest_Test = 50;

echo
'Splitting Randomized Dataset in '.$perc_Train.'% for Train and '.(100-$perc_Train).'% for Test & Validation'.PHP_EOL;

$RandomizedName = $csv_original_dataset->get_csv_randomized_file_names();

$csv = new csv_pair_file( $RandomizedName[0], $RandomizedName[1] );
$SplittedNames = $csv->split( $perc_Train );


// We have New Train & Desired Data File...
$csv_train_dataset = new csv_pair_file( $SplittedNames[0][0], $SplittedNames[0][1] );


// The rest will need to splitted in 2 parts (Test & Validation data)
echo 'Splitting Rest '.(100-$perc_Train).'% Dataset in 2 files of '.$perc_rest_Test.'% for Test and '.(100-$perc_rest_Test).'% for Validation'.PHP_EOL;

$csv = new csv_pair_file( $SplittedNames[1][0], $SplittedNames[1][1] );
$SplittedNames = $csv->split( $perc_rest_Test ); // new csv is perc_rest_Test% of the global data (100% - 70% = 30%) . Split it at 50% (80% + 15% + 15% = 100%)

// We have Test & Validation New Data Files
$this->csv_test_dataset = new csv_pair_file( $SplittedNames[0][0], $SplittedNames[0][1] );
$this->csv_validation_dataset = new csv_pair_file( $SplittedNames[1][0], $SplittedNames[1][1] );

?>


  Files folder image Files  
File Role Description
Accessible without login Plain text file example.php Example Example script

 Version Control Unique User Downloads Download Rankings  
 100%
Total:61
This week:0
All time:10,335
This week:257Up