Login   Register  
PHP Classes
elePHPant
Icontem

File: example.txt

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Classes of Philipp Strazny  >  PHP Split File by Pattern  >  example.txt  >  Download  
File: example.txt
Role: Sample output
Content type: text/plain
Description: usage examples
Class: PHP Split File by Pattern
Split files into chunks divided by pattern strings
Author: By
Last change:
Date: 1 year ago
Size: 2,974 bytes
 

Contents

Class file image Download
Example for using FilePatternSplitter

max_und_moritz.txt contains a text-only version of a famous German cartoon.
For some purposes, it may be useful to have each chapter in its own file.
Luckily, each chapter heading starts with an underscore in this file, so
we issue:

$ php FilePatternSplitter.php split max_und_moritz.txt '/^_/'
./fps00001_max_und_moritz.txt
./fps00002_max_und_moritz.txt
./fps00003_max_und_moritz.txt
./fps00004_max_und_moritz.txt
./fps00005_max_und_moritz.txt
./fps00006_max_und_moritz.txt
./fps00007_max_und_moritz.txt
./fps00008_max_und_moritz.txt
./fps00009_max_und_moritz.txt
./fps00010_max_und_moritz.txt

When we check the contents of these files, we see that the individual chapters are nicely put into their individual chapters and separated from the Gutenberg preamble: 

$ head -n 1 fps*
==> fps00001_max_und_moritz.txt <==
The Project Gutenberg EBook of Max und Moritz, by Wilhelm Busch
==> fps00002_max_und_moritz.txt <==
_VORWORT._
==> fps00003_max_und_moritz.txt <==
_Erster Streich._
==> fps00004_max_und_moritz.txt <==
_Zweiter Streich._
==> fps00005_max_und_moritz.txt <==
_Dritter Streich._
==> fps00006_max_und_moritz.txt <==
_Vierter Streich._
==> fps00007_max_und_moritz.txt <==
_Fünfter Streich._
==> fps00008_max_und_moritz.txt <==
_Sechster Streich._
==> fps00009_max_und_moritz.txt <==
_Letzter Streich._
==> fps00010_max_und_moritz.txt <==
_SCHLUSS._

However, the last file does not only contain the final chapter, but also the Gutenberg license. In order to also separate that one, we simply add a second pattern:

$ php FilePatternSplitter.php split max_und_moritz.txt '/^_/' '/^End of the/'
./fps00001_max_und_moritz.txt
./fps00002_max_und_moritz.txt
./fps00003_max_und_moritz.txt
./fps00004_max_und_moritz.txt
./fps00005_max_und_moritz.txt
./fps00006_max_und_moritz.txt
./fps00007_max_und_moritz.txt
./fps00008_max_und_moritz.txt
./fps00009_max_und_moritz.txt
./fps00010_max_und_moritz.txt
./fps00011_max_und_moritz.txt

$ head -n 1 fps*
==> fps00001_max_und_moritz.txt <==
The Project Gutenberg EBook of Max und Moritz, by Wilhelm Busch
==> fps00002_max_und_moritz.txt <==
_VORWORT._
==> fps00003_max_und_moritz.txt <==
_Erster Streich._
==> fps00004_max_und_moritz.txt <==
_Zweiter Streich._
==> fps00005_max_und_moritz.txt <==
_Dritter Streich._
==> fps00006_max_und_moritz.txt <==
_Vierter Streich._
==> fps00007_max_und_moritz.txt <==
_Fünfter Streich._
==> fps00008_max_und_moritz.txt <==
_Sechster Streich._
==> fps00009_max_und_moritz.txt <==
_Letzter Streich._
==> fps00010_max_und_moritz.txt <==
_SCHLUSS._
==> fps00011_max_und_moritz.txt <==
End of the Project Gutenberg EBook of 
 
Just to verify that things worked appropriately, we can merge the files again:

$ php FilePatternSplitter.php merge .
merged into max_und_moritz.txt.merged
 
and then check against the original:

$ diff -s max_und_moritz.txt max_und_moritz.txt.merged
Files max_und_moritz.txt and max_und_moritz.txt.merged are identical