PHP Classes

Directory Documentation: Crawl a directory and document files and changes

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 202 All time: 8,464 This week: 455Up
Version License PHP version Categories
documentdirectories 1.0GNU General Publi...5PHP 5, Files and Folders
Description 

Author

This class can crawl a directory and document files and changes.

It can traverse a given directory to extract the list of files and creates a document file that lists the files that were added, changed or removed since last time the directory was documented.

The class generates or updates a document file with the results of the documentation process.

The generated file can be edited manually to include comments and file descriptions. These manual additions will be preserved next time the directory is documented again.

Innovation Award
PHP Programming Innovation award nominee
February 2016
Number 7
Many applications generate and manage files that are stored in given directories.

As more files are added to the directories, it is hard for the users to keep track of all the changes, even more if some of the files are added manually by those users.

This class can help solving that problem by producing automatic documentation of files in directories. It supports for including manual comments and descriptions, so the users can complement the file documentation generated by the automated process.

Manuel Lemos
Picture of Bob Wedwick
  Performance   Level  
Name: Bob Wedwick <contact>
Classes: 4 packages by
Country: United States United States
Age: 82
All time rank: 1979281 in United States United States
Week rank: 199 Up28 in United States United States Up
Innovation award
Innovation award
Nominee: 2x

Example

#!/usr/bin/php -q
<?php
/*
    Author: Bob Wedwick
    USE: php DocumentingDad.php

    1/27/2016 - a demonstration of how the DocumentADirectory class works.

    A complete description is found in script for the class
    and in the pseudocode design file DocumentADirectory.php.pseudo.

    For this demonstration the file types included are:
         .doc, .php, and .xls,
        and 'noextension', '*test*', 'Samp*' are the file name patterns.

    No file names are excluded.

    This produces and edits a "Read Me" text file with the 'DadWhatsUp.ttt' name.

*/

    # attach the class
    #require_once('DocumentADirectory.php');
   
require_once('/home/bob/bin/DocumentADirectory.php');

   
# create the object
   
$doc = new DocumentADirectory;

   
# do not save backup copies
   
$doc->SavePriorDocumentFlag(false);

   
# set the file types to be documented
   
$doc->IncludeFileExtensions(array('.doc','.xls ','php'));
   
$doc->IncludeNamePatterns(array('noextension', '*test*', 'Samp*'));

   
# document the directory. This runs very quickly.
   
$doc->DocumentDirectory('DadWhatsUp.ttt');

/*

Contents of the example directory (Dad) before running this script for the first time.
    DadSample1.doc
    DadSample3.ph
    DadSample4.php
    DadSample5.phP
    DadTesting.tests
    DadWhatsUp8.text
    DocumentingDad.php
    Hold <- directory
    LogTable2Xml.php.BAK
    noextension
    Populateee.pl
    populate.pl
    popu.sc
    popu.Sh
    Samples <- directory
    SampleXls.xls
    SampleX.ttt
    SampleX.xxx
    test <- directory

*/

/* ========= The first time the above script is run, the file 'DadWhatsUp.ttt' is created with these contents. =======

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Documenting: /home/bob/Dad
* DadWhatsUp.ttt NEW FILE created 20160127-090432
* File Types:
* File Extensions: doc xls php
* Include Name Patterns: 'noextension' '*test*' 'Samp*'
* Case Sensitive
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

DadSample1.doc -- no description.
DadSample4.php -- no description.
DocumentingDad.php -- no description.
noextension -- no description.
SampleXls.xls -- no description.
SampleX.ttt -- no description.
SampleX.xxx -- no description.

========
It shows which directory is being documented, the date and time the document was created, what types of files
are included, and whether or not the file names are case sensitive. The directory named Samples is not listed even
though its name matches a name pattern.

*/

/* ======== The second time the above script is run, the same 'DadWhatsUp.ttt' file is modified. ========

Prior to running the script, a text editor was used to manually make notes and comments to the "Read Me" document.

One file was removed from the directory and another was added to show changes to the "Read Me" document.

-------- Results are below. --------

20160127-103132
 ADDITIONS were made.
 Documented file names are MISSING from the directory.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Documenting: /home/bob/Dad
* DadWhatsUp.ttt NEW FILE created 20160127-090432
* File Types:
* File Extensions: doc xls php
* Include Name Patterns: 'noextension' '*test*' 'Samp*'
* Case Sensitive
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Any descriptive text such as this can be manually added.
Any file names anywhere in the descriptive text are also examined to see if they
match a file extension or name pattern. To include a file name from a different directory in the descriptive text without
it being examined, enclose the name in either single or double quotes. Thus the file "SomewhereElse.php" is not marked as
being missing from the directory.
Directory names, such as the 'Samples' directory, are ignored but may be inserted in the descriptive text.

    The next file names were found because they match one of the file extensions being examined.

DadSample1.doc -- a Word document explaining the universe.
    Found as one of the file extensions to include.

DadSample4.php -- a script that does nothing important.
    Found as one of the file extensions (php) to include.

DocumentingDad.php -- this is the script that operates on this "ReadMe" document named "DadWhatsUp.ttt."
    Found as one of the file extensions (php) to include.

SampleXls.xls -- no description.
    Found as one of the file extensions (xls) to include as well as a name pattern (Samp*).

-------- Note that the location of file descriptions has been re-arranged, which is fine to do -------

    The next file names were found because they match one of the name patterns being examined.

noextension -- just something hanging around with no extension.
    Found as one of the name patterns (noextension) to include.

SampleX.ttt -- no description.
    Found as one of the name patterns (Samp*) to include.

SampleX.xxx -- no description.
    Found as one of the name patterns (Samp*) to include.


---------- Comparing directory contents with names that were ignored, we see that:

These do not match a case sensitive file type nor case sensitive name pattern.
    DadSample3.ph
    DadSample5.phP
    DadTesting.tests
    DadWhatsUp8.text
    LogTable2Xml.php.BAK
    Populateee.pl
    populate.pl
    popu.sc
    popu.Sh

These are are names of subordinate directories. Names that might otherwise look like file names
        are put in single quotes.
    Hold <- directory
    'Samples' <- directory
    'test' <- directory

**ADDITIONS** were made 20160127-103132.
DadSample2.doc -- Oh my goodness! Who put this file here? It belongs somewhere else.

Files **MISSING** from the directory -- 20160127-103132.
SampleX.ttt <-- This was removed from the directory. Once listed as missing, it will not be listed again.

*/


/* ======== The third time the above script is run, the 'DadWhatsUp.ttt' file has the same contents as the
second run but the following was added to the beginning of the document.

20160127-104250
 --NO-- additions were found.
 --NO-- documented file names were found missing from the directory.
...

*/
# end script
?>


Details

Note: #.... represents the end of an 'if' statement # <--- represents the end of a loop /* Author: Bob Wedwick DocumentADirectory.php - a class used as an aid in documenting the contents of directories. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Copyright (C) 2015 Software Installation Services, Inc. * Author: Bob Wedwick, Phoenix, AZ 602-449-8552 bob at wedwick dot com. * * This program is free software: you may redistribute it or modify it under the terms * of the GNU General Public License as published by the Free Software Foundation, * either version 3 of the License or any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Definition: "document" is used here to mean a text file used for documenting the contents of a directory. That document is often some sort of a "Read Me" file, but the .txt extension is not required neither is "ReadMe" required to be part of the name of the document. You may optionally provide the name you want for the document's name when you run your calling script. If no name is given a default name is created from the name of the directory with "ReadMe.txt" appended, "MyDirReadMe.txt" as an example. The following command runs the job. $class->DocumentDirectory([optional document name]); This class is used to examine a document in a given directory to see if chosen types of file names are included in the document. It also identifies file names that are in the document but are not in the directory. If the document does not exist, a new one is created. File names in the directory that look like the name of the document are ignored. For example if you are documenting all '.txt' file types and the name of the document file is "MyDirReadMe.txt" any file names starting with "MyDirReadMe" are ignored. Those would most likely be backup copies of the document. The following optional functions give you control over how the class examines a "Read Me" document. The defaults are shown after the // for each function. If you are happy with the defaults, no changes are needed. If you want to change the default conditions, do so before invoking $class->DocumentDirectory(). $class->ExcludeNamePatterns(array of strings); // array(); $class->IncludeFileExtensions(array of strings); // array('.php.') $class->IncludeNamePatterns(array of strings); // array(); $class->NewLineString(string); // '\n' $class->SavePriorDocumentFlag(boolean) ; // true $class->SetCaseSensitive(boolean); // true $class->WhichDirectory(string); // current directory The names of files to be included in the search may be defined in either of two ways or both ways. One way is to provide an array of extensions for files, such as ".php, .pl" remembering that for Linux, the extensions are case sensitive. Therefore files with ".PHP" are different from those with the ".php" extension. Example of overriding the default '.php' extension: $class->IncludeFileExtensions(array('php','pl','bat')); If the extension type is just *, then all files having an extension are included. Files with no extension are not included. Example: $class->IncludeFileExtensions(array('*')); Another way to search for files is to provide an array of file names to include uses * for wild cards. Any combination of four modes may be used: *name*, name*, *name, and name. The * acts much like you would expect from listing files using the * wild card. Example: $class->IncludeNamePatterns(array('ThisName*','*otherName','*this*', 'MyName')); If the inclusion is just *, then all files in the directory are included, and no check is done for missing files, since all words in the document would appear to be be missing file names from the directory. Example: $class->IncludeNamePatterns(array('*'); The names of files to be excluded may be entered into an array using the * wild card. Any combination of four modes may be used: *name*, name*, *name, and name. Exclusions take priority over inclusions. Example: $class->ExcludeNamePatterns(array('ThatName*','*someName','*that*', 'YourName')); No attempt is made to enter or examine descriptions or uses for the files being documented. In order to add descriptions or comments to the document, use a plain text editor like WordPad, vi, or any one of many other editors that saves the document contents as plain text and not in any other format. A qualified file name is any word from the document, not in quotations, with or without a dot extension. So the following words from a document may be treated as file names: RunProg, Sam.xxx, Smpl.php, while the quopted word "Abcd.php" is ignored. Each time a document is examined, the results of the run are pre-pended to the document with the date and time of run. Any added and missing file names are appended to the end of the document. Those would be either of two types: 1. files in the directory that are not found in the document = **ADDITIONS** 2. file names found in the document that are not found in the directory = **MISSING**. Once a file is listed at the end of the document in either the **ADDITIONS** or the **MISSING** section, it will not be entered again as an addition or missing. You may freely move the additions to another part of the document and add descriptive text to them. File names following the **MISSING** section may be deleted from the body of the document, left in the missing section, or edited to have quotation marks so they are ignored in the future. A dated backup of the prior file is optional, so when this is done, older documentation backup files accumulate in the directory but will not be treated as additions. Example of not generating backup copies: $class->SavePriorDocumentFlag(false) ; The calling script needs to have something similar to this. require_once('class_path/DocumentADirectory.php'); $class = new DocumentADirectory; The calling script does not need to be in the directory being documented. Example: $class->WhichDirectory('Some/Other/Directory'); One calling script may be used to document more than one directory by creating new objects for each directory. The object returns a total count of files added plus files found missing. Therefore a return value of zero means no changes were found. A return message is also generated. You have the option of accessing it like this. echo "$class->returnMessage\n"; The following functions are included in this class. It is rarely necessary to call any other than those shown above under defaults but these may be called as part of testing or debugging. AddThisName() - add this file name to the document Additions() - look for additions to the document BackupDocument() - backup the current document BasicName() - return a word with final dot extension removed CurrentNames() - make an array of qualifying file names found in the document DocumentDirectory() - perform documentation for a given directory DocumentFileName() - note that a file name is in the document EndPunctuation() - remove trailing punctuation marks from a string ExcludeNamePatterns() - exclude file name patterns regardless of the file extension Ext() - return the extension from a file name ExtensionMatch() - true when word passed has an extension and it matches something we want GetCurrentContents() - get the current contents of a document IncludeFileExtensions() - set file extension types to to be included in the document IncludeNamePatterns() - set file name patterns to be included in the document regardless of the file extension MakeDocumentName() - make the document name Missing() - hunt for file names in the document that are missing from the directory NameMatches() - true when the name passed matches something to be included in the document NewLineString() - change the string used to represent a new line PatternHit() - return true when a name matches one pattern PatternMatches() - return true when a name matches a pattern in a pattern array QualifyingFileNames() - build the array of qualifying file names found in the directory RemoveMarks() - remove characters that would not be part of a file name, dot is allowed ReturnMessage() - build a return message SavePriorDocumentFlag() - change the option to save a copy of the existing document SetCase() - return a lower case string when names are not case sensitive SetCaseSensitive() - set or clear the case sensitive flag TextLine() - build a text line with new line ending characters ValueInArray() - check to see if a value is in an array and if not, optionally insert it WhichDirectory() - set which directory to document */ # class definition begins here ### define the class to document selected files in a directory # private variables may be made public for testing and debugging # flag for case sensing # addition count >0 when any new directory file names are found # change status string # name of the directory being documented # directory listing array of qualifying file names # document file name # contents of the existing document # array of file names currently in the document # array of file names currently found in the document but missing in the directory # an array of file name patterns to exclude # an array of file extension types to document # an array of file name patterns to include # flag for this being a new document # missing count > 0 when any missing file names are found # new line string - options include '\n' for Linux,'\r\n' for Windows, <br/> for html # default postfix for the output documentation file name # flag to save prior document. # today's date and time string # return message is public ### __construct() - construct the class # record the current directory as the default directory # set default file type array # get today's date and time # end function ### AddThisName() - add this file name to the document # if first time for additions and not a new document # say at the end of the document that there were additions - TextLine() # .... # add the file name to the document in memory - TextLine() # count new file names found # end function ### Additions() - look for additions to the document # get the pattern of the file name for the document being examined - BasicName() # get the extension for this document - Ext() # loop thru the qualified file list # if the name matches what we are looking for - NameMatches() # if the name is not found in the array of current document file names - ValueInArray() # add this key name to the document - AddThisName() # .... # .... # <--- # if any additions were found # if not a new document # prepend a message saying additions were made - TextLine() #.... # else # prepend a message saying no additions were found - TextLine() # .... # end function ### BackupDocument() - backup the current document # if this is not a new document # get the document name without the extension - BasicName() # add the date time string # add the extension - Ext() # copy contents to the backup file # .... # end function ### BasicName() - return a word with any final dot extension removed. # if a string is passed # explode on the dot # get part 1 # count the parts # loop through 2nd to next to last part # add a dot and parts up to the last one # <--- # return all but the last part minus any extraneous marks - RemoveMarks() # else # name is blank # .... # return result # end function ### CurrentNames() - make an array of qualifying file names found in the document # assume we are not in an area of the document where there are missing file names # open the document for reading # loop through the document line by line # trim the line # if the line contains **MISSING** # note that we are in an area where files were previously noted as missing in the document # .... # if the line contains **ADDITIONS** # note that we are not longer in an area where files were previously noted as missing in the document # .... # explode the line into separate words # loop through words on one line # if quotes are found # ignore the word # .... # remove punctuation from the end of a word - EndPunctuation() # if there is a basic name and not just an extension - BasicName() # if the word does not match names to be excluded - PatternMatches() # if the extension matches any that are wanted - ExtensionMatch() # or matches a name pattern - PatternMatches() # note that it is in the document - DocumentFileName() # .... # .... # .... # <--- # <--- # close the document # end function ### DocumentDirectory() - perform documentation for a given directory # if an empty string is passed # make it null # .... # make the document name - MakeDocumentName() # determine if this is a new document # retrieve the contents of the current or new document - GetCurrentContents() # if backup is wanted # make a dated backup as a text file - BackupDocument() # .... # exclude file names that look like copies of the document # fill qualifying file name array from the directory - QualifyingFileNames() # if not a new document # fill the array of qualifying file names found in the document - CurrentNames() # .... # look for additions - Additions() # if not a new document # look for names missing from the directory - Missing() # prepend examination date and time to the document - TextLine() # .... # save the edited document as a text file # count the total changes found # build a return message # return the total. Zero means no changes were found. # end function ### DocumentFileName() - note that a file name is in the document # if we are in the "**MISSING**" area in the document # enter the file name in the array of those files already noted as missing - ValueInArray() # .... # add the file name to those in the document so we don't record it again - ValueInArray() # end function ### EndPunctuation() - remove trailing punctuation marks from a string # initialize return string # if a string is passed that is not zero length # loop backwards on the string # get the last char from the string # if it is punctuation -- namely ?!,.`~ # subtract 1 from the length # else # quit looking # .... # <--- # return the string with new length # .... # return the result # end function ### ExcludeNamePatterns() - exclude file name patterns regardless of the file extension # if an array is passed # assign it to the name Pattern array # .... # end function ### Ext() - return the extension from a file name # assume there is no extension # if a string is passed # remove unwanted punctuation anywhere in the file name - RemoveMarks() # remove any punctuation at the end of the string - EndPunctuation() # explode the string name on dot # if more than 1 part # set last part to appropriate case - SetCase() # .... # .... # return result # end function ### ExtensionMatch() - true when word passed has an extension and it matches something we want # assume no match # if there is an extension - Ext() # if there are file types types being used # true if the pattern matchs - PatternMatches() # .... # .... # rerurn result # end function ### GetCurrentContents() - get the current contents of a document # if a new document # create a multiline description of what is included and what is excluded # make a string of the file types being documented # if file extension types are wanted # build a string of extension types # .... # if any name patterns are included # add any name patterns to a string # .... # if any name patterns are excluded # add any name patterns to a string # .... # comment about case sensing # the contents will show this is a new document and its file types - TextLine() # else # copy the current documentation contents to memory # .... # end function ### IncludeFileExtensions() - set file extension types to to be included in the document # if argument passed is an array # clear the file type array # loop through the new array # trim leading and trailing blanks # remove any dots # enter it in the file type array with proper case setting - SetCase() # <--- # .... # end function ### IncludeNamePatterns() - set file name patterns to be included in the document regardless of the file extension # if an array is passed # initialize the name pattern array # loop through the strings # trim leading and trailing blanks # and assign it to the name pattern array # <--- # .... # end function ### MakeDocumentName() - make the document name # if a name string is not passed # create a default name from the current directory # get the basename for the directory # append the postfix # .... # save the name in document name variable # end function ### Missing() - hunt for file names in the document that are missing from the directory # if the name pattern array has just *, we can not look for missing files because all words would be missing # note this in the return message # return now # .... # loop through the array of file names from the document # if the name from the document is not in directory - ValueInArray() # if not previously marked as missing - ValueInArray() # if first time for missing files # say so at the end of the document - TextLine() # .... # add the name to the end of the document - TextLine() # count missing file names in the document # .... #.... # <--- # if there were any file names in the document that are missing from the directory # report that missing files were found - TextLine() # else # report that no files missing were found - TextLine() # .... # end function ### NameMatches() - true when a name matches something to be included in the document # assume no match # if it does not match names to be excluded - PatternMatches() # if there is a basic name and not just an extension - BasicName() # if the extension does not match any that are wanted - ExtensionMatch() # return true if it makes a name pattern match # .... # .... # .... # return result # end function ### NewLineString() - change the string used to represent a new line # remember the current new line string # if there is a new line string # save it # .... # return the prior new line string # end function ### PatternHit() - return true when a name matches one pattern # assume it does not match # remove the extension from the name - BasicName() # set file and pattern to lowercase when not case sensitive - SetCase() # if the pattern is just a * # everything matches # no need to look further # .... # explode the pattern on * (explode treats * as zero length string) # if three parts (*name*) # pattern anywhere in the name # else if 2 parts # if the name part comes second (*part) # reverse the file string # copy reverse the pattern string to 1st element # .... # pattern must match from the beginning # else # just one part so pattern must match exactly # .... # return result # end function ### PatternMatches() - return true when a name matches any pattern in a pattern array # assume not a match # if there are any entries in the pattern array # loop through the patterns # if this pattern hits - PatternHit() # result is true # look no further # .... # <--- # .... # return the result # end function ### QualifyingFileNames() - build the array of qualifying file names found in the directory # get a directory listing # loop through the directory listing # if not a directory # if the name matches anything we want - NameMatches() # enter the file name in the array for the directory - ValueInArray() # .... # .... # <---- # end function ### RemoveMarks() - remove characters that would not be part of a file name, dot is allowed # make a regular expression pattern of characters that can be part of a file name # remove non-filename characters # return the string # end function ### ReturnMessage() - build a return message # begin the message with the name of the document # if no changes # message says so # else # if any additions, # message says how many #.... # if any are missing, # message says how many # .... # .... # end function ### SavePriorDocumentFlag() - change the option to save a copy of the existing document # if an integer is sent # convert it to boolean # .... # if the flag being passed is boolean # set save flag to true or false # .... # end function ### SetCase() - return a lower case string when names are not case sensitive # if a string is passed and not case sensitive # set it to lower case # .... # return result # end function ### SetCaseSensitive() - set or clear the case sensitive flag # if an integer is sent # convert it to boolean # .... # if the flag being passed is boolean # set case sensitive flag to true or false # .... # end function ### TextLine() - build a text line with new line ending characters # get the new line string # replace \r with chr(13) # replace \n with chr(10) # return the string with new line chars appended # end function ### ValueInArray() - check to see if a value is in an array and if not, optionally insert it # this avoids redundant entries in an array # note: when not case sensitive (Windows) values are lower case # set appropriate case sensing for the value - SetCase() # search for a value in the array # if not found and we want to insert it # insert value in the array # .... # return result # end function ### WhichDirectory() - set which directory to document # if a directory string is passed # if the directory exists # set it # .... # .... # end function # end class

  Files folder image Files (4)  
File Role Description
Accessible without login Plain text file DadWhatsUp.ttt Output Class Source
Plain text file DocumentADirectory.php Class Class Source
Accessible without login Plain text file DocumentADirectory.php.pseudo Doc. Class Source
Accessible without login Plain text file DocumentingDad.php Example Class Source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 0%
Total:202
This week:0
All time:8,464
This week:455Up