PHP Classes

wordDocumentHandler: Convert and clean MSWord documents to HTML

Recommend this page to a friend!
  Info   View files Example   View files View files (2)   DownloadInstall with Composer Download .zip   Reputation   Support forum (6)   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStar 43%Total: 6,990 This week: 1All time: 280 This week: 560Down
Version License PHP version Categories
logantools 1.0.0GNU General Publi...4HTML, Text processing, Windows
Description 

Author

This class can be used to convert a Microsoft Word document to HTML, RTF or plain text using COM objects.

The input document formats can be Microsoft Word DOC, RTF and plain text.

The class can also clean the generated HTML to remove unnecessary markup that Microsoft Word adds.

Of course, you need MsWord installed on the server, and Windows OS.

It doesn't works ? Look below =>

1- your server must be running Win32
2- Microsoft Word must be installed on the server (I tested with Word2000)
3- readfile() is not available under PHP 4.3. You can use the following code to replace it with PHP<4.3
if (str_replace(".", "", phpversion())<"430")
{
function readFile( $f ) {
$out = ""; $lines = file ($f); foreach( $lines as $l ) $out .= $l."\n"; return $out;
}
}
4- try to not open a file on the netword (ie \\server\doc...) unless you fully understand the authentification process

Picture of Logan Dugenoux
Name: Logan Dugenoux <contact>
Classes: 6 packages by
Country: France France
Age: ???
All time rank: 785 in France France
Week rank: 411 Down15 in France France Down

Example

<?php
//########################################################################################
// -------------- Summary
// Example of use of the wordDocumentHandler class
//
// -------------- Author
// Logan Dugenoux - 2003
// logan.dugenoux@netcourrier.com
// http://www.peous.com/logan/
//
// -------------- License
// GPL
//
//########################################################################################

   
@set_time_limit( 60 ); // cleaning is sometimes very long depending on options
   
require ("wordDocumentHandler.php");

   
   
// ############### Put here the name of a MsWord document ###################
   
$myWordFile = "my doc file.doc";
   
   
// The class
   
$w = new wordDocumentHandler();
   
   
$txt = $w->convertWordDocumentToString( $myWordFile , "htm" );
    if (!
$txt)
    {
        die(
$w->GetLastError() );
    }
    else
    {
        echo
"Conversion to string ok. Output len :".strlen($txt)."<br>";
    }
   
   
$w->cleanWordHTML( $txt );
    echo
"Cleaned string len :".strlen($txt)."<br>";
   

   
$outFile = $myWordFile.".html";
    if (!
$w->convertWordDocumentToFile( $myWordFile ,$outFile , "htm" ))
    {
        die(
$w->GetLastError() );
    }
    else
    {
        echo
"Conversion to file ok.<br>";
    }
   
?>


  Files folder image Files  
File Role Description
Plain text file wordDocumentHandler.php Class Source of the wordDocumentHandler class
Accessible without login Plain text file wordDocumentHandler_test_code.php Example Example of use of wordDocumentHandler class

 Version Control Unique User Downloads Download Rankings  
 0%
Total:6,990
This week:1
All time:280
This week:560Down
 User Ratings  
 
 All time
Utility:60%StarStarStarStar
Consistency:61%StarStarStarStar
Documentation:-
Examples:70%StarStarStarStar
Tests:-
Videos:-
Overall:43%StarStarStar
Rank:3531