PHP Classes

File: class.word2000html.php

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in Bookmark in
  Classes of Ryan Flynn  >  Word2000Html  >  class.word2000html.php  >  Download  
File: class.word2000html.php
Role: ???
Content type: text/plain
Description: class, docs & example
Class: Word2000Html
Author: By
Last change:
Date: 14 years ago
Size: 2,995 bytes


Class file image Download

    Version 0.5
    Ryan Flynn (ryan@ryanflynn || DALnet->#php->pizza_milkshake)
    Thursday, June 28 2001

    This class was invented to save ordinary humans from having to deal
    with converting Word HTML to actual HTML, a job I once had and it
    nearly drove me insane... This class allows you to extract the
    3 most important chunks from a Word html doc: title, style and body
    sections, which you can then manipulate in whatever fashion you see fit.
		PHP 4.0.3 on Apache 1.3.14/Windows 98
		PHP 4.0.3 on MS IIS 4.?/Windows 2000
	So far:
		MSIE:	5-6 = flawless
		NN:		4.7 = good, 3.04 = ok
		Opera:	5.0 = good

//ok, here's how to use this class:
$bob=new Word2000Html("news.htm"); //path to Word 2000 HTML doc; this creates the object
echo $bob->Title;                   //self-explanatory
echo $bob->Style;
echo $bob->Body;

You can throw this code into HTML tags and convert Word docs on-the-fly. Have fun.

$content_path=substr($content_path, 0, (strrpos($content_path, "\\")+1));

class Word2000Html{
	var $Title;
	var $Style;
	var $Body;
	function Word2000Html($file){
		global $PHP_SELF, $content_path;
				echo "
If you see this message, please contact
\"$file\" does not exist
PHP_SELF.......... $PHP_SELF
file.............. $file
content_path...... $content_path
		$s=implode('', file($file));
		$style only gets the first, useless chunk of Word2000
		<style> code; it needs to get all
		//removes <v: blah blah/> tags
			if(!preg_match($title, $s, $tmp)) echo 'no title, ';
		$tmp[0]=preg_replace('/\<(\/|)title\>/i', '', $tmp[0]);
			if(!preg_match_all($style, $s, $tmp)) echo 'no style, ';
		foreach($tmp as $a)
			foreach($a as $b)
			if(!preg_match($body, $s, $tmp)) echo 'no body';
		$tmp[0]=preg_replace($if_crap, '', $tmp[0]);
		$tmp[0]=str_replace('./', '', $tmp[0]);
		$tmp[0]=preg_replace('/\<(\/|)body(^>)*?\>/i', '', $tmp[0]);
		$tmp[0]=preg_replace($v_crap, '', $tmp[0]);
		$tmp[0]=preg_replace($o_crap, '', $tmp[0]);
		//remove <td width but doesn't affect the damn Word2000 HTML
		//$tmp[0]=preg_replace('/<td width=[\d]{0,5}\s/i', '<td ', $tmp[0]);