PHP Classes
elePHPant
Icontem

PHP HTML to Text Conversion: Parse HTML and extract text contained in it

Recommend this page to a friend!
  Info   View files Documentation   View files View files (67)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2017-11-01 (17 days ago) RSS 2.0 feedNot enough user ratingsTotal: 272 This week: 4All time: 7,366 This week: 161Up
Version License PHP version Categories
html2text 1.0.5GNU General Publi...5HTML, PHP 5, Text processing
Collaborate with this project Author

html2text - github.com

Description

This class can parse HTML and extract text contained in it.

It can take a given HTML string and parse it to extract the text in the HTML document.

The class can change the case of the text inside certain HTML elements, as well prepend or append a given text.

Innovation Award
PHP Programming Innovation award nominee
December 2016
Number 9
Most PHP applications are used to generate HTML but some times we need to also generate text versions of given HTML, like for instance to send by email that includes the HTML and the text version as alternative.

This package provides a solution that lets you automatically create the text version of a given text that you can use on email messages or for other purposes.

Manuel Lemos
  Performance   Level  
Name: Lars Moelleken <contact>
Classes: 17 packages by
Country: Germany Germany
Age: 29
All time rank: 135786 in Germany Germany
Week rank: 7 Up1 in Germany Germany Up
Innovation award
Innovation award
Nominee: 8x

Details

Build Status codecov.io Coverage Status Scrutinizer Code Quality Codacy Badge SensioLabsInsight Dependency Status Reference Status Latest Stable Version Total Downloads Latest Unstable Version PHP 7 ready License

Html2Text

WARNING: this is only a Maintained-Fork of "https://github.com/mtibben/html2text/"

A PHP library for converting HTML to formatted plain text.

Installation

The recommended installation way is through Composer.

$ composer require voku/html2text

Basic Usage

$html = new \voku\Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"

Extended Usage

Each element (h1, li, div, etc) can have the following options:

  • 'case' => convert case (`Html2Text::OPTION_NONE, Html2Text::OPTION_UPPERCASE, Html2Text::OPTION_LOWERCASE , Html2Text::OPTION_UCFIRST, Html2Text::OPTION_TITLE`)
  • 'prepend' => prepend a string
  • 'append' => append a string

For example:

$html = '<h1>Should have "AAA" changed to BBB</h1><ul><li>• Custom bullet should be removed</li></ul><img alt="The Linux Tux" src="tux.png" />';
$expected = 'SHOULD HAVE "BBB" CHANGED TO BBB' . "\n\n" . '- Custom bullet should be removed |' . "\n\n" . '[IMAGE]: "The Linux Tux"';

$html2text = new Html2Text(
    $html,
    array(
        'width'    => 0,
        'elements' => array(
            'h1' => array(
              'case' => Html2Text::OPTION_UPPERCASE, 
              'replace' => array('AAA', 'BBB')),
            'li' => array(
              'case' => Html2Text::OPTION_NONE, 
              'replace' => array('•', ''), 
              'prepend' => "- ",
              'append' => " |",
            ),
        ),
    )
);

$html2text->setPrefixForImages('[IMAGE]: ');
$html2text->setPrefixForLinks('[LINKS]: ');
$html2text->getText(); // === $expected

Live Demo

  • HTML | TEXT
  • https://moelleken.org/url_to_text.php?url=https://ADD_YOUR_URL_HERE

History

This library started life on the blog of Jon Abernathy http://www.chuggnutt.com/html2text

A number of projects picked up the library and started using it - among those was RoundCube mail. They made a number of updates to it over time to suit their webmail client.

Now it has been extracted as a standalone library. Hopefully it can be of use to others.

  Files folder image Files  
File Role Description
Files folder imagesrc (1 file)
Files folder imagetests (26 files, 1 directory)
Accessible without login Plain text file .editorconfig Data Auxiliary data
Accessible without login Plain text file .scrutinizer.yml Data Auxiliary data
Accessible without login Plain text file .styleci.yml Data Auxiliary data
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.md Lic. License text
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
  Plain text file Html2Text.php Class Class source

  Files folder image Files  /  tests  
File Role Description
Files folder imagefixtures (32 files)
  Plain text file BasicTest.php Class Class source
  Plain text file BlankSpacesTest.php Class Class source
  Plain text file BlockquoteTest.php Class Class source
  Accessible without login Plain text file bootstrap.php Aux. Auxiliary script
  Plain text file ConstructorTest.php Class Class source
  Plain text file DefinitionListTest.php Class Class source
  Plain text file ElementsTest.php Class Class source
  Plain text file HeadingsTest.php Class Class source
  Plain text file HtmlCharsTest.php Class Class source
  Plain text file ImageTest.php Class Class source
  Plain text file LinkTest.php Class Class source
  Plain text file ListItemsTest.php Class Class source
  Plain text file ListTest.php Class Class source
  Plain text file MailTest.php Class Class source
  Plain text file NewlineSpaceTest.php Class Class source
  Plain text file NewlineTabBreakTest.php Class Class source
  Plain text file ParagraphBreakTest.php Class Class source
  Plain text file PreTest.php Class Class source
  Plain text file PrintTest.php Class Class source
  Plain text file SearchReplaceTest.php Class Class source
  Plain text file SpaceTest.php Class Class source
  Plain text file SpanTest.php Class Class source
  Plain text file StrToUpperTest.php Class Class source
  Plain text file TableTest.php Class Class source
  Plain text file UnderscoresTest.php Class Class source
  Plain text file UppercaseTest.php Class Class source

  Files folder image Files  /  tests  /  fixtures  
File Role Description
  Accessible without login HTML file code.html Doc. Documentation
  Accessible without login Plain text file code.txt Doc. Documentation
  Accessible without login HTML file dl_dt_dd.html Doc. Documentation
  Accessible without login Plain text file dl_dt_dd.txt Doc. Documentation
  Accessible without login HTML file msoffice.html Doc. Documentation
  Accessible without login Plain text file msoffice.txt Doc. Documentation
  Accessible without login HTML file nbsp.html Doc. Documentation
  Accessible without login Plain text file nbsp.txt Doc. Documentation
  Accessible without login HTML file non-breaking-spaces.html Doc. Documentation
  Accessible without login Plain text file non-breaking-spaces.txt Doc. Documentation
  Accessible without login HTML file table.html Doc. Documentation
  Accessible without login Plain text file table.txt Doc. Documentation
  Accessible without login HTML file test10Html.html Doc. Documentation
  Accessible without login Plain text file test10Html.txt Doc. Documentation
  Accessible without login HTML file test1Html.html Doc. Documentation
  Accessible without login Plain text file test1Html.txt Doc. Documentation
  Accessible without login HTML file test2Html.html Doc. Documentation
  Accessible without login Plain text file test2Html.txt Doc. Documentation
  Accessible without login HTML file test3Html.html Doc. Documentation
  Accessible without login Plain text file test3Html.txt Doc. Documentation
  Accessible without login HTML file test4Html.html Doc. Documentation
  Accessible without login Plain text file test4Html.txt Doc. Documentation
  Accessible without login HTML file test5Html.html Doc. Documentation
  Accessible without login Plain text file test5Html.txt Doc. Documentation
  Accessible without login HTML file test6Html.html Doc. Documentation
  Accessible without login Plain text file test6Html.txt Doc. Documentation
  Accessible without login HTML file test7Html.html Doc. Documentation
  Accessible without login Plain text file test7Html.txt Doc. Documentation
  Accessible without login HTML file test8Html.html Doc. Documentation
  Accessible without login Plain text file test8Html.txt Doc. Documentation
  Accessible without login HTML file test9Html.html Doc. Documentation
  Accessible without login Plain text file test9Html.txt Doc. Documentation

 Version Control Unique User Downloads Download Rankings  
 100%
Total:272
This week:4
All time:7,366
This week:161Up