PHP Classes
elePHPant
Icontem

File: tests/fixtures/test10Html.txt

Recommend this page to a friend!
  Classes of Lars Moelleken  >  PHP HTML to Text Conversion  >  tests/fixtures/test10Html.txt  >  Download  
File: tests/fixtures/test10Html.txt
Role: Documentation
Content type: text/plain
Description: Documentation
Class: PHP HTML to Text Conversion
Parse HTML and extract text contained in it
Author: By
Last change:
Date: 1 year ago
Size: 35,892 bytes
 

Contents

Class file image Download
[](/#portable-utf-8--api) PORTABLE UTF-8 | API

The API from the "UTF8"-Class is written as small static methods that will match the default PHP-API e.g.

[](/#methods) METHODS

[](/#accessstring-str-int-pos) access(string $str, int $pos)

Return the character at the specified position: $str[1] like functionality.

```php
UTF8::access('fòô', 1); // 'ò'
```

[](/#add_bom_to_stringstring-str) add_bom_to_string(string $str)

Prepends UTF-8 BOM character to the string and returns the whole string.

If BOM already existed there, the Input string is returned.

```php
UTF8::add_bom_to_string('fòô'); // "\xEF\xBB\xBF" . 'fòô'
```

[](/#binary_to_strmixed-bin) binary_to_str(mixed $bin)

Convert binary into an string.

INFO: opposite to UTF8::str_to_binary()

```php
UTF8::binary_to_str('11110000100111111001100010000011'); // '😃'
```

[](/#bom) bom()

Returns the UTF-8 Byte Order Mark Character.

```php
UTF8::bom(); // "\xEF\xBB\xBF"
```

[](/#chrint-code_point--string) chr(int $code_point) : string

Generates a UTF-8 encoded character from the given code point.

INFO: opposite to UTF8::ord()

```php
UTF8::chr(666); // 'ʚ'
```

[](/#chr_mapstringarray-callback-string-str--array) chr_map(string|array $callback, string $str) : array

Applies callback to all characters of a string.

```php
UTF8::chr_map(['voku\helper\UTF8', 'strtolower'], 'Κόσμε'); // ['κ','ό', 'σ', 'μ', 'ε']
```

[](/#chr_size_liststring-str--array) chr_size_list(string $str) : array

Generates a UTF-8 encoded character from the given code point.

1 byte => U+0000 - U+007F 2 byte => U+0080 - U+07FF 3 byte => U+0800 - U+FFFF 4 byte => U+10000 - U+10FFFF

```php
UTF8::chr_size_list('中文空白-test'); // [3, 3, 3, 3, 1, 1, 1, 1, 1]
```

[](/#chr_to_decimalstring-chr--int) chr_to_decimal(string $chr) : int

Get a decimal code representation of a specific character.

```php
UTF8::chr_to_decimal('§'); // 0xa7
```

[](/#chr_to_hexstring-chr-string-pfix--u) chr_to_hex(string $chr, string $pfix = 'U+')

Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.

```php
UTF8::chr_to_hex('§'); // 0xa7
```

[](/#chunk_splitstring-body-int-chunklen--76-string-end--rn--string) chunk_split(string $body, int $chunklen = 76, string $end = "\r\n") : string

Splits a string into smaller chunks and multiple lines, using the specified line ending character.

```php
UTF8::chunk_split('ABC-ÖÄÜ-中文空白-κόσμε', 3); // "ABC\r\n-ÖÄ\r\nÜ-中\r\n文空白\r\n-κό\r\nσμε"
```

[](/#cleanstring-str-bool-remove_bom--false-bool-normalize_whitespace--false-bool-normalize_msword--false-bool-keep_non_breaking_space--false--string) clean(string $str, bool $remove_bom = false, bool $normalize_whitespace = false, bool $normalize_msword = false, bool $keep_non_breaking_space = false) : string

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.

```php
UTF8::clean("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef  …” — 😃 - Düsseldorf'
```

[](/#cleanupstring-str--string) cleanup(string $str) : string

Clean-up a and show only printable UTF-8 chars at the end + fix UTF-8 encoding.

```php
UTF8::cleanup("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef  …” — 😃 - Düsseldorf'
```

[](/#codepointsmixed-arg-bool-u_style--false--array) codepoints(mixed $arg, bool $u_style = false) : array

Accepts a string and returns an array of Unicode code points.

INFO: opposite to UTF8::string()

```php
UTF8::codepoints('κöñ'); // array(954, 246, 241)
// ... OR ...
UTF8::codepoints('κöñ', true); // array('U+03ba', 'U+00f6', 'U+00f1')
```

[](/#count_charsstring-str-bool-cleanutf8--false--array) count_chars(string $str, bool $cleanUtf8 = false) : array

Returns count of characters used in a string.

```php
UTF8::count_chars('κaκbκc'); // array('κ' => 3, 'a' => 1, 'b' => 1, 'c' => 1)
```

[](/#encodestring-encoding-string-str-bool-force--true--string) encode(string $encoding, string $str, bool $force = true) : string

Encode a string with a new charset-encoding.

INFO: The different to "UTF8::utf8_encode()" is that this function, try to fix also broken / double encoding, so you can call this function also on a UTF-8 String and you don't mess the string.

```php
UTF8::encode('ISO-8859-1', '-ABC-中文空白-'); // '-ABC-????-'
//
UTF8::encode('UTF-8', '-ABC-中文空白-'); // '-ABC-中文空白-'
```

[](/#file_get_contentsstring-filename-intnull-flags--null-resourcenull-context--null-intnull-offset--null-intnull-maxlen--null-int-timeout--10-bool-converttoutf8--true--string) file_get_contents(string $filename, int|null $flags = null, resource|null $context = null, int|null $offset = null, int|null $maxlen = null, int $timeout = 10, bool $convertToUtf8 = true) : string

Reads entire file into a string.

WARNING: do not use UTF-8 Option ($convertToUtf8) for binary-files (e.g.: images) !!!

```php
UTF8::file_get_contents('utf16le.txt'); // ...
```

[](/#file_has_bomstring-file_path--bool) file_has_bom(string $file_path) : bool

Checks if a file starts with BOM (Byte Order Mark) character.

```php
UTF8::file_has_bom('utf8_with_bom.txt'); // true
```

[](/#filtermixed-var-int-normalization_form--4-string-leading_combining----mixed) filter(mixed $var, int $normalization_form = 4, string $leading_combining = '◌') : mixed

Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

```php
UTF8::filter(array("\xE9", 'à', 'a')); // array('é', 'à', 'a')
```

[](/#filter_inputint-type-string-var-int-filter--filter_default-nullarray-option--null--string) filter_input(int $type, string $var, int $filter = FILTER_DEFAULT, null|array $option = null) : string

"filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

```php
// _GET['foo'] = 'bar';
UTF8::filter_input(INPUT_GET, 'foo', FILTER_SANITIZE_STRING)); // 'bar'
```

[](/#filter_input_arrayint-type-mixed-definition--null-bool-add_empty--true--mixed) filter_input_array(int $type, mixed $definition = null, bool $add_empty = true) : mixed

"filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

```php
// _GET['foo'] = 'bar';
UTF8::filter_input_array(INPUT_GET, array('foo' => 'FILTER_SANITIZE_STRING')); // array('bar')
```

[](/#filter_varstring-var-int-filter--filter_default-array-option--null--string) filter_var(string $var, int $filter = FILTER_DEFAULT, array $option = null) : string

"filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

```php
UTF8::filter_var('-ABC-中文空白-', FILTER_VALIDATE_URL); // false
```

[](/#filter_var_arrayarray-data-mixed-definition--null-bool-add_empty--true--mixed) filter_var_array(array $data, mixed $definition = null, bool $add_empty = true) : mixed

"filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

```php
$filters = [ 
  'name'  => ['filter'  => FILTER_CALLBACK, 'options' => ['voku\helper\UTF8', 'ucwords']],
  'age'   => ['filter'  => FILTER_VALIDATE_INT, 'options' => ['min_range' => 1, 'max_range' => 120]],
  'email' => FILTER_VALIDATE_EMAIL,
];

$data = [
  'name' => 'κόσμε', 
  'age' => '18', 
  'email' => 'foo@bar.de'
];

UTF8::filter_var_array($data, $filters, true); // ['name' => 'Κόσμε', 'age' => 18, 'email' => 'foo@bar.de']
```

[](/#fits_insidestring-str-int-box_size--bool) fits_inside(string $str, int $box_size) : bool

Check if the number of unicode characters are not more than the specified integer.

```php
UTF8::fits_inside('κόσμε', 6); // false
```

[](/#fix_simple_utf8string-str--string) fix_simple_utf8(string $str) : string

Try to fix simple broken UTF-8 strings.

INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.

```php
UTF8::fix_simple_utf8('Düsseldorf'); // 'Düsseldorf'
```

[](/#fix_utf8stringstring-str--mixed) fix_utf8(string|string[] $str) : mixed

Fix a double (or multiple) encoded UTF8 string.

```php
UTF8::fix_utf8('Fédération'); // 'Fédération'
```

[](/#getchardirectionstring-char--string-rtl-or-ltr) getCharDirection(string $char) : string ('RTL' or 'LTR')

Get character of a specific character.

```php
UTF8::getCharDirection('ا'); // 'RTL'
```

[](/#getchardirectionstring-char--string-rtl-or-ltr-1) getCharDirection(string $char) : string ('RTL' or 'LTR')

Get character of a specific character.

```php
UTF8::getCharDirection('ا'); // 'RTL'
```

[](/#hex_to_intstring-str--intfalse) hex_to_int(string $str) : int|false

Converts hexadecimal U+xxxx code point representation to integer.

INFO: opposite to UTF8::int_to_hex()

```php
UTF8::hex_to_int('U+00f1'); // 241
```

[](/#html_encodestring-str-bool-keepasciichars--false-string-encoding--utf-8--string) html_encode(string $str, bool $keepAsciiChars = false, string $encoding = 'UTF-8') : string

Converts a UTF-8 string to a series of HTML numbered entities.

INFO: opposite to UTF8::html_decode()

```php
UTF8::html_encode('中文空白'); // '中文空白'
```

[](/#html_entity_decodestring-str-int-flags--null-string-encoding--utf-8--string) html_entity_decode(string $str, int $flags = null, string $encoding = 'UTF-8') : string

UTF-8 version of html_entity_decode()

The reason we are not using html_entity_decode() by itself is because while it is not technically correct to leave out the semicolon at the end of an entity most browsers will still interpret the entity correctly. html_entity_decode() does not convert entities without semicolons, so we are left with our own little solution here. Bummer.

Convert all HTML entities to their applicable characters

INFO: opposite to UTF8::html_encode()

```php
UTF8::html_encode('中文空白'); // '中文空白' 
```

[](/#htmlentitiesstring-str-int-flags--ent_compat-string-encoding--utf-8-bool-double_encode--true--string) htmlentities(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string

Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()

```php
UTF8::htmlentities('<白-öäü>'); // '<白-öäü>'
```

[](/#htmlspecialcharsstring-str-int-flags--ent_compat-string-encoding--utf-8-bool-double_encode--true--string) htmlspecialchars(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string

Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()

INFO: Take a look at "UTF8::htmlentities()"

```php
UTF8::htmlspecialchars('<白-öäü>'); // '<白-öäü>'
```

[](/#int_to_hexint-int-string-pfix--u--str) int_to_hex(int $int, string $pfix = 'U+') : str

Converts Integer to hexadecimal U+xxxx code point representation.

INFO: opposite to UTF8::hex_to_int()

```php
UTF8::int_to_hex(241); // 'U+00f1'
```

[](/#is_asciistring-str--bool) is_ascii(string $str) : bool

Checks if a string is 7 bit ASCII.

alias: UTF8::isAscii()

```php
UTF8::is_ascii('白'); // false
```

[](/#is_base64string-str--bool) is_base64(string $str) : bool

Returns true if the string is base64 encoded, false otherwise.

alias: UTF8::isBase64()

```php
UTF8::is_base64('4KSu4KWL4KSo4KS/4KSa'); // true
```

[](/#is_binarymixed-input--bool) is_binary(mixed $input) : bool

Check if the input is binary... (is look like a hack).

alias: UTF8::isBinary()

```php
UTF8::is_binary(01); // true
```

[](/#is_binary_filestring-file--bool) is_binary_file(string $file) : bool

Check if the file is binary.

```php
UTF8::is_binary('./utf32.txt'); // true
```

[](/#is_bomstring-str--bool) is_bom(string $str) : bool

Checks if the given string is equal to any "Byte Order Mark".

WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.

alias: UTF8::isBom()

```php
UTF8::is_bom("\xef\xbb\xbf"); // true
```

[](/#is_jsonstring-str--bool) is_json(string $str) : bool

Try to check if "$str" is an json-string.

alias: UTF8::isJson()

```php
UTF8::is_json('{"array":[1,"¥","ä"]}'); // true
```

[](/#is_htmlstring-str--bool) is_html(string $str) : bool

Check if the string contains any html-tags .

alias: UTF8::isHtml()

```php
UTF8::is_html('LALL'); // true
```

[](/#is_utf16string-str--intfalse) is_utf16(string $str) : int|false

Check if the string is UTF-16: This function will return false if is't not UTF-16, 1 for UTF-16LE, 2 for UTF-16BE.

alias: UTF8::isUtf16()

```php
UTF8::is_utf16(file_get_contents('utf-16-le.txt')); // 1
UTF8::is_utf16(file_get_contents('utf-16-be.txt')); // 2
UTF8::is_utf16(file_get_contents('utf-8.txt')); // false
```

[](/#is_utf32string-str--intfalse) is_utf32(string $str) : int|false

Check if the string is UTF-32: This function will return false if is't not UTF-32, 1 for UTF-32LE, 2 for UTF-32BE.

alias: UTF8::isUtf16()

```php
UTF8::is_utf32(file_get_contents('utf-32-le.txt')); // 1
UTF8::is_utf32(file_get_contents('utf-32-be.txt')); // 2
UTF8::is_utf32(file_get_contents('utf-8.txt')); // false
```

[](/#is_utf8string-str-bool-strict--false--bool) is_utf8(string $str, bool $strict = false) : bool

Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters.

alias: UTF8::isUtf8()

```php
UTF8::is_utf8('Iñtërnâtiônàlizætiøn'); // true
UTF8::is_utf8("Iñtërnâtiônàlizætiøn\xA0\xA1"); // false
```

[](/#json_decodestring-json-bool-assoc--false-int-depth--512-int-options--0--mixed) json_decode(string $json, bool $assoc = false, int $depth = 512, int $options = 0) : mixed

Decodes a JSON string.

```php
UTF8::json_decode('[1,"\u00a5","\u00e4"]'); // array(1, '¥', 'ä')
```

[](/#json_encodemixed-value-int-options--0-int-depth--512--string) json_encode(mixed $value, int $options = 0, int $depth = 512) : string

Returns the JSON representation of a value.

```php
UTF8::json_enocde(array(1, '¥', 'ä')); // '[1,"\u00a5","\u00e4"]'
```

[](/#lcfirststring-str--string) lcfirst(string $str) : string

Makes string's first char lowercase.

```php
UTF8::lcfirst('ÑTËRNÂTIÔNÀLIZÆTIØN'); // ñTËRNÂTIÔNÀLIZÆTIØN 
```

[](/#maxmixed-arg--string) max(mixed $arg) : string

Returns the UTF-8 character with the maximum code point in the given data.

```php
UTF8::max('abc-äöü-中文空白'); // 'ø'
```

[](/#max_chr_widthstring-str--int) max_chr_width(string $str) : int

Calculates and returns the maximum number of bytes taken by any UTF-8 encoded character in the given string.

```php
UTF8::max_chr_width('Intërnâtiônàlizætiøn'); // 2
```

[](/#minmixed-arg--string) min(mixed $arg) : string

Returns the UTF-8 character with the minimum code point in the given data.

```php
UTF8::min('abc-äöü-中文空白'); // '-'
```

[](/#normalize_encodingstring-encoding--string) normalize_encoding(string $encoding) : string

Normalize the encoding-"name" input.

```php
UTF8::normalize_encoding('UTF8'); // 'UTF-8'
```

[](/#normalize_mswordstring-str--string) normalize_msword(string $str) : string

Normalize some MS Word special characters.

```php
UTF8::normalize_msword('„Abcdef…”'); // '"Abcdef..."'
```

[](/#normalize_whitespacestring-str-bool-keepnonbreakingspace--false-bool-keepbidiunicodecontrols--false--string) normalize_whitespace(string $str, bool $keepNonBreakingSpace = false, bool $keepBidiUnicodeControls = false) : string

Normalize the whitespace.

```php
UTF8::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"
```

[](/#ordstring-chr--int) ord(string $chr) : int

Calculates Unicode code point of the given UTF-8 encoded character.

INFO: opposite to UTF8::chr()

```php
UTF8::ord('中'); // 20013
```

[](/#parse_strstring-str-result--bool) parse_str(string $str, &$result) : bool

Parses the string into an array (into the the second parameter).

WARNING: Instead of "parse_str()" this method do not (re-)placing variables in the current scope, if the second parameter is not set!

```php
UTF8::parse_str('Iñtërnâtiônéàlizætiøn=測試&arr[]=foo+測試&arr[]=ການທົດສອບ', $array);
echo $array['Iñtërnâtiônéàlizætiøn']; // '測試'
```

[](/#rangemixed-var1-mixed-var2--array) range(mixed $var1, mixed $var2) : array

Create an array containing a range of UTF-8 characters.

```php
UTF8::range('κ', 'ζ'); // array('κ', 'ι', 'θ', 'η', 'ζ',)
```

[](/#remove_bomstring-str--string) remove_bom(string $str) : string

Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.

```php
UTF8::remove_bom("\xEF\xBB\xBFΜπορώ να"); // 'Μπορώ να'
```

[](/#remove_duplicatesstring-str-stringarray-what-----string) remove_duplicates(string $str, string|array $what = ' ') : string

Removes duplicate occurrences of a string in another string.

```php
UTF8::remove_duplicates('öäü-κόσμεκόσμε-äöü', 'κόσμε'); // 'öäü-κόσμε-äöü'
```

[](/#remove_invisible_charactersstring-str-bool-url_encoded--true-string-replacement----string) remove_invisible_characters(string $str, bool $url_encoded = true, string $replacement = '') : string

Remove invisible characters from a string.

```php
UTF8::remove_duplicates("κόσ\0με"); // 'κόσμε'
```

[](/#replace_diamond_question_markstring-str-string-unknown----string) replace_diamond_question_mark(string $str, string $unknown = '?') : string

Replace the diamond question mark () with the replacement.

```php
UTF8::replace_diamond_question_mark('中文空白'); // '中文空白'
```

[](/#trimstring-str---string-chars--inf--string) trim(string $str = '', string $chars = INF) : string

Strip whitespace or other characters from beginning or end of a UTF-8 string.

```php
UTF8::rtrim('   -ABC-中文空白-  '); // '-ABC-中文空白-'
```

[](/#rtrimstring-str---string-chars--inf--string) rtrim(string $str = '', string $chars = INF) : string

Strip whitespace or other characters from end of a UTF-8 string.

```php
UTF8::rtrim('-ABC-中文空白-  '); // '-ABC-中文空白-'
```

[](/#ltrimstring-str-string-chars--inf--string) ltrim(string $str, string $chars = INF) : string

Strip whitespace or other characters from beginning of a UTF-8 string.

```php
UTF8::ltrim(' 中文空白  '); // '中文空白  '
```

[](/#single_chr_html_encodestring-char-bool-keepasciichars--false--string) single_chr_html_encode(string $char, bool $keepAsciiChars = false) : string

Converts a UTF-8 character to HTML Numbered Entity like "{".

```php
UTF8::single_chr_html_encode('κ'); // 'κ'
```

[](/#splitstring-str-int-length--1-bool-cleanutf8--false--array) split(string $str, int $length = 1, bool $cleanUtf8 = false) : array

Convert a string to an array of Unicode characters.

```php
UTF8::split('中文空白'); // array('中', '文', '空', '白')
```

[](/#str_detect_encodingstring-str--string) str_detect_encoding(string $str) : string

Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.

```php
UTF8::str_detect_encoding('中文空白'); // 'UTF-8'
UTF8::str_detect_encoding('Abc'); // 'ASCII'
```

[](/#str_ireplacemixed-search-mixed-replace-mixed-subject-int-count--null--mixed) str_ireplace(mixed $search, mixed $replace, mixed $subject, int &$count = null) : mixed

Case-insensitive and UTF-8 safe version of str_replace.

```php
UTF8::str_ireplace('lIzÆ', 'lise', array('Iñtërnâtiônàlizætiøn')); // array('Iñtërnâtiônàlisetiøn')
```

[](/#str_limit_after_wordstring-str-int-length--100-stirng-straddon----string) str_limit_after_word(string $str, int $length = 100, stirng $strAddOn = '...') : string

Limit the number of characters in a string, but also after the next word.

```php
UTF8::str_limit_after_word('fòô bàř fòô', 8, ''); // 'fòô bàř'
```

[](/#str_padstring-str-int-pad_length-string-pad_string----int-pad_type--str_pad_right--string) str_pad(string $str, int $pad_length, string $pad_string = ' ', int $pad_type = STR_PAD_RIGHT) : string

Pad a UTF-8 string to given length with another string.

```php
UTF8::str_pad('中文空白', 10, '_', STR_PAD_BOTH); // '___中文空白___'
```

[](/#str_padstring-str-int-pad_length-string-pad_string----int-pad_type--str_pad_right--string-1) str_pad(string $str, int $pad_length, string $pad_string = ' ', int $pad_type = STR_PAD_RIGHT) : string

Pad a UTF-8 string to given length with another string.

```php
UTF8::str_pad('中文空白', 10, '_', STR_PAD_BOTH); // '___中文空白___'
```

[](/#str_repeatstring-str-int-multiplier--string) str_repeat(string $str, int $multiplier) : string

Repeat a string.

```php
UTF8::str_repeat("°~\xf0\x90\x28\xbc", 2); // '°~ð(¼°~ð(¼'
```

[](/#str_shufflestring-str--string) str_shuffle(string $str) : string

Shuffles all the characters in the string.

```php
UTF8::str_shuffle('fòô bàř fòô'); // 'àòôřb ffòô '
```

[](/#str_sortstring-str-bool-unique--false-bool-desc--false--string) str_sort(string $str, bool $unique = false, bool $desc = false) : string

Sort all characters according to code points.

```php
UTF8::str_sort('  -ABC-中文空白-  '); // '    ---ABC中文白空'
```

[](/#str_splitstring-str-int-len--1--array) str_split(string $str, int $len = 1) : array

Split a string into an array.

```php
UTF8::split('déjà', 2); // array('dé', 'jà')
```

[](/#str_to_binarystring-str--string) str_to_binary(string $str) : string

Get a binary representation of a specific string.

INFO: opposite to UTF8::binary_to_str()

```php
UTF8::str_to_binary('😃'); // '11110000100111111001100010000011'
```

[](/#str_word_countstring-str-int-format--0-string-charlist----string) str_word_count(string $str, int $format = 0, string $charlist = '') : string

Get a binary representation of a specific string.

```php
// format: 0 -> return only word count (int)
//
UTF8::str_word_count('中文空白 öäü abc#c'); // 4
UTF8::str_word_count('中文空白 öäü abc#c', 0, '#'); // 3

// format: 1 -> return words (array) 
//
UTF8::str_word_count('中文空白 öäü abc#c', 1); // array('中文空白', 'öäü', 'abc', 'c')
UTF8::str_word_count('中文空白 öäü abc#c', 1, '#'); // array('中文空白', 'öäü', 'abc#c')

// format: 2 -> return words with offset (array) 
//
UTF8::str_word_count('中文空白 öäü ab#c', 2); // array(0 => '中文空白', 5 => 'öäü', 9 => 'abc', 13 => 'c')
UTF8::str_word_count('中文空白 öäü ab#c', 2, '#'); // array(0 => '中文空白', 5 => 'öäü', 9 => 'abc#c')
```

[](/#strcmpstring-str1-string-str2--int) strcmp(string $str1, string $str2) : int

Case-insensitive string comparison: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

```php
UTF8::strcmp("iñtërnâtiôn\nàlizætiøn", "iñtërnâtiôn\nàlizætiøn"); // 0
```

[](/#strnatcmpstring-str1-string-str2--int) strnatcmp(string $str1, string $str2) : int

Case sensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

INFO: natural order version of UTF8::strcmp()

```php
UTF8::strnatcmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // -1
UTF8::strcmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // 1

UTF8::strnatcmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // 1
UTF8::strcmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!')); // -1
```

[](/#strcasecmpstring-str1-string-str2--int) strcasecmp(string $str1, string $str2) : int

Case-insensitive string comparison: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

INFO: Case-insensitive version of UTF8::strcmp()

```php
UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn", "Iñtërnâtiôn\nàlizætiøn"); // 0
```

[](/#strnatcasecmpstring-str1-string-str2--int) strnatcasecmp(string $str1, string $str2) : int

Case insensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

INFO: natural order version of UTF8::strcasecmp()

```php
UTF8::strnatcasecmp('2', '10Hello WORLD 中文空白!'); // -1
UTF8::strcasecmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // 1

UTF8::strnatcasecmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // 1
UTF8::strcasecmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // -1
```

[](/#strncasecmpstring-str1-string-str2-int-len--int) strncasecmp(string $str1, string $str2, int $len) : int

Case-insensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

INFO: Case-insensitive version of UTF8::strncmp()

```php
UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "iñtërnâtiôn\nàlizætiøn123", 5); // 0
```

[](/#strncasecmpstring-str1-string-str2-int-len--int-1) strncasecmp(string $str1, string $str2, int $len) : int

Case-insensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

INFO: Case-insensitive version of UTF8::strncmp()

```php
UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0
```

[](/#strncmpstring-str1-string-str2-int-len--int) strncmp(string $str1, string $str2, int $len) : int

Case-sensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal.

```php
UTF8::strncmp("Iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0
```

[](/#stringstring-str1-string-str2--int) string(string $str1, string $str2) : int

Create a UTF-8 string from code points.

INFO: opposite to UTF8::codepoints()

```php
UTF8::string(array(246, 228, 252)); // 'öäü'
```

[](/#string_has_bomstring-str--bool) string_has_bom(string $str) : bool

Checks if string starts with "BOM" (Byte Order Mark Character) character.

alias: UTF8::hasBom()

```php
UTF8::string_has_bom("\xef\xbb\xbf foobar"); // true
```

[](/#strip_tagsstring-str-stingnull-allowable_tags--null--string) strip_tags(string $str, sting|null $allowable_tags = null) : string

Strip HTML and PHP tags from a string + clean invalid UTF-8.

```php
UTF8::strip_tags("κόσμε\xa0\xa1"); // 'κόσμε'
```

[](/#strlenstring-str-string-encoding--utf-8-bool-cleanutf8--false--int) strlen(string $str, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : int

Get the string length, not the byte-length!

```php
UTF8::strlen("Iñtërnâtiôn\xE9àlizætiøn")); // 20
```

[](/#strwidthstring-str-string-encoding--utf-8-bool-cleanutf8--false--int) strwidth(string $str, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : int

Return the width of a string.

```php
UTF8::strwidth("Iñtërnâtiôn\xE9àlizætiøn")); // 21
```

[](/#strpbrkstring-haystack-string-char_list--string) strpbrk(string $haystack, string $char_list) : string

Search a string for any of a set of characters.

```php
UTF8::strpbrk('-中文空白-', '白'); // '白-'
```

[](/#strposstring-haystack-string-char_list--intfalse) strpos(string $haystack, string $char_list) : int|false

Find position of first occurrence of string in a string.

```php
UTF8::strpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 8
```

[](/#striposstr-needle-before_needle--false--intfalse) stripos($str, $needle, $before_needle = false) : int|false

Finds position of first occurrence of a string within another, case insensitive.

```php
UTF8::strpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 8
```

[](/#strrposstring-haystack-string-needle-int-offset--0-bool-cleanutf8--false--stringfalse) strrpos(string $haystack, string $needle, int $offset = 0, bool $cleanUtf8 = false) : string|false

Find position of last occurrence of a string in a string.

```php
UTF8::strrpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 13
```

[](/#strriposstring-haystack-string-needle-int-offset--0-bool-cleanutf8--false--stringfalse) strripos(string $haystack, string $needle, int $offset = 0, bool $cleanUtf8 = false) : string|false

Find position of last occurrence of a case-insensitive string.

```php
UTF8::strripos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 13
```

[](/#strrchrstring-haystack-string-needle-bool-part--false-string-encoding--stringfalse) strrchr(string $haystack, string $needle, bool $part = false, string $encoding) : string|false

Finds the last occurrence of a character in a string within another.

```php
UTF8::strrchr('κόσμεκόσμε-äöü', 'κόσμε'); // 'κόσμε-äöü'
```

[](/#strrichrstring-haystack-string-needle-bool-part--false-string-encoding--stringfalse) strrichr(string $haystack, string $needle, bool $part = false, string $encoding) : string|false

Finds the last occurrence of a character in a string within another, case insensitive.

```php
UTF8::strrichr('Aκόσμεκόσμε-äöü', 'aκόσμε'); // 'Aκόσμεκόσμε-äöü'
```

[](/#strrevstring-str--string) strrev(string $str) : string

Reverses characters order in the string.

```php
UTF8::strrev('κ-öäü'); // 'üäö-κ'
```

[](/#strspnstring-str-string-mask-int-offset--0-int-length--2147483647--string) strspn(string $str, string $mask, int $offset = 0, int $length = 2147483647) : string

Finds the length of the initial segment of a string consisting entirely of characters contained within a given mask.

```php
UTF8::strspn('iñtërnâtiônàlizætiøn', 'itñ'); // '3'
```

[](/#strstrstring-str-string-needle-bool-before_needle--false--string) strstr(string $str, string $needle, bool $before_needle = false) : string

Returns part of haystack string from the first occurrence of needle to the end of haystack.

```php
$str = 'iñtërnâtiônàlizætiøn';
$search = 'nât';

UTF8::strstr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::strstr($str, $search, true)); // 'iñtër'
```

[](/#stristrstring-str-string-needle-bool-before_needle--false--string) stristr(string $str, string $needle, bool $before_needle = false) : string

Returns all of haystack starting from and including the first occurrence of needle to the end.

```php
$str = 'iñtërnâtiônàlizætiøn';
$search = 'NÂT';

UTF8::stristr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::stristr($str, $search, true)); // 'iñtër'
```

[](/#strtocasefoldstring-str-bool-full--true--string) strtocasefold(string $str, bool $full = true) : string

Unicode transformation for case-less matching.

```php
UTF8::strtocasefold('ǰ◌̱'); // 'ǰ◌̱'
```

[](/#strtolowerstring-str-string-encoding--utf-8--string) strtolower(string $str, string $encoding = 'UTF-8') : string

Make a string lowercase.

```php
UTF8::strtolower('DÉJÀ Σσς Iıİi'); // 'déjà σσς iıii'
```

[](/#strtoupperstring-str-string-encoding--utf-8--string) strtoupper(string $str, string $encoding = 'UTF-8') : string

Make a string uppercase.

```php
UTF8::strtoupper('Déjà Σσς Iıİi'); // 'DÉJÀ ΣΣΣ IIİI'
```

[](/#strtrstring-str-stringarray-from-stringarray-to--inf--string) strtr(string $str, string|array $from, string|array $to = INF) : string

Translate characters or replace sub-strings.

```php
$arr = array(
    'Hello'   => '○●◎',
    '中文空白' => 'earth',
);
UTF8::strtr('Hello 中文空白', $arr); // '○●◎ earth'
```

[](/#substrstring-str-int-start--0-int-length--null-string-encoding--utf-8-bool-cleanutf8--false--string) substr(string $str, int $start = 0, int $length = null, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : string

Get part of a string.

```php
UTF8::substr('中文空白', 1, 2); // '文空'
```

[](/#substr_comparestring-main_str-string-str-int-offset-int-length--2147483647-bool-case_insensitivity--false--int) substr_compare(string $main_str, string $str, int $offset, int $length = 2147483647, bool $case_insensitivity = false) : int

Binary safe comparison of two strings from an offset, up to length characters.

```php
UTF8::substr_compare("○●◎\r", '●◎', 0, 2); // -1
UTF8::substr_compare("○●◎\r", '◎●', 1, 2); // 1
UTF8::substr_compare("○●◎\r", '●◎', 1, 2); // 0
```

[](/#substr_countstring-haystack-string-needle-int-offset--0-int-length--null-string-encoding--utf-8--int) substr_count(string $haystack, string $needle, int $offset = 0, int $length = null, string $encoding = 'UTF-8') : int

Count the number of substring occurrences.

```php
UTF8::substr_count('中文空白', '文空', 1, 2); // 1
```

[](/#substr_replacestringstring-str-stringstring-replacement-intint-start-intint-length--null--stringarray) substr_replace(string|string[] $str, string|string[] $replacement, int|int[] $start, int|int[] $length = null) : string|array

Replace text within a portion of a string.

```php
UTF8::substr_replace(array('Iñtërnâtiônàlizætiøn', 'foo'), 'æ', 1); // array('Iæñtërnâtiônàlizætiøn', 'fæoo')
```

[](/#swapcasestring-str-string-string-encoding--utf-8--string) swapCase(string $str, string string $encoding = 'UTF-8') : string

Returns a case swapped version of the string.

```php
UTF8::swapCase('déJÀ σσς iıII'); // 'DÉjà ΣΣΣ IIii'
```

[](/#swapcasestring-str-string-string-encoding--utf-8--string-1) swapCase(string $str, string string $encoding = 'UTF-8') : string

Returns a case swapped version of the string.

```php
UTF8::swapCase('déJÀ σσς iıII'); // 'DÉjà ΣΣΣ IIii'
```

[](/#to_asciistring-str-string-unknown----string) to_ascii(string $str, string $unknown = '?') : string

Convert a string into ASCII.

alias: UTF8::toAscii()

```php
UTF8::to_ascii('déjà σσς iıii'); // 'deja sss iiii'
```

[](/#to_utf8stringstring-str--stringstring) to_utf8(string|string[] $str) : string|string[]

This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.

alias: UTF8::toUtf8()

```php
UTF8::to_utf8("\u0063\u0061\u0074"); // 'cat'
```

[](/#to_iso8859stringstring-str--stringstring) to_iso8859(string|string[] $str) : string|string[]

Convert a string into "ISO-8859"-encoding (Latin-1).

alias: UTF8::toIso8859() alias: UTF8::to_latin1() alias: UTF8::toLatin1()

```php
UTF8::to_utf8(UTF8::to_latin1('  -ABC-中文空白-  ')); // '  -ABC-????-  ' 
```

[](/#ucfirststring-str--string) ucfirst(string $str) : string

Makes string's first char uppercase.

alias: UTF8::ucword()

```php
UTF8::ucfirst('ñtërnâtiônàlizætiøn'); // 'Ñtërnâtiônàlizætiøn'
```

[](/#ucwordsstring-str--string) ucwords(string $str) : string

Uppercase for all words in the string.

```php
UTF8::ucwords('iñt ërn âTi ônà liz æti øn'); // 'Iñt Ërn ÂTi Ônà Liz Æti Øn'
```

[](/#urldecodestring-str--string) urldecode(string $str) : string

Multi decode html entity & fix urlencoded-win1252-chars.

```php
UTF8::urldecode('tes%20öäü%20\u00edtest'); // 'tes öäü ítest'
```

[](/#utf8_decodestring-str--string) utf8_decode(string $str) : string

Decodes an UTF-8 string to ISO-8859-1.

```php
UTF8::encode('UTF-8', UTF8::utf8_decode('-ABC-中文空白-')); // '-ABC-????-'
```

[](/#utf8_encodestring-str--string) utf8_encode(string $str) : string

Encodes an ISO-8859-1 string to UTF-8.

```php
UTF8::utf8_decode(UTF8::utf8_encode('-ABC-中文空白-')); // '-ABC-中文空白-'
```

[](/#words_limitstring-str-int-words--100-string-straddon----string) words_limit(string $str, int $words = 100, string $strAddOn = '...') : string

Limit the number of words in a string.

```php
UTF8::words_limit('fòô bàř fòô', 2, ''); // 'fòô bàř'
```

[](/#wordwrapstring-str-int-width--75-string-break--n-bool-cut--false--string) wordwrap(string $str, int $width = 75, string $break = "\n", bool $cut = false) : string

Wraps a string to a given number of characters

```php
UTF8::wordwrap('Iñtërnâtiônàlizætiøn', 10, "\n", true)); // 'Iñ
të
rn
ât
iô
nà
li
zæ
ti
øn'
```