PHP Classes

Sweeper: Clean HTML to remove unwanted tags and attributes

Recommend this page to a friend!
  Info   View files Example   View files View files (120)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 93 This week: 1All time: 9,877 This week: 560Up
Version License PHP version Categories
sweeper 2.6Freeware5HTML, PHP 5, Parsers
Description 

Author

This package can clean HTML to remove unwanted tags and attributes.

It is based on Mihai Sucan's ReTidy package and it uses regular expressions, DOM and XPath to find and remove the unwanted HTML code.

That package can can also reformat HTML tables to improve accessibility, and automatically generates a table of contents restructure contents.

Picture of Jill Lingoff
Name: Jill Lingoff <contact>
Classes: 3 packages by
Country: France France
Age: ???
All time rank: 3950101 in France France
Week rank: 416 Up17 in France France Up
Innovation award
Innovation award
Nominee: 1x

Recommendations

Extract PDF to text and XML
I need to parse a PDF file and convert whole text into XML

Example

<meta charset="utf-8" />
Run sweeper.php Script
<form method="POST" action="sweeper.php" style="margin-top:0;">
<br>
Profile: <br>
<select style="WIDTH: 350px;" name="profile">

<?php

$directory
= "profiles";
$handle = opendir($directory);

$profiles_array = array();
$file = "string_not_null";
while(
$file != "") {
   
$file = readdir($handle);
    if(
$file != "." && $file != ".." && $file != "" && !is_dir($directory . '/' . $file)) {
       
//print("<!--$file-->\r\n");
       
$profiles_array[] = substr($file, 0, strpos($file, "."));
    }
}
closedir($handle);
sort($profiles_array, SORT_NATURAL | SORT_FLAG_CASE); // for linux
foreach($profiles_array as $profile) {
    print(
"<option value=\"" . $profile . "\">" . $profile . "</option>\r\n");
}

?>

</select><br><br>
<div id="EngDepDiv">
Path: <input type="text" name="acronym_path" size="70"> (in the abbr folder)<br>
</div>
<br>
<div style="float: left;">
English Template: <br>
<select style="WIDTH: 350px;" name="EngTemplate">
<option value=""></option>
<option value="none">none</option>
<?php

$directory
= "Templates";

print_template_options($directory);

closedir($handle);

function
print_template_options($source) {
    if(
is_dir($source)) {
       
$d = dir($source);
        while(
FALSE !== ($entry = $d->read())) {
            if(
$entry == '.' || $entry == '..') {
                continue;
            }
           
$Entry = $source . '/' . $entry;
            if(
is_dir($Entry)) {
                if(
$entry != 'Templates') {
                   
print_template_options($Entry);
                }
                continue;
            }
            if(
strpos($Entry, ".html") || strpos($Entry, ".htm") || strpos($Entry, ".asp") || strpos($Entry, ".xml")) {
                print(
"<option value=\"" . $Entry . "\">" . $Entry . "</option>\r\n");
            }
        }
       
$d->close();
    }
    else {
        print(
"<option value=\"" . $Entry . "\">" . $Entry . "</option>\r\n");
    }
}

?>
</select>
</div>

<div style="float: left;margin-left: 10px;">
French Template: <br>
<select style="WIDTH: 350px;" name="FraTemplate">
<option value=""></option>
<option value="none">none</option>
<?php

$directory
= "Templates";

print_template_options($directory);

closedir($handle);

?>
</select>
</div><br><br><br>

Source: <br><input type="text" name="source" value="not-swept" size="70"><br>
Target: <br><input type="text" name="target" value="swept" size="70"><br>

<br>
<input type="submit">
</form>


Details

sweeper

Sweeper is an HTML code cleaner based on Mihai ?ucan's ReTidy. It is written in PHP and mostly uses regular expressions, DOM and XPath.

It does some handy stuff like table accessibility, abbreviations, automatic table of contents to content structuring

See documentation.html for fuller information.


  Files folder image Files  
File Role Description
Files folder imageabbr (2 files, 2 directories)
Files folder imagebasic (5 files)
Files folder imageDTD (10 files)
Files folder imagefeed_generator (5 files)
Files folder imagemappings (4 files)
Files folder imageprofiles (36 files)
Files folder imagesrc (1 file)
Files folder imageTemplates (4 files, 1 directory)
Accessible without login Plain text file character_generator.php Aux. Auxiliary script
Accessible without login Plain text file charsets.php Aux. Auxiliary script
Accessible without login Plain text file clean_dreamweaver_files.php Example Example script
Accessible without login HTML file documentation.html Doc. Documentation
Plain text file DTD.php Class Class source
Plain text file even_qs.php Class Class source
Accessible without login Plain text file filter_url_list.php Aux. Auxiliary script
Accessible without login Plain text file find_empty_ths.php Example Example script
Accessible without login Plain text file find_paragraphs_to_list.php Example Example script
Accessible without login Plain text file flip_acronyms.php Aux. Auxiliary script
Accessible without login Plain text file getLanguage.php Aux. Auxiliary script
Accessible without login Plain text file get_all_folder_names.php Example Example script
Accessible without login Plain text file get_recently_modified.php Example Example script
Accessible without login HTML file index.html Doc. Documentation
Plain text file OM.php Class Class source
Accessible without login Plain text file page_id_counter.txt Doc. Documentation
Accessible without login Plain text file paste_sweep.php Example Example script
Accessible without login Plain text file purge_old_abbr_and_acronyms.php Example Example script
Accessible without login Plain text file readme.md Doc. Documentation
Accessible without login Plain text file recursive_list.php Example Example script
Accessible without login Plain text file redistribute_acronyms_files.php Aux. Auxiliary script
Plain text file retidy.php Class Class source
Accessible without login Plain text file run_sweeper.php Example Example script
Accessible without login Plain text file sweeper.php Example Example script
Accessible without login Plain text file upperclass_spans.php Aux. Auxiliary script
Accessible without login Plain text file WAMP_to_LAMP.php Aux. Auxiliary script
Plain text file wordtonumber.class.php Class Class source

  Files folder image Files  /  abbr  
File Role Description
Files folder imageeng (8 files)
Files folder imagefra (6 files)
  Accessible without login Plain text file abbr_ignore.txt Doc. Documentation
  Accessible without login Plain text file getOrgAcro.php Aux. Auxiliary script

  Files folder image Files  /  abbr  /  eng  
File Role Description
  Accessible without login Plain text file abbr-parts-ignore.txt Doc. Documentation
  Accessible without login Plain text file abbr_ignore.txt Doc. Documentation
  Accessible without login Plain text file new-sample-document-abbr.txt Doc. Documentation
  Accessible without login Plain text file sample-document-abbr.txt Doc. Documentation
  Accessible without login Plain text file stop_words.txt Doc. Documentation
  Accessible without login Plain text file stop_words_small-2011-08-12.txt Doc. Documentation
  Accessible without login Plain text file stop_words_small.txt Doc. Documentation
  Accessible without login Plain text file words.txt Doc. Documentation

  Files folder image Files  /  abbr  /  fra  
File Role Description
  Accessible without login Plain text file abbr-parts-ignore.txt Doc. Documentation
  Accessible without login Plain text file abbr_ignore.txt Doc. Documentation
  Accessible without login Plain text file mots.txt Doc. Documentation
  Accessible without login Plain text file new-sample-document-abbr.txt Doc. Documentation
  Accessible without login Plain text file sample-document-abbr.txt Doc. Documentation
  Accessible without login Plain text file stop_words.txt Doc. Documentation

  Files folder image Files  /  basic  
File Role Description
  Accessible without login Plain text file language.php Aux. Auxiliary script
  Accessible without login Plain text file physunits.php Aux. Auxiliary script
  Accessible without login Plain text file typical-mod.php Aux. Auxiliary script
  Accessible without login Plain text file typical-rxp-mod.php Aux. Auxiliary script
  Accessible without login Plain text file wingding.php Aux. Auxiliary script

  Files folder image Files  /  DTD  
File Role Description
  Accessible without login Plain text file html5.dtd Data Auxiliary data
  Accessible without login Plain text file loose.dtd Data Auxiliary data
  Accessible without login Plain text file many_entities.dtd Data Auxiliary data
  Accessible without login Plain text file xhtml-lat1.ent Data Auxiliary data
  Accessible without login Plain text file xhtml-special-for-acronyms.ent Data Auxiliary data
  Accessible without login Plain text file xhtml-special.ent Data Auxiliary data
  Accessible without login Plain text file xhtml-symbol.ent Data Auxiliary data
  Accessible without login Plain text file xhtml1-frameset.dtd Data Auxiliary data
  Accessible without login Plain text file xhtml1-strict.dtd Data Auxiliary data
  Accessible without login Plain text file xhtml1-transitional.dtd Data Auxiliary data

  Files folder image Files  /  feed_generator  
File Role Description
  Accessible without login Plain text file example_minimum.php Example Example script
  Accessible without login Plain text file example_rss1.php Example Example script
  Accessible without login Plain text file example_rss2.php Example Example script
  Plain text file FeedItem.php Class Class source
  Plain text file FeedWriter.php Class Class source

  Files folder image Files  /  mappings  
File Role Description
  Accessible without login Plain text file CLF2.php Aux. Auxiliary script
  Accessible without login Plain text file dekern.php Aux. Auxiliary script
  Accessible without login Plain text file latin1.php Aux. Auxiliary script
  Accessible without login Plain text file layouttable.php Aux. Auxiliary script

  Files folder image Files  /  profiles  
File Role Description
  Accessible without login Plain text file abbr.php Aux. Auxiliary script
  Accessible without login Plain text file add_BOM.php Aux. Auxiliary script
  Accessible without login Plain text file arbitrary_sweep.php Aux. Auxiliary script
  Accessible without login Plain text file basic.php Aux. Auxiliary script
  Accessible without login Plain text file classes_to_styles.php Aux. Auxiliary script
  Accessible without login Plain text file clean_CSS.php Aux. Auxiliary script
  Accessible without login Plain text file clean_excel.php Aux. Auxiliary script
  Accessible without login Plain text file clean_feeds.php Aux. Auxiliary script
  Accessible without login Plain text file clean_indesign.php Aux. Auxiliary script
  Accessible without login Plain text file clean_openoffice.php Aux. Auxiliary script
  Accessible without login Plain text file clean_PDF.php Aux. Auxiliary script
  Accessible without login Plain text file clean_word.php Aux. Auxiliary script
  Accessible without login Plain text file convert_to_iso_8859_1.php Aux. Auxiliary script
  Accessible without login Plain text file convert_to_utf8.php Aux. Auxiliary script
  Accessible without login Plain text file decode_character_entities.php Aux. Auxiliary script
  Accessible without login Plain text file definition_listify.php Aux. Auxiliary script
  Accessible without login Plain text file dekern.php Aux. Auxiliary script
  Accessible without login Plain text file dom_table_accessibility.php Aux. Auxiliary script
  Accessible without login Plain text file encode_character_entities.php Aux. Auxiliary script
  Accessible without login Plain text file find_abbr.php Aux. Auxiliary script
  Accessible without login Plain text file flip_lists.php Aux. Auxiliary script
  Accessible without login Plain text file flip_tables.php Aux. Auxiliary script
  Accessible without login Plain text file html_to_word.php Aux. Auxiliary script
  Accessible without login Plain text file link_titles.php Aux. Auxiliary script
  Accessible without login Plain text file ol_start.php Aux. Auxiliary script
  Accessible without login Plain text file quality_assurance.php Aux. Auxiliary script
  Accessible without login Plain text file quotation.php Aux. Auxiliary script
  Accessible without login Plain text file remove_embedded_stylesheets.php Aux. Auxiliary script
  Accessible without login Plain text file shiftFootnotesDown.php Aux. Auxiliary script
  Accessible without login Plain text file shiftFootnotesUp.php Aux. Auxiliary script
  Accessible without login Plain text file shiftHeadingsDown.php Aux. Auxiliary script
  Accessible without login Plain text file shiftHeadingsUp.php Aux. Auxiliary script
  Accessible without login Plain text file structure.php Aux. Auxiliary script
  Accessible without login Plain text file styles_to_classes_full.php Aux. Auxiliary script
  Accessible without login Plain text file templateCode.php Aux. Auxiliary script
  Accessible without login Plain text file tidy.php Aux. Auxiliary script

  Files folder image Files  /  src  
File Role Description
  Plain text file retidy.php Class Class source

  Files folder image Files  /  Templates  
File Role Description
Files folder imagecss (12 files)
  Accessible without login HTML file test-templater.html Doc. Documentation
  Accessible without login HTML file XHTML-blank-css-error.html Doc. Documentation
  Accessible without login HTML file XHTML-blank-css.html Doc. Documentation
  Accessible without login HTML file XHTML-blank.html Doc. Documentation

  Files folder image Files  /  Templates  /  css  
File Role Description
  Accessible without login Plain text file 2col.css Data Auxiliary data
  Accessible without login Plain text file base-institution.css Data Auxiliary data
  Accessible without login Plain text file base.css Data Auxiliary data
  Accessible without login Plain text file base2.css Data Auxiliary data
  Accessible without login Plain text file cust-inac.css Data Auxiliary data
  Accessible without login Plain text file frm.css Data Auxiliary data
  Accessible without login Plain text file institution.css Data Auxiliary data
  Accessible without login Plain text file jstarget-eng.js Data Auxiliary data
  Accessible without login Plain text file jstarget-fra.js Data Auxiliary data
  Accessible without login Plain text file nwt-inac-20081111.css Data Auxiliary data
  Accessible without login Plain text file pf-if.css Data Auxiliary data
  Accessible without login Plain text file rgn-rht_mnu.css Data Auxiliary data

 Version Control Unique User Downloads Download Rankings  
 100%
Total:93
This week:1
All time:9,877
This week:560Up