Recommend this page to a friend! |
Download .zip |
Info | Example | View files (120) | Download .zip | Reputation | Support forum | Blog | Links |
Ratings | Unique User Downloads | Download Rankings | ||||
Not yet rated by the users | Total: 93 This week: 1 | All time: 9,877 This week: 560 |
Version | License | PHP version | Categories | |||
sweeper 2.6 | Freeware | 5 | HTML, PHP 5, Parsers |
Description | Author | |||||||||||||||||||||||
This package can clean HTML to remove unwanted tags and attributes. |
|
Extract PDF to text and XML
I need to parse a PDF file and convert whole text into XML
<meta charset="utf-8" /> |
Sweeper is an HTML code cleaner based on Mihai ?ucan's ReTidy. It is written in PHP and mostly uses regular expressions, DOM and XPath.
It does some handy stuff like table accessibility, abbreviations, automatic table of contents to content structuring
See documentation.html for fuller information.
Files |
File | Role | Description | ||
---|---|---|---|---|
abbr (2 files, 2 directories) | ||||
basic (5 files) | ||||
DTD (10 files) | ||||
feed_generator (5 files) | ||||
mappings (4 files) | ||||
profiles (36 files) | ||||
src (1 file) | ||||
Templates (4 files, 1 directory) | ||||
character_generator.php | Aux. | Auxiliary script | ||
charsets.php | Aux. | Auxiliary script | ||
clean_dreamweaver_files.php | Example | Example script | ||
documentation.html | Doc. | Documentation | ||
DTD.php | Class | Class source | ||
even_qs.php | Class | Class source | ||
filter_url_list.php | Aux. | Auxiliary script | ||
find_empty_ths.php | Example | Example script | ||
find_paragraphs_to_list.php | Example | Example script | ||
flip_acronyms.php | Aux. | Auxiliary script | ||
getLanguage.php | Aux. | Auxiliary script | ||
get_all_folder_names.php | Example | Example script | ||
get_recently_modified.php | Example | Example script | ||
index.html | Doc. | Documentation | ||
OM.php | Class | Class source | ||
page_id_counter.txt | Doc. | Documentation | ||
paste_sweep.php | Example | Example script | ||
purge_old_abbr_and_acronyms.php | Example | Example script | ||
readme.md | Doc. | Documentation | ||
recursive_list.php | Example | Example script | ||
redistribute_acronyms_files.php | Aux. | Auxiliary script | ||
retidy.php | Class | Class source | ||
run_sweeper.php | Example | Example script | ||
sweeper.php | Example | Example script | ||
upperclass_spans.php | Aux. | Auxiliary script | ||
WAMP_to_LAMP.php | Aux. | Auxiliary script | ||
wordtonumber.class.php | Class | Class source |
Files | / | abbr |
File | Role | Description | ||
---|---|---|---|---|
eng (8 files) | ||||
fra (6 files) | ||||
abbr_ignore.txt | Doc. | Documentation | ||
getOrgAcro.php | Aux. | Auxiliary script |
Files | / | abbr | / | eng |
File | Role | Description |
---|---|---|
abbr-parts-ignore.txt | Doc. | Documentation |
abbr_ignore.txt | Doc. | Documentation |
new-sample-document-abbr.txt | Doc. | Documentation |
sample-document-abbr.txt | Doc. | Documentation |
stop_words.txt | Doc. | Documentation |
stop_words_small-2011-08-12.txt | Doc. | Documentation |
stop_words_small.txt | Doc. | Documentation |
words.txt | Doc. | Documentation |
Files | / | abbr | / | fra |
File | Role | Description |
---|---|---|
abbr-parts-ignore.txt | Doc. | Documentation |
abbr_ignore.txt | Doc. | Documentation |
mots.txt | Doc. | Documentation |
new-sample-document-abbr.txt | Doc. | Documentation |
sample-document-abbr.txt | Doc. | Documentation |
stop_words.txt | Doc. | Documentation |
Files | / | basic |
File | Role | Description |
---|---|---|
language.php | Aux. | Auxiliary script |
physunits.php | Aux. | Auxiliary script |
typical-mod.php | Aux. | Auxiliary script |
typical-rxp-mod.php | Aux. | Auxiliary script |
wingding.php | Aux. | Auxiliary script |
Files | / | DTD |
File | Role | Description |
---|---|---|
html5.dtd | Data | Auxiliary data |
loose.dtd | Data | Auxiliary data |
many_entities.dtd | Data | Auxiliary data |
xhtml-lat1.ent | Data | Auxiliary data |
xhtml-special-for-acronyms.ent | Data | Auxiliary data |
xhtml-special.ent | Data | Auxiliary data |
xhtml-symbol.ent | Data | Auxiliary data |
xhtml1-frameset.dtd | Data | Auxiliary data |
xhtml1-strict.dtd | Data | Auxiliary data |
xhtml1-transitional.dtd | Data | Auxiliary data |
Files | / | feed_generator |
File | Role | Description |
---|---|---|
example_minimum.php | Example | Example script |
example_rss1.php | Example | Example script |
example_rss2.php | Example | Example script |
FeedItem.php | Class | Class source |
FeedWriter.php | Class | Class source |
Files | / | mappings |
File | Role | Description |
---|---|---|
CLF2.php | Aux. | Auxiliary script |
dekern.php | Aux. | Auxiliary script |
latin1.php | Aux. | Auxiliary script |
layouttable.php | Aux. | Auxiliary script |
Files | / | profiles |
File | Role | Description |
---|---|---|
abbr.php | Aux. | Auxiliary script |
add_BOM.php | Aux. | Auxiliary script |
arbitrary_sweep.php | Aux. | Auxiliary script |
basic.php | Aux. | Auxiliary script |
classes_to_styles.php | Aux. | Auxiliary script |
clean_CSS.php | Aux. | Auxiliary script |
clean_excel.php | Aux. | Auxiliary script |
clean_feeds.php | Aux. | Auxiliary script |
clean_indesign.php | Aux. | Auxiliary script |
clean_openoffice.php | Aux. | Auxiliary script |
clean_PDF.php | Aux. | Auxiliary script |
clean_word.php | Aux. | Auxiliary script |
convert_to_iso_8859_1.php | Aux. | Auxiliary script |
convert_to_utf8.php | Aux. | Auxiliary script |
decode_character_entities.php | Aux. | Auxiliary script |
definition_listify.php | Aux. | Auxiliary script |
dekern.php | Aux. | Auxiliary script |
dom_table_accessibility.php | Aux. | Auxiliary script |
encode_character_entities.php | Aux. | Auxiliary script |
find_abbr.php | Aux. | Auxiliary script |
flip_lists.php | Aux. | Auxiliary script |
flip_tables.php | Aux. | Auxiliary script |
html_to_word.php | Aux. | Auxiliary script |
link_titles.php | Aux. | Auxiliary script |
ol_start.php | Aux. | Auxiliary script |
quality_assurance.php | Aux. | Auxiliary script |
quotation.php | Aux. | Auxiliary script |
remove_embedded_stylesheets.php | Aux. | Auxiliary script |
shiftFootnotesDown.php | Aux. | Auxiliary script |
shiftFootnotesUp.php | Aux. | Auxiliary script |
shiftHeadingsDown.php | Aux. | Auxiliary script |
shiftHeadingsUp.php | Aux. | Auxiliary script |
structure.php | Aux. | Auxiliary script |
styles_to_classes_full.php | Aux. | Auxiliary script |
templateCode.php | Aux. | Auxiliary script |
tidy.php | Aux. | Auxiliary script |
Files | / | Templates |
File | Role | Description | ||
---|---|---|---|---|
css (12 files) | ||||
test-templater.html | Doc. | Documentation | ||
XHTML-blank-css-error.html | Doc. | Documentation | ||
XHTML-blank-css.html | Doc. | Documentation | ||
XHTML-blank.html | Doc. | Documentation |
Files | / | Templates | / | css |
File | Role | Description |
---|---|---|
2col.css | Data | Auxiliary data |
base-institution.css | Data | Auxiliary data |
base.css | Data | Auxiliary data |
base2.css | Data | Auxiliary data |
cust-inac.css | Data | Auxiliary data |
frm.css | Data | Auxiliary data |
institution.css | Data | Auxiliary data |
jstarget-eng.js | Data | Auxiliary data |
jstarget-fra.js | Data | Auxiliary data |
nwt-inac-20081111.css | Data | Auxiliary data |
pf-if.css | Data | Auxiliary data |
rgn-rht_mnu.css | Data | Auxiliary data |
Version Control | Unique User Downloads | Download Rankings | |||||||||||||||
100% |
|
|
Applications that use this package |
If you know an application of this package, send a message to the author to add a link here.