rellenar para llenar libreria hacer generar exportar ejemplo documentos documento desde con como php phpoffice

para - ¿Cómo extraigo el contenido de texto de un documento de Word con PHP?



php word ejemplo (2)

Intenta crear tu lector antes

$source = "word.doc"; // create your reader object $phpWordReader = /PhpOffice/PhpWord/IOFactory::createReader(''MsDoc''); // read source if($phpWordReader->canRead($source)) { $phpWord = $phpWordReader->load($source); ... // rest of your code }

La respuesta se basa en este example y en la documentación de la API.

Quiero extraer el contenido de texto de la palabra documento con PHP.

He creado un nuevo documento de Word en Microsoft Word para Mac 2011. Editar: también he probado creando el mismo documento en Microsoft Word en Windows 7.

El contenido del documento es

The quick brown fox jumps over the lazy dog

Lo he guardado en el disco como un documento de Word 97-2004 (.doc).

Estoy usando phpoffice/phpword y este código para extraer el texto:

<?php $source = "word.doc"; $phpWord = /PhpOffice/PhpWord/IOFactory::load($source, ''MsDoc''); $text = ''''; $sections = $phpWord->getSections(); foreach ($sections as $s) { $els = $s->getElements(); foreach ($els as $e) { if (get_class($e) === ''PhpOffice/PhpWord/Element/Text'') { $text .= $e->getText(); } elseif (get_class($e) === ''PhpOffice/PhpWord/Section/TextBreak'') { $text .= " /n"; } else { throw new Exception(''Unknown class type '' . get_class($e)); } } } print $text;

La salida de este código es solo parte del texto:

The quick brown fox j

¿Hay algún problema con el código o es algún tipo de problema de compatibilidad?

Editar:

Si agrego un var_dump($els); antes de foreach ($els as $e) { la salida es esta:

array(1) { [0]=> object(PhpOffice/PhpWord/Element/Text)#1265 (14) { ["text":protected]=> string(21) "The quick brown fox j" ["fontStyle":protected]=> object(PhpOffice/PhpWord/Style/Font)#1267 (25) { ["aliases":protected]=> array(1) { ["line-height"]=> string(10) "lineHeight" } ["type":"PhpOffice/PhpWord/Style/Font":private]=> string(4) "text" ["name":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["hint":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["size":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["color":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["bold":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["italic":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["underline":"PhpOffice/PhpWord/Style/Font":private]=> string(4) "none" ["superScript":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["subScript":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["strikethrough":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["doubleStrikethrough":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["smallCaps":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["allCaps":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["fgColor":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["scale":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["spacing":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["kerning":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["paragraph":"PhpOffice/PhpWord/Style/Font":private]=> object(PhpOffice/PhpWord/Style/Paragraph)#1266 (26) { ["aliases":protected]=> array(1) { ["line-height"]=> string(10) "lineHeight" } ["basedOn":"PhpOffice/PhpWord/Style/Paragraph":private]=> string(6) "Normal" ["next":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["alignment":"PhpOffice/PhpWord/Style/Paragraph":private]=> string(0) "" ["indentation":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["spacing":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["lineHeight":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["widowControl":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(true) ["keepNext":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["keepLines":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["pageBreakBefore":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["numStyle":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["numLevel":"PhpOffice/PhpWord/Style/Paragraph":private]=> int(0) ["tabs":"PhpOffice/PhpWord/Style/Paragraph":private]=> array(0) { } ["shading":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["borderTopSize":protected]=> NULL ["borderTopColor":protected]=> NULL ["borderLeftSize":protected]=> NULL ["borderLeftColor":protected]=> NULL ["borderRightSize":protected]=> NULL ["borderRightColor":protected]=> NULL ["borderBottomSize":protected]=> NULL ["borderBottomColor":protected]=> NULL ["styleName":protected]=> NULL ["index":protected]=> NULL ["isAuto":"PhpOffice/PhpWord/Style/AbstractStyle":private]=> bool(false) } ["shading":"PhpOffice/PhpWord/Style/Font":private]=> NULL ["rtl":"PhpOffice/PhpWord/Style/Font":private]=> bool(false) ["styleName":protected]=> NULL ["index":protected]=> NULL ["isAuto":"PhpOffice/PhpWord/Style/AbstractStyle":private]=> bool(false) } ["paragraphStyle":protected]=> object(PhpOffice/PhpWord/Style/Paragraph)#1266 (26) { ["aliases":protected]=> array(1) { ["line-height"]=> string(10) "lineHeight" } ["basedOn":"PhpOffice/PhpWord/Style/Paragraph":private]=> string(6) "Normal" ["next":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["alignment":"PhpOffice/PhpWord/Style/Paragraph":private]=> string(0) "" ["indentation":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["spacing":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["lineHeight":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["widowControl":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(true) ["keepNext":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["keepLines":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["pageBreakBefore":"PhpOffice/PhpWord/Style/Paragraph":private]=> bool(false) ["numStyle":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["numLevel":"PhpOffice/PhpWord/Style/Paragraph":private]=> int(0) ["tabs":"PhpOffice/PhpWord/Style/Paragraph":private]=> array(0) { } ["shading":"PhpOffice/PhpWord/Style/Paragraph":private]=> NULL ["borderTopSize":protected]=> NULL ["borderTopColor":protected]=> NULL ["borderLeftSize":protected]=> NULL ["borderLeftColor":protected]=> NULL ["borderRightSize":protected]=> NULL ["borderRightColor":protected]=> NULL ["borderBottomSize":protected]=> NULL ["borderBottomColor":protected]=> NULL ["styleName":protected]=> NULL ["index":protected]=> NULL ["isAuto":"PhpOffice/PhpWord/Style/AbstractStyle":private]=> bool(false) } ["phpWord":protected]=> object(PhpOffice/PhpWord/PhpWord)#1247 (3) { ["sections":"PhpOffice/PhpWord/PhpWord":private]=> array(1) { [0]=> object(PhpOffice/PhpWord/Element/Section)#1261 (16) { ["container":protected]=> string(7) "Section" ["style":"PhpOffice/PhpWord/Element/Section":private]=> object(PhpOffice/PhpWord/Style/Section)#1262 (28) { ["orientation":"PhpOffice/PhpWord/Style/Section":private]=> string(8) "portrait" ["paper":"PhpOffice/PhpWord/Style/Section":private]=> object(PhpOffice/PhpWord/Style/Paper)#1263 (8) { ["sizes":"PhpOffice/PhpWord/Style/Paper":private]=> array(6) { ["A3"]=> array(3) { [0]=> int(297) [1]=> int(420) [2]=> string(2) "mm" } ["A4"]=> array(3) { [0]=> int(210) [1]=> int(297) [2]=> string(2) "mm" } ["A5"]=> array(3) { [0]=> int(148) [1]=> int(210) [2]=> string(2) "mm" } ["Folio"]=> array(3) { [0]=> float(8.5) [1]=> int(13) [2]=> string(2) "in" } ["Legal"]=> array(3) { [0]=> float(8.5) [1]=> int(14) [2]=> string(2) "in" } ["Letter"]=> array(3) { [0]=> float(8.5) [1]=> int(11) [2]=> string(2) "in" } } ["size":"PhpOffice/PhpWord/Style/Paper":private]=> string(2) "A4" ["width":"PhpOffice/PhpWord/Style/Paper":private]=> int(11870) ["height":"PhpOffice/PhpWord/Style/Paper":private]=> int(16787) ["styleName":protected]=> NULL ["index":protected]=> NULL ["aliases":protected]=> array(0) { } ["isAuto":"PhpOffice/PhpWord/Style/AbstractStyle":private]=> bool(false) } ["pageSizeW":"PhpOffice/PhpWord/Style/Section":private]=> int(11906) ["pageSizeH":"PhpOffice/PhpWord/Style/Section":private]=> int(16838) ["marginTop":"PhpOffice/PhpWord/Style/Section":private]=> int(1417) ["marginLeft":"PhpOffice/PhpWord/Style/Section":private]=> int(1417) ["marginRight":"PhpOffice/PhpWord/Style/Section":private]=> int(1417) ["marginBottom":"PhpOffice/PhpWord/Style/Section":private]=> int(1417) ["gutter":"PhpOffice/PhpWord/Style/Section":private]=> int(0) ["headerHeight":"PhpOffice/PhpWord/Style/Section":private]=> int(720) ["footerHeight":"PhpOffice/PhpWord/Style/Section":private]=> int(720) ["pageNumberingStart":"PhpOffice/PhpWord/Style/Section":private]=> NULL ["colsNum":"PhpOffice/PhpWord/Style/Section":private]=> int(1) ["colsSpace":"PhpOffice/PhpWord/Style/Section":private]=> int(720) ["breakType":"PhpOffice/PhpWord/Style/Section":private]=> NULL ["lineNumbering":"PhpOffice/PhpWord/Style/Section":private]=> NULL ["borderTopSize":protected]=> NULL ["borderTopColor":protected]=> NULL ["borderLeftSize":protected]=> NULL ["borderLeftColor":protected]=> NULL ["borderRightSize":protected]=> NULL ["borderRightColor":protected]=> NULL ["borderBottomSize":protected]=> NULL ["borderBottomColor":protected]=> NULL ["styleName":protected]=> NULL ["index":protected]=> NULL ["aliases":protected]=> array(0) { } ["isAuto":"PhpOffice/PhpWord/Style/AbstractStyle":private]=> bool(false) } ["headers":"PhpOffice/PhpWord/Element/Section":private]=> array(0) { } ["footers":"PhpOffice/PhpWord/Element/Section":private]=> array(0) { } ["elements":protected]=> array(1) { [0]=> *RECURSION* } ["phpWord":protected]=> *RECURSION* ["sectionId":protected]=> int(1) ["docPart":protected]=> string(7) "Section" ["docPartId":protected]=> int(1) ["elementIndex":protected]=> int(1) ["elementId":protected]=> NULL ["relationId":protected]=> NULL ["nestedLevel":"PhpOffice/PhpWord/Element/AbstractElement":private]=> int(0) ["parentContainer":"PhpOffice/PhpWord/Element/AbstractElement":private]=> NULL ["mediaRelation":protected]=> bool(false) ["collectionRelation":protected]=> bool(false) } } ["collections":"PhpOffice/PhpWord/PhpWord":private]=> array(5) { ["Bookmarks"]=> object(PhpOffice/PhpWord/Collection/Bookmarks)#1248 (1) { ["items":"PhpOffice/PhpWord/Collection/AbstractCollection":private]=> array(0) { } } ["Titles"]=> object(PhpOffice/PhpWord/Collection/Titles)#1249 (1) { ["items":"PhpOffice/PhpWord/Collection/AbstractCollection":private]=> array(0) { } } ["Footnotes"]=> object(PhpOffice/PhpWord/Collection/Footnotes)#1250 (1) { ["items":"PhpOffice/PhpWord/Collection/AbstractCollection":private]=> array(0) { } } ["Endnotes"]=> object(PhpOffice/PhpWord/Collection/Endnotes)#1251 (1) { ["items":"PhpOffice/PhpWord/Collection/AbstractCollection":private]=> array(0) { } } ["Charts"]=> object(PhpOffice/PhpWord/Collection/Charts)#1252 (1) { ["items":"PhpOffice/PhpWord/Collection/AbstractCollection":private]=> array(0) { } } } ["metadata":"PhpOffice/PhpWord/PhpWord":private]=> array(3) { ["DocInfo"]=> object(PhpOffice/PhpWord/Metadata/DocInfo)#1253 (12) { ["creator":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["lastModifiedBy":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["created":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> int(1483515248) ["modified":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> int(1483515248) ["title":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["description":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["subject":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["keywords":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["category":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["company":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["manager":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> string(0) "" ["customProperties":"PhpOffice/PhpWord/Metadata/DocInfo":private]=> array(0) { } } ["Protection"]=> object(PhpOffice/PhpWord/Metadata/Protection)#1254 (1) { ["editing":"PhpOffice/PhpWord/Metadata/Protection":private]=> NULL } ["Compatibility"]=> object(PhpOffice/PhpWord/Metadata/Compatibility)#1255 (1) { ["ooxmlVersion":"PhpOffice/PhpWord/Metadata/Compatibility":private]=> int(12) } } } ["sectionId":protected]=> NULL ["docPart":protected]=> string(7) "Section" ["docPartId":protected]=> int(1) ["elementIndex":protected]=> int(1) ["elementId":protected]=> string(6) "5d531b" ["relationId":protected]=> NULL ["nestedLevel":"PhpOffice/PhpWord/Element/AbstractElement":private]=> int(0) ["parentContainer":"PhpOffice/PhpWord/Element/AbstractElement":private]=> string(7) "Section" ["mediaRelation":protected]=> bool(false) ["collectionRelation":protected]=> bool(false) } }


Puede extraer txt de un documento de Word usando catdoc http://www.wagner.pp.ru/~vitus/software/catdoc/

Se puede instalar en Ubuntu usando

sudo apt-get install catdoc

Una vez que tenga catdoc trabajando en su sistema, puede llamarlo desde php usando shell_exec ()

<?php $text = shell_exec(''/(fullpath)/catdoc /(fullpath)/word.doc''); print $text; ?>

Asegúrese de sustituir (fullpath) con la ruta real a catdoc y su documento de Word.

EDITAR ---- adición

Si puede guardar sus archivos como .docx en lugar de .doc , es un poco más fácil. Puedes usar descomprimir en lugar de catdoc .

Simplemente reemplace:

$text = shell_exec(''/(fullpath)/catdoc /(fullpath)/word.doc'');

con

$text = shell_exec("/(fullpath)/unzip -p /(fullpath)/word.docx word/document.xml | sed -e ''s/<[^>]/{1,/}>//g; s/[^[:print:]]/{1,/}//g''");

Podría usar esta misma técnica con la mayoría de los otros documentos de línea de comandos para convertir conversores de texto. Simplemente reemplace el comando en el shell_exec () con el comando que funciona en su sistema. Puede consultar ¿Cómo extraer solo texto sin formato de los archivos .doc y .docx? (Unix) para otras alternativas de Unix / Linux

Para otras alternativas de PHP, echa un vistazo a Cómo extraer texto del archivo de texto .doc, docx, .xlsx, .pptx php