tecnicas que programacion pnl neurolinguistica ejercicios cursos nlp nltk stanford-nlp opennlp

nlp - programacion - que es pnl



CompresiĆ³n de oraciones usando PNL (4)

Usando la traducción automática, ¿puedo obtener una versión muy comprimida de una oración, por ejemplo? Realmente me gustaría que una deliciosa taza de café se tradujera a Quiero café. ¿Alguno de los motores de la PNL proporciona tal funcionalidad?

Obtuve algunos trabajos de investigación que hacen parafase y compresión de oraciones . Pero ¿hay alguna biblioteca que ya haya implementado esto?


Esto es lo que encuentro:

Una implementación modificada del modelo descrito en Clarke y Lapata, 2008, "Inferencia global para la compresión de oraciones: un enfoque de programación lineal integral".

Documento: https://www.jair.org/media/2433/live-2433-3731-jair.pdf

Fuente: https://github.com/cnap/sentence-compression (escrito en JAVA)

Entrada: En el campamento, las tropas rebeldes fueron recibidas con una pancarta que decía "Bienvenido a casa".

Salida: En el campamento, las tropas fueron bienvenidas.

Actualización: Secuencia a secuencia con modelo de atención para resumen de texto.

https://github.com/tensorflow/models/tree/master/textsum

https://arxiv.org/abs/1509.00685


Para comenzar, intente usar las bibliotecas watson NaturalLanguageUnderstanding / Alchemy. Usando el cual pude extraer palabras clave importantes de mis declaraciones, ejemplo:

Entrada: ¡Hey! Estoy teniendo problemas con la pantalla de mi laptop

Salida: la pantalla del portátil emite hardware.

no solo reformulando, sino que utilizando NLU puede obtener los siguientes detalles de su declaración de entrada, como en la declaración anterior puede obtener detalles para las siguientes categorías:

Lenguaje como "en", Entidades, Conceptos, Palabras clave como "pantalla del portátil", "problemas" con detalles como relevancia, texto, palabra clave emoción, sentimiento. Categorías con detalles como etiquetas, puntuación de relevancia. Roles semánticos con detalles como oración, su tema, acción y objeto

Junto con esto, puede utilizar el analizador de tonos para obtener el tono prominente de la frase como: miedo, enojo, felicidad, disgusto, etc.

A continuación se muestra el código de ejemplo para las bibliotecas watson. Nota: las librerías de waston no son gratuitas, pero tienen un mes de prueba, por lo que puede comenzar con esto y luego, una vez que adquiera los conceptos, cambie a otras bibliotecas de código abierto y descubra bibliotecas y funciones similares.

NaturalLanguageUnderstanding service = new NaturalLanguageUnderstanding( NaturalLanguageUnderstanding.VERSION_DATE_2017_02_27, WatsonConfiguration.getAlchemyUserName(), WatsonConfiguration.getAlchemyPassword()); //ConceptsOptions ConceptsOptions conceptOptions = new ConceptsOptions.Builder() .limit(10) .build(); //CategoriesOptions CategoriesOptions categoriesOptions = new CategoriesOptions(); //SemanticOptions SemanticRolesOptions semanticRoleOptions = new SemanticRolesOptions.Builder() .entities(true) .keywords(true) .limit(10) .build(); EntitiesOptions entitiesOptions = new EntitiesOptions.Builder() .emotion(true) .sentiment(true) .limit(10) .build(); KeywordsOptions keywordsOptions = new KeywordsOptions.Builder() .emotion(true) .sentiment(true) .limit(10) .build(); Features features = new Features.Builder() .entities(entitiesOptions) .keywords(keywordsOptions) .concepts(conceptOptions) .categories(categoriesOptions) .semanticRoles(semanticRoleOptions) .build(); AnalyzeOptions parameters = new AnalyzeOptions.Builder() .text(inputText) .features(features) .build(); AnalysisResults response = service .analyze(parameters) .execute(); System.out.println(response);


Puede utilizar una combinación de "eliminación de la palabra parada" y "Arrastre y lematización". Stemming and lemmatization es un proceso que devuelve todas las palabras en el texto a su raíz básica, puede encontrar la explicación completa here , estoy usando Porter stemmer, búsquelo en google. Después de la eliminación de lemas y la lematización, la eliminación de las palabras de parada es muy fácil, aquí está mi método de eliminación de la parada:

public static String[] stopwords ={"a", "about", "above", "across", "after", "afterwards", "again", "against", "all", "almost", "alone", "along", "already", "also","although","always","am","among", "amongst", "amoungst", "amount", "an", "and", "another", "any","anyhow","anyone","anything","anyway", "anywhere", "are", "around", "as", "at", "back","be","became", "because","become","becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", "between", "beyond", "bill", "both", "bottom","but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt", "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven","else", "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", "may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must", "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own","part", "per", "perhaps", "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she", "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", "these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever", "where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet", "you", "your", "yours", "yourself", "yourselves","1","2","3","4","5","6","7","8","9","10","1.","2.","3.","4.","5.","6.","11", "7.","8.","9.","12","13","14","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z", "terms","CONDITIONS","conditions","values","interested.","care","sure","!","@","#","$","%","^","&","*","(",")","{","}","[","]",":",";",",","<",">","/","?","_","-","+","=", "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z", "contact","grounds","buyers","tried","said,","plan","value","principle.","forces","sent:","is,","was","like", "discussion","tmus","diffrent.","layout","area.","thanks","thankyou","hello","bye","rise","fell","fall","psqft.","http://","km","miles"};

En mi proyecto usé el párrafo como entrada de texto:

public static String removeStopWords(String paragraph) throws IOException{ Scanner paragraph1=new Scanner( paragraph ); String newtext=""; Map map = new TreeMap(); Integer ONE = new Integer(1); while(paragraph1.hasNext()) { int flag=1; fixString=paragraph1.next(); fixString=fixString.toLowerCase(); for(int i=0;i<stopwords.length;i++) { if(fixString.equals(stopwords[i])) { flag=0; } } if(flag!=0) { newtext=newtext+fixString+" "; } if (fixString.length() > 0) { Integer frequency = (Integer) map.get(fixString); if (frequency == null) { frequency = ONE; } else { int value = frequency.intValue(); frequency = new Integer(value + 1); } map.put(fixString, frequency); } } return newtext; }

He usado la biblioteca de la PNL de Stanford que puede descargar si lo desea desde here . Espero haberte ayudado de alguna manera.


Si su intención es hacer que sus oraciones sean breves sin perder una idea importante de las oraciones, puede hacerlo simplemente extrayendo el objeto-predicado objeto triplete.

Hablando de herramientas / motor, te recomiendo que uses Stanford NLP. Su salida del analizador de dependencias ya proporciona sujeto y objeto (si existe). Pero aún necesitas hacer algunos ajustes para obtener el resultado deseado.

Puede descargar la PNL de Stanford y conocer el uso de la muestra here

Encontré papel relacionado con su pregunta. Eche un vistazo a la simplificación de texto usando dependencias mecanografiadas: una comparación de la robustez de diferentes estrategias de generación.