nlp - spanish - Cómo extraer la relación del texto en NLTK

nltk python español (1)

Hola, estoy tratando de extraer relaciones de una cadena de texto basada en el segundo ejemplo anterior aquí: https://web.archive.org/web/20120907184244/http://nltk.googlecode.com/svn/trunk/doc /howto/relextract.html

A partir de una cadena como "editor Michael James de Publishers Weekly" mi resultado deseado es tener un resultado como:

[PER: ''Michael James''] '', editor de'' [ORG: ''Publishers Weekly'']

¿Cuál es la mejor manera de hacerlo? ¿Qué formato espera extraer y cómo formateo mi entrada para cumplir con ese requisito?

Intenté hacerlo yo mismo pero no funcionó. Aquí está el código que he adaptado del libro. No estoy obteniendo ningún resultado impreso. ¿Qué estoy haciendo mal?

class doc(): pass doc.headline = [''this is expected by nltk.sem.extract_rels but not used in this script''] def findrelations(text): roles = """ (.*( analyst| editor| librarian).*)| researcher| spokes(wo)?man| writer| ,/sof/sthe?/s* # "X, of (the) Y" """ ROLES = re.compile(roles, re.VERBOSE) tokenizedsentences = nltk.sent_tokenize(text) for sentence in tokenizedsentences: taggedwords = nltk.pos_tag(nltk.word_tokenize(sentence)) doc.text = nltk.batch_ne_chunk(taggedwords) print doc.text for rel in relextract.extract_rels(''PER'', ''ORG'', doc, corpus=''ieer'', pattern=ROLES): print relextract.show_raw_rtuple(rel) # doctest: +ELLIPSIS

text = "Michael James editor de Publishers Weekly"
encontrar relaciones (texto)

aquí un código basado en el suyo (solo algunos ajustes) que funciona bien;)

import nltk import re from nltk.chunk import ne_chunk_sents from nltk.sem import relextract def findrelations(text): roles = """ (.*( analyst| editor| librarian).*)| researcher| spokes(wo)?man| writer| ,/sof/sthe?/s* # "X, of (the) Y" """ ROLES = re.compile(roles, re.VERBOSE) sentences = nltk.sent_tokenize(text) tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences] tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] chunked_sentences = nltk.ne_chunk_sents(tagged_sentences) for doc in chunked_sentences: print doc for rel in relextract.extract_rels(''PER'', ''ORG'', doc, corpus=''ace'', pattern=ROLES): #it is a tree, so you need to work on it to output what you want print relextract.show_raw_rtuple(rel) findrelations(''Michael James editor of Publishers Weekly'')

(S (PERSON Michael / NNP) (PERSON James / NNP) editor / NN de / IN (ORGANIZATION Publishers / NNS Weekly / NNP))