python - tag - nltk corpus spanish

¿Qué es lo que el etiquetador de POS de NLTK me pide que descargue? (5)

Acabo de empezar a usar un etiquetador de parte del discurso y me enfrento a muchos problemas.

Comencé a etiquetar POS con lo siguiente:

import nltk text=nltk.word_tokenize("We are going out.Just you and me.")

Cuando quiero imprimir ''text'' , sucede lo siguiente:

print nltk.pos_tag(text) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "F:/Python26/lib/site-packages/nltk/tag/__init__.py", line 63, in pos_tag tagger = nltk.data.load(_POS_TAGGER) File "F:/Python26/lib/site-packages/nltk/data.py", line 594, in load resource_val = pickle.load(_open(resource_url)) File "F:/Python26/lib/site-packages/nltk/data.py", line 673, in _open return find(path).open() File "F:/Python26/lib/site-packages/nltk/data.py", line 455, in find raise LookupError(resource_not_found)` LookupError: Resource ''taggers/maxent_treebank_pos_tagger/english.pickle'' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download(). Searched in: - ''C://Documents and Settings//Administrator/nltk_data'' - ''C://nltk_data'' - ''D://nltk_data'' - ''E://nltk_data'' - ''F://Python26//nltk_data'' - ''F://Python26//lib//nltk_data'' - ''C://Documents and Settings//Administrator//Application Data//nltk_data''

Utilicé nltk.download() pero no funcionó.

A partir de versiones NLTK superiores a v3.2, use:

>>> import nltk >>> nltk.__version__ ''3.2.1'' >>> nltk.download(''averaged_perceptron_tagger'') [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /home/alvas/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to-date! True

Para las versiones NLTK usan el antiguo modelo MaxEnt, es decir, v3.1 y anteriores, use:

>>> import nltk >>> nltk.download(''maxent_treebank_pos_tagger'') [nltk_data] Downloading package maxent_treebank_pos_tagger to [nltk_data] /home/alvas/nltk_data... [nltk_data] Package maxent_treebank_pos_tagger is already up-to-date! True

Para obtener más detalles sobre el cambio en la pos_tag predeterminada, consulte https://github.com/nltk/nltk/pull/1143

Cuando escribe nltk.download() en Python, se muestra automáticamente una interfaz NLTK Downloader.
Haga clic en Modelos y elija maxent_treebank_pos_. Se instala automáticamente.

import nltk text=nltk.word_tokenize("We are going out.Just you and me.") print nltk.pos_tag(text) [(''We'', ''PRP''), (''are'', ''VBP''), (''going'', ''VBG''), (''out.Just'', ''JJ''), (''you'', ''PRP''), (''and'', ''CC''), (''me'', ''PRP''), (''.'', ''.'')]

Desde el shell / terminal, puede utilizar:

python -m nltk.downloader maxent_treebank_pos_tagger

(puede ser necesario ser sudo en Linux)

Instalará maxent_treebank_pos_tagger (es decir, el etiquetador de puntos de venta de treebank estándar en NLTK) y solucionará su problema.

import nltk text = "Obama delivers his first speech." sent = nltk.sent_tokenize(text) loftags = [] for s in sent: d = nltk.word_tokenize(s) print nltk.pos_tag(d)

Resultado:

akshayy @ ubuntu: ~ / summ $ python nn1.py [(''Obama'', ''NNP''), (''Deliver'', ''NNS''), (''his'', ''PRP $''), (''first'', '' JJ ''), ('' discurso '','' NN ''), (''. '',''. '')]

(Acabo de hacer otra pregunta donde usé este código)

nltk.download()

Haga clic en Modelos y elija maxent_treebank_pos_. Se instala automáticamente.