stop - Cómo eliminar palabras de parada usando nltk o python

stop words elimination python (7)

Así que tengo un conjunto de datos que me gustaría eliminar para dejar de usar palabras

stopwords.words(''english'')

Estoy luchando por usar esto dentro de mi código para simplemente sacar estas palabras. Ya tengo una lista de las palabras de este conjunto de datos, la parte con la que estoy luchando es compararla con esta lista y eliminar las palabras de finalización. Cualquier ayuda es apreciada.

Para excluir todo tipo de palabras prohibidas, incluidas nltk stop-words, podría hacer algo como esto:

from stop_words import get_stop_words from nltk.corpus import stopwords stop_words = list(get_stop_words(''en'')) #About 900 stopwords nltk_words = list(stopwords.words(''english'')) #About 150 stopwords stop_words.extend(nltk_words) output = [w for w in word_list if not w in stop_words]

Supongo que tiene una lista de palabras (word_list) de la cual quiere quitar stopwords. Podrías hacer algo como esto:

filtered_word_list = word_list[:] #make a copy of the word_list for word in word_list: # iterate over word_list if word in stopwords.words(''english''): filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword

También podría hacer un conjunto de diferencias, por ejemplo:

list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words(''english'')))

puedes usar esta función, deberías notar que necesitas bajar todas las palabras

from nltk.corpus import stopwords def remove_stopwords(word_list): processed_word_list = [] for word in word_list: word = word.lower() # in case they arenet all lower cased if word not in stopwords.words("english"): processed_word_list.append(word) return processed_word_list

usando el filter :

from nltk.corpus import stopwords # ... filtered_words = list(filter(lambda word: word not in stopwords.words(''english''), word_list))

import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x not in list: # comparing from the list and removing it another_list.append(x) # it is also possible to use .remove for x in another_list: print(x,end='' '') # 2) if you want to use .remove more preferred code import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x in list: userstring.remove(x) for x in userstring: print(x,end = '' '') #the code will be like this

from nltk.corpus import stopwords # ... filtered_words = [word for word in word_list if word not in stopwords.words(''english'')]