python 3.x - sublime - ¿Cómo puedo ampliar el resultado de abajo en python porque quiero usarlo como entrada en otro lugar?
python online (1)
Creo que necesitas str.split
para la lista de todas las palabras, dividida por todos los espacios en blanco, también necesitas ho[''tweet'']
para seleccionar el tweet
columna:
wordList = word_tokenize(fg)
#output is string
ho1=ho[''tweet''].str.split()
.apply(lambda x:'' ''.join([word for word in wordList if word not in eng_stopwords]))
O:
wordList = word_tokenize(fg)
#output is list
ho1=ho[''tweet''].str.split()
.apply(lambda x:[word for word in wordList if word not in eng_stopwords])
en lugar:
ho = ho.to_frame(name=None)
a=ho.to_string(buf=None, columns=None, col_space=None, header=True,
index=True, na_rep=''NaN'', formatters=None, float_format=None,
sparsify=False, index_names=True, justify=None, line_width=None,
max_rows=None, max_cols=None, show_dimensions=False)
wordList = word_tokenize(fg)
wordList = [word for word in wordList if word not in eng_stopwords]
print (wordList)
Este es el código que estoy usando:
ho = ho.replace(''((www/.[/s]+)|(https?://[^/s]+))'',''URL'',regex=True)
ho =ho.replace(r''#([^/s]+)'', r''/1'', regex=True)
ho =ho.replace(''/'"'',regex=True)
lem = WordNetLemmatizer()
stem = PorterStemmer()
fg=stem.stem(a)
eng_stopwords = stopwords.words(''english'')
ho = ho.to_frame(name=None)
a=ho.to_string(buf=None, columns=None, col_space=None, header=True,
index=True, na_rep=''NaN'', formatters=None, float_format=None,
sparsify=False, index_names=True, justify=None, line_width=None,
max_rows=None, max_cols=None, show_dimensions=False)
wordList = word_tokenize(fg)
wordList = [word for word in wordList if word not in eng_stopwords]
print (wordList)
al imprimir (a) obtengo el resultado siguiente. No puedo realizar tokenize de palabra correctamente.
tweet
0 1495596971.6034188automotive auto ebc greenstu...
1 1495596972.330948new free stock photo of city ...
2 1495596972.775966ebay 1974 volkswagen beetle -...
3 1495596975.6460807cars fly off a hidden speed ...
4 1495596978.12868rt @jiikae guys i think mario ...
Estas son las primeras 5 líneas del archivo csv:
"1495596971.6034188::automotive auto ebc greenstuff 6000 series supreme
truck and suv brake pads dp61603 https:////t.co//jpylzjyd5o cars/u2026
https:////t.co//gfsbz6pkj7""display_text_range:[0140]source:""/u003ca
href=/""https:////dlvrit.com///""
rel=/""nofollow/""/u003edlvr.it/u003c//a/u003e"""
"1495596972.330948::new free stock photo of city cars road
https:////t.co//qbkgvkfgpp""display_text_range:[0"
"1495596972.775966::ebay: 1974 volkswagen beetle - classic 1952 custom
conversion extremely rare 1974 vw beetle/u2026/u2026
https:////t.co//wdsnf2pmo7""display_text_range:[0140]source:""/u003ca
href=/""https:////dlvrit.com///""
rel=/""nofollow/""/u003edlvr.it/u003c//a/u003e"""
"1495596975.6460807::cars fly off a hidden speed bump
https:////t.co//fliiqwt1rk https:////t.co//klx7kfooro""display_text_range:
[056]source:""/u003ca href=/""https:////dlvrit.com///""
rel=/""nofollow/""/u003edlvr.it/u003c//a/u003e"""
1495596978.12868::rt @jiikae: guys i think mario is going through a mid-life
crisis. buying expensive cars using guns hanging out with proport/u2026