visual validar usar ucase studio primera minusculas mayusculas mayuscula letra example convertir como r tm lowercase term-document-matrix

validar - ucase en c#



Error al convertir el texto a minĂºsculas con tm_map(..., tolower) (4)

Intenté usar tm_map . Dio el siguiente error. ¿Cómo puedo evitar esto?

require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for ''tm_map'' applied to an object of class "character"


Ampliando mi comment a una respuesta más detallada aquí: hay que envolver tolower dentro de content_transformer para no estropear el objeto VCorpus , algo así como:

> library(tm) > data(''crude'') > crude[[1]]$content [1] "Diamond Shamrock Corp said that/neffective today it had cut its contract prices for crude oil by/n1.50 dlrs a barrel./n The reduction brings its posted price for West Texas/nIntermediate to 16.00 dlrs a barrel, the copany said./n /"The price reduction today was made in the light of falling/noil product prices and a weak crude oil market,/" a company/nspokeswoman said./n Diamond is the latest in a line of U.S. oil companies that/nhave cut its contract, or posted, prices over the last two days/nciting weak oil markets./n Reuter" > tm_map(crude, content_transformer(tolower))[[1]]$content [1] "diamond shamrock corp said that/neffective today it had cut its contract prices for crude oil by/n1.50 dlrs a barrel./n the reduction brings its posted price for west texas/nintermediate to 16.00 dlrs a barrel, the copany said./n /"the price reduction today was made in the light of falling/noil product prices and a weak crude oil market,/" a company/nspokeswoman said./n diamond is the latest in a line of u.s. oil companies that/nhave cut its contract, or posted, prices over the last two days/nciting weak oil markets./n reuter"


El uso de tolower de esta manera tiene un efecto secundario indeseable: si intentas crear una matriz de documentos de términos a partir del corpus más tarde, fallará. Esto se debe a un cambio reciente en tm que no puede manejar el tipo de retorno de tolower. En cambio, usa:

myCorpus <- tm_map(myCorpus, PlainTextDocument)


Utilice la función de la base R tolower() :

tolower(c("THE quick BROWN fox")) # [1] "the quick brown fox"


myCorpus <- Corpus(VectorSource(byword)) myCorpus <- tm_map(myCorpus , tolower) print(myCorpus[[1]])