language-agnostic code-golf

language agnostic - Construya un cuadro ASCII de las palabras más comúnmente usadas en un texto dado



language-agnostic code-golf (30)

El reto:

Construya un cuadro ASCII de las palabras más comúnmente usadas en un texto dado.

Las normas:

  • Solo acepta az y AZ (caracteres alfabéticos) como parte de una palabra.
  • Ignore el revestimiento ( She == she para nuestro propósito).
  • Ignore las siguientes palabras (bastante arbitrarias, lo sé): the, and, of, to, a, i, it, in, or, is
  • Aclaración: considerando don''t : esto se tomaría como 2 "palabras" diferentes en los rangos az y AZ : ( don y t ).

  • Opcionalmente (es demasiado tarde para cambiar formalmente las especificaciones ahora), puede optar por descartar todas las ''palabras'' de una sola letra (esto podría hacer que también se acorte la lista de ignorar).

Analice un text dado (lea un archivo especificado a través de los argumentos de la línea de comandos o entierre, presume us-ascii ) y compilar un word frequency chart con las siguientes características:

  • Muestre la tabla (también vea el ejemplo a continuación) para las 22 palabras más comunes (ordenadas por frecuencia descendente).
  • El width barra representa el número de ocurrencias (frecuencia) de la palabra (proporcionalmente). Agregue un espacio e imprima la palabra.
  • Asegúrate de que estas barras (más espacio-palabra-espacio) siempre encajen : bar + [space] + word + [space] siempre deben tener <= 80 caracteres (asegúrate de tener en cuenta las posibles longitudes diferentes de barras y palabras: por ejemplo: el segundo la palabra más común podría ser mucho más larga que la primera, sin diferir tanto en frecuencia). Maximice el ancho de la barra dentro de estas restricciones y escale las barras apropiadamente (de acuerdo con las frecuencias que representan).

Un ejemplo:

El texto para el ejemplo se puede encontrar aquí ( Alicia en el país de las maravillas, de Lewis Carroll ).

Este texto específico arrojaría la siguiente tabla:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |____________________________________________________| alice |______________________________________________| was |__________________________________________| that |___________________________________| as |_______________________________| her |____________________________| with |____________________________| at |___________________________| s |___________________________| t |_________________________| on |_________________________| all |______________________| this |______________________| for |______________________| had |_____________________| but |____________________| be |____________________| not |___________________| they |__________________| so

Para su información: estas son las frecuencias sobre las que se basa el cuadro anterior:

[(''she'', 553), (''you'', 481), (''said'', 462), (''alice'', 403), (''was'', 358), (''that '', 330), (''as'', 274), (''her'', 248), (''with'', 227), (''at'', 227), (''s'', 219), (''t'' , 218), (''on'', 204), (''all'', 200), (''this'', 181), (''for'', 179), (''had'', 178), ('' but'', 175), (''be'', 167), (''not'', 166), (''they'', 155), (''so'', 152)]

Un segundo ejemplo (para verificar si implementó la especificación completa): reemplace cada ocurrencia de you en el archivo vinculado Alice in Wonderland con superlongstringstring :

________________________________________________________________ |________________________________________________________________| she |_______________________________________________________| superlongstringstring |_____________________________________________________| said |______________________________________________| alice |________________________________________| was |_____________________________________| that |______________________________| as |___________________________| her |_________________________| with |_________________________| at |________________________| s |________________________| t |______________________| on |_____________________| all |___________________| this |___________________| for |___________________| had |__________________| but |_________________| be |_________________| not |________________| they |________________| so

El ganador:

La solución más corta (por cantidad de caracteres, por idioma). ¡Que te diviertas!

Editar : Tabla que resume los resultados hasta el momento (2012-02-15) (originalmente agregado por el usuario Nas Banov):

Language Relaxed Strict ========= ======= ====== GolfScript 130 143 Perl 185 Windows PowerShell 148 199 Mathematica 199 Ruby 185 205 Unix Toolchain 194 228 Python 183 243 Clojure 282 Scala 311 Haskell 333 Awk 336 R 298 Javascript 304 354 Groovy 321 Matlab 404 C# 422 Smalltalk 386 PHP 450 F# 452 TSQL 483 507

Los números representan la longitud de la solución más corta en un idioma específico. "Estricto" se refiere a una solución que implementa la especificación completamente (dibuja |____| barras, cierra la primera barra en la parte superior con una ____ línea, da cuenta de la posibilidad de palabras largas con alta frecuencia, etc.). "Relajado" significa que se tomaron algunas libertades para acortar la solución.

Solo se incluyen soluciones de menos de 500 caracteres. La lista de idiomas está ordenada por la longitud de la solución ''estricta''. ''Unix Toolchain'' se usa para significar varias soluciones que usan shell * nix tradicional más una combinación de herramientas (como grep, tr, sort, uniq, head, perl, awk).


206

shell, grep, tr, grep, sort, uniq, sort, head, perl

~ % wc -c wfg 209 wfg ~ % cat wfg egrep -oi //b[a-z]+|tr A-Z a-z|egrep -wv ''the|and|of|to|a|i|it|in|or|is''|sort|uniq -c|sort -nr|head -22|perl -lape''($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b/n$_"'' ~ % # usage: ~ % sh wfg < 11.txt

hm, acabo de ver arriba: sort -nr -> sort -n y luego head -> tail => 208 :)
update2: erm, por supuesto, lo de arriba es tonto, ya que se invertirá entonces. Entonces, 209.
actualización3: optimización de la expresión regular de exclusión -> 206

egrep -oi //b[a-z]+|tr A-Z a-z|egrep -wv ''the|and|o[fr]|to|a|i[tns]?''|sort|uniq -c|sort -nr|head -22|perl -lape''($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b/n$_"''



por diversión, aquí hay una versión solo por perl (mucho más rápido):

~ % wc -c pgolf 204 pgolf ~ % cat pgolf perl -lne''$1=~/^(the|and|o[fr]|to|.|i[tns])$/i||$f{lc$1}++while//b([a-z]+)/gi}{@w=(sort{$f{$b}<=>$f{$a}}keys%f)[0..21];$Q=$f{$_=$w[0]};$B=76-y///c;print" "."_"x$B;print"|"."_"x($B*$f{$_}/$Q)."| $_"for@w'' ~ % # usage: ~ % sh pgolf < 11.txt


GolfScript, 177 175 173 167 164 163 144 131 130 caracteres

Lento - 3 minutos para el texto de muestra (130)

{32|.123%97<n@if}%]''''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~/;}$22<.0=~:2;,76/-:1''_'':0*'' ''/@{" |"/~1*2/0*''| ''@}/

Explicación:

{ #loop through all characters 32|. #convert to uppercase and duplicate 123%97< #determine if is a letter n@if #return either the letter or a newline }% #return an array (of ints) ]''''* #convert array to a string with magic n% #split on newline, removing blanks (stack is an array of words now) "oftoitinorisa" #push this string 2/ #split into groups of two, i.e. ["of" "to" "it" "in" "or" "is" "a"] - #remove any occurrences from the text "theandi"3/-#remove "the", "and", and "i" $ #sort the array of words (1@ #takes the first word in the array, pushes a 1, reorders stack #the 1 is the current number of occurrences of the first word { #loop through the array .3$>1{;)}if#increment the count or push the next word and a 1 }/ ]2/ #gather stack into an array and split into groups of 2 {~~/;}$ #sort by the latter element - the count of occurrences of each word 22< #take the first 22 elements .0=~:2; #store the highest count ,76/-:1 #store the length of the first line ''_'':0*'' ''/@ #make the first line { #loop through each word " |"/~ #start drawing the bar 1*2/0 #divide by zero *''| ''@ #finish drawing the bar }/

"Correcto" (con suerte). (143)

{32|.123%97<n@if}%]''''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~/;}$22<..0=1=:^;{~76@,-^*//}%$0=:1''_'':0*'' ''/@{" |"/~1*^/0*''| ''@}/

Menos lento, medio minuto. (162)

''"''/'' '':S*n/S*''"#{%q ''/+" .downcase.tr(''^a-z'','' '')}/""+~n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~/;}$22<.0=~:2;,76/-:1''_'':0*S/@{" |"/~1*2/0*''| ''@}/

Salida visible en los registros de revisión.


Mathematica ( 297 284 248 244 242 199 caracteres) Funcional puro

y la prueba de la ley de Zipf

Mire Mamma ... no vars, no hands, .. no head

Editar 1> algunas palabras cortas definidas (284 caracteres)

f[x_, y_] := Flatten[Take[x, All, y]]; BarChart[f[{##}, -1], BarOrigin -> Left, ChartLabels -> Placed[f[{##}, 1], After], Axes -> None ] & @@ Take[ SortBy[ Tally[ Select[ StringSplit[ToLowerCase[Import[i]], RegularExpression["//W+"]], !MemberQ[{"the", "and", "of", "to", "a", "i", "it", "in", "or","is"}, #]&] ], Last], -22]

Algunas explicaciones

Import[] # Get The File ToLowerCase [] # To Lower Case :) StringSplit[ STRING , RegularExpression["//W+"]] # Split By Words, getting a LIST Select[ LIST, !MemberQ[{LIST_TO_AVOID}, #]&] # Select from LIST except those words in LIST_TO_AVOID # Note that !MemberQ[{LIST_TO_AVOID}, #]& is a FUNCTION for the test Tally[LIST] # Get the LIST {word,word,..} and produce another {{word,counter},{word,counter}...} SortBy[ LIST ,Last] # Get the list produced bt tally and sort by counters Note that counters are the LAST element of {word,counter} Take[ LIST ,-22] # Once sorted, get the biggest 22 counters BarChart[f[{##}, -1], ChartLabels -> Placed[f[{##}, 1], After]] &@@ LIST # Get the list produced by Take as input and produce a bar chart f[x_, y_] := Flatten[Take[x, All, y]] # Auxiliary to get the list of the first or second element of lists of lists x_ dependending upon y # So f[{##}, -1] is the list of counters # and f[{##}, 1] is the list of words (labels for the chart)

Salida

texto alternativo http://i49.tinypic.com/2n8mrer.jpg

Mathematica no es muy adecuado para jugar al golf, y eso se debe solo a los nombres largos y descriptivos de las funciones. Funciones como "RegularExpression []" o "StringSplit []" solo me hacen sollozar :(.

Prueba de ley de Zipf

La ley de Zipf predice que, para un texto en lenguaje natural, el gráfico Log (Rank) vs Log (occurrences) sigue una relación lineal .

La ley se usa en el desarrollo de algoritmos para criptografía y compresión de datos. (Pero NO es la "Z" en el algoritmo LZW).

En nuestro texto, podemos probarlo con lo siguiente

f[x_, y_] := Flatten[Take[x, All, y]]; ListLogLogPlot[ Reverse[f[{##}, -1]], AxesLabel -> {"Log (Rank)", "Log Counter"}, PlotLabel -> "Testing Zipf''s Law"] & @@ Take[ SortBy[ Tally[ StringSplit[ToLowerCase[b], RegularExpression["//W+"]] ], Last], -1000]

El resultado es (bastante bien lineal)

texto alternativo http://i46.tinypic.com/33fcmdk.jpg

Editar 6> (242 Chars)

Refactorización de Regex (ya no se usa la función Seleccionar)
Dejar caer 1 palabras de carbonilla
Definición más eficiente para la función "f"

f = Flatten[Take[#1, All, #2]]&; BarChart[ f[{##}, -1], BarOrigin -> Left, ChartLabels -> Placed[f[{##}, 1], After], Axes -> None] & @@ Take[ SortBy[ Tally[ StringSplit[ToLowerCase[Import[i]], RegularExpression["(//W|//b(.|the|and|of|to|i[tns]|or)//b)+"]] ], Last], -22]

Editar 7 → 199 caracteres

BarChart[#2, BarOrigin->Left, ChartLabels->Placed[#1, After], Axes->None]&@@ Transpose@Take[SortBy[Tally@StringSplit[ToLowerCase@Import@i, RegularExpression@"(//W|//b(.|the|and|of|to|i[tns]|or)//b)+"],Last], -22]

  • Se reemplazó f con los argumentos Transpose y Slot ( #1 / #2 ).
  • No necesitamos ningún paréntesis desagradable (use f@x lugar de f[x] cuando sea posible)

Perl, 237 229 209 caracteres

(Actualizado nuevamente para superar la versión de Ruby con más trucos de golf sucios, reemplazando split/[^az/,lc con lc=~/[az]+/g , y eliminando una marca de cadena vacía en otro lugar. Estos fueron inspirados por la versión de Ruby, así que acredite dónde se debe el crédito).

Actualización: ahora con Perl 5.10! Reemplace la print con say , y use ~~ para evitar un map . Esto debe invocarse en la línea de comando como perl -E ''<one-liner>'' alice.txt . Como toda la secuencia de comandos está en una línea, escribirla como una línea no debe presentar ninguna dificultad :).

@s=qw/the and of to a i it in or is/;$c{$_}++foreach grep{!($_~~@s)}map{lc=~/[a-z]+/g}<>;@s=sort{$c{$b}<=>$c{$a}}keys%c;$f=76-length$s[0];say" "."_"x$f;say"|"."_"x($c{$_}/$c{$s[0]}*$f)."| $_ "foreach@s[0..21];

Tenga en cuenta que esta versión se normaliza para el caso. Esto no acorta la solución, ya que eliminar ,lc (para la carcasa inferior) requiere que agregue AZ a la expresión regular dividida, por lo que es un lavado.

Si está en un sistema donde una nueva línea tiene un carácter y no dos, puede acortar esto con otros dos caracteres utilizando una nueva línea literal en lugar de /n . Sin embargo, no he escrito la muestra anterior de esa manera, ya que es "más claro" (¡ja!) De esa manera.

Aquí hay una solución perl, en su mayoría correcta, pero no remotamente lo suficientemente corta:

use strict; use warnings; my %short = map { $_ => 1 } qw/the and of to a i it in or is/; my %count = (); $count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-zA-Z]/ } (<>); my @sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21]; my $widest = 76 - (length $sorted[0]); print " " . ("_" x $widest) . "/n"; foreach (@sorted) { my $width = int(($count{$_} / $count{$sorted[0]}) * $widest); print "|" . ("_" x $width) . "| $_ /n"; }

La siguiente información es tan breve como puede obtenerse mientras se mantiene relativamente legible. (392 caracteres).

%short = map { $_ => 1 } qw/the and of to a i it in or is/; %count; $count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-z]/, lc } (<>); @sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21]; $widest = 76 - (length $sorted[0]); print " " . "_" x $widest . "/n"; print"|" . "_" x int(($count{$_} / $count{$sorted[0]}) * $widest) . "| $_ /n" foreach @sorted;


Ruby 1.9, 185 caracteres

(basado en gran medida en las otras soluciones de Ruby)

w=($<.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort[0,22] k,l=w[0] puts [?/s+?_*m=76-l.size,w.map{|f,x|?|+?_*(f*m/k)+"| "+x}]

En lugar de usar cualquier modificador de línea de comando como las otras soluciones, simplemente puede pasar el nombre del archivo como argumento. (es decir, ruby1.9 wordfrequency.rb Alice.txt )

Como uso caracteres literales aquí, esta solución solo funciona en Ruby 1.9.

Editar: reemplazó punto y coma por saltos de línea para "legibilidad". :PAG

Editar 2: Shtééf señaló que olvidé el espacio final, lo arreglé.

Editar 3: se eliminó el espacio final de nuevo;)


Ruby 207 213 211 210 207 203 201 200 caracteres

Una mejora en Anurag, incorporando la sugerencia de rfusca. También elimina el argumento para clasificar y algunos otros golfitos menores.

w=(STDIN.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort.take 22;k,l=w[0];m=76.0-l.size;puts'' ''+''_''*m;w.map{|f,x|puts"|#{''_''*(m*f/k)}| #{x} "}

Ejecutar como:

ruby GolfedWordFrequencies.rb < Alice.txt

Editar: vuelva a poner ''puts'', necesita estar allí para evitar tener comillas en la salida.
Edit2: archivo cambiado-> IO
Edit3: eliminado / i
Edit4: Se han eliminado los paréntesis (f * 1.0), contados
Editar5: utilizar la suma de cadenas para la primera línea; expandir s en el lugar.
Edit6: Hecho m flotar, eliminado 1.0. EDITAR: no funciona, cambia longitudes. EDITAR: No es peor que antes
Edit7: Use STDIN.read .


Ruby, 215, 216 , 218 , 221 , 224 , 236 , 237 caracteres

actualización 1: ¡ Hurra ! Es un empate con la solution JS Bangs . No se puede pensar en una forma de reducir más :)

actualización 2: jugó un truco de golf sucio. Cambié each al map para guardar 1 carácter :)

actualización 3: Cambió File.read a IO.read +2. Array.group_by no fue muy fructífero, cambió para reduce +6. La verificación insensible a las mayúsculas y minúsculas no es necesaria después de la carcasa inferior con el downcase en regex +1. La ordenación en orden descendente se realiza fácilmente anulando el valor +6. Ahorro total +15

actualización 4: [0] lugar de .first , +3. (@ Shtééf)

actualización 5: expanda la variable l en contexto, +1. Expanda la variable s en contexto, +2. (@ Shtééf)

actualización 6: use la suma de cadenas en lugar de la interpolación para la primera línea, +2. (@ Shtééf)

w=(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take 22;m=76-w[0][0].size;puts'' ''+''_''*m;w.map{|x,f|puts"|#{''_''*(f*1.0/w[0][1]*m)}| #{x} "}

actualización 7: pasé por mucho hoopla para detectar la primera iteración dentro del ciclo, usando variables de instancia. Todo lo que obtuve es +1, aunque tal vez haya potencial. Preservando la versión anterior, porque creo que esta es magia negra. (@ Shtééf)

(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take(22).map{|x,f|@f||(@f=f;puts'' ''+''_''*(@m=76-x.size));puts"|#{''_''*(f*1.0/@f*@m)}| #{x} "}

Versión legible

string = File.read($_).downcase words = string.scan(/[a-z]+/i) allowed_words = words - %w{the and of to a i it in or is} sorted_words = allowed_words.group_by{ |x| x }.map{ |x,y| [x, y.size] }.sort{ |a,b| b[1] <=> a[1] }.take(22) highest_frequency = sorted_words.first highest_frequency_count = highest_frequency[1] highest_frequency_word = highest_frequency[0] word_length = highest_frequency_word.size widest = 76 - word_length puts " #{''_'' * widest}" sorted_words.each do |word, freq| width = (freq * 1.0 / highest_frequency_count) * widest puts "|#{''_'' * width}| #{word} " end

To use:

echo "Alice.txt" | ruby -ln GolfedWordFrequencies.rb

Salida:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |___________________________________________| that |____________________________________| as |________________________________| her |_____________________________| with |_____________________________| at |____________________________| s |____________________________| t |__________________________| on |__________________________| all |_______________________| this |_______________________| for |_______________________| had |_______________________| but |______________________| be |_____________________| not |____________________| they |____________________| so


Windows PowerShell, 199 caracteres

$x=$input-split''/P{L}''-notmatch''^(the|and|of|to|.?|i[tns]|or)$''|group|sort * filter f($w){'' ''+''_''*$w $x[-1..-22]|%{"|$(''_''*($w*$_.Count/$x[-1].Count))| "+$_.Name}} f(76..1|?{!((f $_)-match''.''*80)})[0]

(El último salto de línea no es necesario, pero se incluye aquí para facilitar la lectura).

(El código actual y mis archivos de prueba están disponibles en mi repositorio SVN . Espero que mis casos de prueba capten la mayoría de los errores comunes (longitud de la barra, problemas con la coincidencia de expresiones regulares y algunos otros)

Suposiciones

  • US ASCII como entrada. Probablemente se vuelva raro con Unicode.
  • Al menos dos palabras non-stop en el texto

History

Versión relajada (137), ya que eso ya se cuenta por separado, aparentemente:

($x=$input-split''/P{L}''-notmatch''^(the|and|of|to|.?|i[tns]|or)$''|group|sort *)[-1..-22]|%{"|$(''_''*(76*$_.Count/$x[-1].Count))| "+$_.Name}

  • no cierra la primera barra
  • no tiene en cuenta la longitud de palabra de la primera palabra

Las variaciones de las longitudes de barra de un carácter en comparación con otras soluciones se deben a que PowerShell utiliza el redondeo en lugar del truncamiento al convertir los números de coma flotante en enteros. Sin embargo, dado que la tarea requería solo una longitud de barra proporcional, esto debería estar bien.

En comparación con otras soluciones, tomé un enfoque ligeramente diferente al determinar la longitud más larga de la barra simplemente probando y tomando la longitud más alta en la que ninguna línea tiene más de 80 caracteres.

Una versión anterior explicada se puede encontrar here .


C (828)

It looks alot like obfuscated code, and uses glib for string, list and hash. Char count with wc -m says 828 . It does not consider single-char words. To calculate the max length of the bar, it consider the longest possible word among all, not only the first 22. Is this a deviation from the spec?

It does not handle failures and it does not release used memory.

#include <glib.h> #define S(X)g_string_##X #define H(X)g_hash_table_##X GHashTable*h;int m,w=0,z=0;y(const void*a,const void*b){int*A,*B;A=H(lookup)(h,a);B=H(lookup)(h,b);return*B-*A;}void p(void*d,void*u){int *v=H(lookup)(h,d);if(w<22){g_printf("|");*v=*v*(77-z)/m;while(--*v>=0)g_printf("=");g_printf("| %s/n",d);w++;}}main(c){int*v;GList*l;GString*s=S(new)(NULL);h=H(new)(g_str_hash,g_str_equal);char*n[]={"the","and","of","to","it","in","or","is"};while((c=getchar())!=-1){if(isalpha(c))S(append_c)(s,tolower(c));else{if(s->len>1){for(c=0;c<8;c++)if(!strcmp(s->str,n[c]))goto x;if((v=H(lookup)(h,s->str))!=NULL)++*v;else{z=MAX(z,s->len);v=g_malloc(sizeof(int));*v=1;H(insert)(h,g_strdup(s->str),v);}}x:S(truncate)(s,0);}}l=g_list_sort(H(get_keys)(h),y);m=*(int*)H(lookup)(h,g_list_first(l)->data);g_list_foreach(l,p,NULL);}


F#, 452 chars

Strightforward: get a sequence a of word-count pairs, find the best word-count-per-column multiplier k , then print results.

let a= stdin.ReadToEnd().Split(" .?!,/":;''/r/n".ToCharArray(),enum 1) |>Seq.map(fun s->s.ToLower())|>Seq.countBy id |>Seq.filter(fun(w,n)->not(set["the";"and";"of";"to";"a";"i";"it";"in";"or";"is"].Contains w)) |>Seq.sortBy(fun(w,n)-> -n)|>Seq.take 22 let k=a|>Seq.map(fun(w,n)->float(78-w.Length)/float n)|>Seq.min let u n=String.replicate(int(float(n)*k)-2)"_" printfn" %s "(u(snd(Seq.nth 0 a))) for(w,n)in a do printfn"|%s| %s "(u n)w

Example (I have different freq counts than you, unsure why):

% app.exe < Alice.txt _________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |_____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |___________________________________________| that |___________________________________| as |________________________________| her |_____________________________| with |_____________________________| at |____________________________| t |____________________________| s |__________________________| on |_________________________| all |_______________________| this |______________________| had |______________________| for |_____________________| but |_____________________| be |____________________| not |___________________| they |__________________| so


Haskell - 366 351 344 337 333 characters

(One line break in main added for readability, and no line break needed at end of last line.)

import Data.List import Data.Char l=length t=filter m=map f c|isAlpha c=toLower c|0<1='' '' h w=(-l w,head w) x!(q,w)=''|'':replicate(minimum$m(q?)x)''_''++"| "++w q?(g,w)=q*(77-l w)`div`g b x=m(x!)x a(l:r)=('' '':t(==''_'')l):l:r main=interact$unlines.a.b.take 22.sort.m h.group.sort .t(`notElem`words"the and of to a i it in or is").words.m f

How it works is best seen by reading the argument to interact backwards:

  • map f lowercases alphabetics, replaces everything else with spaces.
  • words produces a list of words, dropping the separating whitespace.
  • filter ( notElem words "the and of to ai it in or is") discards all entries with forbidden words.
  • group . sort sorts the words, and groups identical ones into lists.
  • map h maps each list of identical words to a tuple of the form (-frequency, word) .
  • take 22 . sort sorts the tuples by descending frequency (the first tuple entry), and keeps only the first 22 tuples.
  • b maps tuples to bars (see below).
  • a prepends the first line of underscores, to complete the topmost bar.
  • unlines joins all these lines together with newlines.

The tricky bit is getting the bar length right. I assumed that only underscores counted towards the length of the bar, so || would be a bar of zero length. The function b maps cx over x , where x is the list of histograms. The entire list is passed to c , so that each invocation of c can compute the scale factor for itself by calling u . In this way, I avoid using floating-point math or rationals, whose conversion functions and imports would eat many characters.

Note the trick of using -frequency . This removes the need to reverse the sort since sorting (ascending) -frequency will places the words with the largest frequency first. Later, in the function u , two -frequency values are multiplied, which will cancel the negation out.


C # - 510 451 436 446 434 426 422 caracteres (minificado)

No es tan corto, ¡pero ahora probablemente sea correcto! Tenga en cuenta que la versión anterior no mostraba la primera línea de las barras, no escalaba las barras correctamente, descargaba el archivo en lugar de obtenerlo de stdin, y no incluía todo el nivel de detalle requerido de C #. Podrías afeitar muchos golpes si C # no necesitara tanta mierda extra. Tal vez Powershell podría hacerlo mejor.

using C=System.Console; // alias for Console using System.Linq; // for Split, GroupBy, Select, OrderBy, etc. class Class // must define a class { static void Main() // must define a Main { // split into words var allwords = System.Text.RegularExpressions.Regex.Split( // convert stdin to lowercase C.In.ReadToEnd().ToLower(), // eliminate stopwords and non-letters @"(?:/b(?:the|and|of|to|a|i[tns]?|or)/b|/W)+") .GroupBy(x => x) // group by words .OrderBy(x => -x.Count()) // sort descending by count .Take(22); // take first 22 words // compute length of longest bar + word var lendivisor = allwords.Max(y => y.Count() / (76.0 - y.Key.Length)); // prepare text to print var toPrint = allwords.Select(x=> new { // remember bar pseudographics (will be used in two places) Bar = new string(''_'',(int)(x.Count()/lendivisor)), Word=x.Key }) .ToList(); // convert to list so we can index into it // print top of first bar C.WriteLine(" " + toPrint[0].Bar); toPrint.ForEach(x => // for each word, print its bar and the word C.WriteLine("|" + x.Bar + "| " + x.Word)); } }

422 caracteres con lendivisor en línea (lo que lo hace 22 veces más lento) en el siguiente formulario (líneas nuevas usadas para espacios seleccionados):

using System.Linq;using C=System.Console;class M{static void Main(){var a=System.Text.RegularExpressions.Regex.Split(C.In.ReadToEnd().ToLower(),@"(?:/b(?:the|and|of|to|a|i[tns]?|or)/b|/W)+").GroupBy(x=>x).OrderBy(x=>-x.Count()).Take(22);var b=a.Select(x=>new{p=new string(''_'',(int)(x.Count()/a.Max(y=>y.Count()/(76d-y.Key.Length)))),t=x.Key}).ToList();C.WriteLine(" "+b[0].p);b.ForEach(x=>C.WriteLine("|"+x.p+"| "+x.t));}}


Nodos de LabVIEW 51, 5 estructuras, 10 diagramas

Enseñar al elefante a bailar claqué nunca es bonito. Yo, ah, me saltearé el recuento de personajes.

El programa fluye de izquierda a derecha:


Solución basada en el conjunto Transact SQL (SQL Server 2005) 1063 892 873 853 827 820 783 683 647 644 630 caracteres

Gracias a Gabe por algunas sugerencias útiles para reducir el número de personajes.

NB: Se han agregado saltos de línea para evitar barras de desplazamiento, solo se requiere el último salto de línea.

DECLARE @ VARCHAR(MAX),@F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK''A'', SINGLE_BLOB)x;WITH N AS(SELECT 1 i,LEFT(@,1)L UNION ALL SELECT i+1,SUBSTRING (@,i+1,1)FROM N WHERE i<LEN(@))SELECT i,L,i-RANK()OVER(ORDER BY i)R INTO #D FROM N WHERE L LIKE''[A-Z]''OPTION(MAXRECURSION 0)SELECT TOP 22 W,-COUNT(*)C INTO # FROM(SELECT DISTINCT R,(SELECT''''+L FROM #D WHERE R=b.R FOR XML PATH (''''))W FROM #D b)t WHERE LEN(W)>1 AND W NOT IN(''the'',''and'',''of'',''to'',''it'', ''in'',''or'',''is'')GROUP BY W ORDER BY C SELECT @F=MIN(($76-LEN(W))/-C),@='' ''+ REPLICATE(''_'',-MIN(C)*@F)+'' ''FROM # SELECT @=@+'' |''+REPLICATE(''_'',-C*@F)+''| ''+W FROM # ORDER BY C PRINT @

Versión legible

DECLARE @ VARCHAR(MAX), @F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK''A'',SINGLE_BLOB)x; /* Loads text file from path C:/WINDOWS/system32/A */ /*Recursive common table expression to generate a table of numbers from 1 to string length (and associated characters)*/ WITH N AS (SELECT 1 i, LEFT(@,1)L UNION ALL SELECT i+1, SUBSTRING(@,i+1,1) FROM N WHERE i<LEN(@) ) SELECT i, L, i-RANK()OVER(ORDER BY i)R /*Will group characters from the same word together*/ INTO #D FROM N WHERE L LIKE''[A-Z]''OPTION(MAXRECURSION 0) /*Assuming case insensitive accent sensitive collation*/ SELECT TOP 22 W, -COUNT(*)C INTO # FROM (SELECT DISTINCT R, (SELECT ''''+L FROM #D WHERE R=b.R FOR XML PATH('''') )W /*Reconstitute the word from the characters*/ FROM #D b ) T WHERE LEN(W)>1 AND W NOT IN(''the'', ''and'', ''of'' , ''to'' , ''it'' , ''in'' , ''or'' , ''is'') GROUP BY W ORDER BY C /*Just noticed this looks risky as it relies on the order of evaluation of the variables. I''m not sure that''s guaranteed but it works on my machine :-) */ SELECT @F=MIN(($76-LEN(W))/-C), @ ='' '' +REPLICATE(''_'',-MIN(C)*@F)+'' '' FROM # SELECT @=@+'' |''+REPLICATE(''_'',-C*@F)+''| ''+W FROM # ORDER BY C PRINT @

Salida

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| You |____________________________________________________________| said |_____________________________________________________| Alice |_______________________________________________| was |___________________________________________| that |____________________________________| as |________________________________| her |_____________________________| at |_____________________________| with |__________________________| on |__________________________| all |_______________________| This |_______________________| for |_______________________| had |_______________________| but |______________________| be |_____________________| not |____________________| they |____________________| So |___________________| very |__________________| what

Y con la larga cuerda

_______________________________________________________________ |_______________________________________________________________| she |_______________________________________________________| superlongstringstring |____________________________________________________| said |______________________________________________| Alice |________________________________________| was |_____________________________________| that |_______________________________| as |____________________________| her |_________________________| at |_________________________| with |_______________________| on |______________________| all |____________________| This |____________________| for |____________________| had |____________________| but |___________________| be |__________________| not |_________________| they |_________________| So |________________| very |________________| what


*sh (+curl), partial solution

This is incomplete, but for the hell of it, here''s the word-frequency counting half of the problem in 192 bytes:

curl -s http://www.gutenberg.org/files/11/11.txt|sed -e ''s@[^a-z]@/n@gi''|tr ''[:upper:]'' ''[:lower:]''|egrep -v ''(^[^a-z]*$|/b(the|and|of|to|a|i|it|in|or|is)/b)'' |sort|uniq -c|sort -n|tail -n 22


Perl: 203 202 201 198 195 208 203 / 231 chars

$/=/0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;map{$z=$x{$_};$y||{$y=(76-y///c)/$z}&&warn" "."_"x($z*$y)."/n";printf"|%.78s/n","_"x($z*$y)."| $_"}(sort{$x{$b}<=>$x{$a}}keys%x)[0..21]

Alternate, full implementation including indicated behaviour (global bar-squishing) for the pathological case in which the secondary word is both popular and long enough to combine to over 80 chars ( this implementation is 231 chars ):

$/=/0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;@e=(sort{$x{$b}<=>$x{$a}}keys%x)[0..21];for(@e){$p=(76-y///c)/$x{$_};($y&&$p>$y)||($y=$p)}warn" "."_"x($x{$e[0]}*$y)."/n";for(@e){warn"|"."_"x($x{$_}*$y)."| $_/n"}

The specification didn''t state anywhere that this had to go to STDOUT, so I used perl''s warn() instead of print - four characters saved there. Used map instead of foreach, but I feel like there could still be some more savings in the split(join()). Still, got it down to 203 - might sleep on it. At least Perl''s now under the "shell, grep, tr, grep, sort, uniq, sort, head, perl" char count for now ;)

PS: Reddit says "Hi" ;)

Update: Removed join() in favour of assignment and implicit scalar conversion join. Down to 202. Also please note I have taken advantage of the optional "ignore 1-letter words" rule to shave 2 characters off, so bear in mind the frequency count will reflect this.

Update 2: Swapped out assignment and implicit join for killing $/ to get the file in one gulp using <> in the first place. Same size, but nastier. Swapped out if(!$y){} for $y||{}&&, saved 1 more char => 201.

Update 3: Took control of lowercasing early (lc<>) by moving lc out of the map block - Swapped out both regexes to no longer use /i option, as no longer needed. Swapped explicit conditional x?y:z construct for traditional perlgolf || implicit conditional construct - /^...$/i?1:$x{$ }++ for /^...$/||$x{$ }++ Saved three characters! => 198, broke the 200 barrier. Might sleep soon... perhaps.

Update 4: Sleep deprivation has made me insane. Well. More insane. Figuring that this only has to parse normal happy text files, I made it give up if it hits a null. Saved two characters. Replaced "length" with the 1-char shorter (and much more golfish) y///c - you hear me, GolfScript?? I''m coming for you!!! sob

Update 5: Sleep dep made me forget about the 22row limit and subsequent-line limiting. Back up to 208 with those handled. Not too bad, 13 characters to handle it isn''t the end of the world. Played around with perl''s regex inline eval, but having trouble getting it to both work and save chars... lol. Updated the example to match current output.

Update 6: Removed unneeded braces protecting (...)for, since the syntactic candy ++ allows shoving it up against the for happily. Thanks to input from Chas. Owens (reminding my tired brain), got the character class i[tns] solution in there. Back down to 203.

Update 7: Added second piece of work, full implementation of specs (including the full bar-squishing behaviour for secondary long-words, instead of truncation which most people are doing, based on the original spec without the pathological example case)

Ejemplos:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |___________________________________________| that |____________________________________| as |________________________________| her |_____________________________| with |_____________________________| at |__________________________| on |__________________________| all |_______________________| this |_______________________| for |_______________________| had |_______________________| but |______________________| be |_____________________| not |____________________| they |____________________| so |___________________| very |__________________| what

Alternative implementation in pathological case example:

_______________________________________________________________ |_______________________________________________________________| she |_______________________________________________________| superlongstringstring |____________________________________________________| said |______________________________________________| alice |________________________________________| was |_____________________________________| that |_______________________________| as |____________________________| her |_________________________| with |_________________________| at |_______________________| on |______________________| all |____________________| this |____________________| for |____________________| had |____________________| but |___________________| be |__________________| not |_________________| they |_________________| so |________________| very |________________| what


Clojure 282 strict

(let[[[_ m]:as s](->>(slurp *in*).toLowerCase(re-seq #"/w+/b(?<!/bthe|and|of|to|a|i[tns]?|or)")frequencies(sort-by val >)(take 22))[b](sort(map #(/(- 76(count(key %)))(val %))s))p #(do(print %1)(dotimes[_(* b %2)](print /_))(apply println %&))](p " " m)(doseq[[k v]s](p /| v /| k)))

Somewhat more legibly:

(let[[[_ m]:as s](->> (slurp *in*) .toLowerCase (re-seq #"/w+/b(?<!/bthe|and|of|to|a|i[tns]?|or)") frequencies (sort-by val >) (take 22)) [b] (sort (map #(/ (- 76 (count (key %)))(val %)) s)) p #(do (print %1) (dotimes[_(* b %2)] (print /_)) (apply println %&))] (p " " m) (doseq[[k v] s] (p /| v /| k)))


Common LISP, 670 characters

I''m a LISP newbie, and this is an attempt using an hash table for counting (so probably not the most compact method).

(flet((r()(let((x(read-char t nil)))(and x(char-downcase x)))))(do((c( make-hash-table :test ''equal))(w NIL)(x(r)(r))y)((not x)(maphash(lambda (k v)(if(not(find k ''("""the""and""of""to""a""i""it""in""or""is"):test ''equal))(push(cons k v)y)))c)(setf y(sort y #''> :key #''cdr))(setf y (subseq y 0(min(length y)22)))(let((f(apply #''min(mapcar(lambda(x)(/(- 76.0(length(car x)))(cdr x)))y))))(flet((o(n)(dotimes(i(floor(* n f))) (write-char #/_))))(write-char #/Space)(o(cdar y))(write-char #/Newline) (dolist(x y)(write-char #/|)(o(cdr x))(format t "| ~a~%"(car x)))))) (cond((char<= #/a x #/z)(push x w))(t(incf(gethash(concatenate ''string( reverse w))c 0))(setf w nil)))))

can be run on for example with cat alice.txt | clisp -C golf.lisp .

In readable form is

(flet ((r () (let ((x (read-char t nil))) (and x (char-downcase x))))) (do ((c (make-hash-table :test ''equal)) ; the word count map w y ; current word and final word list (x (r) (r))) ; iteration over all chars ((not x) ; make a list with (word . count) pairs removing stopwords (maphash (lambda (k v) (if (not (find k ''("" "the" "and" "of" "to" "a" "i" "it" "in" "or" "is") :test ''equal)) (push (cons k v) y))) c) ; sort and truncate the list (setf y (sort y #''> :key #''cdr)) (setf y (subseq y 0 (min (length y) 22))) ; find the scaling factor (let ((f (apply #''min (mapcar (lambda (x) (/ (- 76.0 (length (car x))) (cdr x))) y)))) ; output (flet ((outx (n) (dotimes (i (floor (* n f))) (write-char #/_)))) (write-char #/Space) (outx (cdar y)) (write-char #/Newline) (dolist (x y) (write-char #/|) (outx (cdr x)) (format t "| ~a~%" (car x)))))) ; add alphabetic to current word, and bump word counter ; on non-alphabetic (cond ((char<= #/a x #/z) (push x w)) (t (incf (gethash (concatenate ''string (reverse w)) c 0)) (setf w nil)))))


Gawk -- 336 (originally 507) characters

(after fixing the output formatting; fixing the contractions thing; tweaking; tweaking again; removing a wholly unnecessary sorting step; tweaking yet again; and again (oops this one broke the formatting); tweak some more; taking up Matt''s challenge I desperately tweak so more; found another place to save a few, but gave two back to fix the bar length bug)

Heh heh! I am momentarily ahead of [Matt''s JavaScript][1] solution counter challenge! ;) and [AKX''s python][2].

The problem seems to call out for a language that implements native associative arrays, so of course I''ve chosen one with a horribly deficient set of operators on them. In particular, you cannot control the order in which awk offers up the elements of a hash map, so I repeatedly scan the whole map to find the currently most numerous item, print it and delete it from the array.

It is all terribly inefficient, with all the golfifcations I''ve made it has gotten to be pretty awful, as well.

Minified:

{gsub("[^a-zA-Z]"," ");for(;NF;NF--)a[tolower($NF)]++} END{split("the and of to a i it in or is",b," "); for(w in b)delete a[b[w]];d=1;for(w in a){e=a[w]/(78-length(w));if(e>d)d=e} for(i=22;i;--i){e=0;for(w in a)if(a[w]>e)e=a[x=w];l=a[x]/d-2; t=sprintf(sprintf("%%%dc",l)," ");gsub(" ","_",t);if(i==22)print" "t; print"|"t"| "x;delete a[x]}}

line breaks for clarity only: they are not necessary and should not be counted.

Salida:

$ gawk -f wordfreq.awk.min < 11.txt _________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |____________________________________________________| alice |______________________________________________| was |__________________________________________| that |___________________________________| as |_______________________________| her |____________________________| with |____________________________| at |___________________________| s |___________________________| t |_________________________| on |_________________________| all |______________________| this |______________________| for |______________________| had |_____________________| but |____________________| be |____________________| not |___________________| they |__________________| so $ sed ''s/you/superlongstring/gI'' 11.txt | gawk -f wordfreq.awk.min ______________________________________________________________________ |______________________________________________________________________| she |_____________________________________________________________| superlongstring |__________________________________________________________| said |__________________________________________________| alice |____________________________________________| was |_________________________________________| that |_________________________________| as |______________________________| her |___________________________| with |___________________________| at |__________________________| s |__________________________| t |________________________| on |________________________| all |_____________________| this |_____________________| for |_____________________| had |____________________| but |___________________| be |___________________| not |__________________| they |_________________| so

Readable; 633 characters (originally 949):

{ gsub("[^a-zA-Z]"," "); for(;NF;NF--) a[tolower($NF)]++ } END{ # remove "short" words split("the and of to a i it in or is",b," "); for (w in b) delete a[b[w]]; # Find the bar ratio d=1; for (w in a) { e=a[w]/(78-length(w)); if (e>d) d=e } # Print the entries highest count first for (i=22; i; --i){ # find the highest count e=0; for (w in a) if (a[w]>e) e=a[x=w]; # Print the bar l=a[x]/d-2; # make a string of "_" the right length t=sprintf(sprintf("%%%dc",l)," "); gsub(" ","_",t); if (i==22) print" "t; print"|"t"| "x; delete a[x] } }


Java - 896 chars

931 chars

1233 chars made unreadable

1977 chars "uncompressed"

Actualización: he reducido agresivamente el recuento de caracteres. Omite palabras de una sola letra por especificación actualizada.

Envidio tanto a C # y LINQ.

import java.util.*;import java.io.*;import static java.util.regex.Pattern.*;class g{public static void main(String[] a)throws Exception{PrintStream o=System.out;Map<String,Integer> w=new HashMap();Scanner s=new Scanner(new File(a[0])).useDelimiter(compile("[^a-z]+|//b(the|and|of|to|.|it|in|or|is)//b",2));while(s.hasNext()){String z=s.next().trim().toLowerCase();if(z.equals(""))continue;w.put(z,(w.get(z)==null?0:w.get(z))+1);}List<Integer> v=new Vector(w.values());Collections.sort(v);List<String> q=new Vector();int i,m;i=m=v.size()-1;while(q.size()<22){for(String t:w.keySet())if(!q.contains(t)&&w.get(t).equals(v.get(i)))q.add(t);i--;}int r=80-q.get(0).length()-4;String l=String.format("%1$0"+r+"d",0).replace("0","_");o.println(" "+l);o.println("|"+l+"| "+q.get(0)+" ");for(i=m-1;i>m-22;i--){o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");}}}

"Legible":

import java.util.*; import java.io.*; import static java.util.regex.Pattern.*; class g { public static void main(String[] a)throws Exception { PrintStream o = System.out; Map<String,Integer> w = new HashMap(); Scanner s = new Scanner(new File(a[0])) .useDelimiter(compile("[^a-z]+|//b(the|and|of|to|.|it|in|or|is)//b",2)); while(s.hasNext()) { String z = s.next().trim().toLowerCase(); if(z.equals("")) continue; w.put(z,(w.get(z) == null?0:w.get(z))+1); } List<Integer> v = new Vector(w.values()); Collections.sort(v); List<String> q = new Vector(); int i,m; i = m = v.size()-1; while(q.size()<22) { for(String t:w.keySet()) if(!q.contains(t)&&w.get(t).equals(v.get(i))) q.add(t); i--; } int r = 80-q.get(0).length()-4; String l = String.format("%1$0"+r+"d",0).replace("0","_"); o.println(" "+l); o.println("|"+l+"| "+q.get(0)+" "); for(i = m-1; i > m-22; i--) { o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" "); } } }

Salida de Alicia:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |_____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |____________________________________________| that |____________________________________| as |_________________________________| her |______________________________| with |______________________________| at |___________________________| on |__________________________| all |________________________| this |________________________| for |_______________________| had |_______________________| but |______________________| be |______________________| not |____________________| they |____________________| so |___________________| very |___________________| what

Salida de Don Quijote (también de Gutenberg):

________________________________________________________________________ |________________________________________________________________________| that |________________________________________________________| he |______________________________________________| for |__________________________________________| his |________________________________________| as |__________________________________| with |_________________________________| not |_________________________________| was |________________________________| him |______________________________| be |___________________________| don |_________________________| my |_________________________| this |_________________________| all |_________________________| they |________________________| said |_______________________| have |_______________________| me |______________________| on |______________________| so |_____________________| you |_____________________| quixote


Java - 886 865 756 744 742 744 752 742 714 680 chars

  • Updates before first 742 : improved regex, removed superfluous parameterized types, removed superfluous whitespace.

  • Update 742 > 744 chars : fixed the fixed-length hack. It''s only dependent on the 1st word, not other words (yet). Found several places to shorten the code ( //s in regex replaced by and ArrayList replaced by Vector ). I''m now looking for a short way to remove the Commons IO dependency and reading from stdin.

  • Update 744 > 752 chars : I removed the commons dependency. It now reads from stdin. Paste the text in stdin and hit Ctrl+Z to get result.

  • Update 752 > 742 chars : I removed public and a space, made classname 1 char instead of 2 and it''s now ignoring one-letter words.

  • Update 742 > 714 chars : Updated as per comments of Carl: removed redundant assignment (742 > 730), replaced m.containsKey(k) by m.get(k)!=null (730 > 728), introduced substringing of line (728 > 714).

  • Update 714 > 680 chars : Updated as per comments of Rotsor: improved bar size calculation to remove unnecessary casting and improved split() to remove unnecessary replaceAll() .

import java.util.*;class F{public static void main(String[]a)throws Exception{StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(//b(.|the|and|of|to|i[tns]|or)//b|//W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});int c=76-l.get(0).length();String s=new String(new char[c]).replace(''/0'',''_'');System.out.println(" "+s);for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);}}

More readable version:

import java.util.*; class F{ public static void main(String[]a)throws Exception{ StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c)); final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(//b(.|the|and|of|to|i[tns]|or)//b|//W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1); List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}}); int c=76-l.get(0).length();String s=new String(new char[c]).replace(''/0'',''_'');System.out.println(" "+s); for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w); } }

Salida:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |___________________________________________| that |____________________________________| as |________________________________| her |_____________________________| with |_____________________________| at |__________________________| on |__________________________| all |_______________________| this |_______________________| for |_______________________| had |_______________________| but |______________________| be |_____________________| not |____________________| they |____________________| so |___________________| very |__________________| what

It pretty sucks that Java doesn''t have String#join() and closures (yet).

Edit by Rotsor:

I have made several changes to your solution:

  • Replaced List with a String[]
  • Reused the ''args'' argument instead of declaring my own String array. Also used it as an argument to .ToArray()
  • Replaced StringBuffer with a String (yes, yes, terrible performance)
  • Replaced Java sorting with a selection-sort with early halting (only first 22 elements have to be found)
  • Aggregated some int declaration into a single statement
  • Implemented the non-cheating algorithm finding the most limiting line of output. Implemented it without FP.
  • Fixed the problem of the program crashing when there were less than 22 distinct words in the text
  • Implemented a new algorithm of reading input, which is fast and only 9 characters longer than the slow one.

The condensed code is 688 711 684 characters long:

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;(j=System.in.read())>0;w+=(char)j);for(String W:w.toLowerCase().split("(//b(.|the|and|of|to|i[tns]|or)//b|//W)+"))m.put(W,m.get(W)!=null?m.get(W)+1:1);l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace(''/0'',''_'');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

The fast version ( 720 693 characters)

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace(''/0'',''_'');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

More readable version:

import java.util.*;class F{public static void main(String[]l)throws Exception{ Map<String,Integer>m=new HashMap();String w=""; int i=0,k=0,j=8,x,y,g=22; for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{ if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w=""; }} l=m.keySet().toArray(l);x=l.length;if(x<g)g=x; for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;} for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}} String s=new String(new char[m.get(l[0])*i/j]).replace(''/0'',''_''); System.out.println(" "+s); for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}} }

The version without behaviour improvements is 615 characters:

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);for(;i<g;++i)for(j=i;++j<l.length;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}i=76-l[0].length();String s=new String(new char[i]).replace(''/0'',''_'');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/m.get(l[0]))+"| "+w);}}}


JavaScript 1.8 (SpiderMonkey) - 354

x={};p=''|'';e='' '';z=[];c=77 while(l=readline())l.toLowerCase().replace(//b(?!(the|and|of|to|a|i[tns]?|or)/b)/w+/g,function(y)x[y]?x[y].c++:z.push(x[y]={w:y,c:1})) z=z.sort(function(a,b)b.c-a.c).slice(0,22) for each(v in z){v.r=v.c/z[0].c c=c>(l=(77-v.w.length)/v.r)?l:c}for(k in z){v=z[k] s=Array(v.r*c|0).join(''_'') if(!+k)print(e+s+e) print(p+s+p+e+v.w)}

Sadly, the for([k,v]in z) from the Rhino version doesn''t seem to want to work in SpiderMonkey, and readFile() is a little easier than using readline() but moving up to 1.8 allows us to use function closures to cut a few more lines....

Adding whitespace for readability:

x={};p=''|'';e='' '';z=[];c=77 while(l=readline()) l.toLowerCase().replace(//b(?!(the|and|of|to|a|i[tns]?|or)/b)/w+/g, function(y) x[y] ? x[y].c++ : z.push( x[y] = {w: y, c: 1} ) ) z=z.sort(function(a,b) b.c - a.c).slice(0,22) for each(v in z){ v.r=v.c/z[0].c c=c>(l=(77-v.w.length)/v.r)?l:c } for(k in z){ v=z[k] s=Array(v.r*c|0).join(''_'') if(!+k)print(e+s+e) print(p+s+p+e+v.w) }

Usage: js golf.js < input.txt

Salida:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |____________________________________________________| alice |______________________________________________| was |___________________________________________| that |___________________________________| as |________________________________| her |_____________________________| at |_____________________________| with |____________________________| s |____________________________| t |__________________________| on |_________________________| all |_______________________| this |______________________| for |______________________| had |______________________| but |_____________________| be |_____________________| not |___________________| they |___________________| so

(base version - doesn''t handle bar widths correctly)

JavaScript (Rhino) - 405 395 387 377 368 343 304 chars

I think my sorting logic is off, but.. I duno. Brainfart fixed.

Minified (abusing /n ''s interpreted as a ; sometimes):

x={};p=''|'';e='' '';z=[] readFile(arguments[0]).toLowerCase().replace(//b(?!(the|and|of|to|a|i[tns]?|or)/b)/w+/g,function(y){x[y]?x[y].c++:z.push(x[y]={w:y,c:1})}) z=z.sort(function(a,b){return b.c-a.c}).slice(0,22) for([k,v]in z){s=Array((v.c/z[0].c)*70|0).join(''_'') if(!+k)print(e+s+e) print(p+s+p+e+v.w)}


PHP CLI version (450 chars)

This solution takes into account the last requirement which most purists have conviniently chosen to ignore. That costed 170 characters!

Usage: php.exe <this.php> <file.txt>

Minified:

<?php $a=array_count_values(array_filter(preg_split(''/[^a-z]/'',strtolower(file_get_contents($argv[1])),-1,1),function($x){return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);}));arsort($a);$a=array_slice($a,0,22);function R($a,$F,$B){$r=array();foreach($a as$x=>$f){$l=strlen($x);$r[$x]=$b=$f*$B/$F;if($l+$b>76)return R($a,$f,76-$l);}return$r;}$c=R($a,max($a),76-strlen(key($a)));foreach($a as$x=>$f)echo ''|'',str_repeat(''-'',$c[$x]),"| $x/n";?>

Human readable:

<?php // Read: $s = strtolower(file_get_contents($argv[1])); // Split: $a = preg_split(''/[^a-z]/'', $s, -1, PREG_SPLIT_NO_EMPTY); // Remove unwanted words: $a = array_filter($a, function($x){ return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x); }); // Count: $a = array_count_values($a); // Sort: arsort($a); // Pick top 22: $a=array_slice($a,0,22); // Recursive function to adjust bar widths // according to the last requirement: function R($a,$F,$B){ $r = array(); foreach($a as $x=>$f){ $l = strlen($x); $r[$x] = $b = $f * $B / $F; if ( $l + $b > 76 ) return R($a,$f,76-$l); } return $r; } // Apply the function: $c = R($a,max($a),76-strlen(key($a))); // Output: foreach ($a as $x => $f) echo ''|'',str_repeat(''-'',$c[$x]),"| $x/n"; ?>

Salida:

|-------------------------------------------------------------------------| she |---------------------------------------------------------------| you |------------------------------------------------------------| said |-----------------------------------------------------| alice |-----------------------------------------------| was |-------------------------------------------| that |------------------------------------| as |--------------------------------| her |-----------------------------| at |-----------------------------| with |--------------------------| on |--------------------------| all |-----------------------| this |-----------------------| for |-----------------------| had |-----------------------| but |----------------------| be |---------------------| not |--------------------| they |--------------------| so |-------------------| very |------------------| what

When there is a long word, the bars are adjusted properly:

|--------------------------------------------------------| she |---------------------------------------------------| thisisareallylongwordhere |-------------------------------------------------| you |-----------------------------------------------| said |-----------------------------------------| alice |------------------------------------| was |---------------------------------| that |---------------------------| as |-------------------------| her |-----------------------| with |-----------------------| at |--------------------| on |--------------------| all |------------------| this |------------------| for |------------------| had |-----------------| but |-----------------| be |----------------| not |---------------| they |---------------| so |--------------| very


Perl, 185 char

200 (slightly broken) 199 197 195 193 187 185 characters. Last two newlines are significant. Complies with the spec.

map$X{+lc}+=!/^(.|the|and|to|i[nst]|o[rf])$/i,/[a-z]+/gfor<>; $n=$n>($:=$X{$_}/(76-y+++c))?$n:$:for@w=(sort{$X{$b}-$X{$a}}%X)[0..21]; die map{$U=''_''x($X{$_}/$n);" $U "x!$z++,"|$U| $_ "}@w

First line loads counts of valid words into %X .

The second line computes minimum scaling factor so that all output lines will be <= 80 characters.

The third line (contains two newline characters) produces the output.


Python 2.6, 347 chars

import re W,x={},"a and i in is it of or the to".split() [W.__setitem__(w,W.get(w,0)-1)for w in re.findall("[a-z]+",file("11.txt").read().lower())if w not in x] W=sorted(W.items(),key=lambda p:p[1])[:22] bm=(76.-len(W[0][0]))/W[0][1] U=lambda n:"_"*int(n*bm) print "".join(("%s/n|%s| %s "%((""if i else" "+U(n)),U(n),w))for i,(w,n)in enumerate(W))

Salida:

_________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |___________________________________________| that |____________________________________| as |________________________________| her |_____________________________| with |_____________________________| at |____________________________| s |____________________________| t |__________________________| on |__________________________| all |_______________________| this |_______________________| for |_______________________| had |_______________________| but |______________________| be |_____________________| not |____________________| they |____________________| so


Python 2.x, latitudinarian approach = 227 183 chars

import sys,re t=re.split(''/W+'',sys.stdin.read().lower()) r=sorted((-t.count(w),w)for w in set(t)if w not in''andithetoforinis'')[:22] for l,w in r:print(78-len(r[0][1]))*l/r[0][0]*''='',w

Allowing for freedom in the implementation, I constructed a string concatenation that contains all the words requested for exclusion ( the, and, of, to, a, i, it, in, or, is ) - plus it also excludes the two infamous "words" s and t from the example - and I threw in for free the exclusion for an, for, he . I tried all concatenations of those words against corpus of the words from Alice, King James'' Bible and the Jargon file to see if there are any words that will be mis-excluded by the string. And that is how I ended with two exclusion strings: itheandtoforinis and andithetoforinis .

PD. borrowed from other solutions to shorten the code.

=========================================================================== she ================================================================= you ============================================================== said ====================================================== alice ================================================ was ============================================ that ===================================== as ================================= her ============================== at ============================== with =========================== on =========================== all ======================== this ======================== had ======================= but ====================== be ====================== not ===================== they ==================== so =================== very =================== what ================= little

Despotricar

Regarding words to ignore, one would think those would be taken from list of the most used words in English. That list depends on the text corpus used. Per one of the most popular lists ( http://en.wikipedia.org/wiki/Most_common_words_in_English , http://www.english-for-students.com/Frequently-Used-Words.html , http://www.sporcle.com/games/common_english_words.php ), top 10 words are: the be(am/are/is/was/were) to of and a in that have I

The top 10 words from the Alice in Wonderland text are the and to a of it she i you said
The top 10 words from the Jargon File (v4.4.7) are the a of to and in is that or for

So question is why or was included in the problem''s ignore list, where it''s ~30th in popularity when the word that (8th most used) is not. etc, etc. Hence I believe the ignore list should be provided dynamically (or could be omitted).

Alternative idea would be simply to skip the top 10 words from the result - which actually would shorten the solution (elementary - have to show only the 11th to 32nd entries).

Python 2.x, punctilious approach = 277 243 chars

The chart drawn in the above code is simplified (using only one character for the bars). If one wants to reproduce exactly the chart from the problem description (which was not required), this code will do it:

import sys,re t=re.split(''/W+'',sys.stdin.read().lower()) r=sorted((-t.count(w),w)for w in set(t)-set(sys.argv))[:22] h=min(9*l/(77-len(w))for l,w in r) print'''',9*r[0][0]/h*''_'' for l,w in r:print''|''+9*l/h*''_''+''|'',w

I take an issue with the somewhat random choice of the 10 words to exclude the, and, of, to, a, i, it, in, or, is so those are to be passed as command line parameters, like so:
python WordFrequencyChart.py the and of to ai it in or is <"Alice''s Adventures in Wonderland.txt"

This is 213 chars + 30 if we account for the "original" ignore list passed on command line = 243

PD. The second code also does "adjustment" for the lengths of all top words, so none of them will overflow in degenerate case.

_______________________________________________________________ |_______________________________________________________________| she |_______________________________________________________| superlongstringstring |_____________________________________________________| said |______________________________________________| alice |_________________________________________| was |______________________________________| that |_______________________________| as |____________________________| her |__________________________| at |__________________________| with |_________________________| s |_________________________| t |_______________________| on |_______________________| all |____________________| this |____________________| for |____________________| had |____________________| but |___________________| be |___________________| not |_________________| they |_________________| so


Python 3.1 - 245 229 charaters

I guess using Counter is kind of cheating :) I just read about it about a week ago, so this was the perfect chance to see how it works.

import re,collections o=collections.Counter([w for w in re.findall("[a-z]+",open("!").read().lower())if w not in"a and i in is it of or the to".split()]).most_common(22) print(''/n''.join(''|''+76*v//o[0][1]*''_''+''| ''+k for k,v in o))

Prints out:

|____________________________________________________________________________| she |__________________________________________________________________| you |_______________________________________________________________| said |_______________________________________________________| alice |_________________________________________________| was |_____________________________________________| that |_____________________________________| as |__________________________________| her |_______________________________| with |_______________________________| at |______________________________| s |_____________________________| t |____________________________| on |___________________________| all |________________________| this |________________________| for |________________________| had |________________________| but |______________________| be |______________________| not |_____________________| they |____________________| so

Some of the code was "borrowed" from AKX''s solution.


Scala 2.8, 311 314 320 330 332 336 341 375 characters

including long word adjustment. Ideas borrowed from the other solutions.

Now as a script ( a.scala ):

val t="//w+//b(?<!//bthe|and|of|to|a|i[tns]?|or)".r.findAllIn(io.Source.fromFile(argv(0)).mkString.toLowerCase).toSeq.groupBy(w=>w).mapValues(_.size).toSeq.sortBy(-_._2)take 22 def b(p:Int)="_"*(p*(for((w,c)<-t)yield(76.0-w.size)/c).min).toInt println(" "+b(t(0)._2)) for(p<-t)printf("|%s| %s /n",b(p._2),p._1)

Run with

scala -howtorun:script a.scala alice.txt

BTW, the edit from 314 to 311 characters actually removes only 1 character. Someone got the counting wrong before (Windows CRs?).


Scala, 368 chars

First, a legible version in 592 characters:

object Alice { def main(args:Array[String]) { val s = io.Source.fromFile(args(0)) val words = s.getLines.flatMap("(?i)//w+//b(?<!//bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase) val freqs = words.foldLeft(Map[String, Int]())((countmap, word) => countmap + (word -> (countmap.getOrElse(word, 0)+1))) val sortedFreqs = freqs.toList.sort((a, b) => a._2 > b._2) val top22 = sortedFreqs.take(22) val highestWord = top22.head._1 val highestCount = top22.head._2 val widest = 76 - highestWord.length println(" " + "_" * widest) top22.foreach(t => { val width = Math.round((t._2 * 1.0 / highestCount) * widest).toInt println("|" + "_" * width + "| " + t._1) }) } }

The console output looks like this:

$ scalac alice.scala $ scala Alice aliceinwonderland.txt _________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |_____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |____________________________________________| that |____________________________________| as |_________________________________| her |______________________________| at |______________________________| with |_____________________________| s |_____________________________| t |___________________________| on |__________________________| all |_______________________| had |_______________________| but |______________________| be |______________________| not |____________________| they |____________________| so |___________________| very |___________________| what

We can do some aggressive minifying and get it down to 415 characters:

object A{def main(args:Array[String]){val l=io.Source.fromFile(args(0)).getLines.flatMap("(?i)//w+//b(?<!//bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase).foldLeft(Map[String, Int]())((c,w)=>c+(w->(c.getOrElse(w,0)+1))).toList.sort((a,b)=>a._2>b._2).take(22);println(" "+"_"*(76-l.head._1.length));l.foreach(t=>println("|"+"_"*Math.round((t._2*1.0/l.head._2)*(76-l.head._1.length)).toInt+"| "+t._1))}}

The console session looks like this:

$ scalac a.scala $ scala A aliceinwonderland.txt _________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |_____________________________________________________________| said |_____________________________________________________| alice |_______________________________________________| was |____________________________________________| that |____________________________________| as |_________________________________| her |______________________________| at |______________________________| with |_____________________________| s |_____________________________| t |___________________________| on |__________________________| all |_______________________| had |_______________________| but |______________________| be |______________________| not |____________________| they |____________________| so |___________________| very |___________________| what

I''m sure a Scala expert could do even better.

Update: In the comments Thomas gave an even shorter version, at 368 characters:

object A{def main(a:Array[String]){val t=(Map[String, Int]()/:(for(x<-io.Source.fromFile(a(0)).getLines;y<-"(?i)//w+//b(?<!//bthe|and|of|to|a|i|it|in|or|is)".r findAllIn x) yield y.toLowerCase).toList)((c,x)=>c+(x->(c.getOrElse(x,0)+1))).toList.sortBy(_._2).reverse.take(22);val w=76-t.head._1.length;print(" "+"_"*w);t map (s=>"/n|"+"_"*(s._2*w/t.head._2)+"| "+s._1) foreach print}}

Legibly, at 375 characters:

object Alice { def main(a:Array[String]) { val t = (Map[String, Int]() /: ( for ( x <- io.Source.fromFile(a(0)).getLines y <- "(?i)//w+//b(?<!//bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(x) ) yield y.toLowerCase ).toList)((c, x) => c + (x -> (c.getOrElse(x, 0) + 1))).toList.sortBy(_._2).reverse.take(22) val w = 76 - t.head._1.length print (" "+"_"*w) t.map(s => "/n|" + "_" * (s._2 * w / t.head._2) + "| " + s._1).foreach(print) } }


perl, 205 191 189 characters/ 205 characters (fully implemented)

Some parts were inspired by the earlier perl/ruby submissions, a couple similar ideas were arrived at independently, the others are original. Shorter version also incorporates some things I saw/learned from other submissions.

Original:

$k{$_}++for grep{$_!~/^(the|and|of|to|a|i|it|in|or|is)$/}map{lc=~/[a-z]+/g}<>;@t=sort{$k{$b}<=>$k{$a}}keys%k;$l=76-length$t[0];printf" %s ",''_''x$l;printf"|%s| $_ ",''_''x int$k{$_}/$k{$t[0]}*$l for@t[0..21];

Latest version down to 191 characters:

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s ";$r=(76-y///c)/$k{$_=$e[0]};map{printf$n,''_''x($k{$_}*$r),$_;$n="|%s| %s "}@e[0,0..21]

Latest version down to 189 characters:

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@_=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s ";$r=(76-m//)/$k{$_=$_[0]};map{printf$n,''_''x($k{$_}*$r),$_;$n="|%s| %s "}@_[0,0..21]

This version (205 char) accounts for the lines with words longer than what would be found later.

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;($r)=sort{$a<=>$b}map{(76-y///c)/$k{$_}}@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s ";map{printf$n,''_''x($k{$_}*$r),$_;$n="|%s| %s ";}@e[0,0..21]