python - real - histogramas en spyder

¿Cómo hacer un histograma a partir de una lista de cadenas en Python? (6)

Tengo una lista de cuerdas:

a = [''a'', ''a'', ''a'', ''a'', ''b'', ''b'', ''c'', ''c'', ''c'', ''d'', ''e'', ''e'', ''e'', ''e'', ''e'']

Quiero hacer un histograma para mostrar la distribución de frecuencia de las letras. Puedo hacer una lista que contiene el recuento de cada letra usando los siguientes códigos:

from itertools import groupby b = [len(list(group)) for key, group in groupby(a)]

¿Cómo hago el histograma? Puedo tener un millón de tales elementos en la lista a .

Aquí hay un enfoque conciso de todos los pandas:

a = [''a'', ''a'', ''a'', ''a'', ''b'', ''b'', ''c'', ''c'', ''c'', ''d'', ''e'', ''e'', ''e'', ''e'', ''e''] pd.Series(a).value_counts().plot(''bar'')

Como @notconfusing señaló anteriormente, esto se puede resolver con Pandas y Counter. Si por alguna razón no necesita usar Pandas , puede hacerlo con solo matplotlib usando la función en el siguiente código:

from collections import Counter import numpy as np import matplotlib.pyplot as plt a = [''a'', ''a'', ''a'', ''a'', ''b'', ''b'', ''c'', ''c'', ''c'', ''d'', ''e'', ''e'', ''e'', ''e'', ''e''] letter_counts = Counter(a) def plot_bar_from_counter(counter, ax=None): """" This function creates a bar plot from a counter. :param counter: This is a counter object, a dictionary with the item as the key and the frequency as the value :param ax: an axis of matplotlib :return: the axis wit the object in it """ if ax is None: fig = plt.figure() ax = fig.add_subplot(111) frequencies = counter.values() names = counter.keys() x_coordinates = np.arange(len(counter)) ax.bar(x_coordinates, frequencies, align=''center'') ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates)) ax.xaxis.set_major_formatter(plt.FixedFormatter(names)) return ax plot_bar_from_counter(letter_counts) plt.show()

Que producirá

Echa un vistazo a matplotlib.pyplot.bar . También hay un numpy.histogram que es más flexible si quieres contenedores más anchos.

En lugar de usar groupby() (que requiere que su entrada esté ordenada), use collections.Counter() ; esto no tiene que crear listas intermedias solo para contar entradas:

from collections import Counter counts = Counter(a)

No has especificado realmente lo que consideras un "histograma". Asumamos que querías hacer esto en la terminal:

width = 120 # Adjust to desired width longest_key = max(len(key) for key in counts) graph_width = width - longest_key - 2 widest = counts.most_common(1)[0][1] scale = graph_width / float(widest) for key, size in sorted(counts.items()): print(''{}: {}''.format(key, int(size * scale) * ''*''))

Manifestación:

>>> from collections import Counter >>> a = [''a'', ''a'', ''a'', ''a'', ''b'', ''b'', ''c'', ''c'', ''c'', ''d'', ''e'', ''e'', ''e'', ''e'', ''e''] >>> counts = Counter(a) >>> width = 120 # Adjust to desired width >>> longest_key = max(len(key) for key in counts) >>> graph_width = width - longest_key - 2 >>> widest = counts.most_common(1)[0][1] >>> scale = graph_width / float(widest) >>> for key, size in sorted(counts.items()): ... print(''{}: {}''.format(key, int(size * scale) * ''*'')) ... a: ********************************************************************************************* b: ********************************************** c: ********************************************************************** d: *********************** e: *********************************************************************************************************************

Se encuentran herramientas más sofisticadas en las numpy.histogram() y matplotlib.pyplot.hist() . Estos hacen el recuento por usted, con matplotlib.pyplot.hist() también le proporciona resultados gráficos.

Forma simple y efectiva de hacer un histrograma de caracteres en python.

import numpy as np import matplotlib.pyplot as plt from collections import Counter a = [] count =0 d = dict() filename = raw_input("Enter file name: ") with open(filename,''r'') as f: for word in f: for letter in word: if letter not in d: d[letter] = 1 else: d[letter] +=1 num = Counter(d) x = list(num.values()) y = list(num.keys()) x_coordinates = np.arange(len(num.keys())) plt.bar(x_coordinates,x) plt.xticks(x_coordinates,y) plt.show() print x,y

Muy fácil con Pandas .

import pandas from collections import Counter a = [''a'', ''a'', ''a'', ''a'', ''b'', ''b'', ''c'', ''c'', ''c'', ''d'', ''e'', ''e'', ''e'', ''e'', ''e''] letter_counts = Counter(a) df = pandas.DataFrame.from_dict(letter_counts, orient=''index'') df.plot(kind=''bar'')

Tenga en cuenta que Counter está haciendo un conteo de frecuencia, por lo que nuestro tipo de gráfico es ''bar'' no ''hist'' .