mathematical - scipy optimize minimize in python

Una versión ponderada de random.choice (18)

Organice los pesos en una distribución acumulativa.
Utilice random.random () para elegir un float aleatorio 0.0 <= x < total .
Busque la distribución usando bisect.bisect como se muestra en el ejemplo en http://docs.python.org/dev/library/bisect.html#other-examples .

from random import random from bisect import bisect def weighted_choice(choices): values, weights = zip(*choices) total = 0 cum_weights = [] for w in weights: total += w cum_weights.append(total) x = random() * total i = bisect(cum_weights, x) return values[i] >>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)]) ''WHITE''

Si necesita hacer más de una elección, divida esto en dos funciones, una para construir los pesos acumulados y otra para dividir en dos en un punto aleatorio.

Necesitaba escribir una versión ponderada de random.choice (cada elemento de la lista tiene una probabilidad diferente de ser seleccionado). Esto es lo que se me ocurrió:

def weightedChoice(choices): """Like random.choice, but each element can have a different chance of being selected. choices can be any iterable containing iterables with two items each. Technically, they can have more than two items, the rest will just be ignored. The first item is the thing being chosen, the second item is its weight. The weights can be any numeric values, what matters is the relative differences between them. """ space = {} current = 0 for choice, weight in choices: if weight > 0: space[current] = choice current += weight rand = random.uniform(0, current) for key in sorted(space.keys() + [current]): if rand < key: return choice choice = space[key] return None

Esta función me parece demasiado compleja y fea. Espero que todos aquí puedan ofrecer algunas sugerencias para mejorarlo o alternar formas de hacerlo. La eficiencia no es tan importante para mí como la limpieza y legibilidad del código.

A partir de Python v3.6 , v3.6 podría utilizarse para devolver una list de elementos de tamaño especificado de la población dada con pesos opcionales.

random.choices(population, weights=None, *, cum_weights=None, k=1)

población : list contiene observaciones únicas. (Si está vacío, aumenta IndexError )
Pesos : pesos relativos más precisos necesarios para realizar selecciones.
cum_weights : pesos acumulativos necesarios para hacer selecciones.
k : tamaño ( len ) de la list que se va a generar. (Predeterminado len()=1 )

Pocas advertencias:

1) Hace uso de muestreo ponderado con reemplazo para que los elementos extraídos sean reemplazados más tarde. Los valores en la secuencia de ponderaciones en sí no importan, pero su relación relativa sí lo hace.

A diferencia de np.random.choice que solo puede tomar probabilidades como ponderaciones y que debe garantizar la suma de las probabilidades individuales hasta 1 criterio, no existen tales regulaciones aquí. Siempre que pertenezcan a tipos numéricos ( int/float/fraction excepto Decimal type), estos seguirán funcionando.

>>> import random # weights being integers >>> random.choices(["white", "green", "red"], [12, 12, 4], k=10) [''green'', ''red'', ''green'', ''white'', ''white'', ''white'', ''green'', ''white'', ''red'', ''white''] # weights being floats >>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10) [''white'', ''white'', ''green'', ''green'', ''red'', ''red'', ''white'', ''green'', ''white'', ''green''] # weights being fractions >>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10) [''green'', ''green'', ''white'', ''red'', ''green'', ''red'', ''white'', ''green'', ''green'', ''green'']

2) Si no se especifican ni ponderaciones ni pesos cum_ , las selecciones se realizan con la misma probabilidad. Si se suministra una secuencia de pesos , debe ser de la misma longitud que la secuencia de población .

Especificar tanto pesos como cum_weights genera un TypeError .

>>> random.choices(["white", "green", "red"], k=10) [''white'', ''white'', ''green'', ''red'', ''red'', ''red'', ''white'', ''white'', ''white'', ''green'']

3) cum_weights son típicamente el resultado de la función itertools.accumulate que son realmente útiles en tales situaciones.

_{De la documentación vinculada:}
Internamente, los pesos relativos se convierten en pesos acumulados antes de realizar selecciones, por lo que el suministro de pesos acumulados ahorra trabajo.

Por lo tanto, el suministro de weights=[12, 12, 4] o cum_weights=[12, 24, 28] para nuestro caso artificial produce el mismo resultado y el último parece ser más rápido / eficiente.

Aquí hay otra versión de weighted_choice que usa numpy. Pase el vector de pesos y devolverá un conjunto de 0 que contiene un 1 que indica qué contenedor fue elegido. El código predeterminado es solo hacer un sorteo, pero puede pasar el número de sorteos que se realizarán y se devolverán los conteos por bote dibujado.

Si el vector de ponderaciones no suma a 1, se normalizará para que así sea.

import numpy as np def weighted_choice(weights, n=1): if np.sum(weights)!=1: weights = weights/np.sum(weights) draws = np.random.random_sample(size=n) weights = np.cumsum(weights) weights = np.insert(weights,0,0.0) counts = np.histogram(draws, bins=weights) return(counts[0])

Crudo, pero puede ser suficiente:

import random weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

¿Funciona?

# define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] # initialize tally dict tally = dict.fromkeys(choices, 0) # tally up 1000 weighted choices for i in xrange(1000): tally[weighted_choice(choices)] += 1 print tally.items()

Huellas dactilares:

[(''WHITE'', 904), (''GREEN'', 22), (''RED'', 74)]

Supone que todos los pesos son enteros. No tienen que sumar 100, solo lo hice para que los resultados de la prueba fueran más fáciles de interpretar. (Si los pesos son números flotantes, multiplíquelos todos por 10 repetidamente hasta que todos los pesos> = 1.)

weights = [.6, .2, .001, .199] while any(w < 1.0 for w in weights): weights = [w*10 for w in weights] weights = map(int, weights)

Depende de cuántas veces quiera muestrear la distribución.

Supongamos que quiere muestrear la distribución K veces. Entonces, la complejidad de tiempo usando np.random.choice() cada vez es O(K(n + log(n))) cuando n es el número de elementos en la distribución.

En mi caso, tuve que muestrear la misma distribución varias veces del orden de 10 ^ 3, donde n es del orden de 10 ^ 6. Utilicé el siguiente código, que calcula previamente la distribución acumulada y la prueba en O(log(n)) . La complejidad global del tiempo es O(n+K*log(n)) .

import numpy as np n,k = 10**6,10**3 # Create dummy distribution a = np.array([i+1 for i in range(n)]) p = np.array([1.0/n]*n) cfd = p.cumsum() for _ in range(k): x = np.random.uniform() idx = cfd.searchsorted(x, side=''right'') sampled_element = a[idx]

Desde Python3.6 hay choices método de random módulo random .

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) Type ''copyright'', ''credits'' or ''license'' for more information IPython 6.0.0 -- An enhanced Interactive Python. Type ''?'' for help. In [1]: import random In [2]: population = [[''a'',''b''], [''b'',''a''], [''c'',''b'']] In [3]: list_of_prob = [0.2, 0.2, 0.6] In [4]: population = random.choices(population, weights=list_of_prob, k=10) In [5]: population Out[5]: [[''c'', ''b''], [''c'', ''b''], [''b'', ''a''], [''c'', ''b''], [''c'', ''b''], [''b'', ''a''], [''c'', ''b''], [''b'', ''a''], [''c'', ''b''], [''c'', ''b'']]

Y la gente también mencionó que hay numpy.random.choice que admite ponderaciones, PERO no admite matrices en 2d , y así sucesivamente.

Entonces, básicamente puedes obtener lo que quieras con las elecciones al azar incorporadas si tienes 3.6.x Python .

Desde la versión 1.7.0, NumPy tiene una función de choice que admite distribuciones de probabilidad.

from numpy.random import choice draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)

Tenga en cuenta que probability_distribution es una secuencia en el mismo orden de list_of_candidates . También puede usar la palabra clave replace=False para cambiar el comportamiento de manera que los elementos dibujados no sean reemplazados.

Esta es la versión que se está incluyendo en la biblioteca estándar para Python 3.6:

import itertools as _itertools import bisect as _bisect class Random36(random.Random): "Show the code included in the Python 3.6 version of the Random class" def choices(self, population, weights=None, *, cum_weights=None, k=1): """Return a k sized list of population elements chosen with replacement. If the relative weights or cumulative weights are not specified, the selections are made with equal probability. """ random = self.random if cum_weights is None: if weights is None: _int = int total = len(population) return [population[_int(random() * total)] for i in range(k)] cum_weights = list(_itertools.accumulate(weights)) elif weights is not None: raise TypeError(''Cannot specify both weights and cumulative weights'') if len(cum_weights) != len(population): raise ValueError(''The number of weights does not match the population'') bisect = _bisect.bisect total = cum_weights[-1] return [population[bisect(cum_weights, random() * total)] for i in range(k)]

Fuente: https://hg.python.org/cpython/file/tip/Lib/random.py#l340

Exigiría que la suma de opciones sea 1, pero esto funciona de todos modos

def weightedChoice(choices): # Safety check, you can remove it for c,w in choices: assert w >= 0 tmp = random.uniform(0, sum(c for c,w in choices)) for choice,weight in choices: if tmp < weight: return choice else: tmp -= weight raise ValueError(''Negative values in input'')

Miré el otro hilo y presenté esta variación en mi estilo de codificación, esto devuelve el índice de elección para el propósito de contar, pero es simple devolver la cadena (alternativa de devolución comentada):

import random import bisect try: range = xrange except: pass def weighted_choice(choices): total, cumulative = 0, [] for c,w in choices: total += w cumulative.append((total, c)) r = random.uniform(0, total) # return index return bisect.bisect(cumulative, (r,)) # return item string #return choices[bisect.bisect(cumulative, (r,))][0] # define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] tally = [0 for item in choices] n = 100000 # tally up n weighted choices for i in range(n): tally[weighted_choice(choices)] += 1 print([t/sum(tally)*100 for t in tally])

Probablemente sea demasiado tarde para aportar algo útil, pero aquí hay un fragmento simple, corto y muy eficiente:

def choose_index(probabilies): cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

No es necesario ordenar sus probabilidades o crear un vector con su cmf, y termina una vez que encuentra su elección. Memoria: O (1), tiempo: O (N), con tiempo promedio de ejecución ~ N / 2.

Si tienes pesas, simplemente agrega una línea:

def choose_index(weights): probabilities = weights / sum(weights) cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

Si no te importa usar numpy, puedes usar numpy.random.choice .

Por ejemplo:

import numpy items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05] elems = [i[0] for i in items] probs = [i[1] for i in items] trials = 1000 results = [0] * len(items) for i in range(trials): res = numpy.random.choice(items, p=probs) #This is where the item is selected! results[items.index(res)] += 1 results = [r / float(trials) for r in results] print "item/texpected/tactual" for i in range(len(probs)): print "%s/t%0.4f/t%0.4f" % (items[i], probs[i], results[i])

Si sabe cuántas selecciones necesita hacer con antelación, puede hacerlo sin un ciclo como este:

numpy.random.choice(items, trials, p=probs)

Si su lista de opciones ponderadas es relativamente estática y desea un muestreo frecuente, puede hacer un paso de preprocesamiento de O (N) y luego hacer la selección en O (1), usando las funciones en esta respuesta relacionada .

# run only when `choices` changes. preprocessed_data = prep(weight for _,weight in choices) # O(1) selection value = choices[sample(preprocessed_data)][0]

Si tiene un diccionario ponderado en lugar de una lista, puede escribir esto

items = { "a": 10, "b": 5, "c": 1 } random.choice([k for k in items for dummy in range(items[k])])

Tenga en cuenta que [k for k in items for dummy in range(items[k])] produce esta lista [''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''c'', ''b'', ''b'', ''b'', ''b'', ''b'']

Una forma es aleatorizar sobre el total de todos los pesos y luego usar los valores como los puntos límite para cada var. Aquí hay una implementación cruda como generador.

def rand_weighted(weights): """ Generator which uses the weights to generate a weighted random values """ sum_weights = sum(weights.values()) cum_weights = {} current_weight = 0 for key, value in sorted(weights.iteritems()): current_weight += value cum_weights[key] = current_weight while True: sel = int(random.uniform(0, 1) * sum_weights) for key, value in sorted(cum_weights.iteritems()): if sel < value: break yield key

Una solución general:

import random def weighted_choice(choices, weights): total = sum(weights) treshold = random.uniform(0, total) for k, weight in enumerate(weights): total -= weight if total < treshold: return choices[k]

def weighted_choice(choices): total = sum(w for c, w in choices) r = random.uniform(0, total) upto = 0 for c, w in choices: if upto + w >= r: return c upto += w assert False, "Shouldn''t get here"

import numpy as np w=np.array([ 0.4, 0.8, 1.6, 0.8, 0.4]) np.random.choice(w, p=w/sum(w))