python 3.x - ¿Acelerar las operaciones bits/bits en Python?

python-3.x optimization (11)

Escribí un generador de números primos usando Sieve of Eratosthenes y Python 3.1. El código se ejecuta correctamente y con gracia en 0.32 segundos en ideone.com para generar números primos de hasta 1,000,000.

# from bitstring import BitString def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True

El problema es que me quedo sin memoria cuando trato de generar números de hasta 1,000,000,000.

flags = [False, False] + [True] * (limit - 2) MemoryError

Como se puede imaginar, la asignación de mil millones de valores booleanos ( ~~1 byte~~ 4 u 8 bytes (ver comentario) cada uno en Python) realmente no es factible, así que busqué bitstring . Pensé que usar 1 bit para cada bandera sería mucho más eficiente con la memoria. Sin embargo, el rendimiento del programa disminuyó drásticamente: 24 segundos de tiempo de ejecución, para el número primo hasta 1,000,000. Esto probablemente se deba a la implementación interna de la cadena de bits.

Puede comentar / descomentar las tres líneas para ver lo que cambié para usar BitString, como el fragmento de código anterior.

Mi pregunta es, ¿hay alguna forma de acelerar mi programa, con o sin cadena de bits?

Editar: compruebe el código usted mismo antes de publicarlo. No puedo aceptar respuestas que corran más lento que mi código existente, naturalmente.

Editar de nuevo:

He compilado una lista de puntos de referencia en mi máquina.

Aquí hay una versión que escribí hace un tiempo; podría ser interesante compararlo con el tuyo por la velocidad. Sin embargo, no hace nada sobre los problemas de espacio (de hecho, probablemente sean peores que con su versión).

from math import sqrt def basicSieve(n): """Given a positive integer n, generate the primes < n.""" s = [1]*n for p in xrange(2, 1+int(sqrt(n-1))): if s[p]: a = p*p s[a::p] = [0] * -((a-n)//p) for p in xrange(2, n): if s[p]: yield p

Tengo versiones más rápidas, usando una rueda, pero son mucho más complicadas. La fuente original está here .

Bien, aquí está la versión usando una rueda. wheelSieve es el punto de entrada principal.

from math import sqrt from bisect import bisect_left def basicSieve(n): """Given a positive integer n, generate the primes < n.""" s = [1]*n for p in xrange(2, 1+int(sqrt(n-1))): if s[p]: a = p*p s[a::p] = [0] * -((a-n)//p) for p in xrange(2, n): if s[p]: yield p class Wheel(object): """Class representing a wheel. Attributes: primelimit -> wheel covers primes < primelimit. For example, given a primelimit of 6 the wheel primes are 2, 3, and 5. primes -> list of primes less than primelimit size -> product of the primes in primes; the modulus of the wheel units -> list of units modulo size phi -> number of units """ def __init__(self, primelimit): self.primelimit = primelimit self.primes = list(basicSieve(primelimit)) # compute the size of the wheel size = 1 for p in self.primes: size *= p self.size = size # find the units by sieving units = [1] * self.size for p in self.primes: units[::p] = [0]*(self.size//p) self.units = [i for i in xrange(self.size) if units[i]] # number of units self.phi = len(self.units) def to_index(self, n): """Compute alpha(n), where alpha is an order preserving map from the set of units modulo self.size to the nonnegative integers. If n is not a unit, the index of the first unit greater than n is given.""" return bisect_left(self.units, n % self.size) + n // self.size * self.phi def from_index(self, i): """Inverse of to_index.""" return self.units[i % self.phi] + i // self.phi * self.size def wheelSieveInner(n, wheel): """Given a positive integer n and a wheel, find the wheel indices of all primes that are less than n, and that are units modulo the wheel modulus. """ # renaming to avoid the overhead of attribute lookups U = wheel.units wS = wheel.size # inverse of unit map UI = dict((u, i) for i, u in enumerate(U)) nU = len(wheel.units) sqroot = 1+int(sqrt(n-1)) # ceiling of square root of n # corresponding index (index of next unit up) sqrti = bisect_left(U, sqroot % wS) + sqroot//wS*nU upper = bisect_left(U, n % wS) + n//wS*nU ind2 = bisect_left(U, 2 % wS) + 2//wS*nU s = [1]*upper for i in xrange(ind2, sqrti): if s[i]: q = i//nU u = U[i%nU] p = q*wS+u u2 = u*u aq, au = (p+u)*q+u2//wS, u2%wS wp = p * nU for v in U: # eliminate entries corresponding to integers # congruent to p*v modulo p*wS uvr = u*v%wS m = aq + (au > uvr) bot = (m + (q*v + u*v//wS - m) % p) * nU + UI[uvr] s[bot::wp] = [0]*-((bot-upper)//wp) return s def wheelSieve(n, wheel=Wheel(10)): """Given a positive integer n, generate the list of primes <= n.""" n += 1 wS = wheel.size U = wheel.units nU = len(wheel.units) s = wheelSieveInner(n, wheel) return (wheel.primes[:bisect_left(wheel.primes, n)] + [p//nU*wS + U[p%nU] for p in xrange(bisect_left(U, 2 % wS) + 2//wS*nU, len(s)) if s[p]])

En cuanto a lo que es una rueda: bueno, sabes que (aparte de 2), todos los primos son impares, por lo que la mayoría de los tamices pierden todos los números pares. De manera similar, puede ir un poco más allá y observar que todos los números primos (excepto 2 y 3) son congruentes con 1 o 5 módulo 6 (== 2 * 3), así que puede salirse con la única tarea de almacenar esos números en su tamiz . El siguiente paso es observar que todos los números primos (excepto 2, 3 y 5) son congruentes con uno de 1, 7, 11, 13, 17, 19, 23, 29 (módulo 30) (aquí 30 == 2 * 3 * 5), y así sucesivamente.

De acuerdo, aquí hay una evaluación comparativa completa (casi completa) que he hecho esta noche para ver qué código se ejecuta más rápido. Con suerte, alguien encontrará esta lista útil. Omití todo lo que toma más de 30 segundos para completar en mi máquina.

Me gustaría agradecer a todos los que hicieron una aportación. Obtuve mucha información de tus esfuerzos, y espero que tú también.

Mi máquina: AMD ZM-86, Dual-Core de 2,40 Ghz, con 4 GB de RAM. Esta es una computadora portátil HP Touchsmart Tx2. Tenga en cuenta que, si bien pude haber vinculado a un pastebin, hice una evaluación comparativa de todo lo siguiente en mi propia máquina.

Agregaré el punto de referencia gmpy2 una vez que pueda construirlo.

Todos los puntos de referencia se prueban en Python 2.6 x86

Devolver una lista de números primos n hasta 1,000,000: ( usando generadores Python)
La versión de generador numpy de Sebastian (actualizada) - 121 ms @
Mark Sieve + Rueda - 154 ms
La versión de Robert con rebanado - 159 ms
Mi versión mejorada con slicing - 205 ms
Generador Numpy con enumerate - 249 ms @
Tamiz básico de Mark - 317 ms
La mejora de Casevh en mi solución original - 343 ms
Mi solución de generador numpy modificado - 407 ms
Mi método original en la pregunta - 409 ms
Solución Bitarray - 414 ms @
Pure Python con bytearray - 1394 ms @
La solución BitString de Scott - 6659 ms @
''@'' significa que este método es capaz de generar hasta n <1,000,000,000 en la configuración de mi máquina.
Además, si no necesita el generador y solo quiere toda la lista a la vez:
rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy - 32 ms @
(La solución numpy también es capaz de generar hasta mil millones, lo que demoró 61,6259 segundos. Sospecho que la memoria se cambió una vez, de ahí el doble de tiempo).

Hay un par de pequeñas optimizaciones para su versión. Al invertir los roles de True y False, puede cambiar " if flags[i] is False: " to " if flags[i]: ". Y el valor inicial para la declaración del segundo range puede ser i*i lugar de i*3 . Su versión original demora 0.166 segundos en mi sistema. Con esos cambios, la versión siguiente toma 0.156 segundos en mi sistema.

def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) flags = [True, True] + [False] * (limit - 2) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i]: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*i, limit, i<<1): flags[j] = True

Sin embargo, esto no ayuda a tu problema de memoria.

Pasando al mundo de las extensiones C, utilicé la versión de desarrollo de gmpy . (Descargo de responsabilidad: soy uno de los mantenedores). La versión de desarrollo se llama gmpy2 y admite enteros mutables llamados xmpz. Usando gmpy2 y el siguiente código, tengo un tiempo de ejecución de 0.140 segundos. El tiempo de funcionamiento para un límite de 1,000,000,000 es 158 segundos.

import gmpy2 def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) # Actual number is 2*bit_position + 1. oddnums = gmpy2.xmpz(1) current = 0 while True: current += 1 current = oddnums.bit_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime # Exclude further multiples of the current prime number if prime <= sub_limit: for j in range(2*current*(current+1), limit>>1, prime): oddnums.bit_set(j)

Al impulsar optimizaciones y sacrificar la claridad, obtengo tiempos de ejecución de 0.107 y 123 segundos con el siguiente código:

import gmpy2 def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) # Actual number is 2*bit_position + 1. oddnums = gmpy2.xmpz(1) f_set = oddnums.bit_set f_scan0 = oddnums.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime # Exclude further multiples of the current prime number if prime <= sub_limit: list(map(f_set,range(2*current*(current+1), limit>>1, prime)))

Editar: Basado en este ejercicio, modifiqué gmpy2 para aceptar xmpz.bit_set(iterator) . Usando el siguiente código, el tiempo de ejecución para todos los números primos menos de 1,000,000,000 es 56 segundos para Python 2.7 y 74 segundos para Python 3.2. (Como se señaló en los comentarios, xrange es más rápido que el range ).

import gmpy2 try: range = xrange except NameError: pass def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) oddnums = gmpy2.xmpz(1) f_scan0 = oddnums.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime if prime <= sub_limit: oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime)))

Editar # 2: ¡Un intento más! Modifiqué gmpy2 para aceptar xmpz.bit_set(slice) . Usando el siguiente código, el tiempo de ejecución para todos los números primos menos de 1,000,000,000 es de aproximadamente 40 segundos tanto para Python 2.7 como para Python 3.2.

from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) # pre-allocate the total length flags.bit_set((limit>>1)+1) f_scan0 = flags.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime if prime <= sub_limit: flags.bit_set(slice(2*current*(current+1), limit>>1, prime)) start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start)

Editar # 3: He actualizado gmpy2 para admitir correctamente el corte en el nivel de bits de un xmpz. Sin cambios en el rendimiento, pero una API muy agradable. He hecho un pequeño ajuste y he reducido el tiempo a unos 37 segundos. (Ver Edición # 4 a cambios en gmpy2 2.0.0b1.)

from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) flags[(limit>>1)+1] = True f_scan0 = flags.bit_scan0 current = 0 prime = 2 while prime <= sub_limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 flags[2*current*(current+1):limit>>1:prime] = True while prime <= limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start)

Editar # 4: hice algunos cambios en gmpy2 2.0.0b1 que rompen el ejemplo anterior. gmpy2 ya no trata a True como un valor especial que proporciona una fuente infinita de 1 bit. -1 debe usarse en su lugar.

from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) flags[(limit>>1)+1] = 1 f_scan0 = flags.bit_scan0 current = 0 prime = 2 while prime <= sub_limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 flags[2*current*(current+1):limit>>1:prime] = -1 while prime <= limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start)

Edición n. ° 5: realicé algunas mejoras en gmpy2 2.0.0b2. Ahora puede iterar sobre todos los bits configurados o claros. El tiempo de ejecución ha mejorado en ~ 30%.

from __future__ import print_function import time import gmpy2 def sieve(limit=1000000): ''''''Returns a generator that yields the prime numbers up to limit.'''''' # Increment by 1 to account for the fact that slices do not include # the last index value but we do want to include the last value for # calculating a list of primes. sieve_limit = gmpy2.isqrt(limit) + 1 limit += 1 # Mark bit positions 0 and 1 as not prime. bitmap = gmpy2.xmpz(3) # Process 2 separately. This allows us to use p+p for the step size # when sieving the remaining primes. bitmap[4 : limit : 2] = -1 # Sieve the remaining primes. for p in bitmap.iter_clear(3, sieve_limit): bitmap[p*p : limit : p+p] = -1 return bitmap.iter_clear(2, limit) if __name__ == "__main__": start = time.time() result = list(sieve(1000000000)) print(time.time() - start) print(len(result))

Los tipos enteros de Python pueden ser de un tamaño arbitrario, por lo que no debería necesitar una biblioteca inteligente de bits bits para representar un conjunto de bits, solo un número.

Aquí está el código. Maneja un límite de 1,000,000 con facilidad, e incluso puede manejar 10,000,000 sin quejarse demasiado:

def multiples_of(n, step, limit): bits = 1 << n old_bits = bits max = 1 << limit while old_bits < max: old_bits = bits bits += bits << step step *= 2 return old_bits def prime_numbers(limit=10000000): ''''''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. '''''' yield 2 sub_limit = int(limit**0.5) flags = ((1 << (limit - 2)) - 1) << 2 # Step through all the odd numbers for i in xrange(3, limit, 2): if not (flags & (1 << i)): continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: flags &= ~multiples_of(i * 3, i << 1, limit)

La función multiples_of evita el costo de configurar cada múltiplo individual individualmente. Establece el bit inicial, lo desplaza por el paso inicial ( i << 1 ) y agrega el resultado a sí mismo. Luego dobla el paso, de modo que el siguiente desplazamiento copie ambos bits, y así sucesivamente hasta que llegue al límite. Esto establece todos los múltiplos de un número hasta el límite en el tiempo O (log (límite)).

OK, esta es mi segunda respuesta, pero como la velocidad es esencial, pensé que tenía que mencionar el módulo bitarray , a pesar de que es el bitstring de bitstring :). Es ideal para este caso, ya que no solo es una extensión C (y más rápida de lo que Python tiene la esperanza de ser), sino que también admite asignaciones de sectores. Todavía no está disponible para Python 3.

Ni siquiera he tratado de optimizar esto, simplemente reescribí la versión de cadena de bits. En mi máquina obtengo 0.16 segundos para números primos por debajo de un millón.

Por mil millones, funciona perfectamente bien y se completa en 2 minutos y 31 segundos.

import bitarray def prime_bitarray(limit=1000000): yield 2 flags = bitarray.bitarray(limit) flags.setall(False) sub_limit = int(limit**0.5) for i in xrange(3, limit, 2): if not flags[i]: yield i if i <= sub_limit: flags[3*i:limit:i*2] = True

Pregunta relacionada
La forma más rápida de listar todos los números primos debajo de N en python

Hola, también estoy buscando un código en python para generar números primos hasta 10 9 lo más rápido que pueda, lo que es difícil debido al problema de memoria. hasta ahora se me ocurrió esto para generar números primos de hasta 10 6 y 10 * 7 (0,171 y 1,764 s respectivamente en mi máquina anterior), que parece ser un poco más rápido (al menos en mi computadora) que "Mi versión mejorada" con cortar "de la publicación anterior, probablemente porque uso n // ii +1 (ver a continuación) en lugar de la len análoga (indicadores [i2 :: i << 1]) en el otro código. Por favor, corríjame si estoy equivocado. Cualquier sugerencia de mejora es muy bienvenida.

def primes(n): """ Returns a list of primes < n """ sieve = [True] * n for i in xrange(3,int(n**0.5)+1,2): if sieve[i]: sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1) return [2] + [i for i in xrange(3,n,2) if sieve[i]]

En uno de sus códigos Xavier usa banderas [i2 :: i << 1] y len (flags [i2 :: i << 1]), calculé el tamaño explícito usando el hecho de que entre i i..n tenemos ( ni i) // 2 * i múltiplos de 2 * i porque queremos contar i también sumamos 1 por lo len (tamiz [i i :: 2 * i]) igual (ni * i) // (2 * i ) +1 Esto hace que el código sea más rápido. Un generador básico basado en el código anterior sería:

def primesgen(n): """ Generates all primes <= n """ sieve = [True] * n yield 2 for i in xrange(3,int(n**0.5)+1,2): if sieve[i]: yield i sieve[i*i::2*i] = [False]*((n-i*i-1)/(2*i)+1) for i in xrange(i+2,n,2): if sieve[i]: yield i

con un poco de modificación, se puede escribir una versión un poco más lenta del código que comienza con una criba de la mitad del tamiz de tamaño = [Verdadero] * (n // 2) y funciona para el mismo n. no estoy seguro de cómo se refleja en el problema de la memoria. Como ejemplo de implementación aquí está la versión modificada del código de numeración rosetta (quizás más rápido) comenzando con un tamiz de la mitad del tamaño.

import numpy def primesfrom3to(n): """ Returns a array of primes, 3 <= p < n """ sieve = numpy.ones(n/2, dtype=numpy.bool) for i in xrange(3,int(n**0.5)+1,2): if sieve[i/2]: sieve[i*i/2::i] = False return 2*numpy.nonzero(sieve)[0][1::]+1

Un generador más rápido y más sabio para la memoria sería:

import numpy def primesgen1(n): """ Input n>=6, Generates all primes < n """ sieve1 = numpy.ones(n/6+1, dtype=numpy.bool) sieve5 = numpy.ones(n/6 , dtype=numpy.bool) sieve1[0] = False yield 2; yield 3; for i in xrange(int(n**0.5)/6+1): if sieve1[i]: k=6*i+1; yield k; sieve1[ ((k*k)/6) ::k] = False sieve5[(k*k+4*k)/6::k] = False if sieve5[i]: k=6*i+5; yield k; sieve1[ ((k*k)/6) ::k] = False sieve5[(k*k+2*k)/6::k] = False for i in xrange(i+1,n/6): if sieve1[i]: yield 6*i+1 if sieve5[i]: yield 6*i+5 if n%6 > 1: if sieve1[i+1]: yield 6*(i+1)+1

o con un poco más de código:

import numpy def primesgen(n): """ Input n>=30, Generates all primes < n """ size = n/30 + 1 sieve01 = numpy.ones(size, dtype=numpy.bool) sieve07 = numpy.ones(size, dtype=numpy.bool) sieve11 = numpy.ones(size, dtype=numpy.bool) sieve13 = numpy.ones(size, dtype=numpy.bool) sieve17 = numpy.ones(size, dtype=numpy.bool) sieve19 = numpy.ones(size, dtype=numpy.bool) sieve23 = numpy.ones(size, dtype=numpy.bool) sieve29 = numpy.ones(size, dtype=numpy.bool) sieve01[0] = False yield 2; yield 3; yield 5; for i in xrange(int(n**0.5)/30+1): if sieve01[i]: k=30*i+1; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+10*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+16*k)/30::k] = False sieve19[(k*k+18*k)/30::k] = False sieve23[(k*k+22*k)/30::k] = False sieve29[(k*k+28*k)/30::k] = False if sieve07[i]: k=30*i+7; yield k; sieve01[(k*k+ 6*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+16*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+ 4*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+22*k)/30::k] = False sieve29[(k*k+10*k)/30::k] = False if sieve11[i]: k=30*i+11; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+20*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+26*k)/30::k] = False sieve19[(k*k+18*k)/30::k] = False sieve23[(k*k+ 2*k)/30::k] = False sieve29[(k*k+ 8*k)/30::k] = False if sieve13[i]: k=30*i+13; yield k; sieve01[(k*k+24*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+ 4*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+16*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+28*k)/30::k] = False sieve29[(k*k+10*k)/30::k] = False if sieve17[i]: k=30*i+17; yield k; sieve01[(k*k+ 6*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+26*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+14*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+ 2*k)/30::k] = False sieve29[(k*k+20*k)/30::k] = False if sieve19[i]: k=30*i+19; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+10*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+ 4*k)/30::k] = False sieve19[(k*k+12*k)/30::k] = False sieve23[(k*k+28*k)/30::k] = False sieve29[(k*k+22*k)/30::k] = False if sieve23[i]: k=30*i+23; yield k; sieve01[(k*k+24*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+14*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+26*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+ 8*k)/30::k] = False sieve29[(k*k+20*k)/30::k] = False if sieve29[i]: k=30*i+29; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+20*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+14*k)/30::k] = False sieve19[(k*k+12*k)/30::k] = False sieve23[(k*k+ 8*k)/30::k] = False sieve29[(k*k+ 2*k)/30::k] = False for i in xrange(i+1,n/30): if sieve01[i]: yield 30*i+1 if sieve07[i]: yield 30*i+7 if sieve11[i]: yield 30*i+11 if sieve13[i]: yield 30*i+13 if sieve17[i]: yield 30*i+17 if sieve19[i]: yield 30*i+19 if sieve23[i]: yield 30*i+23 if sieve29[i]: yield 30*i+29 i += 1 mod30 = n%30 if mod30 > 1: if sieve01[i]: yield 30*i+1 if mod30 > 7: if sieve07[i]: yield 30*i+7 if mod30 > 11: if sieve11[i]: yield 30*i+11 if mod30 > 13: if sieve13[i]: yield 30*i+13 if mod30 > 17: if sieve17[i]: yield 30*i+17 if mod30 > 19: if sieve19[i]: yield 30*i+19 if mod30 > 23: if sieve23[i]: yield 30*i+23

Ps: Si tiene alguna sugerencia sobre cómo acelerar este código, cualquier cosa, desde cambiar el orden de las operaciones hasta precomputar cualquier cosa, por favor coméntelo.

Una mejora de velocidad que puede hacer usando bitstring es usar el método ''set'' cuando excluye múltiplos del número actual.

Entonces la sección vital se vuelve

for i in range(3, limit, 2): if flags[i]: yield i if i <= sub_limit: flags.set(1, range(i*3, limit, i*2))

En mi máquina, esto es aproximadamente 3 veces más rápido que el original.

Mi otro pensamiento fue usar la cadena de bits para representar solo los números impares. A continuación, puede encontrar los bits no configurados utilizando flags.findall([0]) que devuelve un generador. No estoy seguro si eso sería mucho más rápido (encontrar patrones de bits no es terriblemente fácil de hacer de manera eficiente).

[Divulgación completa: ¡escribí el módulo de cadena de bits, así que tengo algo de orgullo en juego aquí!]

Como una comparación, también he extraído las tripas del método de cadena de bits para que lo haga de la misma manera, pero sin verificación de rango, etc. Creo que esto da un límite inferior razonable para Python puro que funciona para mil millones de elementos (sin cambiando el algoritmo, lo que creo que es hacer trampa!)

def prime_pure(limit=1000000): yield 2 flags = bytearray((limit + 7) // 8) sub_limit = int(limit**0.5) for i in xrange(3, limit, 2): byte, bit = divmod(i, 8) if not flags[byte] & (128 >> bit): yield i if i <= sub_limit: for j in xrange(i*3, limit, i*2): byte, bit = divmod(j, 8) flags[byte] |= (128 >> bit)

En mi máquina esto se ejecuta en aproximadamente 0,62 segundos para un millón de elementos, lo que significa que es aproximadamente una cuarta parte de la velocidad de la respuesta bitarray. Creo que es bastante razonable para Python puro.

Another really stupid option, but that can be of help if you really need a large list of primes number available very fast. Say, if you need them as a tool to answer project Euler''s problems (if the problem is just optimizing your code as a mind game it''s irrelevant).

Use any slow solution to generate list and save it to a python source file, says primes.py that would just contain:

primes = [ a list of a million primes numbers here ]

Now to use them you just have to do import primes and you get them with minimal memory footprint at the speed of disk IO. Complexity is number of primes :-)

Even if you used a poorly optimized solution to generate this list, it will only be done once and it does not matter much.

You could probably make it even faster using pickle/unpickle, but you get the idea...

Here is some Python3 code that uses less memory than the bitarray/bitstring solutions posted to date and about 1/8 the memory of Robert William Hanks''s primesgen(), while running faster than primesgen() (marginally faster at 1,000,000, using 37KB of memory, 3x faster than primesgen() at 1,000,000,000 using under 34MB). Increasing the chunk size (variable chunk in the code) uses more memory but speeds up the program, up to a limit -- I picked a value so that its contribution to memory is under about 10% of sieve''s n//30 bytes. It''s not as memory-efficient as Will Ness''s infinite generator ( https://.com/a/19391111/5439078 ) because it records (and returns at the end, in compressed form) all calculated primes.

Esto debería funcionar correctamente siempre que el cálculo de la raíz cuadrada sea preciso (aproximadamente 2 ** 51 si Python usa dobles de 64 bits). Sin embargo, no deberías usar este programa para encontrar números primos tan grandes.

(No recalculé las compensaciones, simplemente las saqué del código de Robert William Hanks. ¡Gracias, Robert!)

import numpy as np def prime_30_gen(n): """ Input n, int-like. Yields all primes < n as Python ints""" wheel = np.array([2,3,5]) yield from wheel[wheel < n].tolist() powers = 1 << np.arange(8, dtype=''u1'') odds = np.array([1, 7, 11, 13, 17, 19, 23, 29], dtype=''i8'') offsets=np.array([[0,6,10,12,16,18,22,28],[6,24,16,12,4,0,22,10], [0,6,20,12,26,18,2,8], [24,6,4,18,16,0,28,10], [6,24,26,12,14,0,2,20], [0,24,10,18,4,12,28,22], [24,6,14,18,26,0,8,20], [0,24,20,18,14,12,8,2]], dtype=''i8'') offsets = {d:f for d,f in zip(odds, offsets)} sieve = 255 * np.ones((n + 28) // 30, dtype=''u1'') if n % 30 > 1: sieve[-1] >>= 8 - odds.searchsorted(n % 30) sieve[0] &= ~1 for i in range((int((n - 1)**0.5) + 29) // 30): for odd in odds[(sieve[i] & powers).nonzero()[0]]: k = i * 30 + odd yield int(k) for clear_bit, off in zip(~powers, offsets[odd]): sieve[(k * (k + off)) // 30 :: k] &= clear_bit chunk = min(1 + (n >> 13), 1<<15) for i in range(i + 1, len(sieve), chunk): a = (sieve[i : i + chunk, np.newaxis] & powers) a = np.flatnonzero(a.astype(''bool'')) + i * 8 a = (a // 8 * 30 + odds[a & 7]).tolist() yield from a return sieve

Nota al margen: si quiere velocidad real y tiene los 2 GB de RAM necesarios para la lista de primos a 10 ** 9, entonces debe usar pyprimesieve (en https://pypi.python.org/ , usando primesieve http://primesieve.org/ ).

One option you may want to look at is just compiling a C/C++ module so you have direct access to the bit-twiddling features. AFAIK you could write something of that nature and only return the results on completion of the calculations performed in C/C++. Now that I''m typing this you may also look at numpy which does store arrays as native C for speed. I don''t know if that will be any faster than the bitstring module, though!

You could use a segmented Sieve of Eratosthenes. The memory used for each segment is reused for the next segment.