sumar suma signo restar resta multiplicar multiplicaciones multiplicacion ingles escritas dividir como performance haskell

performance - signo - ¿Por qué es más rápido sumar una secuencia de datos por dividir y conquistar, incluso sin paralelismo?



multiplicaciones en ingles escritas (2)

Trate de usar foldr'' lugar de foldr . Apuesto a que es por el comportamiento perezoso de foldr que lleva a asignar thunk para cada dato en secuencia y evaluar al final.

Editar:

Así que usar foldr'' mitades tomó tiempo en el caso mío, pero aún más lento, incluso luego foldl'' . Lo que significa que hay un problema de complejidad en la implementación de Data.Sequence.fold* .

benchmarking seqFoldr collecting 100 samples, 1 iterations each, in estimated 2.516484 s bootstrapping with 100000 resamples mean: 24.93222 ms, lb 24.72772 ms, ub 25.15255 ms, ci 0.950 std dev: 1.081204 ms, lb 938.4503 us, ub 1.332666 ms, ci 0.950 found 1 outliers among 100 samples (1.0%) variance introduced by outliers: 0.999% variance is unaffected by outliers benchmarking seqFoldr'' collecting 100 samples, 1 iterations each, in estimated 902.7004 ms bootstrapping with 100000 resamples mean: 11.05375 ms, lb 10.68481 ms, ub 11.42519 ms, ci 0.950 std dev: 1.895777 ms, lb 1.685334 ms, ub 2.410870 ms, ci 0.950 found 1 outliers among 100 samples (1.0%) variance introduced by outliers: 1.000% variance is unaffected by outliers benchmarking seqFoldl'' collecting 100 samples, 1 iterations each, in estimated 862.4077 ms bootstrapping with 100000 resamples mean: 10.35651 ms, lb 9.947395 ms, ub 10.73637 ms, ci 0.950 std dev: 2.011693 ms, lb 1.875869 ms, ub 2.131425 ms, ci 0.950 variance introduced by outliers: 1.000% variance is unaffected by outliers

Estaba jugando con una reducción paralela en un Data.Sequence.Seq , y noté que dividir y conquistar da una ventaja de velocidad incluso sin paralelismo. ¿Alguien sabe por qué?

Aquí está mi código:

import qualified Data.Sequence as S import qualified Data.Foldable as F import System.Random import Control.DeepSeq import Criterion.Main import Test.QuickCheck import Control.Exception ( evaluate ) instance (Arbitrary a) => Arbitrary (S.Seq a) where arbitrary = fmap S.fromList arbitrary instance NFData a => NFData (S.Seq a) where rnf = F.foldr seq () funs :: [(String, S.Seq Int -> Int)] funs = [ ("seqDirect" , seqDirect) , ("seqFoldr" , seqFoldr) , ("seqFoldl''" , seqFoldl'') , ("seqSplit 1" , (seqSplit 1)) , ("seqSplit 2" , (seqSplit 2)) , ("seqSplit 4" , (seqSplit 4)) , ("seqSplit 8" , (seqSplit 8)) , ("seqSplit 16" , (seqSplit 16)) , ("seqSplit 32" , (seqSplit 32)) ] main :: IO () main = do mapM_ (/(_,f) -> quickCheck (/xs -> seqDirect xs == f xs)) funs gen <- newStdGen let inpt = S.fromList . take 100000 $ randoms gen evaluate (rnf inpt) defaultMain [ bench n (nf f inpt) | (n,f) <- funs ] seqDirect :: S.Seq Int -> Int seqDirect v = case S.viewl v of S.EmptyL -> 0 x S.:< xs -> x + seqDirect xs seqFoldr :: S.Seq Int -> Int seqFoldr = F.foldr (+) 0 seqFoldl'' :: S.Seq Int -> Int seqFoldl'' = F.foldl'' (+) 0 seqSplit :: Int -> S.Seq Int -> Int seqSplit 1 xs = seqFoldr xs seqSplit _ xs | S.null xs = 0 seqSplit n xs = let (a, b) = S.splitAt (S.length xs `div` n) xs sa = seqFoldr a sb = seqSplit (n-1) b in sa + sb

Y los resultados:

$ ghc -V The Glorious Glasgow Haskell Compilation System, version 7.0.4 $ ghc --make -O2 -fforce-recomp -rtsopts seq.hs [1 of 1] Compiling Main ( seq.hs, seq.o ) Linking seq ... $ ./seq +RTS -s ./seq +RTS -s +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. warming up estimating clock resolution... mean is 5.882556 us (160001 iterations) found 2368 outliers among 159999 samples (1.5%) 2185 (1.4%) high severe estimating cost of a clock call... mean is 85.26448 ns (44 iterations) found 4 outliers among 44 samples (9.1%) 3 (6.8%) high mild 1 (2.3%) high severe benchmarking seqDirect mean: 23.37511 ms, lb 23.01101 ms, ub 23.77594 ms, ci 0.950 std dev: 1.953348 ms, lb 1.781578 ms, ub 2.100916 ms, ci 0.950 benchmarking seqFoldr mean: 25.60206 ms, lb 25.39648 ms, ub 25.80034 ms, ci 0.950 std dev: 1.030794 ms, lb 926.7246 us, ub 1.156656 ms, ci 0.950 benchmarking seqFoldl'' mean: 10.65757 ms, lb 10.29087 ms, ub 10.99869 ms, ci 0.950 std dev: 1.819595 ms, lb 1.703732 ms, ub 1.922018 ms, ci 0.950 benchmarking seqSplit 1 mean: 25.50376 ms, lb 25.29045 ms, ub 25.71225 ms, ci 0.950 std dev: 1.075497 ms, lb 961.5707 us, ub 1.229739 ms, ci 0.950 benchmarking seqSplit 2 mean: 18.15032 ms, lb 17.62943 ms, ub 18.66413 ms, ci 0.950 std dev: 2.652232 ms, lb 2.288088 ms, ub 3.044585 ms, ci 0.950 benchmarking seqSplit 4 mean: 10.48334 ms, lb 10.14152 ms, ub 10.87061 ms, ci 0.950 std dev: 1.869274 ms, lb 1.690063 ms, ub 1.997915 ms, ci 0.950 benchmarking seqSplit 8 mean: 5.737956 ms, lb 5.616747 ms, ub 5.965689 ms, ci 0.950 std dev: 825.2361 us, lb 442.1652 us, ub 1.232003 ms, ci 0.950 benchmarking seqSplit 16 mean: 3.677038 ms, lb 3.669035 ms, ub 3.685547 ms, ci 0.950 std dev: 42.18741 us, lb 36.57112 us, ub 49.93574 us, ci 0.950 benchmarking seqSplit 32 mean: 2.855626 ms, lb 2.849962 ms, ub 2.862226 ms, ci 0.950 std dev: 31.25475 us, lb 26.49104 us, ub 37.18611 us, ci 0.950 25,154,069,064 bytes allocated in the heap 4,120,506,464 bytes copied during GC 32,344,120 bytes maximum residency (446 sample(s)) 4,042,704 bytes maximum slop 78 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 42092 collections, 0 parallel, 6.57s, 6.57s elapsed Generation 1: 446 collections, 0 parallel, 2.62s, 2.62s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 18.57s ( 18.58s elapsed) GC time 9.19s ( 9.19s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 27.76s ( 27.77s elapsed) %GC time 33.1% (33.1% elapsed) Alloc rate 1,354,367,579 bytes per MUT second Productivity 66.9% of total user, 66.9% of total elapsed


Nota: esta respuesta en realidad no responde a la pregunta. Sólo reafirma la pregunta de una manera diferente. La razón precisa por la que Data.Sequence.foldr ralentiza a medida que la secuencia se hace más grande aún se desconoce.

Tu codigo

seqFoldr :: S.Seq Int -> Int seqFoldr = F.foldr (+) 0

Tiene un rendimiento no lineal en función de la longitud de la secuencia. Echa un vistazo a este punto de referencia:

./seq-customized +RTS -s -A128M [Length] [Performance of function seqFoldr] 25000: mean: 1.096352 ms, lb 1.083301 ms, ub 1.121152 ms, ci 0.950 50000: mean: 2.542133 ms, lb 2.514076 ms, ub 2.583209 ms, ci 0.950 100000: mean: 6.068437 ms, lb 5.951889 ms, ub 6.237442 ms, ci 0.950 200000: mean: 14.41332 ms, lb 13.95552 ms, ub 15.21217 ms, ci 0.950

Usar la línea con 25000 como base nos da la siguiente tabla:

[Length] [Performance of function seqFoldr] 1x: mean: 1.00 = 1*1.00 2x: mean: 2.32 = 2*1.16 4x: mean: 5.54 = 4*1.39 8x: mean: 13.15 = 8*1.64

En la tabla anterior, la no linealidad se demuestra en las series 1.00, 1.16, 1.39, 1.64.

Ver también http://haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists

Suponiendo que la longitud inicial de Seq xs es 100000 y n es 32, su código

seqSplit n xs = let (a, b) = S.splitAt (S.length xs `div` n) xs sa = seqFoldr a sb = seqSplit (n-1) b in sa + sb

Pasará secuencias algo más cortas a la función seqFoldr . Las longitudes sucesivas de los Seqs pasados ​​del código anterior a la función seqFoldr ven así:

(length xs)/n = (length a) -------------------------- 100000/32 = 3125 (100000-3125)/31 = 3125 (100000-2*3125)/30 = 3125 ... (100000-30*3125)/2 = 3125

Basado en la primera parte de mi respuesta (donde vimos que el rendimiento no era lineal), [32 llamadas a seqFoldr con Seq de longitud 3125] se ejecutarán más rápido que [1 call a seqFoldr con una sola Seq de longitud 32 * 3125 = 100000].

Por lo tanto, la respuesta a su pregunta es: porque foldr en Data.Sequence La secuencia es más lenta a medida que la secuencia se hace más grande.