c multithreading valgrind lapack atlas

¿Las advertencias de "valor no inicializado" de valgrind son falsos positivos en las rutinas BLAS multihilo ATLAS?



multithreading lapack (1)

Estoy usando ATLAS para LAPACK y rutinas BLAS multiproceso, y he notado que cuando mis matrices son lo suficientemente grandes para que ATLAS use las versiones multiproceso de BLAS, recibo errores de inicialización de Valgrind. Aquí hay un ejemplo mínimo de mi código:

#include <stdio.h> #include <stdlib.h> extern void dgetrf_(int *, int *, double *, int *, int *, int *); extern void dgetri_(int *, double *, int *, int *, double *, int *, int *); extern void dgemm_(char *, char *, int *, int *, int *, double *, double *, int *, double *, int *, double *, double *, int *); int main(void) { double *m1,*m2,*work,*temp; int dim = 576; int i,j,info; int lwork = dim * dim; int *ipiv; char transA = ''N''; char transB = ''N''; double alpha = 1.0; double beta = 0.0; m1 = malloc(dim*dim*sizeof(double)); m2 = malloc(dim*dim*sizeof(double)); temp = malloc(dim*dim*sizeof(double)); ipiv = malloc(dim*sizeof(int)); work = malloc(lwork*sizeof(double)); for(i=0; i<dim; i++) { for(j=0; j<dim; j++) { if(i==j) { m1[i+dim*j] = .25; m2[i+dim*j] = .5; } else { m1[i+dim*j] = 0.0; m2[i+dim*j] = 0.0; } } } dgetrf_(&dim, &dim, m1, &dim, ipiv, &info); dgetri_(&dim, m1, &dim, ipiv, work, &lwork, &info); dgemm_(&transA, &transB, &dim, &dim, &dim, &alpha, m1, &dim, m2, &dim, &beta, temp, &dim); for(i=0; i<dim*dim; i++) m1[i] = temp[i]; dgetrf_(&dim, &dim, m1, &dim, ipiv, &info); dgetri_(&dim, m1, &dim, ipiv, work, &lwork, &info); free(m1); free(m2); free(ipiv); free(work); free(temp); return 0; }

(Nota: he comprobado que las matrices no sean singulares y no lo sean).

Compilo el programa:

gcc -Wall -DATLAS -m64 -g -c fermi.c gcc -o fermi fermi.o -L/usr/lib64/atlas/ -lm -ltatlas

Y ejecuta valgrind:

valgrind --leak-check=yes ./fermi

Cuando hago esto, obtengo 193 errores de 11 contextos de "Salto o movimiento condicional depende de los valores no inicializados" cuando se encuentran las segundas instancias de dgetrf_ y dgetri_.

==24999== Memcheck, a memory error detector ==24999== Copyright (C) 2002-2015, and GNU GPL''d, by Julian Seward et al. ==24999== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==24999== Command: ./fermi ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x524C62B: ??? (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x524C66A: ??? (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x524C6BE: ??? (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51C2A0B: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51C2A0D: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51C2A4E: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51C2A61: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x524C2D7: ATL_daxpy (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x53426BB: ATL_dgerk_axpy (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C2AC7: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x524C751: ??? (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400A97: main (fermi.c:52) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51CD8E5: ATL_dtrtri (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C2EC3: ATL_dgetriC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520EFA5: atl_f77wrap_dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F684: dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400AC0: main (fermi.c:53) ==24999== ==24999== Conditional jump or move depends on uninitialised value(s) ==24999== at 0x51CD8E7: ATL_dtrtri (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x51C2EC3: ATL_dgetriC (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520EFA5: atl_f77wrap_dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x520F684: dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==24999== by 0x400AC0: main (fermi.c:53) ==24999== ==24999== ==24999== HEAP SUMMARY: ==24999== in use at exit: 0 bytes in 0 blocks ==24999== total heap usage: 2,024 allocs, 2,024 frees, 54,831,424 bytes allocated ==24999== ==24999== All heap blocks were freed -- no leaks are possible ==24999== ==24999== For counts of detected and suppressed errors, rerun with: -v ==24999== Use --track-origins=yes to see where uninitialised values come from ==24999== ERROR SUMMARY: 193 errors from 11 contexts (suppressed: 0 from 0)

He encontrado algunos enlaces que sugieren que esto podría ser un falso positivo por la forma en que la biblioteca está haciendo las cosas, aunque no están muy relacionadas con mi contexto.

pérdida de memoria en dgemm_

https://www.open-mpi.org/community/lists/users/2007/05/3192.php

Así que mi pregunta: ¿Valgrind me está dando errores falsos positivos?


¿Valgrind me está dando errores falsos positivos?

Parece que no.

En lugar de ejecutar valgrind con --leak-check=yes , debería haberlo ejecutado con --track-origins=yes para ver de dónde provienen los valores no inicializados, como lo sugiere valgrind al final de la salida. Esto es lo que tengo con --track-origins=yes :

[ ~]$ valgrind --track-origins=yes ./a.out ==17533== Memcheck, a memory error detector ==17533== Copyright (C) 2002-2015, and GNU GPL''d, by Julian Seward et al. ==17533== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==17533== Command: ./a.out ==17533== ==17533== Conditional jump or move depends on uninitialised value(s) ==17533== at 0x4F4362B: ??? (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4EB99E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4F06538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x4F07416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10) ==17533== by 0x400A29: main (fermi.c:50) ==17533== Uninitialised value was created by a heap allocation ==17533== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299) ==17533== by 0x40080B: main (fermi.c:22)

Entonces, la fuente de los valores sin inicializar es esta línea de código:

temp = malloc(dim*dim*sizeof(double));

Luego se usa para inicializar m1 que se pasa a dgetrf_() en la línea 50.

No estoy familiarizado con la biblioteca ATLAS, pero supongo que debería inicializar la variable temp . Por ejemplo, zero initializing temp con calloc resuelve todos estos errores valgrind:

temp = calloc(dim*dim,sizeof(double));