preprocesamiento - CUDA y nvcc: utilizando el preprocesador para elegir entre flotación o doble

funcion preprocesador en c++ (2)

El problema :

Al tener una .h, quiero definir que real sea doble si se está compilando para c / c ++ o para cuda con capacidad de cálculo> = 1.3. Si compila para cuda con capacidad de computación <1.3, defina floe real.

Después de muchas horas llegué a esto (que no funciona)

# if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real double # warning using double in cuda # elif (__CUDA_ARCH__ >= 0) # define real float # warning using float in cuda # warning how the hell is this printed when __CUDA_ARCH__ is not defined? # else # define real # error what the hell is the value of __CUDA_ARCH__ and how can I print it # endif # else # warning * making definitions for c/c++ # define real double # warning using double for c/c++ # endif

cuando compilo (observe la bandera de la búsqueda)

nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu

yo obtengo

* making definitions for cuda __CUDA_ARCH__ is defined using double in cuda * making definitions for cuda warning __CUDA_ARCH__ is NOT defined warning using float in cuda how the hell is this printed if __CUDA_ARCH__ is not defined now? Undefined symbols for architecture i386: "myKernel(float*, int)", referenced from: ....

Sé que los archivos se compilan dos veces por nvcc. El primero está bien (definido por CUDACC y CUDA_ARCH > = 130), pero ¿qué ocurre la segunda vez? CUDA_DEFINED pero CUDA_ARCH indefinido o con valor <130? Por qué ?

Gracias por tu tiempo.

Por el momento, la única solución práctica que veo es usar una definición personalizada:

# if (!defined(__CUDACC__) || defined(USE_DOUBLE_IN_CUDA)) # define real double # warning defining double for cuda or c/c++ # else # define real float # warning defining float for cuda # endif

y entonces

nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13 -Ilibcutil testFloatDouble.cu

A medida que produce las dos compilaciones:

#warning defining double for cuda or c/c++ #warning defining double for cuda or c/c++

nvcc -Ilibcutil testFloatDouble.cu

hace

#warning defining float for cuda #warning defining float for cuda

Parece que podría confundir dos cosas: cómo diferenciar las trayectorias de compilación de host y dispositivo cuando nvcc está procesando código CUDA, y cómo diferenciar entre código CUDA y no CUDA. Hay una sutil diferencia entre los dos. __CUDA_ARCH__ responde la primera pregunta, y __CUDACC__ responde la segunda.

Considere el siguiente fragmento de código:

#ifdef __CUDACC__ #warning using nvcc template <typename T> __global__ void add(T *x, T *y, T *z) { int idx = threadIdx.x + blockDim.x * blockIdx.x; z[idx] = x[idx] + y[idx]; } #ifdef __CUDA_ARCH__ #warning device code trajectory #if __CUDA_ARCH__ > 120 #warning compiling with double precision template void add<double>(double *, double *, double *); #else #warning compiling with single precision template void add<float>(float *, float *, float *); #else #warning nvcc host code trajectory #endif #else #warning non-nvcc code trajectory #endif

Aquí tenemos un núcleo CUDA con plantillas con instanciación dependiente de la arquitectura CUDA, una sección separada para el código del host modificado por nvcc y una nvcc para la compilación del código del host no dirigido por nvcc . Esto se comporta de la siguiente manera:

$ ln -s cudaarch.cu cudaarch.cc $ gcc -c cudaarch.cc -o cudaarch.o cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory $ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:19:2: warning: #warning compiling with single precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function ''_Z3addIfEvPT_S1_S1_'' for ''sm_11'' ptxas info : Used 4 registers, 12+16 bytes smem $ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:16:2: warning: #warning compiling with double precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function ''_Z3addIdEvPT_S1_S1_'' for ''sm_20'' ptxas info : Used 8 registers, 44 bytes cmem[0]

Los puntos para llevar de esto son:

__CUDACC__ define si nvcc es una compilación de dirección o no
__CUDA_ARCH__ siempre está indefinido al compilar código de host, dirigido por nvcc o no
__CUDA_ARCH__ solo se define para la trayectoria de compilación del código del dispositivo dirigida por nvcc

Esas tres piezas de información son siempre suficientes para tener una compilación condicional para el código del dispositivo a diferentes arquitecturas CUDA, código CUDA del lado del nvcc y código no compilado por nvcc en absoluto. La documentación de nvcc es un tanto escueta a veces, pero todo esto está cubierto en la discusión sobre las trayectorias de compilación.