
A great place to get an overview of modern GPU pipeline performance is the Graphics Pipeline Performance chapter of the book GPU Gems: Programming Techniques, Tips, and Tricks for Real …
NVIDIA CUDA Code Samples
This example demonstrates how to pass in a GPU device function (from the GPU device static library) as a function pointer to be called. This sample requires devices with compute capability 2.0 or higher.
Nvidia
NVIDIA GPU Computing Documentation. The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, port
The CPU and GPU are treated as separate devices that have their own memory spaces. This configuration also allows simultaneous computation on the CPU and GPU without contention for …
Measurements from: H. Wang, S. Potluri, M. Luo, A. Singh, S. Sur and D. K. Panda, "MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters", Int'l Supercomputing Conference …
- [PDF]
Slide 1
Summary Optimization needs an understanding of GPU architecture Memory optimization: coalescing, shared memory Execution configuration: latency hiding Instruction throughput: use high throughput …
NVIDIA CUDA Library: cuCtxCreate
The three LSBs of the flags parameter can be used to control how the OS thread, which owns the CUDA context at the time of an API call, interacts with the OS scheduler when waiting for results from the …