A hands-on introduction to parallel programming and optimizations for 1000+ core GPU processors, their architecture, the CUDA programming model, and performance analysis. Students implement various ...
[Editor's note: Part 2 of this series shows how to optimize DSP “kernels,” i.e., inner loops. For more programming tips, see the DSP programmer’s guide.] DSP applications typically have tough ...
In this special guest feature, James Reinders describes why roofline estimation is a great tool for code optimization in HPC. Roofline Analysis is a technique that projects a view of realism into ...
A technical paper titled “Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation” was published by researchers at MIT (CSAIL), Argonne National Lab, and TU ...
In this slidecast, Torsten Hoefler from ETH Zurich presents: Data-Centric Parallel Programming. The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the ...
A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...
[Editor's note: Part 2 shows how to optimize DSP kernels (i.e., inner loops), and how to write fast floating-point and fractional code. Part 4 explains why it is important to optimize “control code,” ...
This course focuses on developing and optimizing applications software on massively parallel graphics processing units (GPUs). Such processing units routinely come with hundreds to thousands of cores ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results