Wednesday, May 20, 2009

Wednesday: the afternoon

The Wednesday sessions continued in the afternoon with an overview of three top projects, Blue Waters, Sequoia, and Roadrunner.

Blue Waters is project with the aim to provides scientists with a compute power of at least 1 PetaFlop - sustained. Bill Kramer from NCSA stressed that next to performance (peak) effectiveness, reliability, and consistency are equally important when offering supercomputing services. But the Blue Waters project is much more than just a PetaFlops and beyond project. An important part is given to the education of users for developing petascale application. That also includes the provisioning with tools like sophisticated compilers, for performance analysis, etc.
"Watch the word sustained", Bill Kramer points out. He compares Peak performance values to the speedometer of a car which shows 180mph but you can never reach it on a normal street - "Linpack is like NASCAR".
Blue waters will be an open platform. First users have not been selected yet. That is expected to happen soon. Those will have to demonstrate that their applications will be able to run on such a facility.

In place of Tom Spelce, who unexpectedly could not make it to the conference John Westlund from LLNL gave on update on the Sequoia project. This project focuses on creating a supercomputer with 20 PetaFlops peak. Six target applications have been chosen for the initial phase of the project. All of them are material science codes from quantum molecular dynamics to a dislocation dynamics code for materials under high pressure strengths. That choice is not surprising, since Sequoia is planned to contain 1.6 Million cores. The MD method is known to be able to scale out to these numbers. Nevertheless, the extremely high number of compute cores imposes a large pressure especially on the MPI communication library development.

Ben Bergen from Los Alamos National Laboratories gave an update on the status of the Roadrunner system. He presented VPIC, a plasma physics code which implements a particle in cell algorithm. It reaches extraordinary 11% Peak of the Roadrunner system. Part of the secrets behind it is a triple buffer approach when using the Cell PCIe boards. Nevertheless, due to a missing instruction cache it is necessary to use overlays for the program text. That means mimicking a cache by software which eats up performance. The results were submitted for the Gordon Bell Price. Unfortunately, it has just been missed.
Ben points out that during the development and porting phase his team was not satisfied with the performance of the PPE component on the Cell processor. Compared to the AMD processors on the main board, the PPE is simply not powerful enough, he argues. Although, he is aware that it is difficult for IBM, he would like to see more support when dealing with PCIe based Cell-boards as used in Roadrunner.
Cerrillos is a Roadrunner-like system with 162TF which recently has been installed at LANL in order to allow unclassified research outside the fence. A call for compute time had been issued, the evaluation of the proposals has just been finished and allocations will be granted soon.

