C++ and the Heterogeneous Challenge

As HPC shifts its long range focus from peta- to exascale, the need for programmers to be able to efficiently utilize the entirety of a machine’s compute resources has become more paramount. This has grown increasingly difficult as most of the Top500 machines utilize, in some capacity, hardware accelerators like GPUs and coprocessors which often require special languages and APIs to take advantage of them. In C++ the concept of executors, as currently discussed by the C++ standardization committee, has created a possibility for a flexible, and dynamic choice of the execution platform for various types of parallelism in C++, including the execution of user code on heterogeneous resources like accelerators and GPUs in a portable way. This will also allow to develop a solution that seamlessly integrates iterative execution (parallel algorithms) with other types of parallelism, such as task-based parallelism, asynchronous execution flows, continuation style computation, and explicit fork-join control flow of independent and non-homogeneous code paths.

HPX V0.9.11 Available!

The STE||AR Group is proud to announce the release of HPX v0.9.11! In this release our team has focused on developing higher level C++ programming interfaces which simplify the use of HPX in applications and ensure their portability in terms of code and performance. We paid particular attention to align all of these changes with the existing C++ Standards or with the ongoing standardization work. Other major features include the introduction of executors and various policies which enable to customize the ‘where’ and ‘when’ of task and data placement.

HPX and C++ Futures

There has been a lot of attention to Futures in C++ lately. One of the main related events (even if it was not widely mentioned anywhere) was the final call for positions and comments for the preliminary draft technical specification for C++ Extensions for Concurrency (PDTS), see N4538. This call closed on July 7th, 2015. At this point, the document is out for the national bodies to vote on whether it should be accepted as a final TS (the balloting period ends on July 22nd, 2015). Personally, I expect for this document to be accepted unanimously, which means that we soon will have a second TS related to parallelism and concurrency ready. Compiler vendors will have a field day implementing all of this functionality over the next months (and years).

HPX and PVS-Studio

We have used a trial version of PVS-Studio for HPX previously, but I vaguely remembered it as being very verbose in its diagnostics. I have read a lot about the tool lately, and since it was a long time since we used it, we contacted the developers at Viva64 asking whether they would be willing to support our open source project. We were positively surprised that they agreed to provide us with a free license for one year in exchange for a blog post about our experience with the tool.

HPX and C++ Distributed Computing

For us, HPX is a ‘A general purpose C++ runtime system for parallel and distributed applications of any scale’. While this is quite a mouthful, we mean every word of it. All of the recently published posts on this site so far have focused on the APIs HPX exposes for purely local operation on a single machine. In this installment I would like to start talking about how HPX exposes distributed functionality, i.e. how to use HPX to write truly distributed applications. As we will see, by introducing just minor extensions to the C++ standard the user is able to write homogeneous code without having to pay attention to any differences between invoking functionality locally (on the current node) or remotely (on any other node in a cluster).

HPX and C++ Parallel Algorithms

In Lenexa (May 2015), the C++ standardization committee has finalized the work related to the Technical Specification for C++ Extensions for Parallelism (the latest document at the time of this writing is N4507) . This document describes parallel algorithms which will extend and complement the (sequential) standard library algorithms we all love and use for over a decade now. This is an important – albeit only first – step towards standardizing higher level abstractions for parallelism and concurrency in C++.

HPX and C++ Dataflow

The C++11 standard library introducedstd::asyncas a convenient means of asynchronously executing a function on a new thread and returning an instance of astd::futurerepresenting the eventual result produced by that function. In HPX, thehpx::asyncAPI is one of the most basic parallelization facilities exposed to the user. Here is a simple example of how it can be used:

HPX and C++ Task Blocks

The quest for finding efficient, convenient, and widely usable higher level parallelization constructs in C++ is continuing. With the standardization document N4411, a group of authors from Intel and Microsoft propose thetask_blockfacility which formalizes fork-join parallelism. From the paper (slightly edited):

HPX and the C++ Standard

While developing HPX, it has always been a goal to create an API which is as easy to learn and use as possible. We quickly realized that almost all of our functionality can be exposed through the interfaces which are already standardized as part of the C++11 standard library or which are being proposed for standardization over the next years. So we made it our goal to conform to the C++ standard documents and proposals as closely as possible. This decision has fundamental impact on almost all aspects of our work on HPX.

HPX V0.9.9 Available!

The STE||AR Group is proud to announce the availability of HPX V0.9.9! You can download the release version or checkout the latest version from Github. With 200 bug fixes and 1,500 commits, V0.9.9 introduces several improvements including:

  • Completing the refactoring of hpx::future to be properly C++11 standards conforming
  • Overhauling our build system to support newer CMake features to make it more robust and more portable
  • Implementing a large part of the parallel algorithms proposed by C++ Technical Specifications N4104, N4088, and N4107
  • Adding examples such as the 1D Stencil and the Matrix Transpose series
  • Remodeling our testing infrastructure which will allow us to quickly discover, diagnose, and fix bugs that arise during development

For more details about these changes please see the release notes here.

This is an exciting time of growth for the STE||AR Group. As HPX has become more robust we have begun to build higher level abstractions both in HPX and on top of it. These abstractions such as our work in parallel algorithms and libraries like LibGeoDecomp allow the strong scaling benefits of techniques like futurization to be even more user friendly and accessible. In addition to new ways of expressing parallelism, our group has also made impressive improvements in integrating different architectures into a single simulation. Libraries like HPXCL are exploring new ways of distributing work to GPUs and other accelerators. You can view our technology live at the LSU booth at the Supercomputing Conference 2014. See you there!

 

More and more people are beginning to recognize the potential of managing concurrency with C++. If exploring new ways to exploit parallelism interests you, check us out! Now has never been a better time to download HPX and become a pioneer of scalability. If you have any questions, comments, or exploits to report you can comment below, reach us on IRC (#stellar on Freenode), or email us at hpx-users@stellar.cct.lsu.edu.