HPX V1.0 Released!

The STE||AR Group is proud to announce the release of HPX V1.0. While we call it version one, it is in fact the fifteenth official release of our library. This release has become possible as we now have implemented all the features we set out to put in place for this version.

HPX is the C++ Standard library for parallelism and concurrency.  It implements all of the related facilities as defined by the C++ Standard. As of this writing, HPX provides the only widely available open-source implementation of the new C++17 parallel algorithms. Additionally, in HPX we implement functionalities proposed as part of the ongoing C++ standardization process, such as large parts of the C++ Concurrency TS, task blocks, data-parallel algorithms, executors, index-based parallel for loops, and many more. We also extend the existing C++ Standard APIs to the distributed case (e.g. compute clusters) and for heterogeneous systems (e.g. GPUs).

At its heart, HPX is an asynchronous many-task runtime system for the distributed world. It is portable in code and performance across a wide variety of architectures and operating systems. We have shown that it is usable on almost any machine from a Raspberry Pi to the biggest computers available to us. Applications relying on HPX will scale from small handheld devices up to machines with thousands of compute-nodes and millions of processors. For example, we have just successfully run an HPX application on the full NERSC Cori machine, a cluster with 9640 Intel Knight’s Landing compute nodes (655520 cores).

The new C++ Standard facilities listed fit perfectly with some of our extensions targeting asynchronous operations, such as asynchronous parallel algorithms, asynchronous task blocks, or dataflow constructs. As a result, HPX changes the way we write programs in modern C++. It seamlessly enables a new asynchronous C++ Standard Programming Model which tends to improve the parallel efficiency of our applications and helps reduce complexities usually associated with concurrency. At the same time, HPX’s API is strictly aligned with the C++ standardization process which removes the barriers of adoption.

The code base of HPX is very mature and of very high code quality. Our extensive testing has definitely paid off. Many people have contributed to this release — we would like to thank all of them for their efforts. This release incorporates nearly 1500 commits and has closed almost 300 tickets and pull requests submitted by collaborators from all over the world. We have introduced several important changes:

  • We added various new higher-level parallelization facilities, such as more parallel algorithms, range based parallel algorithms, and channels — all well aligned with various C++ standardization documents.
  • We now support transparently migrating objects across compute-node boundaries, which is a major feature supporting dynamic load balancing in large distributed applications.
  • We have refactored our thread-scheduling subsystem for improved performance and less overheads.
  • We have added a new network transport module enabling direct support for Infiniband networks.
  • We have added a long list of new performance counters exposing different runtime parameters.
  • We have improved the integration with external diagnostic tools, such as APEX and Intel Amplifier or Intel Inspector.

How to Download:

For a complete list of new features and breaking changes please see our release notes. If you have any questions, comments, or exploits to report you can comment below, reach us on IRC (#stellar on Freenode), or email us at hpx-users@stellar.cct.lsu.edu. We value your input!

HPX V1.0-rc1 Available!

After nine years of development HPX version 1.0.0 is being prepared to ship! The STE||AR Group is exited to share this news, and therefore, has prepared a release candidate for users to discover the improvements that have been made since the last release. We encourage established users and new comers to test it out. You can download a tarball of the release candidate here:

Vectorized C++ Parallel Algorithms with HPX

In preparation for my talk at CppCon 2016 last week I decided to have a closer look at the possibility to add vectorization to HPX’s parallel abstractions1The slides for this talk can be downloaded here. The goal was to avoid using compiler specific extensions while enabling vectorization support. At the same time, I wanted to be able to integrate this with the already existing parallel algorithms in HPX, proving again that higher level APIs and best possible performance go hand in hand in HPX.

Notes

Notes
1 The slides for this talk can be downloaded here

HPX V0.9.99 Released!

Version 1.0 approaches! The STE||AR Group is proud to announce the release of HPX v0.9.99. This release is significant as HPX is nearly feature complete. Over the next several months the STE||AR Group will continue to test and polish HPX’s API and documentation. The feedback and experience gained from the community’s utilization this release will provide guidance on where to add the finishing touches for v1.0.

HPX V0.9.99-rc1 Available

The STE||AR Group is proud to announce the first release candidate of HPX V0.9.99!

In preparation for the next major release of HPX we have put together a first release candidate. Please download one of the tarballs

  • HPX V0.9.99-rc1:
    File MD5 Hash
    zip (5.5M) 3f70d33f0ca737dc55e15e106abef259
    gz (3.7M) 10a0ad210ac33b5a91e25a9a375ce585
    bz2 (3.2M) d78657bce8435c0998648f09b280acee
    7z (2.8M) b109408218d5aa1320efd9fc7a0671d6

or check out the tag ‘0.9.99-rc1’ from the repository here and verify whether your code builds and runs fine when using this version.

Please see for a list of new features and fixed problems in this release here.

Please report any problems you might encounter through our ticket system.
We plan to do the final release of HPX V0.9.99 on July 15th, 2016.

 

HPX and Index-based C++ Parallel Loops

In Jacksonville at the winter 2016 C++ standardization meeting, Intel presented the second revision of their standardization proposal targeting index-based parallel for-loops (the latest document at the time of this writing is P0075R1). This document describes a couple of basic parallel for-loop constructs which complement the existing parallel algorithms which we have already implemented for quite some time in HPX. The following is taken from this document:

HPX and C++17

During the Jacksonville meeting of the C++ standards committee last week the so called Parallelism TS was accepted into the next official International Standard, also known as C++17. We can now expect for all major vendors of C++ compilers to implement the very same parallelism facilities as HPX has exposed for almost 2 years already!

A while back, we wrote about how HPX implements parallel algorithms. At the time of that writing, these parallel algorithms were freshly published as a WG21 Technical Specification with the goal of moving them into the main C++ International Standard at some point in the future. Since then, we have continued to work hard on implementing HPX versions of the proposed algorithms. In addition, we have added extensions in our implementation such as the “task” policy which enables asynchronous execution of the algorithm. We have also spent a lot of time tuning our implementation to make it as performant as possible. For instance, we have shown that it is possible for the relatively higher-level parallelization abstractions in HPX to match or to even outperform the performance of equivalent applications based on well-known and well-honed technologies like OpenMP.

The STE||AR Group is proud of the fact that our implementation in HPX has provided implementation, usage experience, and early feedback to the C++ committee. These contributions have helped to get this specification into the new standard. We hope to continue to be a flexible and valuable testbed for all future standardization proposals relating to parallelism and concurrency.