GSoC’21: Come and code a STE||AR Summer with us!

The STE||AR Group is honored to be selected as one of the 2021 Google Summer of Code (GSoC) mentor organizations! This program, which pays students over the summer to work on open source projects, has been a wonderful experience for students and mentors alike. This is our 7th summer being accepted by the program!

Interested students can find out more about the details of the program on GSoC’s official website. As a mentor organization we have come up with a list of suggested topics for students to work on, however, any student can write a proposal about any topic they are interested. We find that students who engage with us on IRC (#ste||ar on freenode) or via our mailing list hpx-users@stellar-group.org have a better chance of having their proposals accepted and a better understanding of their project scope. Students may also read through our hints for successful proposals.

If you are interested in working with an international team of developers on the cutting edge of C++ parallel, task-based runtime systems please check us out!

HPX 1.6.0 Released!

The STE||AR Group is proud to announce the release of HPX 1.6.0! HPX is an implementation of the C++ standard library for parallelism and concurrency on an efficient user-level threading runtime, with extensions for distributed computing.

This release continues the focus on C++20 conformance with multiple new algorithms adapted to be C++20 conformant and becoming customization point objects (CPOs). We have added experimental support for HIP, allowing existing CUDA features to now be compiled with hipcc and run on AMD GPUs as well. We have also continued improving the performance of the parallel executors, and added an experimental fork-join executor. The full list of improvements, fixes, and breaking changes can be found in the release notes.

Thank you to everyone in the STE||AR Group and all the volunteers who have provided fixes, opened issues, and improved the documentation.

Download the release from our releases page.

If you have any questions, comments, or exploits to report you can reach us on IRC (#ste||ar on freenode), on the matrix hpx channel, or email us at hpx-users. We depend on your input!

HPX 1.5.1 Released!

We have just released HPX 1.5.1. This patch release fixes problems found in the 1.5.0 release. It contains the following changes:

  • Removes restriction on suspending runtime with multiple localities, users are now responsible for synchronizing work between localities before suspending.
  • Fixes several compilation problems and warnings.
  • Adds notes in the documentation explaining how to cite HPX.


Download the release from our releases page or by checking our the 1.5.1 tag. Please see our release notes for a full list of fixed issues and merged pull requests.


If you have any questions, comments, or exploits to report you can reach us on IRC (#ste||ar on freenode), on the matrix hpx channel, or email us at hpx-users. We depend on your input!

HPX 1.5.0 Released!

Make sure to read our release notes for a full list of changes as they contain important information on deprecations and breaking changes.

Thank you to everyone in the STE||AR Group and all the volunteers who have provided fixes, opened issues, and improved the documentation.

Download the release from our releases page.

The highlights of this release are:

  • We continued with the modularization. This results in the deprecation of several headers which have been moved to a different location while better matching the standard ones, we encourage you to update your includes to the new header location as soon as possible as it will be removed in a couple of releases. Here is the link to the headers you should refer to the public API.
  • We adapted several algorithms to be C++20 conformant and added hpx::stable_sort.
  • We have refactored the CMake target, you can now link directly to HPX::hpx (link additionally to HPX::wrap_main to have your main() to be used as the first HPX thread).
  • HPX now requires a C++14 compatible compiler and the standard is set by default to C++14.

If you have any questions, comments, or exploits to report you can reach us on IRC (#ste||ar on freenode), on the matrix hpx channel, or email us at hpx-users. We depend on your input!

GSoC 2020: Final Reports

Please find attached the final reports of our successful GSoC students::

Giannis Gonidelis, Adapt parallel algorithms to C++20: Final Report, Blog

Sayef Sakin, Time Series Updates for Traveler: Final Report/Blog

Weile Wei, Concurrent Data structure Support: Final Report/Blog

Mahesh Kale, Pip Package for Phylanx: Final Report/Blog

Pranav Gadikar, Domain decomposition, load balancing, and massively parallel solvers for the class of nonlocal models: Final Report/Blog

GSoC 2020 – Adapt parallel algorithms to C++20

Giannis Gonidelis – gonidelis@ece.auth.gr


“I often compare open source to science. To where science took this whole notion of developing ideas in the open and improving on other people’s ideas and making it into what science is today and the incredible advances that we have had. And I compare that to witchcraft and alchemy, where openness was something you didn’t do.”

– Linus Torvalds, Creator of the Linux kernel

DISCLAIMER

This is my final report aimed to be used in the final evaluation of GSoC 2020. It’s purpose is to not only encapsule my work at the third period of GSoC but give a detailed description of my work product in the whole program through the summer. 

Except from the standard program procedure which requires that I shall deliver a ‘Work Product Submission’, this document will be utilized as a blog-like post to inform future readers on my work at GSoC – HPX. 

What’s more important though, is that it should help future contributors to get involved in HPX. Thus, apart from being a report, it has the form of a mini-guide on “How could a student programmer start working and contributing in a demanding and large open-source project like HPX”. That is why it contains far more detail beyond my Pull Request descriptions.

~Giannis

0. The General Picture 

C++ Background

Iterators, ranges, Standard Template Library (STL), algorithms, containers, Standard C++. If you already know what these are, then skip this paragraph. These are everything a student needs, in order to get involved with a project like mine on HPX (before proceeding I suggest you read some intro on what HPX is for at the given link).

First things first, STL as you can guess is a C++ library. It’s a powerful C++ library that contains templated data-structures and algorithms to apply on. It enables users to create lists, arrays etc. of whatever data-type they want to. The data-structures are called containers, simply because they contain and relate elements in different ways. A simple example of an STL container is the standard vector

std::vector<float> v = {7.2, 5.5, 16.1, 8.6};

While a simple example of an STL algorithm is std::for_each(), which does exactly what it says: it iterates over the elements of a container and applies a given function to each of them. As you can see the vector is an (instantiated) template (you need the <float>  thing to instantiate it) which means that there is a generic underlying vector structure written behind the curtains, but it is specified according to the given data type.

The third member of the STL, are the iterators. Let me give you a mini introduction. Say you have a vector:

    vector<int> v = { 1, 2, 3, 4, 5 };

Quickly, think off the top of your head a way to print the elements of this vector. Let me guess. Did you just create a classic C-style indexed for-loop? Something like:

for (int i = 0; i < v.size(); i++)
  cout << v[i] << endl;

Well, C++ has introduced a more generic way to treat – access your data structures in seek of some “Holy Grail of Total Uniformity”: iterators. Iterators are pointers. I couldn’t emphasize more on that. They are a high level interface which facilitates reference to data-structures containers. The trivial code above, using iterators would be:

vector<int>::iterator iter;
for (iter = v.begin(); iter < v.end(); iter++)
        cout << *iter << ” “;

The example above utilizes the notions of containers and iterators. Now put the “algorithms” part into the game and the code ends up being a piece of (generic) art:

std::for_each( v.begin(), v.end(), [] (int c) {std::cout << c << std::endl; } );

Now that’s what I call an “one-liner”! Don’t get confused with the third argument of for_each(). It’s a lambda expression but if you don’t want to get dirty let me just tell you that it’s “the thing” I want to be done for each element of my container. It represents the functionality behind the algorithm. One of course could provide whatever “functionality” they want on the third argument. Mine, is just a simple print.

But why constrain ourselves with the use of this .begin() – .end() thingy over and over again? We may as well want to apply – do something to just a part of our collection. Or we may as well, want to pass the container as any sane coder would imagine: “Just print the damn v!”. That’s where C++20 kicks in by introducing the ranges library. 

Ranges hide the iterator mechanism by representing our collection, or better, our container, or even better, our range (!!!) in a standalone way without requiring the user to mess with iterators. Here is our example, using ranges:

std::ranges::for_each( v, [] (int c) {std::cout << c << std::endl ;} );

Yeah that’s right! Only two arguments: 1. the range v and 2. the function we want to apply (the lambda). Simple as that. One could imply we get a little bit pythonish here… but that’s another story to tell.  The thing is we ended up making our code more expressive, more readable and by far more abstracted. No index-missing mess, no complex loops, not even redundant pointers. There is a whole package of different containers and algorithms which could be combined under the ranges mechanism, so one could only imagine the possibilities.

 As I mentioned above, ranges were introduced in C++20. That means that they exist in the version of the language that was published in 2020. That is to say, the last one. A new C++ standard is published every three years. HPX being an independent library, needs to adapt to the C++ Standardization changes and follow the coding standards that are dictated by the community. 

What the Project is for

The title of my project should make more sense by now:  adapt the HPX parallel algorithms according to the latest C++ standard. This includes providing range-based overloads along with the pre-existing ones. 

The goal was actually twofold though. The pre-existing algorithms were taking a .begin() and .end() iterator as their first two arguments. These iterators needed to be of the same type as they were referring to a container of a specific type. But as containers got more abstracted and turned into ranges there was the need of possibly accessing part of the container in a “from the begin till that element” way. “…till that element” is not an iterator rather than some value which should satisfy just a simple condition: be comparable to an iterator. Thus, the first part of my goal was to adapt algorithms in a way that they would allow for their second argument to be of a different type compared to the first (iterator) argument. Let’s call that second argument, a sentinel. By allowing a begin iterator – sentinel value pair we have introduced ourselves the notion of range. The only thing now is to map (or maybe convert) a single range variable into this iterator – sentinel pair and the rest would be history.

By achieving the first goal mentioned above, our team would be able to proceed on providing the ranges mechanism that was introduced recently to the language. So the second goal was to allow algorithms to accept just one range argument. That would be achieved by adding one overload per algo, where the single range argument would be splitted into two arguments, begin_iteratorsentinel and then pass them to the underlying pre-existing “parent” algorithm which would do the rest of the job effortlessly.

All the above sums up my goals in GSoC 2020 on HPX.

1. The Chronicle

 On May the 4th, as soon as the accepted GSoC students were announced I got into touch with my mentors and the whole HPX team. Having already talked to them several times during my proposal preparation we didn’t lose any time into getting to know each other. We scheduled a meeting on a weekly basis and from that point on the journey began.   

Traits, other traits and utilities…

 I started familiarizing with the codebase and asking like a ton of questions on the IRC (that’s where STE||AR Group’s chat lies). The one thing led to the other and after lots (seriously like lots!) of studying of template (meta)-programming techniques and principles, I was assigned my first task: 

—  #4667 : Add `stopValue` in `Sentinel` struct instead of `Iterator`    The story behind this first Pull Request (PR) above is that there was this custom iterator – sentinel pair declaration in a header file, which we used every time we wanted to test whether an algorithm should work when provided this kind of pair. It needed some fixing. Relocate some variables, add some getters, adapt the files which were using it etc. And voila! My first HPX PR was a thing!At this point I would like to note that until I reached the point where I actually started adapting algorithms, there was this quite serious time period of familiarization with the code and it wasn’t until mid-late June when I actually got into the algorithm adaptation game.

—  #4745 : Add `is_sentinel_for` trait implementation and test`   
My second PR was a much needed, defined by C++20 standard trait: is_sentinel_for<>. As you can guess from its name we couldn’t move further if we didn’t have a tool to check whether a given sentinel is actually valid for a given begin iterator. Of course my implementation was accompanied by some test that was proving my trait actually worked (testing is quite a serious thing in HPX).

  #4768 : Add sized_sentinel_for concept and make distance dispatch according   to that
Next, a very very similar trait implementation took place which covered the exact same case (whether a given sentinel could be used along with a begin iterator) plus one important restriction: could we calculate the iterator’s distance from the sentinel in constant time? The trait was aimed to be used in our distance() utility which is being used for this very same reason in the algorithms codebase.

—  #4792 : Replace enable_if with HPX_CONCEPT_REQUIRES_ and add is_sentinel_for constraint
My next PR was a very simple one and it was the last “training”-like one before jumping into the algorithms. On a semi-adapted algorithm, that’s hpx::reduce(), I fixed minor constraints and added a custom HPX macro, namely HPX_CONCEPT_REQUIRES() which replaced the classic SFINAE enable_if() utility (SFINAE = Substitution Failure Is Not An Error). At this point I would strongly suggest that whoever wants to contribute to HPX or learn about template meta-programming to take a serious look at SFINAE. Anyways, where was I? Oh yeah! That was the last “preparation” commit so let’s dig into the actual job.

Adapt these algos already!

The first algorithm I took on was hpx::for_each(). Quite trivial, quite simple:

—  #4809 : Adapt foreach to C++20 
The workload involved changing every single iterator pair in a way that it should allow from the begin iterator to be different than the end iterator. This is the main step that allows the ranges infrastructure to be deployed as mentioned above. Minor housekeeping took place along the adaptation and previously developed traits were used. Tests that prove the functionality were of course delivered along with the adapted algorithm.    Next, I took on hpx::transform(). Same goal. Things changed though when we discussed with my mentor, Hartmut Kaiser, that we should broaden our goal. Instead of just changing the iterator pairs we decided that I should provide the whole interface for every algorithm, which should include all the algorithm overloads that were declared in the C++20 standard. This required extra programming techniques and a whole lot of code relocation within the namespaces. I studied Customization Point Objects(CPO), which ended up being the base of our adapted algorithms. Thus, the transform() adaptation as you can see in the following PR was by far more complicated and different than the for_each() one.

—  #4888 : Adapt transform to C++20   
For this last one we needed to create whole new result types, called in_out_result and  in_in_out_result which denote the concepts of a pair and a tuple accordingly according to the Standard. Then I added CPOs and tests for the non-ranges overloads  of the algorithm (this was part of the goal broadening) along with the range-based ones. Helper utility code was developed in parallel.

What’s next?

GSoC-2020 may have come to an end, but my cooperation with HPX has only just begun. It’s not only that there are lots of parallel algorithms left to adapt. It’s that in that three month period I only got a small taste of what it’s like to work in a large scale open source project. My involvement should stretch far beyond this algorithm adaptation project. GSoC was a perfect opportunity to spark the whole process.

Here is the link to the final report document: My work at GSoC 2020

GSoC 2020 – Massively Parallel Solvers

Final Report for project: Domain decomposition, load balancing, and massively parallel solvers for the class of nonlocal models

By: Pranav Gadikar

Mentors: Patrick Diehl, Prashant K. Jha

Project Proposal

Description of the problem

Recently various nonlocal models have been proposed and applied for understanding of complex spatially multiscale phenomena in diverse fields such as solid mechanics, fluid mechanics, particulate media, directed self-assembly of Block-Copolymer, tumor modeling, etc. Unlike local models which are governed by partial differential equations such as heat equation, wave equation, nonlocal models have dependence over a larger length than the size of discretization. This makes the partitioning of mesh and information sharing among the processors more difficult. Also, the amount of data communicated among the processors is higher compared to what is expected in the local models. The challenge is also to not have idle processors during the communication step. It is our understanding that an efficient computational method to solve the nonlocal model is very important and will have impact in many fields. The algorithm developed for one field can be easily applied and extended to models in the other fields. Our main goal in this project is to highlight and address the key difficulties in designing massively parallel schemes for nonlocal models while keeping the model related complexity to minimum. To meet the previous goal and to fix the ideas, we use the nonlocal heat equation in 2D setting. We utilize a modern parallelization library, HPX, to develop the parallel solver.