GSoC 2024 Contributors Announced!

We are very proud to announce the names of the 5 contributors this year who will be funded by Google to work on projects for our group through Google’s Summer of Code 2024!

These contributors represent the very best of the many excellent proposals that we had to choose from. For those unfamiliar with the program, the Google Summer of Code brings together ambitious students from around the world with open source developers by giving each mentoring organization funds to hire a set number of participants. Students then write proposals, which they submit to a mentoring organization, in hopes of having their work funded.

Below are the contributors who will be working with the STE||AR Group this summer listed with their mentors and their proposal abstracts.


Project Title: Conflict (Range-Based) Locks

Contributor: Hari Hara Naveen S, Indian Institute of Technology, Madras

Mentors: Panos Syskakis, Mikael Simberg, John Biddiscombe

In some multi-threaded algorithms, resources need to be protected using locks, but the locking mechanism may need to operate on ranges rather than individual items. For instance, imagine a scenario with a large array of N items where one task requires a small continguous subset of items to be locked while another task requires a different continguous subset. In such cases, a range-based locking mechanism is required. We need a templated range-based lock that can be applied to arrays of various types and dimensions. A successful implementation should support locking and unlocking operations on specified ranges of items and can be extended to handle multi-dimensional arrays (2D/3D, etc.) with templates that allow flexibility over dimensions and data types.


Project Title: Rustize HPX

Contributor: Dikshant Gurudutt, International Institute of Information Technology, Hyderabad

Mentors: Hartmut Kaiser, Shreyas Atre

Providing performant HPX functionality written in C++ with Rust APIs to facilitate safety as well as ease of learning HPX. Designing and implementing Rust bindings for HPX, exposing all or parts of the HPX functionality with a Rust API. Implementing rust’s ffi along with libraries like cxx.rs, bindgen to intrope hpx functions to rust.


Project Title: Adapting the Parallel Algorithms in HPX for Usage with Senders and Receivers

Contributor: Tobias Wukovitsch, University of Vienna

Mentors: Hartmut Kaiser, Isidoros Tsaousis-Seiras

HPX supports the C++ senders and receivers facilities, which were, inter alia, specified in the standard proposal P2300. However, not all of HPX’s parallel algorithms can currently be used as sender adapters. This project aims to address this issue by completing the remaining S/R implementations of HPX’s algorithms.


Project Title: Standardize and Visualize HPX Benchmarks

Contributor: Vedant Ramesh Nimje, Veermata Jijabai Technological Institute, Mumbai

Mentors: Giannis Gonidelis, Shreyas Atre

HPX, as a framework designed for high performance computing, has various benchmarks for measuring the performance of its various components, which includes parallel algorithms, its runtime system, etc. But these benchmarks (performance tests) lack a standardized format and a visualization tool that can help in analyzing performance trends over time, in different operating environments. Hence, the goal of this project is to standardize the benchmarks’ output formats within HPX, and to also add integration with an external benchmarking framework, i.e., nanobench. Additionally, a visualization tool will also be developed, which will leverage the standardized formats to display the results of the benchmarks in an intuitive manner. Expected results: (1) A unified format for HPX benchmarking using chosen benchmarking framework. (2) Automating the installation of the chosen benchmarking framework in the HPX build system. (3) A visualization tool, developed using python and matplotlib, to display the results. (4) Integration of this plotting tool with CI/CD pipelines, to track and display performance reductions or improvements


Project Title: Implement hpx::contains and hpx::contains_subrange (std::contains and std::contains_subrange).

Contributor: Zakaria Abdi, University of Bath

Mentors: Panos Syskakis, Isidoros Tsaousis-Seiras

A parallel and sequential implementation of std::contains and std::contains_subrange for hpx. To implement this, I will create a customisation point object, which will take in the parameters of the function and depending on whether or not an execution policy is passed, the CPO will dispatch a call to either the parallel or the sequential implementation of hpx::contains or hpx::contains_subrange which will be implemented within a struct called contains or contains_subrange. Both of these structs will inherit from a base class called algorithm which will take in contains or contains_subrange and a bool as template parameters. Inheriting from algorithm will give access to the call member function which will make a call to either parallel or sequential function via CRTP and type based dispatching.

STE||AR Group, 10 years of GSoC Mentorship – Summer 2024

The STE||AR Group is honored to be selected as one of the 2024 Google Summer of Code (GSoC) mentor organizations! This program, which pays students over the summer to work on open source projects, has been a wonderful experience for students and mentors alike. This is our 10th summer being accepted by the program!

Interested students can find out more about the details of the program on GSoC’s official website. As a mentor organization we have come up with a list of suggested topics for students to work on, however, any student can write a proposal about any topic they are interested. We find that students who engage with us on Discord or via our mailing list hpx-users@stellar-group.org have a better chance of having their proposals accepted and a better understanding of their project scope. Students may also read through our hints for successful proposals.

If you are interested in working with an international team of developers on the cutting edge of C++ parallel, task-based runtime systems please check us out!

STEM Careers at the NSA and Quantum Computing

Talk title: STEM Careers at the NSA and Quantum Computing

Speaker: Sean Nemetz-MA, National Security Agency

Location: Digital Media Center Theatre

Date: February 08, 2024 – 02:00 pm

The talk was hosted by CCT and sponsored by the Women in Math Society of the National Security Agency. This event promised to be a compelling exploration of the potential STEM careers at NSA agency, quantum computing, and cryptography. During the talk, the speaker discussed opportunities for a STEM career at the agency, followed by a more technical talk about quantum computing, its immediate application in public key cryptography, and the potential impact of quantum computing on the NSA’s mission.  

Here you can find more information about Wims, and the speaker.

We have also asked the students to register so we could do some statistics. 28 students registered, but there were more in the room. The majority of students were LSU Computer Science students, and there were also some from the Math Department.

The student attendees were engaged, and the event was well-received. Q&A continued among the speaker and attendees even after the event ended.

Below are graphs some graphs detailing attendee demographic by field and level of study, race/ethnicity, and gender identity.

Below is a list of minority or underrepresented groups to which some of the student attendees belong:
Women

Racial minority, lgbtq+, and disability

Hispanic / Latino

Black American

African American

Nigerian

Hispanic female

LGBTQ+ community

Vietnamese, LGBTQ+

African American, LGBTQ+

Veteran

HPX 1.9.0 Released

We have released HPX 1.9.0 — a major update to our C++ Standard Library for Concurrency and Parallelism. The HPX parallel algorithms now have been fully adapted to C++23, all existing facilities have been adjusted to conform to this version of the Standard as well. We now can proudly announce full conformance to the C++23 concurrency and parallelism facilities. HPX supports all of the parallel algorithms as specified by C++23. We have been able to significantly improve the performance of some of our algorithms. On top of that we support parallel versions of all range-based algorithms and have added more support for explicit vectorization to our algorithms (using std::experimental::simd). Even more work has been done towards implementing P2300 (std::execution) and keeping the underlying senders/receivers facilities in line with the evolving standardization efforts. We have done a lot of refactoring to improve the consistency of our exposed APIs. Last but not least, we have continued to improve our documentation, please have a look here.

You can download the release from our releases page or check out the v1.9.0 tag using git. A full list of changes can be found in the release notes.

GSoC 2023 Participants Announced!

It is time to announce the participants for in the STE||AR Group’s 2023 Google Summer of Code! We are very proud to announce the names of the 5 contributors this year who will be funded by Google to work on projects for our group.

These recipients represent the very best of the many excellent proposals that we had to choose from. For those unfamiliar with the program, the Google Summer of Code brings together ambitious students from around the world with open source developers by giving each mentoring organization funds to hire a set number of participants. Students then write proposals, which they submit to a mentoring organization, in hopes of having their work funded.

Below are the contributors who will be working with the STE||AR Group this summer listed with their mentors and their proposal abstracts.


Participant:

Aarya Chaum, College of Engineering, Pune

Mentors: Rod Tohid, Shreyas Atre

Project: hpxMP: HPX threading system for LLVM OpenMP

One of the challenges in adopting HPX is the performance degradation observed in applications that use OpenMP. This occurs because of the contention between HPX threads and OpenMP’s native threading system (i.e., pthread) over available resources. hpxMP aims at resolving this issue by adding support for HPX threads as an alternative to pthreads in LLVM OpenMP. This work relies on the HPXC, which replicates pthread’s API.


Participant:

Arnav Negi, International Institute of Information Technology, Hyderabad

Mentors: Shreyas Atre, Alireza Kheirkhahan

Project:  Async I/O using Coroutines and S/R – Traversing large scale graphs

If graphs are really large their adjacency lists become harder and slower to read and process. This can be a real concern in graph algorithms, as the I/O operations will slow them down considerably. The goal is to maximize speedup to this use case using asynchronous I/O and parallel algorithms. The implementation of this use case will use io_uring along with co-routines for asynchronously reading the graph files, senders and receivers to traverse the graph using the parallel execution policy par_unseq, and multiple NUMA domains to further accelerate memory access.


Participant:

Hari Hara Naveen, Indian Institute of Technology

Mentors: Srinivas Singanaboina

Project: Add Vectorization to par_unseq Implementations of Parallel Algorithms

HPX parallel algorithms currently don’t support the par_unseq execution policy. This project is centered around the idea to implement this execution policy for at least some of the existing algorithms (such as for_each and similar).


Participant:

Isidoros Tsaousis, Aristotle University of Thessaloniki

Mentors: Giannis Gonidelis

Project:  Implement hpx::relocate (P1144)

Modern C++ specifications and the HPX library offer a rich set of algorithms to ensure efficient resource utilization. Nevertheless, there is still room for improvement in data movement operations. Proposal P1144 introduces std::relocate, a feature designed to optimize data relocation by making it safer, faster, and greatly simpler. Essentially, std::relocate utilizes a single memcpy operation to move objects while avoiding unnecessary move-constructor and destructor calls. This improvement impacts key primitives like swap and vector.reserve, subsequently leading to speedup in higher-level algorithms such as rotate and sort. The goal of this proposal is to implement relocation in HPX.


Participant:

Shubham Kumar, Indian Institute of Information Technology, Kalyan

Mentors: Steve Brandt, Rod Tohid

Project: Pythonize HPX!

the project aims to create a Python wrapper for the HPX task-based runtime system to make it more accessible to non-expert users who may not be proficient in C++. The HPX library provides parallel and distributed algorithms and data structures for C++, which can be challenging to use for beginners. The Python wrapper will address this challenge by providing a user-friendly interface for the HPX library, enabling users to leverage its power without requiring knowledge of C++. The project will help increase the accessibility of the HPX library and allow more people to benefit from its performance advantages. However, there are challenges associated with creating a Python binding for parallel computing, such as thread locking due to the Global Interpreter Lock (GIL), templates, reference counting, and handling. The deliverables of this project will include a Python wrapper for the HPX library, documentation, and examples to help users get started with the library.

WAMTA 2023 Best Poster Award

We are pleased to announce the 1st place Best Poster Award winner, Maxwell Cole, for his poster: Computational feasibility of simulating radiation induced changes to vasculature and blood flow rates in the entire human body.

2nd place was awarded to Ioannis Gonidelis for the poster titled Evaluating and Improving Shared Memory Performance of HPX and OpenMP using Task Bench.

Each year, the Best Poster Award recognizes outstanding presentations in the conference’s Poster Session. Posters are judged by external workshop attendees.

We like to thank HPE Enteprise for sponsoring the poster prices.

Maxwell’s poster can be viewed at the following link: https://zenodo.org/record/7647521#.Y_fLWS-B1KM

GSoC 2022 – Adapting std Algorithms for the unseq and par_unseq Execution Policies

Kishore Kumar, International Institute of Information Technology, Hyderabad

Adapting std Algorithms for the unseq and par_unseq Execution Policies

I began my work by first analyzing and testing compiler support and codegen for different user provided hints. This was used to create the original version of #6016. Later, I added support for the omp backend which is supported by later versions of Clang and ICC out of the box. As of the latest PR the unseq backend will first attempt to use the omp backend, and if it is not available, default to compiler specific hints. 

After this, the next task assigned to me was to implement a basic version of the transform_loop and loop CPO’s. This was initially completed keeping in mind just supporting the original non-omp backend. Later, it was ported to account for supporting the omp backend as well. In particular, GCC will throw errors if the loops asked to vectorize are not conforming to the standard syntax:

for(int counter=0; counter < limit; counter++) { … }

So the implementation was then changed to vectorize loops only when passed std::random_access_iterator’s. This is #6017.

Following this, I wrote a mini-benchmark environment for testing the performance of my adaptation of the std algorithms here. This exists as a separate repo and was used to report all the benchmark numbers shown here. 

A strong case for switching to the omp backend was its support for declaring reductions on supported clauses. The next task I worked on was implementing an efficient version of the reduce CPO’s here #6018. Reductions for default-supported ones were overloaded to their respective methods, and a generalized implementation is given as well. This mostly gets the job done, however for the specialized-overloads to accept the overload the reduction operation must exactly match the type of the init value. For example, if reduction is over unsigned int and init is signed, the overload will not accept. This is a TODO that I believe is possible to achieve with more template meta-programming. I will be working on this post GSoC.

Note: GCC Unseq can probably be made a decent amount faster by switching to the omp backend (Does not default support). Also, clang no-vec benchmarks were removed from the chart as they were very slow and skewed the visualization. 

GSoC 22: First Eval Update

In the second week of July, we completed the first evaluation of our Google Summer of Code program. The students have provided summaries of their work and details of the pull requests they’ve created. Check them out below:

Monalisha Ojha:

https://medium.com/@monalisha-ojha/multiple-datasets-performance-visualization-traveler-a352c13f7c25

Multiple Datasets Performance Visualization — Traveler

Phase-1 of Google Summer of Code 2022 at Stellar Group

This summer, I am working as a Google Summer of Code mentee in STE||AR Group on “Upgrading Multiple Datasets Performance Visualization feature in Traveler” under the mentorship of Kate Isaacs. This blog summarizes my work on the Traveler Platform during phase 1 of Google Summer of Code 2022 program.

About Traveler

Traveler-Integrated is a web-based visualization system for parallel performance data, such as OTF2 traces and HPX execution trees. HPX traces are collected with APEX and written as OTF2 files with extensions. It is developed by the HDC Lab (Humans, Data and Computers Lab) at the University of Arizona.The major goal of this platform is to provide meaningful insights into parallel performance data in the form of Gantt charts (trace data timelines with dependencies), source code, expression tree, aggregated time series line charts for counter data, utilization chart and task level histograms.

Web Interface of Traveler

Abstract

The aim of this project, “Multiple Datasets Performance Visualization,’’ is to add specific features in the platform that will help in managing multiple data files and organizing traveler interface windows to handle the comparison of data. Organizing multiple datasets in the platform, comparison of datasets side by side, implementing a highlighted linking system for multiple datasets and organizing datasets efficiently for visualization are some of the major sub-goals.

Phase — 1

Updated the Tagging system of Traveler Interface to accommodate multiple datasets

Issue : Organizing the datasets according to their assigned tags.

Made changes in the interface main menu to display the datasets according to their tags names. Tested the tagging system back-end to accommodate multiple datasets. The screenshot displays the fixes made when tested with 2 datasets.

Traveler Interface

Issue Link: https://github.com/hdc-arizona/traveler-integrated/issues/90

Pull Request: https://github.com/hdc-arizona/traveler-integrated/pull/91

Fixed glitches related Traveler front-end

Issue: Displaying a clear relationship between a folder and its datasets.

Made changes in the front-end to make the lines visible that shows the connection between folder and its datasets. Adjusted the tag header to solve the tag overlapping issue for multiple datasets. The screenshot of the changes are shown below.

Traveler Interface

Issue link: https://github.com/hdc-arizona/traveler-integrated/issues/92

Pull request link: https://github.com/hdc-arizona/traveler-integrated/pull/93

Adding dynamic color highlighting system

Issue: Adding a color picker system to distinguish between multiple datasets.

“Change Datasets color” option is added to datasets context menu. With this feature, a user can change the datasets selection color and main menu color to be distinguishable from other datasets. The screenshots of changes done till now are displayed below:

Traveler Interface

Pull request link: https://github.com/hdc-arizona/traveler-integrated/pull/94

Shreyas Atre

https://satacker.github.io/docs/c++/GSoC-HPX/

Mentors (STE||AR Group @ LSU)

  1. Dr. Hartmut Kaiser, Adjunct Professor @ LSU
  2. Giannis Gonidelis, RA @ LSU

Abstract#

HPX being up to date with Std C++ Proposals, Senders/Receivers were implemented as per P2300. But they have been missing coroutine (co_await) integration and minor functionalities as described in P2300 which is likely to be accepted. Hence I plan to implement these functionalities within the Core HPX Library.

  • Benefits:
    • Coroutines introduce better async code. For example, it is more readable, local variables have the same lifespan as the coroutine which means we don’t need to worry about allocation/release.
    • S/R algorithms can work with coroutines which they cannot as of now unless relied on futures which as mentioned are single-time use.
    • Adding co_await support makes the code more structured with respect to concurrency which can also be done by library abstractions of callbacks but using co_await may make it more optimized.

Brief Summary#

  • Senders, and Receivers
    • Because it makes a more consistent programming model considering async programming types i.e. Parallelism and Concurrency. It standardizes the terminologies and execution policies which are more generic and reduce redundancy.
    • Coroutines have a direct connection between Senders and Coroutine Awaitables.
  • Futures
    • One of the points of S/R is to avoid the allocations associated with futures, also, futures are single-use, whereas S/R, in general, can be used (started) multiple times. – Dr. H. Kaiser

Goal is to enable all Sender CPOs to do the following:

  • If we write a sender and pass it to a function which could be a coroutine that could co_await that sender and get its result.
  • If they are not generally awaitable then we can await transform them (i.e. make them awaitable).

Work#

My PRs can be found using this link as it’ll always be updated.

Following are the Merged PRs until now:

With coroutine traits completed, my remaining work is the following:

  1. Adapt get_completion_signatures when Sender is a awaitable
  2. Utility as_awaitable_t
    • receiver_basesender_awaitable_base
    • to transform an object into one that is awaitable within a particular coroutine.
  3. promise base for 5.
  4. operation base for 5.
  5. Utility connect_awaitable to adapt connect mentioned in spec 2.2
  6. Utility with_awaitable_senders
    • Used as the base class of a coroutine promise type, makes senders awaitable in that coroutine type

References#

Panagiotis Syskakis:

I’m Panos, currently studying Electrical and Computer Engineering in Aristotle University of Thessaloniki, in Greece. This summer, I joined the HPX team as a contributor through Google Summer of Code (GSoC).

My GSoC project involves performance analysis and optimization on C++ standard parallel algorithms.

To explain further:
The C++ standard defines many functions for algorithms that are commonly used by developers (eg. sorting, searching).
HPX provides sequential and parallel implementations for all these algorithms.
I’m working on improving the performance of these implementations.

So far, I have explored different methodologies for visualizing and assessing an algorithm’s performance. This has involved a lot of scripting for automating tasks, as well as data collection and analysis.

With help from my mentor, I have produced plots that show how an algorithm’s performance changes when tweaking different parameters (such as workload size and number of computer cores). We also produced visualizations of how different tasks are distributed and where/how they are executed in a parallel environment.

Most importantly though:
The HPX community has been immensely welcoming. It can often be awkward being “the new junior guy”, but my mentor quickly made me feel like a part of the team.
People here are talented, but also fun and humble, and always eager to help.

This summarizes my experience for the first two months of GSoC. I have learned tons so far. My work here is far from done, however we have laid a great foundation for the work that will follow.

GSoC 2022 Participants Announced!

It is time to announce the participants for in the STE||AR Group’s 2022 Google Summer of Code! We are very proud to announce the names of the 5 contributors this year who will be funded by Google to work on projects for our group.

These recipients represent the very best of the many excellent proposals that we had to choose from. For those unfamiliar with the program, the Google Summer of Code brings together ambitious students from around the world with open source developers by giving each mentoring organization funds to hire a set number of participants. Students then write proposals, which they submit to a mentoring organization, in hopes of having their work funded.

Below are the contributors who will be working with the STE||AR Group this summer listed with their mentors and their proposal abstracts.


Participant:

Shreyas Swanand Atre, Veermata Jijabai Technological Institute

Mentors:

Giannis Gonidelis

Hartmut Kaiser

Project: Coroutine-like interface

HPX being up to date with Std C++ Proposals, Senders/Receivers were implemented as per P2300. But they have been missing coroutine (co_await) integration and minor functionalities as described in P2300 which is likely to be accepted. Hence I propose to implement these functionalities within the Core HPX Library. Benefits: * Coroutines introduce better async code. For example, it is more readable, local variables have the same lifespan as the coroutine which means we don’t need to worry about allocation/release. * S/R algorithms can work with coroutines which they cannot as of now unless relied on futures which as mentioned are single-time use. * Adding co_await support makes the code more structured with respect to concurrency which can also be done by library abstractions of callbacks but using co_await may make it more optimized.


Participant:

Panos Syskakis, Aristotle University of Thessaloniki

Mentors:

Giannis Gonidelis

Hartmut Kaiser

Project:  HPX Algorithm Performance Analysis & Optimization

The latest C++ specifications and the HPX library introduce a variety of ready-to-use algorithms that may use parallelization and concurrency, in order to more efficiently utilize system resources. However, current implementations of parallel algorithms don’t always perform ideally (low thread utilization, large overhead, in some cases slower than sequential). The goal of this project is to investigate this under-performance and improve current implementations, using scaling analysis, profiling tools and visualizations.


Participant:

Bo Chen, University of Science and Technology Beijing

Mentors:

Patrick Diehl

Project: Implement your favorite Computational Algorithm in HPX ( Molecular Dynamics Simulation of Metal)

My Implement will base on MISA-MD. There are various potential functions used in MD simulation under fields, such as Tersoff potential and Lennard-Jones (L-J) potential, for calculating the interaction among atoms. To improve the simulation accuracy, MISA-MD adopted Embedded Atom Method (EAM) potential, a complex but pretty accurate potential Function, which can provide an effective interatomic description for metallic system. To improve the runtime performance, MISA-MD designed and realized a new hash based data structure for efficient atom storage and quick neighbor atom indexing.


Participant:

Kishore Kumar, International Institute of Information Technology, Hyderabad

Mentors:

Nikunj Gupta

Srinivas Yadav

Project:  Implementing auto-vectorization hints for par_unseq and unseq versions of HPX parallel algorithms

C++ 17 and 20 released the par_unseq and unseq execution models which give guarantees to functions which specialize on them that data access functions can be interleaved even between iterations of one thread. This means that these functions are vectorization safe and can thus gain massive boosts in performance by compiler auto-vectorization. Compilers however are conservative and auto-vectorize loops only when they are sure that vectorized versions give the same result as their scalar counterparts and that vectorization will actually end up being profitable. GCC, Clang, MSVC, ICC all rely on different optimization passes in their backend and are all capable of auto-vectorizing certain loop patterns but not all. The goal of this project is to analyze compiler codegen response to different hints and implement a version of the par_unseq and unseq execution policies in HPX that makes use of these guarantees to provide compilers with as many hints as possible to encourage auto-vectorization.


Participant:

Monalisha Ojha, Birla Institute of Technology, Mesra

Mentors:

Kate Isaacs

Project: Multiple Dataset Performance Visualization

Traveler-Integrated is a web-based visualization system for parallel performance data, such as OTF2 traces and HPX execution trees. HPX traces are collected with APEX and written as OTF2 files with extensions. The major goal of this platform is to provide meaningful insights into parallel performance data in the form of Gantt charts (trace data timelines with dependencies), source code, expression tree, aggregated time series line charts for counter data, utilization chart and task level histograms. The aim of this project, “Multiple Dataset Performance Visualization,” is to add specific features in the platform that will help in managing multiple data files and organising traveler interface windows to handle the comparison of data. Organising multiple datasets in the platform, comparison of datasets side by side, implementing a highlighted linking system for multiple datasets and organising datasets efficiently for visualisation are some of the major sub-goals.