GSoC 2022 – Adapting std Algorithms for the unseq and par_unseq Execution Policies

Kishore Kumar, International Institute of Information Technology, Hyderabad

Adapting std Algorithms for the unseq and par_unseq Execution Policies

I began my work by first analyzing and testing compiler support and codegen for different user provided hints. This was used to create the original version of #6016. Later, I added support for the omp backend which is supported by later versions of Clang and ICC out of the box. As of the latest PR the unseq backend will first attempt to use the omp backend, and if it is not available, default to compiler specific hints. 

After this, the next task assigned to me was to implement a basic version of the transform_loop and loop CPO’s. This was initially completed keeping in mind just supporting the original non-omp backend. Later, it was ported to account for supporting the omp backend as well. In particular, GCC will throw errors if the loops asked to vectorize are not conforming to the standard syntax:

for(int counter=0; counter < limit; counter++) { … }

So the implementation was then changed to vectorize loops only when passed std::random_access_iterator’s. This is #6017.

Following this, I wrote a mini-benchmark environment for testing the performance of my adaptation of the std algorithms here. This exists as a separate repo and was used to report all the benchmark numbers shown here. 

A strong case for switching to the omp backend was its support for declaring reductions on supported clauses. The next task I worked on was implementing an efficient version of the reduce CPO’s here #6018. Reductions for default-supported ones were overloaded to their respective methods, and a generalized implementation is given as well. This mostly gets the job done, however for the specialized-overloads to accept the overload the reduction operation must exactly match the type of the init value. For example, if reduction is over unsigned int and init is signed, the overload will not accept. This is a TODO that I believe is possible to achieve with more template meta-programming. I will be working on this post GSoC.

Note: GCC Unseq can probably be made a decent amount faster by switching to the omp backend (Does not default support). Also, clang no-vec benchmarks were removed from the chart as they were very slow and skewed the visualization.