GSoC 2020 – Pip package for Phylanx

By: Mahesh Kale

This is a summary of the Google Summer of Code work related to creating a PIP package for Phylanx, which is An Asynchronous Distributed C++ Array Processing Toolkit developed by the STE||AR Group.

Project Links:

Repository : https://github.com/git-kale/phylanx_wheel

Issue #1218 : https://github.com/STEllAR-GROUP/phylanx/issues/1218

Issue #1234 : https://github.com/STEllAR-GROUP/phylanx/issues/1234

Pull #1204 : https://github.com/STEllAR-GROUP/phylanx/pull/1204

Pull #1238 : https://github.com/STEllAR-GROUP/phylanx/pull/1238

Summary of First & Second Evaluation
The first evaluation period consisted mostly of researching and finding out the best way to package the Phylanx project. I built the wheel package from the existing build system and packaged the HPX and Boost dependencies. I used patchelf to Patch the binaries. The patchelf was unable to work with large binaries so I created a Patch to make it work. At the end of the first evaluation, I had a wheel, but it failed the tests. During the Second evaluation period, I, with Nikunj and Parsa’s help, dig into the failing tests’ of Phylanx. The initial suspicion was a problem with binaries as we were getting a Segmentation fault. I did some digging into how the wheel packaging system packs different files. Packaging the wheel breaks the symlinks as Python Packaging doesn’t allow symlinks in the package. Therefore I renamed the libraries that as the symlinks. I patched the libraries and packed them inside the wheel. After this, I came across a different error that seemed unrelated to Python or creating the wheel package. Since I had no idea how the Phylanx works internally, I asked Prof. Hartmut about the issue. He suggested setting an environment variable that will tell the system the location of Phylanx Plugin Libraries.
Setting the environment variable solved the issue, and the wheel was working as expected. Since we had a wheel package working on the Linux System, I tried creating a manylinux wheel package for Phylanx. Instead of patching libs manually, We would use auditwheel, which would automate the patching and packaging. I looked up how auditwheel it worked and tried running it on the wheel that I had. It was unable to find the Phylanx plugins since Phylanx plugins have no links to any other library. Due to limitations in the auditwheel project and no libraries linking Phylanx Plugins, auditwheel would be of no use. I decided to test the wheel we built on various systems. And it worked well on different Linux distributions. I created a wheel package for python3.8 as well, but the wheel package was failing tests, and I started an issue for the same.
At the end of the second evaluation, I had a system that created working wheels for Phylanx. The tests were passing with python3.7 and failing with python3.8.
Unresolved issues during first and second evaluations:
The python3.8 wheel was unable to pass the tests. The size of the wheel package was 358MB, while PyPI supports packages up to size 80MB. A manual setup of PHYLANX_PLUGIN_PATH environment variable was necessary. I had created a different project that dealt with creating a wheel package, and we needed to make CI in a way that the wheel build will trigger after every commit on Phylanx.

Week 9
My immediate goal was to upload the python3.7 wheel we had to PyPI. The Phylanx Plugins were huge. Even if we remove the boost dependencies by building Phylanx with the latest toolchain, we’ll only reduce the wheel’s size by an MB or two. Parsa suggested trying to package the wheel from Release build of Phylanx instead of Debug build.
I rewrote the CI to create a release wheel of Phylanx. And much to my surprise, the size of the wheel dropped from 358 MB to 38MB! The wheel package was now uploadable to PyPI. The wheel package was working as intended and passing all the tests as expected.

Week 10
Since I had a working wheel with the size that was less the limit of PyPI, I tried uploading the wheel to PyPI. The PyPI threw error while uploading the wheel. PyPI wasn’t accepting the homepage of Phylanx as a valid homepage. I changed the homepage to the Github Project Page of Phylanx and set up the long description of Phylanx as the README of Phylanx. I rebuilt the wheel with the changes. But I was still getting an error.
The error mentions that the wheel version is incorrect, and hence the wheel cannot be uploaded. I found out that the package can’t be uploaded since PyPI invalidated the tag.
Platform compatibility tags allow building tools to mark distributions as compatible with specific platforms and enable installers to understand which distributions are consistent with the system on which they are running. The standards defined in PEP425 were insufficient for the public distribution of wheel on *nix systems. PEP 513, describes the manylinux standard, representing a common subset of Linux platforms, and allows building wheels tagged with the manylinux platform tag *only*.
The only solution is to convert this platform-specific wheel into a manylinux wheel via the manylinux project and the auditwheel tool.
The wheel I bulit was not uploadable to PyPI. So the only way I could achieve the goal, i.e., to install Phylanx with a single command pip install phylanx, was to have a manylinux wheel. I started working on building various dependencies on the manylinux docker. I stumbled across an unresolvable problem. The HPX and Phylanx expect you to make them using recent toolchains. There is a lack of documentation for the workarounds to work with manylinux. In the project’s case, dependencies like HPX and tcmalloc were unable to build due to requirement conflicts. TCmalloc requires C++11, and HPX further requires C++14. But manylinux docker bounds you to use older ABIs. So if in this case a manylinux wheel can’t be produced.

Week 11
Although not on a centos 5, the wheel that I created worked well with all the current systems as intended. The only problem was getting it uploaded to PyPI. I researched how PyPI implements a check to find whether the uploaded wheel is a manylinux wheel. This way, even if we don’t have a manylinux wheel, we can use hacks to make it pass the PyPI checks. To my surprise, there are no such checks implemented in PyPI. Just changing the Metadata of the wheel will make it pass the tests. I made changes to the wheel, and I was able to upload it successfully to PyPI!
Now the wheel was being fetched as intended after running pip install phylanx. I informed the mentors about the progress. Parsa reported that he was unable to install the wheel. I was using manylinux2014 tag, which is the latest one, while systems that are not even a year old didn’t support this tag. So I changed the versioning tag of the wheel again from manylinux2014 to manylinux1. And the issue was resolved. After installing the wheel package through the command on fresh Debian and Fedora docker, I ran tests, and they were passing the tests as expected.
I moved to create a wheel for python3.8 after this. The issue I made on GitHub was closed and resolved. Therefore I built the wheel package for python3.8 again. But the tests were still failing. I tried to find if there was a mistake from my side building Phylanx. I was using a similar CI that I used for creating a working wheel for python3.7. Only 5/102 tests were passing where the build passes 99/102 tests. I started an issue in Phylanx for the same with instructions to recreate the error.

Week 12
Until now, I had created a CI system that produced a working wheel of Phylanx. But all the work I had done was done in a different project repository. The integration of the work I did with the main Phylanx and updating the current CI was the next step.
The CI that I was working with until now was rigid, and with new updates, It was likely to break. Eg. The current version of Phylanx dependencies was hardcoded in the CI. But after some time, The version of the dependencies of Phylanx will change, and the CI will not work. So I changed the CI entirely so that it will work even if the version of packages changes. After this, I integrated the CI into my forked version of Phylanx.
I created PR for the wrong test cases. There was a need to install NumPy after building the Phylanx wheel. I have made a Pull Request that will remove this step.


Conclusion of Project
I have created a system able to produce wheels. Pip install phylanx now installs the package. After this, you have to set an environmental variable PHYLANX_PLUGINS_PATH to the plugin directory.
Steps to install Phylanx:
• pip install phylanx
• export PHYLANX_PLUGINS_PATH=/phylanx-libs/phylanx

Future Work
Since the Phylanx was not working with Python3.8, I could not publish a wheel for python3.8. I will do this once the Phylanx starts working with python3.8. I would create a wheel package for both macOS and Windows. In the current approach, I am moving the dependencies to the wheel and patching them. Having a build system that will output libraries (dependencies) with correct linking at intended places will remove the necessity of moving binaries and fixing them. I would be working on ways to need to set up the Phylanx Plugin Path manually after installing Phylanx using pip3 install phylanx.

Leave a Reply

Your email address will not be published. Required fields are marked *