/
Building scientific software on POWER, what help is available? Building scientific software on POWER, what help is available?

Building scientific software on POWER, what help is available? - PowerPoint Presentation

wang
wang . @wang
Follow
66 views
Uploaded On 2024-01-29

Building scientific software on POWER, what help is available? - PPT Presentation

Andrew Edmondson University of Birmingham Royal charter in 1900 history back to 1828 Member of Russell Group 34835 students 201718 4 th largest in UK 11 staff and alumni are Nobel prize winners ID: 1043097

numpy easybuild ibm scipy easybuild numpy scipy ibm https scientific openblas software tensorflow powerai python github problem conda 2019

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Building scientific software on POWER, w..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Building scientific software on POWER, what help is available?Andrew Edmondson

2. University of BirminghamRoyal charter in 1900 (history back to 1828)Member of Russell Group34835 students (2017/18) – 4th largest in UK11 staff and alumni are Nobel prize winners£134.2M research income (2017/18)£3.5bn economic impactCampuses in Birmingham and Dubai

3. IBM® POWER9™

4. RSEConUK 2019 Programme Chair@RSEConUK

5. https://www.ibm.com/it-infrastructure/power

6. BEAR AI“In November 2018 we announced the imminent arrival at the University of the largest IBM POWER9 AI cluster in the UK. We are now inviting pilot users on to the system... We are particularly looking for people who use TensorFlow, PyTorch, or other GPU-accelerated software to contact us. We will help you use the service.” [https://intranet.birmingham.ac.uk/bear-ai]

7. Installed version: 1.10.1Python-based open source machine learning framework from Google.Installed version: 1.0.1An open source deep learning platform from Facebook.Installed version: 2018.4Very popular HPC molecular dynamics package with GPU acceleration.Scientific Software

8. Scientific SoftwarePlus:Amber Autoconf Automake Autotools Bazel BEDTools binutils Biopython Bison BLAST+ Blosc BOLT-LMM Boost Boost.Python bzip2 cairo CALMET CALPOST CALPUFF chiron CMake CUDA cuDNN Cufflinks CUnit cupy cURL DBus Deepbinner DendroPy dnaMD do_x3dna Doxygen Eigen EIGENSOFT ETE expat fast5_fetcher FFmpeg FFTW FLANN flappie flex fontconfig foss fosscuda FreeBayes freeglut FreeSurfer freetype FriBidi FSL future GATK GBOOST gc GCC GCCcore gcccuda gettext Ghostscript GLib GLPK GMP GObject-Introspection gompi gompic gperf Graphviz GROMACS GSL GTS Guile h5py HarfBuzz HDF5 hdf5storage help2man HPL HTSeq hwloc hypothesis icc intltool IPython JasPer Java Keras LAME LAMMPS libdrm libffi libgd libGLU libgpuarray libiconv LiBiNorm libjpeg-turbo libmatheval libpciaccess libpng libreadline libsodium libStatGen LibTIFF libtool libunistring libxml2 libxslt libyaml LLVM LMDB LWP-Protocol-https lxml LZO M4 magma Mako mappy matplotlib mayavi Mesa METIS mne MPFR MUMmer NAG NASM NCCL ncurses netCDF netCDF-Fortran nettle NiBabel Ninja NLopt NSPR NSS numactl numexpr ont-fast5-api Open3D OpenBLAS OpenMPI OpenPGM OptiType ORCA Pango PCRE Perl PGI picopore Pillow Pillow-SIMD pixman pkgconfig pkg-config PLUMED powerveclib protobuf protobuf-python psutil pylmdb Pyomo PyQt5 Pysam PyTables Python PyTorch PyYAML Qhull Qt5 SAMtools ScaLAPACK scikit-learn scikit-multiflow SCOTCH SeqAn snp-sites Sphinx SQLite SWIG Szip tb_nightly Tcl TensorFlow tensorflow-probability Theano Tk Tkinter TopHat torchvision transIndel uMatIC util-linux v8 veclib VTK wheel X11 x264 x265 x3dna XML-Parser xorg-macros xprop XZ Yasm ZeroMQ zlibSo far…

9. Opt 1: PowerAI Conda ChannelEasy to install, IBM provided (=> supported)conda install powerai=1.6.0conda install tensorflow-gpu powerai-release=1.6.0https://developer.ibm.com/linuxonpower/2019/03/20/powerai-1-6-0-introduction-a-full-transition-to-conda/

10. Opt 2: EasyBuild“EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.”https://easybuilders.github.io/easybuild/

11. EasyBuildEasyBuild allows us to easily and reproducibly build lots of scientific software for various different platforms.We have:EL7 sandybridge, haswell, broadwell, skylake, cascadelakeUbuntu 16.04 haswellAnd now EL7 POWER9

12. Aside: EasyBuild and CondaWe’ve also made an EasyBuild recipe to install the PowerAI packages from the Conda channel…

13. Anon. IBM bigwig quote“You change a couple of flags for the compiler and it just builds on POWER” [2019]Um. No.But if it goes wrong, there is help available…

14. EasyBuildTalked before about EasyBuild expecting Intel…Working with maintainers to support POWERE.g. CUDA, Java, NCCL, PyTorch, PGI, LAMMPS, Mesa, TensorFlow, Amber…

15. EasyBuildCUDA:Distributed as binary, source file named differently on Intel vs. POWERPrevious version RPM only, but latest has .run file (hooray!)Java:Oracle Java not available for POWERHad to modify EasyBuild to use OpenJDK

16. EasyBuildLatest EasyBuild has fixes for some of those things – some from Birmingham, some from other people.E.g. TensorFlow now uses PIP, and “just works” on POWER

17. Challenges/Problems1. Simon talked about porting code to POWERNot going to say any more here2. When things go wrong…

18. Problem: NumPy/SciPy“The fundamental package for scientific computing with Python.” https://github.com/numpy/numpy“[NumPy] is a very important library on which almost every data science or machine learning Python packages such as SciPy (Scientific Python), Mat−plotlib (plotting library), Scikit-learn, etc depends...”https://towardsdatascience.com/lets-talk-about-numpy-for-datascience-beginners-b8088722309fhttps://www.numpy.orghttps://www.scipy.org

19. Problem: NumPy/SciPyBuilding SciPy 1.2.1 on POWER9 using EasyBuild… (core dumped)Investigate…EasyBuild doesn’t didn’t parse the test results… and there were failures. We/it just hadn’t noticed them.No-one else (it seems) had run the SciPy tests on POWERThis affected every NumPy/SciPy version that we tested… including the PowerAI Conda packages.Shout “Help!”

20. Help: NumPy/SciPySend message on EasyBuild Slack… [4 June 2019] The EasyBuild community were alarmed that NumPy and SciPy tests hadn’t been checked… so they started working on a fix.Created bug report on SciPy’s GitHub: https://github.com/scipy/scipy/issues/10256Send message on PowerAIUG Slack… Our IBM contact read it and started talking to IBM dev teams in North America…“To let everyone know - there is a lot of discussion about the bug you mention above within our Development team's Slack channel. They're currently investigating the problem with a variety of packages” [June 2019]

21. Problem: NumPy/SciPy OpenBLASAfter several emails with updates, questions etc…IBM identified the problem is ppc64le bugs in OpenBLASIBM did lots of dev work on 0.3.5 and made several patchesThese have been incorporated into 0.3.6We changed to OpenBLAS 0.3.6 and NumPy/SciPy seem to be fixed. Hooray!DO NOT USE un-patched OpenBLAS < 0.3.6 on POWER

22. Fixed...We now have TensorFlow 1.13.1 “foss” and “fosscuda” working on POWER9 via EasyBuild using OpenBLAS 0.3.6. Our new “2019a” stack.

23. A moment of balance…Alternative title: “The pain OpenBLAS have caused me”Alternative title 2: “It’s not just POWER”OpenBLAS 0.3.1 bug…https://github.com/eylenth/Openblas_matrix_issueAssertionError: Arrays are not equalHad to rebuild our entire 2018b stack on 5 platforms with OpenBLAS 0.3.5.AssertionError: Arrays are not equal AssertionError: Arrays are not equal

24. Building scientific software on POWER, what help is available?

25.

26. Questions