From Cluster Documentation Project
 Higher Level
- Sage is a free open-source mathematics software system licensed under the GPL. It combines the power of many existing open-source packages into a common Python-based interface. The Sage Mission is to create a viable free open source alternative to Magma, Maple, Mathematica and Matlab.
- NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Numpy is licensed under the BSD license, enabling reuse with few restrictions.
- R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
- Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, mostly written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing. More libraries continue to be added over time. Julia programs are organized around defining functions, and overloading them for different combinations of argument types (which can also be user-defined).
- Erlang Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging. Erlang's runtime system has built-in support for concurrency, distribution and fault tolerance.
- Haskell is an advanced purely-functional programming language. An open-source product of more than twenty years of cutting-edge research, it allows rapid development of robust, concise, correct software. With strong support for integration with other languages, built-in concurrency and parallelism, debuggers, profilers, rich libraries and an active community, Haskell makes it easier to produce flexible, maintainable, high-quality software.
 Compiler Enhancements
These enhancements are used with Fortran and C/C++ compilers.
- OpenMP is a standard for parallel programming on shared memory systems, continues to extend its reach beyond pure HPC to include embedded systems, multicore and real time systems. A new version is being developed that will include support for accelerators, error handling, thread affinity, tasking extensions and Fortran 2003. Note: OpenMP is not a cluster programming tool. It works for multi-core cluster nodes and is supported by virtually all compilers.
- OpenACC is an Application Program Interface (API) that describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator (e.g. GPUs), providing portability across operating systems, host CPUs and accelerators.
 Lower Level Parallel Programming Libraries
These are programming libraries that can be used with Fortran, C/C++, and Java.
- MPICH2 is a freely available, portable implementation of MPI, the Standard for message-passing libraries.
- MVAPICH2 enhanced MPICH2 version that delivers best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, 10GigE/iWARP and RoCE networking technologies.
- Open MPI is a project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available. Support runtime selection of interconnect.
- The Java Parallel Processing Framework is a suite of software libraries and tools providing convenient ways to parallelize CPU-intensive processing. It is written in the Java programming language and is platform independent.
- PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of Unix and/or Windows computers hooked together by a network to be used as a single large parallel computer.
 Profilers Analyzers
- Jumpshot is a Java-based visualization tool for doing postmortem performance MPICH2 analysis.
- mpiPis a lightweight profiling library for MPI applications. Because it only collects statistical information about MPI functions, mpiP generates considerably less overhead and much less data than tracing tools. All the information captured by mpiP is task-local. It only uses communication during report generation, typically at the end of the experiment, to merge results from all of the tasks into one output file.
- Open|SpeedShop is explicitly designed with usability in mind and is for application developers and computer scientists. The base functionality include, Sampling Experiments, Support for Callstack Analysis, Hardware Performance Counters, MPI Profiling and Tracing,I/O Profiling and Tracing, and Floating Point Exception Analysis. In addition, Open|SpeedShop is designed to be modular and extensible. It supports several levels of plug-ins which allow users to add their own performance experiments.
- AMD CodeAnalyst Performance Analyzer helps software developers to improve the performance of applications, drivers and system software. Well-tuned software delivers a better end-user experience through shorter response time, increased throughput and better resource utilization.
- IPM is a portable profiling infrastructure for parallel codes. It provides a low-overhead performance profile of the performance aspects and resource utilization in a parallel program. Communication, computation, and IO are the primary focus. While the design scope targets production computing in HPC centers, IPM has found use in application development, performance debugging and parallel computing education.
- HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the nation's largest supercomputers. By using statistical sampling of timers and hardware performance counters, HPCToolkit collects accurate measurements of a program's work, resource consumption, and inefficiency and attributes them to the full calling context in which they occur. HPCToolkit works with multilingual, fully optimized applications that are statically or dynamically linked. Since HPCToolkit uses sampling, measurement has low overhead (1-5%) and scales to large parallel systems. HPCToolkit's presentation tools enable rapid analysis of a program's execution costs, inefficiency, and scaling characteristics both within and across nodes of a parallel system. HPCToolkit supports measurement and analysis of serial codes, threaded codes (e.g. pthreads, OpenMP), MPI, and hybrid (MPI+threads) parallel codes.
 Parallel Debuggers
- Padb is a Job Inspection Tool for examining and debugging parallel programs, primarily it simplifies the process of gathering stack traces on compute clusters however it also supports a wide range of other functions. Padb supports a number of parallel environments and it works out-of-the-box on the majority of clusters. It's an open source, non-interactive, command line, script-able tool intended for use by programmers and system administrators alike.