The September 20th press release for PerfMiner is available here.
Minimal Metrics has just launched PerfMiner and is booking pre-orders. Inquiries are welcome!
Minimal Metrics has partnered with Sandia National Laboratory to build a performance tool for tomorrow’s Exascale-class supercomputers – and to help inform their design.
The system has its roots in a prototype first developed by Philip and colleagues at the Parallel Dator Centrum at the Royal Institute of Technology in Stockholm Sweden. It is a set of software designed to optimize the entire HPC software and hardware ecosystem of an institution. It is capable of analyzing individual HPC applications and their threads of execution as well as entire workloads, groups, users and multiple disjoint systems.
It accomplishes this by integrating the best-of-breed dashboarding and visualization methodologies with state-of-the-art performance data collection. detailed performance metrics from the underlying architecture, including memory bandwidth, memory hierarchy behavior and latencies, vectorization, hardware resource utilization, computational intensity and instruction mix are provided. The system is able to identify issues of on and off-node scaling, including message passing performance, load-imbalance, false-sharing, and coherency operations. Through its architecture specific metrics, it is able identify applications and code amendable to acceleration, be those GPU’s, FPGA’s or many-core systems such as Intel’s Knights Landing.
The system treats performance as system-health issue, providing complete drill-down accountability of the entire, site-wide workload down to individual application threads. It is completely transparent; it works on applications written in any language and does not require any modifications to those applications by the user. The system is also highly efficient and daemon free – it does not require any additional processes or threads to perform measurement. Advanced monitoring methods allow for near-zero overhead performance data collection, even for applications with hundreds of millions of threads.
We’re not the only people working on this problem – and many of them are our friends. But a few are our competitors and as such, we can’t say much more at this time. Plus, we don’t want to spoil the surprise. Watch this space for more!
If your organization is interested in using this system or collaborating on its development, please drop us an email.
Phil, Rahul and Tony will be attending the Supercomputing 2015 show in Austin, TX from Nov 16 to the 19th. Please drop us a line in touch if you’d like to meet. We’ll be presenting at the StartupHPC event, and in the booths of Scalable Informatics and the Innovative Computing Laboratory at the University of Tennessee/NICS. A formal schedule to follow.
See you there!
At Minimal Metrics, we had a fabulous time writing code with our interns this summer. We had 17 applicants from 6 schools. The final two hires were:
- Leigh Stauffer: Bachelors in CS and Fine Art at Washington and Lee University
- Keerthi Nallani: MS in CS at University of New Mexico
Leigh and Keerthi did a number of things for us this summer, in addition to giving us two nice people to blame when our code didn’t work:
- Python scripting to process output from performance tools, libpfm and pmu-tools.
- Making a simple Juana’s Pagodas Volleyball web application out of MongoDB, Node.js, Express and Bootstrap.
- Designing real time performance data displays using Python, JSON and the amazing D3 toolkit.
While we didn’t get through everything we hoped, we all learned a ton in the process. We wish them all the best of luck in their future endeavors!
Here at Minimal Metrics our customers are often our friends – and we have lots of them around the world. Among our favorite people to work with are the brilliant people over at Reservoir Labs, makers of R-Scope among other neat bits of technology. Reservoir also has extensive compiler development expertise. When tasked with optimizing the SPEC CPU benchmarks for a brand new 64-bit multicore (non-Intel) processor, they reached out to the Minimal Metrics crew for guidance. The SPEC benchmarks are tricky animals, solely because one cannot modify the source code to improve its performance; all improvements have to be done by the compiler! On top of that, the source code is ugly, grossly inefficient and poorly documented. In many ways, the SPEC really are the most representative benchmarks in the industry because of those three facts alone. Nevertheless, in order to improve generated code, one still has to understand why the processor is performing the way it is and whether or not a faster sequence of instructions is possible. After that, one needs to follow the transformations performed by the compiler and understand exactly why the code was generated the way it was. It’s a difficult and expensive process – and just one of the major hurdles new processor manufacturers must get over in order to be successful.
Over the course of 3 months, Minimal Metrics worked with the engineers at Reservoir as well as the vendor and identified numerous large opportunities for improvement, most of which were implemented in the compiler. In one particular instance, we found that the presence of a single (prefetch) instruction improved the performance of one of the SPEC codes by 50%. An uncommon result to be sure, but you never know what you find until you look. Another successful engagement and another happy customer.
Performance analysis and understanding is what we do at Minimal Metrics. Contact us to find out how we can help.
In the previous engagement, Minimal Metrics studied and successfully accounted for the performance differences between compilers of multi-dimensional stencil computations on Intel’s Jake Town and Ivy Town architectures. In that particular case, the Cray and Intel compilers were used and the work was primarily performed on Volta, the Cray CX30m. This machine is just one of the Advanced System Technology Test Beds present in the National Nuclear Security Agencies (NNSA) Advanced Simulation and Computing Project. These machines represent small sections of the design space on the path to an exascale computer, meaning a machine capable of a billion, billion (or 10 to the 18th power) floating point operations per second.
For this new engagement, Minimal Metrics will be working closely with the test bed team to do performance studies of codes being developed to run on these (and tomorrow’s exascale) machines. The data gathered is intended not only to help guide performant software design at the lab but also to provide both qualitative and quantitative feedback to the vendors as the architectures mature. As part of this effort, Minimal Metrics will be prototyping an hardware performance instrumentation infrastructure for one of the new run-time systems being developed for Exascale. These systems, like HPX and Qthreads, provide a lightweight abstraction of parallelism more suited to systems of this size. By integrating formal hardware performance measurement at the API level, performance metrics can be more naturally explained and understood by the code developer. The end result is that the developers write better code without the burden of understanding every level of the software stack and how thread-level parallelism is implemented. If all this sounds like magic, then you might want to read up on Coroutines or watch this video on Google’s Go language for some background.
We are super excited to continue our involvement with Sandia and the ASC team.
Philip Mucci from Minimal Metrics will be attending the SC2014 show in New Orleans, LA from Nov 17 to the 20th. Get in touch with us if you’d like to schedule a meeting. If not, look for Phil in the booths of Texas Instruments, Scalable Informatics or the University of Tennessee.