Ignacio Laguna Peralta


Email: ilaguna@llnl.gov
Phone: 925-422-7308


I am a Computer Scientist at the Center for Applied Scientific Computing (CASC) at the Lawrence Livermore National Laboratory. My main area of research is high-performance computing (HPC).

Specific research interests:

  • Fault tolerance, resilience, analysis of errors/failures data
  • Software reliability, parallel debugging
  • Compiler hacking (LLVM), static analysis
  • Scalable machine learning, statistical analysis

Personal Web Site

This personal Web site is not sanctioned or supported by DOE, UC, LLNL, or any other organization.

http://sites.google.com/site/researchlaguna/

Publications

Ignacio Laguna, Martin Schulz, David F. Richards, Jon Calhoun, Luke Olson, "IPAS: Intelligent Protection Against Silent Output Corruption in Scientific Applications", accepted in the 14th IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Barcelona, March 12-18, 2016. LLNL-CONF-669696 .

Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Kathryn Mohror, and Howard Pritchard, "Evaluating and Extending User-Level Fault Tolerance in MPI", accepted to appear in the International Journal of High Performance Computing Applications (IJHPCA). LLNL-JRNL-663434 .

Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Todd Gamblin, Gregory L. Lee, Martin Schulz, Saurabh Bagchi, Milind Kulkarni, Bowen Zhou, Zhezhe Chen, Feng Qin, "Debugging High-Performance Computing Applications at Massive Scales", Communications of the ACM (CACM), September, 2015. (URL) LLNL-JRNL-652400 .

Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz, "Clock Delta Compression for Scalable Order-Replay of Non-Deterministic Parallel Applications", ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC '15), Austin, Texas, Nov, 2015. LLNL-CONF- 669878.

Joachim Protze, Ignacio Laguna, Dong H. Ahn, John DelSignore, Ariel Burton, Martin Schulz, and Matthias S. Muller, "Lessons Learned from Implementing OMPD: a Debugging Interface for OpenMP", 11th International Workshop on OpenMP (IWOMP), Aachen, Germany, October 1-2, 2015. LLNL- CONF-671193 .

Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, “Evaluating User-Level Fault Tolerance for MPI Applications”, EuroMPI/ASIA, Kyoto, Japan, Sep 9-12, 2014. LLNL-CONF-656877. One of the best EuroMPI papers invited to a special edition of International Journal of High Performance Computing Applications (IJHPCA).

Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, and Todd Gamblin, "Accurate Application Progress Analysis for Large-Scale Parallel Debugging", ACM International Symposium on Programming Language Design and Implementation (PLDI), Edinburgh, UK, June 9-11, 2014. LLNL-CONF-646258.

Ignacio Laguna, Edgar A León, Martin Schulz, Mark Stephenson, "A study of application-level recovery methods for transient network faults", Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '13), held in conjunction with SC13, Denver, Colorado, Nov, 2013. LLNL-CONF-643269.

Dong H Ahn, Gregory L Lee, Ganesh Gopalakrishnan, Zvonimir Rakamarić, Martin Schulz, Ignacio Laguna, "Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset", 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SEHPCCSE 2013), held in conjunction with SC13, Denver, Colorado, Nov, 2013. LLNL-CONF-642354.

Ignacio Laguna, Subrata Mitra, Fahad A Arshad, Nawanol Theera-Ampornpunt, Zongyang Zhu, Saurabh Bagchi, Samuel P Midkiff, Mike Kistler, Ahmed Gheith, "Automatic Problem Localization via Multi-dimensional Metric Profiling", 2013 IEEE 32nd International Symposium on Reliable Distributed Systems (SRDS), Braga, Portugal, Sep-Oct, 2013. LLNL-PROC-632265.

Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin, "Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications," International Conference on Parallel Architectures and Compilation Techniques (PACT 2012), Minneapolis, MN, Sep, 2012. LLNL-PROC-548642.

Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi and Bronis R. de Supinski, "Automatic Fault Characterization via Abnormality-Enhanced Classification," IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), Boston, Massachusetts, Jun, 2012. LLNL-CONF-545571.

Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Anh, Martin Schulz, Barry Rountree, "Large Scale Debugging of Parallel Tasks with AutomaDeD," ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), Seattle, WA, Nov 2011. LLNL-CONF-486911.

Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, and Martin Schulz, "Statistical Fault Detection for Parallel Applications with AutomaDeD," 6th IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE 2010), Stanford, CA, Mar 23-24, 2010.

Greg Bronevetsky, Ignacio Laguna, Surabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, “AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks,” IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), Chicago Illinois, Jun-Jul, 2010. LLNL-CONF-426270.

Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi, "How To Keep Your Head Above Water While Detecting Errors," ACM/IFIP/USENIX 10th International Middleware Conference (Middleware 2009), UIUC Illinois, Nov-Dec 2009.

Dong H. Ahn, Bronis R. de Supinski, Ignacio. Laguna, Greg L. Lee, Ben Liblit, Barton P. Miller, and Martin Schulz, "Scalable Temporal Order Analysis for Large Scale Debugging," ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC 2009), Portland, OR, Nov 2009. LLNL-PROC-412227.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, "Distributed Diagnosis of Failures in a Three Tier E-Commerce System," 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, Oct 2007.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, "Stateful Detection in High Throughput Distributed Systems," 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, Oct 2007.