Ignacio Laguna Peralta


Email: ilaguna@llnl.gov
Phone: 925-422-7308


I am a postdoctoral researcher at the Center for Applied Scientific Computing (CASC) at the Lawrence Livermore National Laboratory. My areas of research are fault tolerance, large-scale debugging and failure diagnosis.

I obtained my Ph.D. degree from Purdue University in December of 2012. In my dissertation, I proposed techniques to detect and diagnose anomalies in distributed applications such as HPC and cloud-computing applications. I explored several machine-learning approaches to isolate the origin of failures in these applications.

Personal Web Site

This personal Web site is not sanctioned or supported by DoE, UC, LLNL, or any other organization.

http://sites.google.com/site/researchlaguna/

Publications

Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, “Evaluating User-Level Fault Tolerance for MPI Applications”, EuroMPI/ASIA, Kyoto, Japan, Sep 9-12, 2014. LLNL-CONF-656877.

Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, and Todd Gamblin, "Accurate Application Progress Analysis for Large-Scale Parallel Debugging", ACM International Symposium on Programming Language Design and Implementation (PLDI), Edinburgh, UK, June 9-11, 2014. LLNL-CONF-646258.

Ignacio Laguna, Edgar A León, Martin Schulz, Mark Stephenson, "A study of application-level recovery methods for transient network faults", Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '13), held in conjunction with SC13, Denver, Colorado, Nov, 2013. LLNL-CONF-643269.

Dong H Ahn, Gregory L Lee, Ganesh Gopalakrishnan, Zvonimir Rakamarić, Martin Schulz, Ignacio Laguna, "Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset", 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SEHPCCSE 2013), held in conjunction with SC13, Denver, Colorado, Nov, 2013. LLNL-CONF-642354.

Ignacio Laguna, Martin Schulz, Jeff Keasler, David Richards, Jim Belak, "Optimal Placement of Retry-Based Fault Recovery Annotations in HPC Applications", short abstract and poster, SC13, Denver, Colorado, Nov, 2013. LLNL-ABS-641414.

Ignacio Laguna, Subrata Mitra, Fahad A Arshad, Nawanol Theera-Ampornpunt, Zongyang Zhu, Saurabh Bagchi, Samuel P Midkiff, Mike Kistler, Ahmed Gheith, "Automatic Problem Localization via Multi-dimensional Metric Profiling", 2013 IEEE 32nd International Symposium on Reliable Distributed Systems (SRDS), Braga, Portugal, Sep-Oct, 2013. LLNL-PROC-632265.

Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin, "Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications," International Conference on Parallel Architectures and Compilation Techniques (PACT 2012), Minneapolis, MN, Sep, 2012. LLNL-PROC-548642.

Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi and Bronis R. de Supinski, "Automatic Fault Characterization via Abnormality-Enhanced Classification," IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), Boston, Massachusetts, Jun, 2012. LLNL-CONF-545571.

Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Anh, Martin Schulz, Barry Rountree, "Large Scale Debugging of Parallel Tasks with AutomaDeD," ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), Seattle, WA, Nov 2011. LLNL-CONF-486911.

Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, and Martin Schulz, "Statistical Fault Detection for Parallel Applications with AutomaDeD," 6th IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE 2010), Stanford, CA, Mar 23-24, 2010.

Greg Bronevetsky, Ignacio Laguna, Surabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, “AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks,” IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), Chicago Illinois, Jun-Jul, 2010. LLNL-CONF-426270.

Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi, "How To Keep Your Head Above Water While Detecting Errors," ACM/IFIP/USENIX 10th International Middleware Conference (Middleware 2009), UIUC Illinois, Nov-Dec 2009.

Dong H. Ahn, Bronis R. de Supinski, Ignacio. Laguna, Greg L. Lee, Ben Liblit, Barton P. Miller, and Martin Schulz, "Scalable Temporal Order Analysis for Large Scale Debugging," ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC 2009), Portland, OR, Nov 2009. LLNL-PROC-412227.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, "Distributed Diagnosis of Failures in a Three Tier E-Commerce System," 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, Oct 2007.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, "Stateful Detection in High Throughput Distributed Systems," 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, Oct 2007.

Technical Reports

Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi, How To Keep Your Head Above Water While Detecting Errors, ECE Technical Reports, Purdue University, Apr, 2009.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, Distributed Diagnosis of Failures in a Three Tier E-Commerce System, ECE Technical Reports, Purdue University, May, 2007.

Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi, Stateful Detection in High Throughput Distributed Systems, ECE Technical Reports, Purdue University, May, 2007.

 

 

 

Posters / Short Abstracts

Scalable Detection of Anomalous Parallel Tasks with AutomaDeD, Conference of Dependable Systems and Networks (DSN), Boston, Jun, 2012.

Fault Detection and Diagnosis at Large Scale with AutomaDeD, Salishan Conference on High-Speed Computing, Gleneden Beach, Oregon, Apr 23-27, 2012.

Scalable Error Detection and Failure Prediction in Large-Scale Applications, Postdoc Research Symposium, Argonne National Laboratory, Chicago, Oct 27, 2011.

Stateful error detection in high throughput applications, Poster abstract at the ACM/IFIP/USENIX 10th International Middleware Conference, UIUC Illinois, Dec 2, 2009.

 

 

 

Theses

Ignacio Laguna, Probabilistic Error Detection and Diagnosis in Large-Scale Distributed Applications, Ph.D. Dissertation, Purdue University, Dec, 2012.

Ignacio Laguna, Online Error Detection for High Data Rate Distributed Applications, ECE Masters Thesis, Purdue University, Aug, 2008.

 

 

 

Professional Activities

Reviewer service: DSN 2009-2012, SRDS 2011, PACT 2012, PRDC 2010, PROPER '13, ISSRE 2014.