Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory


Kathryn Mohror


Email: kathryn@llnl.gov
Phone: 925-423-2997


Kathryn Mohror is a computer scientist on the Scalability Team at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). Kathryn’s research on high-end computing systems is currently focused on scalable fault tolerant computing and I/O for extreme scale systems. Her other research interests include scalable performance analysis and tuning, and parallel programming paradigms. Kathryn has been working at LLNL since 2010.

Kathryn works closely with the Development Environment Group (DEG) in Livermore Computing (LC). Her current research focuses primarily on the Scalable Checkpoint/Restart Library (SCR), a multilevel checkpointing library that has been shown to significantly reduce checkpointing overhead. She also leads the Tools Working Group for the MPI Forum.

Kathryn received her Ph.D. in Computer Science in 2010, an M.S. in Computer Science in 2004, and a B.S. in Chemistry in 1999 from Portland State University (PSU) in Portland, OR.

Publications

Journals

  1. Tanzima Zerin Islam, Kathryn Mohror, Saurabh Bagchi, Adam Moody, Bronis R. de Supinski, Rudolf Eigenmann, "McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression," LLNL-CONF-554251, Scientific Programming, 21(3):149-163, 2013.
  2. Kathryn Mohror, Adam Moody, Greg Bronevetsky, Bronis R. de Supinski, "Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System," in Transactions on Parallel and Distributed Systems, LLNL-JRNL-564721, to appear.
  3. Kathryn Mohror and Karen L. Karavanic, "Trace Profiling: Scalable Event Tracing on High-End Parallel Systems," Parallel Computing, 38(4-5):194-225, April-May 2012.

Conferences

  1. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka, “A User-level Infiniband-based File System and Checkpoint Strategy for Burst Buffers,” LLNL-CONF-645876, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014), Chicago, IL, May 2014 (Acceptance rate: 19%).
  2. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka, "FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery," LLNL-CONF-645209, 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2014), Phoenix, AZ, May 2014 (Acceptance rate: 21%).
  3. Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs, “There Goes the Neighborhood: Performance Degradation Due to Nearby Jobs,” LLNL-CONF-635776, Proceedings of ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13), November 2013 (Acceptance rate: 20%).
  4. Matthias Weber, Kathryn Mohror, Martin Schulz, Bronis R. de Supinski, Holger Brunst, and Wolfgang E. Nagel, "Alignment-Based Metrics for Trace Comparison," LLNL-CONF-586852, Euro-Par 2013, Aachen, Germany, Aug. 26-30, 2013.
  5. Raghunath Raja Chandrasekar, Adam Moody, Kathryn Mohror, Dhabaleswar K. Panda, “A 1 PB/s File System to Checkpoint Three Million MPI Tasks,” LLNL-CONF-592884, International Symposium on High Performance Distributed Computing 2013, New York City, NY, June 2013 (Acceptance rate: 15%).
  6. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Design and Modeling of a Non-blocking Checkpointing System," LLNL-CONF-554431, Supercomputing 2012, Salt Lake City, UT, November 2012 (Acceptance rate: 21%).
  7. Tanzima Islam, Kathryn Mohror, Saurabh Bagchi, Adam Moody, Bronis R. de Supinski, and Rudolf Eigenmann, "mcrEngine: A Scalable Checkpointing System using Data-Aware Aggregation and Compression," LLNL-CONF-554251, Supercomputing 2012, Salt Lake City, UT, November 2012 (Best Student Paper Finalist, Acceptance rate: 21%).
  8. Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski, "Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System," LLNL-CONF-427742, Supercomputing 2010, New Orleans, LA, November 2010 (Acceptance rate: 20%).
  9. Kathryn Mohror and Karen L. Karavanic, “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,“ Supercomputing 2009, Portland OR, November 2009 (Acceptance rate: 23%).
  10. Rashawn Knapp, Kathryn Mohror, Thomas Conerly, Abraham Neben, Aaron Amauba, Karen L. Karavanic, and John May, "PerfTrack: Scalable Application Performance Diagnosis for Linux Clusters," LCI Conference 2007, South Lake Tahoe, CA, May 2007.
  11. Kathryn Mohror and Karen L. Karavanic, “Towards Scalable Event Tracing on High End Systems,” High Performance Computing Conference 2007, Houston, TX, September, 2007.
  12. Karen Karavanic, John May, Kathryn Mohror, Brian Miller, Kevin Huck, Rashawn Knapp, and Brian Pugh, “Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool,” Supercomputing 2005, Seattle WA, November 2005 (Acceptance rate: 24%).
  13. Kathryn Mohror and Karen L. Karavanic, "Performance Tool Support for MPI-2 on Linux," Supercomputing 2004, Pittsburgh PA, November 2004 (Acceptance rate: 31%).

Workshops

  1. Kathryn Mohror, Adam Moody, and Bronis R. de Supinski, “Asynchronous Checkpoint Migration with MRNet in the Scalable Checkpoint / Restart Library, “ LLNL-PROC-540391, FTXS’12, Boston MA, June 25, 2012.
  2. Dries Kimpe, Kathryn Mohror, Adam Moody, Brian Van Essen, Maya Gokhale, Kamil Iskra, Rob Ross, Bronis R. de Supinski, "Integrated In-System Storage Architecture for High Performance Computing," LLNL-CONF-557032, ROSS'12, Venice, Italy, June 29, 2012.
  3. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis de Supinski, Naoya Maruyama, and Satoshi Matsuoka, “Towards an Asynchronous Checkpointing System,” LLNL-CONF-509152, IPSJ SIG Technical Reports 2011-ARC-197 2011-HPC-132 (HOKKE-19), November 2011.
  4. Kathryn Mohror, Karen L. Karavanic, and Allan Snavely, “Scalable Event Trace Visualization,” Workshop on Productivity and Performance (PROPER 2009) at EuroPar 2009, Delft, The Netherlands, August 2009.
  5. John J. Hoffman, Andrew Byrd, Kathryn Mohror, and Karen L. Karavanic, “PPerfGrid: A Grid Services-based Tool for the Exchange of Heterogeneous Parallel Performance Data,” HIPS-HPGC 2005 Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models, in conjunction with IPDPS 2005, Denver CO, April 2005.

Posters

  1. Sagar Thapaliya, Purushotham Bangalore, Kathryn Mohror, Adam Moody, "Capturing I/O Dynamics in HPC Applications," Research Poster, PDSW 2014, Denver, CO, November 2013.
  2. Sagar Thapaliya, Adam Moody, Kathryn Mohror, and Purushotham Bangalore, "Inter-application Coordination for Reducing I/O Interference", LLNL-POST-641538, Supercomputing 2013, Denver, CO, November 2013.
  3. Matthias Weber, Kathryn Mohror, Martin Schulz, Holger Brunst, Bronis R. de Supinski, Wolfgang E. Nagel, "Structural Comparison of Parallel Applications", LLNL-POST-569232, Supercomputing 2013, Denver, CO, November 2013. (Best Poster Finalist)
  4. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Design and Modeling of a Non-Blocking Checkpoint System," LLNL-POST-552657, ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing, May 2012.
  5. Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Towards a Light-weight Non-blocking Checkpointing System," LLNL-POST-561176, HPC in Asia Workshop in conjunction with the 2012 International Supercomputing Conference (ISC'12), June 2012.
  6. Tanzima Z Islam, Kathryn Mohror, Adam Moody, Bronis de Supinski, Saurabh Bagchi, Rudolf Eigenmann, "Data-Aware Inter-Process Checkpoint Compression," Research Poster, LLNL-POST-461998, Supercomputing 2010, New Orleans, LA, November 2010.
  7. Kathryn Mohror and Karen L. Karavanic, “A Study of Tracing Overhead on a High-Performance Linux Cluster,” In Symposium on Principles and Practice of Parallel Programming (PPoPP’07), pages 158–159, 2007.
  8. Kathryn Mohror and Karen L. Karavanic, "Scalable Event-based Performance Measurement in High-End Environments", Research Poster, SIGMETRICS Student Workshop, SIGMETRICS'07, San Diego, CA, June 13, 2007.
  9. Kathryn Mohror and Karen L. Karavanic, "A Study of Tracing Overhead on a High-Performance Linux Cluster," Research Poster, PPoPP'07, March 15, 2007.
  10. Kathryn Mohror and Karen L. Karavanic, "Infrastructure for Performance Tuning LAM/MPI Applications," Research Poster, Richard Tapia Celebration of Diversity in Computing Conference, Atlanta, GA, October 2003.

Theses

  1. Kathryn Mohror, “Scalable Event Tracing on High-End Parallel Systems,” PhD thesis, Computer Science Department, Portland State University, 2010.
  2. Kathryn Mohror, "Infrastructure for Performance Tuning MPI Applications," Master’s thesis, Computer Science Department, Portland State University, 2003.
  3. Technical Reports
  4. Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski, "Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System," Lawrence Livermore National Laboratory Technical Report, LLNL-TR-440491, July 2010.
  5. Kathryn Mohror and Karen L. Karavanic, “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,” Portland State University, Computer Science Department Technical Report, TR-09-03, June 2009.
  6. D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang, “Performance Database Technology for SciDAC Applications,” SciDAC 2007.
  7. Kathryn Mohror and Karen L. Karavanic, “An Investigation of Tracing Overheads on High End Systems,” Portland State University, Computer Science Department Technical Report, TR-06-06, December, 2006.
  8. Kathryn Mohror, Kevin Huck, Karen Karavanic, John May, and Brian Miller, “PerfTrack: A Performance Database and Analysis Tool,” Lawrence Livermore National Laboratory Student Research Symposium, August 2004.
  9. Kathryn Mohror and Karen L. Karavanic, "Performance Tool Support for MPI-2 on Linux," Portland State University, Computer Science Department Technical Report, TR-04-03, April 2004.

Invited Talks

  1. “SCR: The Scalable Checkpoint / Restart Library,” UNM Computer Science Department Colloquium, LLNL-PRES-523694, Albuquerque, NM, January 26, 2012.
  2. “The Scalable Checkpoint / Restart Library (SCR): Updates and Future Directions,” NMC Ultrascale Systems Research Center Seminar, LLNL-PRES-523720, Los Alamos, NM, January 25, 2012.
  3. “The Scalable Checkpoint/Restart Library (SCR): Overview and Future Directions,” Paradyn Week, LLNL-PRES-482473, Madison, WI, May 2, 2011.
  4. “SCR: The Scalable Checkpoint/Restart Library,” Portland State University, Portland, OR, Computer Science Department Seminar, LLNL-PRES-471228, February 21, 2011.
  5. “Scalable Event Tracing on High-End Parallel Systems,” Schloss Dagstuhl, Germany, Dagstuhl Seminar on Program Development for Extreme-Scale Computing, May 3, 2010.
  6. “Scalable Event Tracing on High-End Parallel Systems,” Lawrence Livermore National Laboratory, Livermore, CA, October 9, 2009.
  7. “Scalable Event Tracing on High-End Parallel Systems,” Oak Ridge National Laboratory, Oak Ridge, TN, Computer Science and Mathematics Division Seminar, August 7, 2009.
  8. “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,” San Diego Supercomputing Center, San Diego, CA, Large Scale Systems Seminar, May 11, 2009.
  9. “The PerfTrack Tool for Performance Data Management,” Schloss Dagstuhl, Germany, Dagstuhl Seminar on Automatic Performance Analysis, December 13, 2005.
  10. “Enabling MPI-2 Support in Paradyn,” University of Wisconsin, Madison, WI, Paradyn Week, March 17, 2005.
  11. “Performance Tool Support for MPI-2 on Linux,” Lawrence Livermore National Laboratory, August 2004.