-
Title
Computer Scientist -
Email
mohror1@llnl.gov -
Phone
(925) 423-2997 -
Organization
Not Available
Kathryn Mohror is a computer scientist in the Parallel Systems Group in the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). Kathryn serves as the Deputy Director for the Laboratory Directed Research & Development (LDRD) program at LLNL and as the ASCR Point of Contact for Computer Science at LLNL. Kathryn’s research on high-end computing systems is currently focused on I/O for extreme scale systems. Her other research interests include scalable performance analysis and tuning, fault tolerance, and parallel programming paradigms. Kathryn has been working at LLNL since 2010 and is a 2022 fellow in the Oppenheimer Science and Energy Leadership Program and a 2019 recipient of the DOE Early Career Award.
Kathryn's current research focuses primarily I/O performance and portability for HPC. She leads both the Unify project, developing scalable file system support for in-system storage on HPC systems through UnifyFS, and the Scalable Checkpoint/Restart Library (SCR) project, an R&D100 Award-winning multilevel checkpointing library that has been shown to significantly reduce checkpointing overhead. Additionally, she leads the IOPP project, funded by ASCR ECRP, that is working towards fundamental understanding of the needs of current and emerging HPC I/O workloads and developing support based on that understanding.
Kathryn holds several leadership roles in the HPC community. She is a Co-Chair of the Administrative Steering Committee for PMIx, a portable interface for tools and applications to interact with system management software. She was the General Chair of the IEEE Cluster 2021 conference, serves as the IEEE Cluster Steering Committee Secretary, and serves on numerous other steering, organizing, and review committees for HPC conferences and workshops. She was the Lead for the NNSA Software Technologies Portfolio for the U.S. Exascale Computing Project (ECP) (2019-2023), Co-Chair for the PMIx Administrative Steering Committee (2018-2022), Lead for the Tools Working Group for the MPI Forum (2013-2019), and served as the Scientific Editor for LLNL's Science & Technology Review in 2018.
Kathryn received her Ph.D. in Computer Science in 2010, an M.S. in Computer Science in 2004, and a B.S. in Chemistry in 1999 from Portland State University (PSU) in Portland, OR.
Journals
- Zhimin Li, Harshtiha Menon, Dan Maljovec, Yarden Livnat, Shusen Liu, Kathryn Mohror, Peer-Timo Bremer, and Valerio Pascucci, “SpotSDC: Revealing the Silent Data Corruption Propagation in High-performance Computing Systems,” LLNL-CONF-764021, IEEE Transactions on Visualization and Computer Graphics, 27(10):3938-3952, Oct. 2021.
- Nawrin Sultana, Martin Rufenacht, Anthony Skjellum, Purushotham Bangalore, Ignacio Laguna, Kathryn Mohror, “Understanding the Use of MPI in Exascale Proxy Applications,” LLNL-JRNL-766480, Concurrency and Computation: Practice and Experience, 33(14), July 2021.
- Lee Savoie, David K. Lowenthal, Bronis de Supinski, Kathryn Mohror, and Nikhil Jain, “Mitigating Inter-Job Interference via Process-Level Quality-of-Service,” LLNL-JRNL-813440, ACM Transactions on Parallel Computing, 8(1), April 2021.
- Bengisu Elis, Dai Yang, Olga Pearce, Kathryn Mohror, and Martin Schulz, “QMPI: A Next Generation MPI Profiling Interface for Modern HPC Platforms,” LLNL-JRNL-79789, Journal of Parallel Computing, vol.96, 2020.
- André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, and Marc-André Vef, “Ad Hoc File Systems for HPC,” LLNL-JRNL-779789, Journal of Computer Science and Technology, 35(1), 4-26, Jan. 2020.
- Marc-André Hermanns, Nathan T. Hjelm, Michael Knobloch, Kathryn Mohror, MartinSchulz, "The MPI_T Events Interface: An Early Evaluation and Overview of the Interface," LLNL-JRNL-765281, Journal of Parallel Computing, 85:119-130, 2019.
- Nawrin Sultana, Anthony Skjellum, Ignacio Laguna, Matthew Farmer, Kathryn Mohror, and Murali Emani, “Providing Failure Recovery for Bulk Synchronous Applications with MPI Stages,” LLNL-JRNL-759751, Parallel Computing, 84:1-14, May 2019.
- Sourav Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, Dhabaleswar Panda, Martin Schulz, Hari Subramoni, "EReinit: Scalable and Efficient Fault-Tolerance for Bulk-Synchronous MPI Applications," Concurrency and Computation: Practice and Experience, online, 32:e4863, 2020.
- Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Kathryn Mohror, and Howard Pritchard, "Evaluating and Extending User-Level Fault Tolerance in MPI", LLNL-JRNL-663434, in International Journal of High Performance Computing Applications, 30(3):305-319, 2016.
- Tanzima Zerin Islam, Kathryn Mohror, and Martin Schulz, “Exploring the MPI Tool Information Interface: Features and Capabilities,” LLNL-CONF-654091, in International Journal of High Performance Computing Applications, 30(2):212-222, 2016.
- Kathryn Mohror, Adam Moody, Greg Bronevetsky, Bronis R. de Supinski, "Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System," in Transactions on Parallel and Distributed Systems, LLNL-JRNL-564721, 25(9):2255-2263, Sept. 2014.
- Tanzima Zerin Islam, Kathryn Mohror, Saurabh Bagchi, Adam Moody, Bronis R. de Supinski, Rudolf Eigenmann, "McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression," LLNL-CONF-554251, Scientific Programming, 21(3):149-163, 2013.
- Kathryn Mohror and Karen L. Karavanic, "Trace Profiling: Scalable Event Tracing on High-End Parallel Systems," Parallel Computing, 38(4-5):194-225, April-May 2012.
Conferences
- Hariharan Devarajan, Kathryn Mohror, “Extracting and characterizing I/O behavior of HPC workloads,” LLNL-CONF-831041, IEEE Cluster Conference 2022 (Cluster), Heidelberg, Germany, September 2022.
- Fahim Chowdhury, Francesco Di Natele, Adam Moody, Kathryn Mohror, Weikuan Yu, “DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems,” LLNL-CONF-827797, International Parallel & Distributed Processing Symposium (IPDPS) 2022, Lyon, France, May 2022.
- Chen Wang, Kathryn Mohror, Marc Snir, “File System Semantics Requirements of HPC Applications,” LLNL-CONF-814852, ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’21), Stockholm, Sweden, June 2021.
- Zhimin Li, Harshitha Menon, Kathryn Mohror, Peer-Timo Bremer, Yarden Livant, Valerio Pascucci, “Understanding a Program’s Resiliency Through Error Propagation,” LLNL-CONF-764021, ACM SIGPLAN Annual Symposium Principles and Practice of Parallel Programming (PPoPP’21), Seoul, South Korea, February 2021.
- Arnab Kumar, Olaf Faaland, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, Ali R. Butt, “Understanding HPC Application Behavior Using System Level Statistics,” LLNL-CONF-812199, IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC’20), December 2020.
- Ignacio Laguna, Ryan Marshall, Kathryn Mohror, Martin Ruefenacht, Anthony Skjellum, and Nawrin Sultana, “A Large-Scale Study of MPI Usage in Open-Source HPC Applications,” LLNL-CONF-771757, Supercomputing 2019, Denver, CO, November 2019.
- Yue Zhu, Weikuan Yu, Bing Jiao, Kathryn Mohror, Adam Moody, Fahim Chowdhury, “Efficient User-Level Storage Disaggregation for Deep Learning,” LLNL-CONF-771643, IEEE Cluster Conference (Cluster) 2019, September 2019.
- Lee Savoie, David K. Lowenthal, Bronis de Supinski, Kathryn Mohror, and Nikhil Jain, “Mitigating Inter-Job Interference via Process-Level Quality-of-Service,” LLNL-CONF-787578, IEEE Cluster Conference (Cluster) 2019, September 2019.
- Fahim Chowdhury, Yue Zhu, Todd Heer, Saul Paredes, Adam Moody, Robin Goldstone, Kathryn Mohror, and Weikuan Yu, “I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning,” LLNL-CONF-764078, International Conference on Parallel Processing (ICPP) 2019, August 2019.
- Bogdan Nicolae, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, Franck Capello, “VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale,” LLNL-PROC-767059, IPDPS 2019, September 2019.
- Harshitha Menon, Michael O. Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeff Hittinger, “ADAPT: Algorithmic Differentiation for Floating-Point Precision Tuning,” LLNL-CONF-748742, Supercomputing 2018, Dallas, TX, November 2018.
- Yue Zhu, Fahim Chowdhury, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato and Weikuan Yu, “Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems,” LLNL-CONF-750269, IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018), Milwaukee, WI, September 2018.
- Nawrin Sultana, Anthony Skjellum, Ignacio Laguna, Matthew Farmrer, Kathryn Mohror, and Murali Emani, “MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications,” LLNL-CONF-748617, EuroMPI 2018, Barcelona, Spain, September 2018.
- Marc-Andre Hermanns, Nathan Thomas Hjelm, Michael Knobloch, Kathryn Mohror, and Martin Schulz, “Enabling callback-driven runtime introspection via MPI_T,” LLNL-CONF-751714, EuroMPI 2018, Barcelona, Spain, September 2018.
- Harshitha Menon and Kathryn Mohror, “DisCVar: Discovering Critical Variables Using Algorithmic Differentiation for Transient Faults,” LLNL-CONF-737739, Principles and Practice of Parallel Programming (PPoPP 2018), Vienna, Austria, February 2018.
- Teng Wang, Adam Moody, Yue Zhu, Kathryn Mohror, Kento Sato, Tanzima Islam and Weikuan Yu, “MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers,” LLNL-CONF-705913, International Parallel & Distributed Processing Symposium (IPDPS), Orlando, FL, June 2017.
- Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, Weikuan Yu, “An Ephemeral Burst-Buffer File System for Scientific Applications,“ LLNL-CONF-681480, Supercomputing 2016, Salt Lake City Utah, November 2016.
- Daniel Holmes, Kathryn Mohror, Ryan E. Grant, Anthony Skjellum, Martin Schulz, Wesley Bland, Jeffrey M. Squyres, “MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale,” LLNL-CONF-692709, EuroMPI 2016, Edinburgh, UK, September 2016.
- Soren Rasmussen, Martin Schulz, Kathryn Mohror, “Allowing MPI tools builders to forget about Fortran, “ LLNL-CONF-692718, Short Paper EuroMPI 2016, Edinburgh, UK, September 2016.
- Sagar Thapaliya, Purushotham Bangalore, Jay Lofstead, Kathryn Mohror, and Adam Moody, "Managing I/O Interference in a Shared Burst Buffer System," LLNL-CONF-685595, 2016 45th International Conference on Parallel Processing (ICPP), Philadelphia, PA, 2016.
- Matthias Weber, Ronny Brendel, Tobias Hilbrich, Kathryn Mohror, Martin Schulz, and Holger Brunst, “'Structural Clustering: A New Approach to Support Performance Analysis at Scale,” LLNL-CONF-669728, IPDPS 2016, Chicago, IL, May 2016.
- Lee Savoie, David K. Lowenthal, Bronis R. de Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz, “I/O Aware Power Shifting,” LLNL-CONF-669729, IPDPS 2016, Chicago, IL, May 2016.
- Tanzima Islam, Kathryn Mohror and Martin Schulz, “Exploring the Capabilities of the New MPI_T Interface,” LLNL-CONF-654091, EuroMPI/Asia 2014, Kyoto, Japan, September 2014.
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka, “A User-level Infiniband-based File System and Checkpoint Strategy for Burst Buffers,” LLNL-CONF-645876, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014), Chicago, IL, May 2014 (Acceptance rate: 19%).
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka, "FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery," LLNL-CONF-645209, 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2014), Phoenix, AZ, May 2014 (Acceptance rate: 21%).
- Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs, “There Goes the Neighborhood: Performance Degradation Due to Nearby Jobs,” LLNL-CONF-635776, Proceedings of ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13), November 2013 (Acceptance rate: 20%).
- Matthias Weber, Kathryn Mohror, Martin Schulz, Bronis R. de Supinski, Holger Brunst, and Wolfgang E. Nagel, "Alignment-Based Metrics for Trace Comparison," LLNL-CONF-586852, Euro-Par 2013, Aachen, Germany, Aug. 26-30, 2013.
- Raghunath Raja Chandrasekar, Adam Moody, Kathryn Mohror, Dhabaleswar K. Panda, “A 1 PB/s File System to Checkpoint Three Million MPI Tasks,” LLNL-CONF-592884, International Symposium on High Performance Distributed Computing 2013, New York City, NY, June 2013 (Acceptance rate: 15%).
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Design and Modeling of a Non-blocking Checkpointing System," LLNL-CONF-554431, Supercomputing 2012, Salt Lake City, UT, November 2012 (Acceptance rate: 21%).
- Tanzima Islam, Kathryn Mohror, Saurabh Bagchi, Adam Moody, Bronis R. de Supinski, and Rudolf Eigenmann, "mcrEngine: A Scalable Checkpointing System using Data-Aware Aggregation and Compression," LLNL-CONF-554251, Supercomputing 2012, Salt Lake City, UT, November 2012 (Best Student Paper Finalist, Acceptance rate: 21%).
- Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski, "Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System," LLNL-CONF-427742, Supercomputing 2010, New Orleans, LA, November 2010 (Acceptance rate: 20%).
- Kathryn Mohror and Karen L. Karavanic, “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,“ Supercomputing 2009, Portland OR, November 2009 (Acceptance rate: 23%).
- Rashawn Knapp, Kathryn Mohror, Thomas Conerly, Abraham Neben, Aaron Amauba, Karen L. Karavanic, and John May, "PerfTrack: Scalable Application Performance Diagnosis for Linux Clusters," LCI Conference 2007, South Lake Tahoe, CA, May 2007.
- Kathryn Mohror and Karen L. Karavanic, “Towards Scalable Event Tracing on High End Systems,” High Performance Computing Conference 2007, Houston, TX, September, 2007.
- Karen Karavanic, John May, Kathryn Mohror, Brian Miller, Kevin Huck, Rashawn Knapp, and Brian Pugh, “Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool,” Supercomputing 2005, Seattle WA, November 2005 (Acceptance rate: 24%).
- Kathryn Mohror and Karen L. Karavanic, "Performance Tool Support for MPI-2 on Linux," Supercomputing 2004, Pittsburgh PA, November 2004 (Acceptance rate: 31%).
Workshops
- Derek Schafer, Ignacio Laguna, Anthony Skjellum, Nawrin Sultana, Kathryn Mohror, “Extending the MPI Stages Model of Fault Tolerance,” LLNL-CONF-813590, ExaMPI Workshop 2020, Atlanta, GA, November 2020.
- Fahim Tahmid Chowdhury, Yue Zhu, Francesco Di Natale, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, Weikuan Yu, "Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems,” LLNL-CONF-813999, Parallel Data Systems Workshop (PDSW’20), Atlanta, GA, November 2020.
- Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, Elsa Gonsiorowski, “Recorder 2.0: Efficient Parallel I/O Tracing and Analysis,” LLNL-CONF-802986, The IEEE International Workshop on High-Performance Storage (HPS), May 2020.
- Tonmoy Dey, Kento Sato, Bogdan Nicolae, Jian Guo, Jens Domke, Weikuan Yu, Franck Cappello, Kathryn Mohror, “Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning,” LLNL-CONF-802941, The IEEE International Workshop on High-Performance Storage (HPS), May 2020.
- Nawrin Sultana, Anthony Skjellum, Puri Bangalore, Ignacio Laguna, Kathryn Mohror, “Understanding the Usage of MPI in Exascale Proxy Applications,” LLNL-CONF-761321, Workshop on Exascale MPI (ExaMPI), November 2018.
- Yue Zhu, Teng Wang, Kathryn Mohror, Adam Moody, Muhib Khan, and Weikuan Yu, “Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support,” LLNL-CONF-744728, Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2018), June 2018.
- Harshitha Menon, Kathryn Mohror, Chun-Kai Chang, and Mattan Erez, “Towards Understanding the SDC Impact of Variables in HPC Applications,” LLNL-CONF-744963, The 14th IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE 2018), April 2018.
- Lee Savoie, David Lowenthal, Bronis de Supinski, and Kathryn Mohror, “A Study of Network Quality of Service in Many-Core MPI Applications,” LLNL-CONF-745381, 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018), May 25, 2018.
- Ivo Jimenez, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Adam Moody, Remzi Arpaci-Dusseau, Andrea Arpaci-Dusseau, “PopperCI: Automated Reproducibility Validation,” 2017 IEEE INFOCOM International Workshop on Computer and Networking Experimental Research Using Testbeds, Atlanta, GA, May 2017.
- Ivo Jimenez, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Adam Moody, Remzi Arpaci-Dusseau, Andrea Arpaci-Dusseau, “Characterizing and Reducing Cross-Platform Performance Variability Using OS-level Virtualization,” LLNL-TR-670295, Workshop on Variability in Parallel and Distributed Systems (VarSys’16), Chicago, IL, May 2016.
- Ivo Jimenez, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Adam Moody, Remzi Arpaci-Dusseau, Andrea Arpaci-Dusseau, “Tackling the Reproducibility Problem in Storage Systems Research with Declarative Experiment Specifications,” LLNL-CONF-669866, Parallel Data Storage Workshop (PDSW15), Austin, TX, November 2015.
- Aiman Fang, Ignacio Laguna, Kento Sato, Tanzima Islam, Kathryn Mohror, “Fault Tolerance Assistant (FTA): An Exception Handling Approach for MPI Programs,” LLNL-ABS-676900, Workshop on Exascale MPI Hot Topics Submission (ExaMPI15), Austin, TX, November 2015.
- Ivo Jimenez, Carlos Maltzahn, Jay Lofstead, Adam Moody, Kathryn Mohror, Remzi Arpaci-Dusseau, and Andrea Arpaci-Dusseau, “The Role of Container Technology in Reproducible Computer Systems Research,” LLNL-CONF-669847, IEEE International Workshop on Container Technologies and Container Clouds (WoC’15), Tempe, AZ, March 2015.
- Laust Brock-Nannestad, John DelSignore, Jeffrey M. Squyres, Sven Karlsson, Kathryn Mohror, “MPI Debugging with Handle Introspection,” LLNL-CONF-660001, Workshop on Exascale MPI 2014 (ExaMPI14), New Orleans, LA, November 2014.
- Sagar Thapaliya, Purushotham Bangalore, Jay Lofstead, Kathryn Mohror, and Adam Moody, “IO-Cop: Managing Concurrent Accesses to Shared Parallel File System,” LLNL-CONF-653703, IASDS 2014, Minneapolis, MN, September 2014.
- Kathryn Mohror, Adam Moody, and Bronis R. de Supinski, “Asynchronous Checkpoint Migration with MRNet in the Scalable Checkpoint / Restart Library, “ LLNL-PROC-540391, FTXS’12, Boston MA, June 25, 2012.
- Dries Kimpe, Kathryn Mohror, Adam Moody, Brian Van Essen, Maya Gokhale, Kamil Iskra, Rob Ross, Bronis R. de Supinski, "Integrated In-System Storage Architecture for High Performance Computing," LLNL-CONF-557032, ROSS'12, Venice, Italy, June 29, 2012.
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis de Supinski, Naoya Maruyama, and Satoshi Matsuoka, “Towards an Asynchronous Checkpointing System,” LLNL-CONF-509152, IPSJ SIG Technical Reports 2011-ARC-197 2011-HPC-132 (HOKKE-19), November 2011.
- Kathryn Mohror, Karen L. Karavanic, and Allan Snavely, “Scalable Event Trace Visualization,” Workshop on Productivity and Performance (PROPER 2009) at EuroPar 2009, Delft, The Netherlands, August 2009.
- John J. Hoffman, Andrew Byrd, Kathryn Mohror, and Karen L. Karavanic, “PPerfGrid: A Grid Services-based Tool for the Exchange of Heterogeneous Parallel Performance Data,” HIPS-HPGC 2005 Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models, in conjunction with IPDPS 2005, Denver CO, April 2005.
Posters
- Izzet Yildrim, Hariharan Devarajan, Anthony Kougas, Xian-He Sun, Kathryn Mohror, “A Multifaceted Approach to Automated I/O Bottleneck Detection for HPC Workloads,” LLNL-POST-838770, SC’22, Dallas, TX, November, 2022.
- Bengisu Elis, Martin Schulz, Martin Ruefenacht, Anthony Skjellum, Olga Pearce, Kathryn Mohror, “MPI Tools the Easy Way,” LLNL-POST-807339, Research Poster, International Supercomputing Conference (ISC), Frankfurt, Germany (virtual meeting), June 2020.
- Tonmoy Dey, Kento Sato, Jian Guo, Bogdan Nicolae, Jens Domke, Weikuan Yu, Franck Cappello, and Kathryn Mohror, “Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning,” SC’19, Denver, CO, November 2019.
- Fahim Tahmid Chowdhury, Francesco Di Natale, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, and Weikuan Yu, “Understanding I/O Behavior in Scientific Workflows on High Performance Computing Systems,” SC’19, Denver, CO, November 2019.
- Arnab K. Paul, Olaf Faaland, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, and Ali R. Butt, “Understanding HPC Application I/O Behavior Using System Level Statistics,” SC’19, Denver, CO, November 2019.
- Zhimin Li, Harshitha Menon, Yarden Livnat, Kathryn Mohror, Valerio Pascucci, “SpotSDC: an Information Visualization System to Analyze Silent Data Corruption,” LLNL-POST-755344, SC’18, Dallas, TX, November 2018.
- Yue Zhu, Fahim Chowdry, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato, Weikuan Yu, “Multi-Client DeepIO for Large-Scale Deep Learning on HPC Systems,” LLNL-ABS-755590, SC’18, Dallas, TX, November 2018.
- Bogdan Nicolae, Franck Cappello, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror , “VeloC: Very Low Overhead Checkpointing System,” LLNL-POST-761255, SC’18, Dallas, TX, November 2018.
- Nawrin Sultana, Shane Farmer, Anthony Skjellum, Ignacio Laguna, Kathryn Mohror, Murali Emani, “Designing a Reinitializable and Fault Tolerant MPI Library,” LLNL-POST-734163, EuroMPI/USA 2017, Chicago, IL, September 2017.
- Teng Wang, Kathryn Mohror, Adam Moody, Weikuan Yu, “BurstFS: A Distributed Burst Buffer File System for Scientific Applications,“ LLNL-POST-675584, Supercomputing 2015, Austin, TX, November 2015.
- Laust Brock-Nannestad, John DelSignore, Jeffrey M. Squyres, Sven Karlsson, and Kathryn Mohror, “Exposing MPI Objects for Debugging,” LLNL-POST-658417, Supercomputing 2014, New Orleans, LA, November 2014.
- Xiang Ni, Tanzima Islam, Kathryn Mohror, Adam Moody, and Laxmikant Kale, “Lossy Compression for Checkpointing: Fallible or Feasible?” LLNL-POST-658374, Supercomputing 2014, New Orleans, LA, November 2014.
- Sagar Thapaliya, Purushotham Bangalore, Kathryn Mohror, Adam Moody, "Capturing I/O Dynamics in HPC Applications," Research Poster, PDSW 2014, Denver, CO, November 2013.
- Sagar Thapaliya, Adam Moody, Kathryn Mohror, and Purushotham Bangalore, "Inter-application Coordination for Reducing I/O Interference", LLNL-POST-641538, Supercomputing 2013, Denver, CO, November 2013.
- Matthias Weber, Kathryn Mohror, Martin Schulz, Holger Brunst, Bronis R. de Supinski, Wolfgang E. Nagel, "Structural Comparison of Parallel Applications", LLNL-POST-569232, Supercomputing 2013, Denver, CO, November 2013. (Best Poster Finalist)
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Design and Modeling of a Non-Blocking Checkpoint System," LLNL-POST-552657, ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing, May 2012.
- Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, and Satoshi Matsuoka, "Towards a Light-weight Non-blocking Checkpointing System," LLNL-POST-561176, HPC in Asia Workshop in conjunction with the 2012 International Supercomputing Conference (ISC'12), June 2012.
- Tanzima Z Islam, Kathryn Mohror, Adam Moody, Bronis de Supinski, Saurabh Bagchi, Rudolf Eigenmann, "Data-Aware Inter-Process Checkpoint Compression," Research Poster, LLNL-POST-461998, Supercomputing 2010, New Orleans, LA, November 2010.
- Kathryn Mohror and Karen L. Karavanic, “A Study of Tracing Overhead on a High-Performance Linux Cluster,” In Symposium on Principles and Practice of Parallel Programming (PPoPP’07), pages 158–159, 2007.
- Kathryn Mohror and Karen L. Karavanic, "Scalable Event-based Performance Measurement in High-End Environments", Research Poster, SIGMETRICS Student Workshop, SIGMETRICS'07, San Diego, CA, June 13, 2007.
- Kathryn Mohror and Karen L. Karavanic, "A Study of Tracing Overhead on a High-Performance Linux Cluster," Research Poster, PPoPP'07, March 15, 2007.
- Kathryn Mohror and Karen L. Karavanic, "Infrastructure for Performance Tuning LAM/MPI Applications," Research Poster, Richard Tapia Celebration of Diversity in Computing Conference, Atlanta, GA, October 2003.
Theses
- Kathryn Mohror, “Scalable Event Tracing on High-End Parallel Systems,” PhD thesis, Computer Science Department, Portland State University, 2010.
- Kathryn Mohror, "Infrastructure for Performance Tuning MPI Applications," Master’s thesis, Computer Science Department, Portland State University, 2003.
Technical Reports and Other Documents
- Wang, Chen; Snir, Marc; Mohror, Kathryn (2020). High Performance Computing Application I/O Traces. In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. LLNL-MI-811381. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0Z899X4
- Ivo Jimenez, Carlos Maltzahn, Adam Moody, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau, “I Aver: Providing Declarative Experiment Specifications Facilitates the Evaluation of Computer Systems Research,” LLNL-ABS-684863, Tiny Transactions on Computer Science (TinyToCS), Vol 4, March 2016.
- Ivo Jimenez, Carlos Maltzahn, Adam Moody, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau, “Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications,” UC Santa Cruz Technical Report UCSC-SOE-15-07, LLNL-TR-670295, May 2015.
- Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski, "Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System," Lawrence Livermore National Laboratory Technical Report, LLNL-TR-440491, July 2010.
- Kathryn Mohror and Karen L. Karavanic, “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,” Portland State University, Computer Science Department Technical Report, TR-09-03, June 2009.
- D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang, “Performance Database Technology for SciDAC Applications,” SciDAC 2007.
- "MPI PERUSE: An MPI Extension for Revealing Unexposed Implementation Information," Version 2.0, Not ratified - draft version, 2006.
- Kathryn Mohror and Karen L. Karavanic, “An Investigation of Tracing Overheads on High End Systems,” Portland State University, Computer Science Department Technical Report, TR-06-06, December, 2006.
- Kathryn Mohror, Kevin Huck, Karen Karavanic, John May, and Brian Miller, “PerfTrack: A Performance Database and Analysis Tool,” Lawrence Livermore National Laboratory Student Research Symposium, August 2004.
- Kathryn Mohror and Karen L. Karavanic, "Performance Tool Support for MPI-2 on Linux," Portland State University, Computer Science Department Technical Report, TR-04-03, April 2004.
Invited Talks
- “A storage system fit for Exascale,” Salishan Conference on High Speed Computing, April 2022.
- “SCR to the rescue!” SuperCheck 2021 Workshop Plenary, Virtual Meeting, February 2021.
- “Accelerating your I/O with UnifyFS,” IODC’20 Workshop, Virtual Meeting, June 25, 2020.
- “UnifyFS: A file system for burst buffers,” HPC Knowledge Meeting’20 (HPCKP’20), Virtual Meeting, June 18, 2020.
- “I/O on Hierarchical Storage Systems: The Past, Present, and Future”, University of Tsukuba, Tsukuba, Japan, Feb 12, 2020.
- “The evolution of tool support in MPI,” ExaMPI 2017 Keynote Talk, Denver, CO, November 12, 2017.
- “Performance Portable Checkpoint/Restart with VeloC & UnifyCR,” ISC 2017, Frankfurt, Germany, June 21, 2017.
- “Getting Insider Information via the New MPI Tools Information Interface,” EuroMPI 2016 Keynote Talk, Edinburgh, Scotland, September 26, 2016.
- “Fault Tolerance for High Performance Computing: Can burst buffers save the day?” Florida State University Computer Science Department Colloquium, Tallahassee, FL, March 31, 2016.
- “Fault Tolerance for High Performance Computing: Is the Sky Falling?” UO Computer Science Department Colloquium, Eugene, OR, December 5, 2015.
- “SCR: The Scalable Checkpoint / Restart Library,” UNM Computer Science Department Colloquium, LLNL-PRES-523694, Albuquerque, NM, January 26, 2012.
- “The Scalable Checkpoint / Restart Library (SCR): Updates and Future Directions,” NMC Ultrascale Systems Research Center Seminar, LLNL-PRES-523720, Los Alamos, NM, January 25, 2012.
- “The Scalable Checkpoint/Restart Library (SCR): Overview and Future Directions,” Paradyn Week, LLNL-PRES-482473, Madison, WI, May 2, 2011.
- “SCR: The Scalable Checkpoint/Restart Library,” Portland State University, Portland, OR, Computer Science Department Seminar, LLNL-PRES-471228, February 21, 2011.
- “Scalable Event Tracing on High-End Parallel Systems,” Schloss Dagstuhl, Germany, Dagstuhl Seminar on Program Development for Extreme-Scale Computing, May 3, 2010.
- “Scalable Event Tracing on High-End Parallel Systems,” Lawrence Livermore National Laboratory, Livermore, CA, October 9, 2009.
- “Scalable Event Tracing on High-End Parallel Systems,” Oak Ridge National Laboratory, Oak Ridge, TN, Computer Science and Mathematics Division Seminar, August 7, 2009.
- “Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis,” San Diego Supercomputing Center, San Diego, CA, Large Scale Systems Seminar, May 11, 2009.
- “The PerfTrack Tool for Performance Data Management,” Schloss Dagstuhl, Germany, Dagstuhl Seminar on Automatic Performance Analysis, December 13, 2005.
- “Enabling MPI-2 Support in Paradyn,” University of Wisconsin, Madison, WI, Paradyn Week, March 17, 2005.
- “Performance Tool Support for MPI-2 on Linux,” Lawrence Livermore National Laboratory, August 2004.