CCTAILS: CCT Artificial Intelligence Lecture Series – Spring 23

Every first Wednesday of the month, at 3:00 pm Central Time

Some of today’s most visible and, indeed, remarkable achievements in artificial intelligence (AI) have come from advances in deep learning (DL). The formula for the success of DL has been compute power – artificial neural networks are a decades-old idea, but it was the use of powerful accelerators, mainly GPUs, that truly enabled DL to blossom into its current form.

As significant as the impacts of DL have been, there is a realization that current approaches are merely scratching the surface of what might be possible and that researchers could more rapidly conduct exploratory research on ever larger and more complex systems – if only more compute power could be effectively applied.

There are three emerging trends that, if properly harnessed, could enable such a boost in compute power applied to AI, thereby paving the way for major advances in AI capabilities. 

  • Optimization algorithms based on higher-order derivatives are well-established numerical methods, offering superior convergence characteristics and inherently exposing more opportunities for scalable parallel performance than first-order methods commonly applied today. Despite their potential advantages, these algorithms have not yet found their way into mainstream AI applications, as they require significantly more powerful computational resources and must manage significantly larger amounts of data.
  • High-performance computing (HPC) brings more compute power to bear via parallel programming techniques and large-scale hardware clusters and will be required to satisfy the resource requirements of higher-order methods. That DL is not currently taking advantage of HPC resources is not due to lack of imagination or lack of initiative in the community.  Rather, matching the needs of DL systems with the capabilities of HPC platforms presents significant challenges that can only be met by coordinated advances across multiple disciplines.
  • Hardware architecture advances continue apace, with diversification and specialization increasingly being seen as a critical mechanism for increased performance. Cyberinfrastructure (CI) and runtime systems that insulate users from hardware changes, coupled with tools that support performance evaluation and adaptive optimization of AI applications, are increasingly important to achieving high user productivity, code portability, and application performance.

The colloquium, hosted by CCT (Center for Computation and Technology) at LSU, collates experts in the fields of algorithmic theory, artificial intelligence (AI), and high-performance computing (HPC) and aims to transform research in the broader field of AI and Optimization. The first aspects of the colloquium are distributed AI frameworks, e.g. TensorFlow, PyTorch, Horovod, and Phylanx. Here, one challenge is the integration of accelerator devices and support of a wide variety of target architectures, since recent supercomputers are getting more inhomogeneous, having accelerator cards or solely CPUs. The framework should be easy to deploy and maintain and provide good portability and productivity. Here, some abstractions and a unified API to hide the zoo of accelerator devices from the users is important.

The second aspect are higher-order algorithms, e.g. second order methods or Bayesian optimization. These methods might result in a higher accuracy, but are more computationally intense. We will look into the theoretical and computational aspects of these methods.

______________________________________________________________________________

Confirmed Speakers

02/08/2023Dr. Yue YuLehigh University
03/01/2023Michael ShvartsmanMeta
05/03/2023Daniel M. TartakovskyStanford University

Registration

Registration for the colloquium is free. Please complete your registration here: registration form

Logistics

This semester, we will have both Zoom and in-person presentations. In-person presentations will also be available through Zoom. We will keep this page up-to-date with information regarding which presentations will take place at LSU/CCT.

Local organizers

  • Patrick Diehl
  • Katie Bailey
  • Hartmut Kaiser
  • Mayank Tyagi

For questions or comments regarding the colloquium, please contact Katie Bailey.

Talks

Speaker:   Dr. Yue Yu, Associate Professor of Applied Mathematics, Lehigh University

Date:    Wed, Feb 8 @ 3:00 pm        

Title:     Continuous Optimization for Learning Bayesian Networks

Abstract:     

Bayesian networks are directed probabilistic graphical models used to compactly model joint probability distributions of data. Automatic discovery of their directed acyclic graph (DAG) structure is important to causal inference tasks. However, learning a DAG from observed samples of an unknown joint distribution is generally a challenging combinatorial problem, owing to the intractable search space superexponential in the number of graph nodes. A recent breakthrough formulates the problem as a continuous optimization with a structural constraint that ensures acyclicity (NOTEARS, Zheng et al., 2018), which enables a suite of continuous optimization techniques to be used and employs an augmented Lagrangian method to apply the constraint.

In this talk, we take a step further to propose new continuous optimization algorithms and models aiming to improve NOTEARS on both efficiency and accuracy. We first show that the Karush-Kuhn-Tucker (KKT) optimality conditions for the NOTEARS formulation cannot be satisfied except in a trivial case, which explains a behavior of the slow convergence. We then derive the KKT conditions for an equivalent reformulation, show that they are indeed necessary, and relate them to explicit constraints that certain edges are absent from the graph.

Informed by the KKT conditions, a local search post-processing algorithm is proposed and shown to substantially and universally improve the learning accuracy, typically by a factor of 2 or more. Second, we consider a reformulation of the DAG space, and propose a new framework for DAG structure learning by searching in this equivalent set of DAGs.

A fast projection method is developed based on this continuous optimization approach without constraint. Experimental studies on benchmark datasets demonstrate that our method provides comparable accuracy but better efficiency, often by more than one order of magnitude. Last, we develop a variational autoencoder structure parameterized by a graph neural network architecture, which we coin DAG-GNN, to capture complex nonlinear mappings and data types. We demonstrate that the proposed method is capable of handling datasets with either continuous or discrete variables, and it learns more accurate graphs for nonlinearly generated samples.

 Bio:       Dr. Yu is currently an Associate Professor at Department of Mathematics, Lehigh University. She is also affiliated with the College of Health and the Institute for Data, Intelligent Systems, and Computation (I-DISC) at Lehigh.

Dr. Yu’s research concerns topics in the areas of numerical analysis, scientific computing and machine learning, where she works on the development of novel numerical tools for models with background in science, engineering and biomedicine. She is particularly interested in applying the mathematical analysis knowledges in the design and analysis of mathematical models and numerical schemes.

______________________________________________________________________________

Speaker:       Dr. Mike Shvartsman, Meta 

Date:         Wednesday, March 1 @ 3:00 pm CST

Title:      AEPsych: a platform for live human-in-the-loop experimentation

Abstract:   Human binary choice data (e.g. yes/no, better/worse) is widely used in the study of human perception and preferences. Notable examples include psychophysics (studying how the brain maps external stimuli to internal representations, in psychology), value-based decision making (studying how humans assign utility to items, in economics) and preference learning or optimization (uncovering human preferences or optimizing stimuli based on them, in machine learning). AEPsych (aepsych.org/) is a platform for model-based experimentation and active learning in such domains, built both for ML researchers to benchmark and test their models, and for experimentalists to integrate those models with real studies. It allows a predictive model, typically based on Gaussian Processes to be used during an experiment to adaptively select informative stimuli to sample via an acquisition function (similarly to Bayesian Optimization). I will describe the overall design of AEPsych, as well as a number of modeling and algorithmic advances we made as part of its development to enable orders-of-magnitude sample efficiency gains on real experiments performed with human participants.    

Bio:    Mike Shvartsman is a research scientist and manager at Meta Reality Labs Research, where his team works on novel applications of statistics and machine learning that enable the future of augmented and virtual reality. His current primary research interest is in sample-efficient modeling and optimization, with primary application focus on understanding the human mind and brain. Previously, Mike worked on models of decision making and high-dimensional neural data as a postdoc at Princeton, and computational psycholinguistics during his PhD at the University of Michigan.

______________________________________________________________________________

Speaker:         

Date:      

Title:      

Abstract:         

Bio:    

______________________________________________________________________________

Speaker:         

Date:       

Title:     

Abstract:        

Bio:       

______________________________________________________________________________

To access pages and speaker information from previous semesters, use the links below: