UCSD CSE AI Seminar, Fall 2023

Seminar & Graduate Course, CSE, UCSD, 2023

Time: Mondays, 12:30PM to 1:30PM. Room: CSE 1202. Zoom Streaming: https://ucsd.zoom.us/j/92652773524.

Overview

UCSD CSE AI Seminar focuses on discussing the state-of-the-art methods in AI-related fields. We invite distinguished researchers to talk about their most recent works. For graduate students who want to earn course credits by attending this seminar, please enroll into CSE 259.

Talk Schedule

Recording Note: Please download the recording video for the full length. Dropbox website will only show you the first one hour.

WeekDateTalk InfoSpeakerAffiliation
110/02Build an Ecosystem, Not a Monolith [slides] [recording]Colin RaffelUniversity of Toronto and Hugging Face
210/09Language agents: a critical evolutionary step of AI [slides] [recording]Yu SuOhio State University
310/16Learning from Weak Supervision: Label Noise or Bias? [slides] [recording]Jingbo ShangUC San Diego
410/23Why Does Sharpness-Aware Minimization Generalize Better Than SGD? [slides] [recording]Quanquan GuUCLA
510/30Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents [slides] [recording]Sergey LevineUC Berkeley
611/06Scaling Data-Constrained Language Model [slides] [recording]Alexander RushCornell
711/13Reflex or Reflect: When Do Language Tasks Need Slow Reasoning? [slides] [recording]Xiang RenUSC
811/20Possible Impossibilities and Impossible Possibilities [slides] [recording]Yejin ChoiUniversity of Washington
911/27Something Old, Something New: Machine Learning at the End of History [slides] [recording]Taylor Berg-KickpatrickUC San Diego
1012/04Large Language Models for Complex Query Retrieval [slides] [recording]Leon BergenUC San Diego

Week 1: Build an Ecosystem, Not a Monolith

Abstract

Currently, the preeminent paradigm for building artificial intelligence is the development of large, general-purpose models that aim to be able to perform all tasks at (super)human level. In this talk, I will argue that an ecosystem of specialist models would likely be dramatically more efficient and could be significantly more effective. Such an ecosystem could be built collaboratively by a distributed community and be continually expanded and improved. In this talk, I will outline some of the technical challenges involved in creating model ecosystems, including automatically selecting which models to use for a particular task, merging models to combine their capabilities, and efficiently communicating changes to a model. Bio: Colin Raffel is an associate professor at the University of Toronto and a faculty researcher at Hugging Face. His work aims to make it easy to get computers to do new things.

Speaker Bio

Colin Raffel is an associate professor at the University of Toronto and a faculty researcher at Hugging Face. His work aims to make it easy to get computers to do new things.

Week 2: Language agents: a critical evolutionary step of AI

Abstract

A heated ongoing discussion in AI is language agents that, powered by large language models (LLMs), can follow instructions to autonomously carry out complex tasks in real-world environments. However, the concept of agent has been introduced into AI for decades, so what’s different this time around? I argue that a fundamental change is the capability of using language as a vehicle for both thought and communication, a trait that was unique to humans. Language was immensely critical for the evolution of biological intelligence, and now artificial intelligence seems to be following a similar evolutionary path.

In this talk, I will first describe a possible conceptual framework for language agents and touch on important topics such as memory, tool use, grounding, and reasoning. I will then briefly introduce several of our recent work on language agents, including 1) Mind2Web, that aims to develop generalist language agents for real-world website, 2) LLM-Planner, that leverages LLMs for robot planning to interact with physical environments, and 3) Pangu, a generic neurosymbolic framework for developing language agents for different environments, which features a symbolic agent and a neural LM working in a concerted fashion.

Speaker Bio

Yu Su is a Distinguished Assistant Professor of Engineering at the Ohio State University. He obtained his Ph.D. from University of California, Santa Barbara and his bachelor’s degree from Tsinghua University. He co-directs the OSU NLP group and serves in leadership roles in multiple national AI institutes. He has broad interests in developing artificial intelligence, with a primary interest in the role of language as a vehicle of thought and communication. His research has received Outstanding Paper Awards from ACL and COLING and his work at Microsoft has led to a new conversational interface for Microsoft Outlook.

Week 3: Learning from Weak Supervision: Label Noise or Bias?

Abstract

Recent advances in weakly supervised text classification mostly focus on designing sophisticated methods to turn high-level human heuristics into quality pseudo-labels. In this talk, we will first review a few popular methods to generate and select high-quality pseudo-labels, and then we will show that the simplest method based on seed matching, was greatly underestimated. We will show that the limited performance of seed matching is largely due to the label bias injected by the simple seed-match rule, which prevents the classifier from learning reliable confidence for selecting high-quality pseudo-labels. Interestingly, simply deleting the seed words present in the matched input texts can mitigate the label bias and help learn better confidence. Subsequently, the performance achieved by seed matching can be improved significantly, making it on par with or even better than the state-of-the-art. Furthermore, to handle the case when the seed words are not made known, we propose to simply delete the word tokens in the input text randomly with a high deletion ratio. Remarkably, seed matching equipped with this random deletion method can often achieve even better performance than that with seed deletion.

Speaker Bio

Jingbo Shang is an Assistant Professor at the Computer Science and Engineering Department and Halicioglu Data Science Institute at the University of California, San Diego. His research focuses on data mining, natural language processing, and machine learning methods with minimum human effort and their applications. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2019 and his B.E. from Shanghai Jiao Tong University in 2014. His research has been recognized by many prestigious awards, including SIGKDD Dissertation Award Runner-up in 2020, Google Research Scholar in 2021, and NSF CAREER award in 2023.

Week 4: Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Abstract

The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. In this talk, I will show why SAM generalizes better than Stochastic Gradient Descent (SGD) for certain data model and two-layer convolutional ReLU networks. Our theoretical analysis explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, which enables the learning of weak features more effectively. Experiments on both synthetic and real data sets corroborate our theory.

This talk is based on joint work with Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen and Cho-Jui Hsieh.

Speaker Bio

Quanquan Gu is an Associate Professor of Computer Science at UCLA. His research is in the area of artificial intelligence and machine learning, with a focus on developing and analyzing nonconvex optimization algorithms for machine learning to understand large-scale, dynamic, complex, and heterogeneous data and building the theoretical foundations of deep learning and reinforcement learning. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Simons Berkeley Research Fellowship among other industrial research awards.

Week 5: Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents

Abstract

How can we build autonomous agents that benefit from large-scale (pre-)training while still exhibiting emergent and flexible behavior? In this talk, I will discuss recent innovations in terms of algorithms, models, and large scale applications that can bring us closer to uniting the power of large datasets with the flexibility of reinforcement learning, to make it possible to train systems that can generalize as broadly as large-scale supervised systems while solving new problems in new ways. I will discuss the algorithmic foundations of offline reinforcement learning, discuss applications in robotics, language modeling, and image generation, and present some recent large-scale robotic learning efforts that provide us with a preview of what such future autonomous systems might look like.

Speaker Bio

Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.

Week 6: Scaling and Aligning Open Language Models

Abstract

I will discuss two recent works related to the development of practical open-source Language Models. 1) Scaling - Extrapolating scaling trends suggest that training dataset size for LLMs may soon be limited by the amount of text data available on the internet. We run a set of empirical experiments varying the extent of data repetition and compute budget, and we propose an empirical scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. 2) Aligning - Model distillation has proven a promising technique for improving small open-source language models, but the resulting models are unwieldy to use in conversational settings. We show a simple method for “aligning” language models to user intents through distilled direct preference alignment.

Speaker Bio

Alexander “Sasha” Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His interest is in language modeling, specifically in controllable text generation, efficient inference, and robustness. He has written several popular open-source projects including tools for NLP, puzzles to learn deep learning, and virtual conference software. His projects have received awards at NLP, visualization, and hardware conferences, an NSF Career Award, and Sloan Fellowship. Recently he helped launch COLM, a new conference on language modeling.

Week 7: Reflex or Reflect: When Do Language Tasks Need Slow Reasoning?

Abstract

Large language models, such as GPT-3, excel at generating reflexive responses that mimic human-like language, but they fall short when it comes to complex reasoning that requires slower thinking, deeper reflection and a nuanced interpretation of language. This talk will share two lines of efforts in approaching the above problem. In the first part, I will introduce RICA and RobustRL, two benchmarks that expose language models to logical robustness challenges in language inference. The second part presents our exploration on transferring the Chain-of-Thoughts ability to smaller language models while enhancing model’s logical consistency. We show that a smaller, distilled LM can yield dramatically better task accuracy and rationale-prediction consistency.

Speaker Bio

Xiang Ren is an associate professor and Viterbi Early Career Chair at the USC Computer Science Department, a Research Team Leader at USC ISI, and the director of the Intelligence and Knowledge Discovery (INK) Lab at USC. Priorly, he spent time as a research scholar at the Stanford University and received his Ph.D. in Computer Science from the University of Illinois Urbana-Champaign. Ren’s research seeks to build generalizable natural language processing (NLP) systems which can handle a wide variety of language tasks and situations. He works on new algorithms and datasets to make NLP systems cheaper to develop and maintain, arm machine models with common sense, and improve model’s transparency and reliability to build user trust. His research work has received several best paper awards in top NLP and AI conference venues. Ren has been awarded a NSF CAREER Award, multiple faculty research awards from Google, Facebook, Amazon, JP Morgan and Sony, and the 2018 ACM SIGKDD Doctoral Dissertation Award.

Week 8: Possible Impossibilities and Impossible Possibilities

Abstract

In this talk, I will question if there can be possible impossibilities of large language models (i.e., the fundamental limits of transformers, if any) and the impossible possibilities of language models (i.e., seemingly impossible alternative paths beyond scale, if at all).

Speaker Bio

Yejin Choi is Wissner-Slivka Professor and a MacArthur Fellow at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. She is also a senior director at AI2 overseeing the project Mosaic and a Distinguished Research Fellow at the Institute for Ethics in AI at the University of Oxford. Her research investigates if (and how) AI systems can learn commonsense knowledge and reasoning, if machines can (and should) learn moral reasoning, and various other problems in NLP, AI, and Vision including neuro-symbolic integration, language grounding with vision and interactions, and AI for social good. She is a co-recipient of 2 Test of Time Awards (at ACL 2021 and ICCV 2021), 7 Best/Outstanding Paper Awards (at ACL 2023, NAACL 2022, ICML 2022, NeurIPS 2021, AAAI 2019, and ICCV 2013), the Borg Early Career Award (BECA) in 2018, the inaugural Alexa Prize Challenge in 2017, and IEEE AI’s 10 to Watch in 2016.

Week 9: Something Old, Something New: Machine Learning at the End of History

Abstract

In this talk I’ll present two very disparate pieces of work: First, a set of comprehensive empirical results exploring the in-context learning capabilities of non-transformer architectures, including both more classical architectures like RNNs and CNNs as well as recently proposed attention-free alternatives to the transformer. I’ll argue that these results point the way towards “infinite context” in-context learning, allowing for learning from massive data in the forward pass alone. Second, I’ll describe an application of deep learning to a long standing historical problem: revealing the printers of secretly printed revolutionary pamphlets from the English Civil War by visually matching uniquely damaged pieces of movable type. Together, I will use these studies to venture a response to two questions: Are transformers the final model? Are chatbots the final challenge?

Speaker Bio

Taylor Berg-Kirkpatrick is an Associate Professor in the department of Computer Science and Engineering at UC San Diego. Previously, Taylor was an Assistant Professor in the Language Technologies Institute at Carnegie Mellon University. Before this, he was a Research Scientist at Semantic Machines after completing his Ph.D. in Computer Science at the University of California Berkeley in Fall 2015. He is the recipient of an NSF CAREER award, a Google Faculty Research Award, and an Amazon Research Award. Taylor’s lab focuses on using machine learning to understand structured human data, including language but also sources like music, document images, and other complex artifacts.

Week 10: Large Language Models for Complex Query Retrieval

Abstract

This talk discusses recent work on information retrieval for complex queries, addressing the limitations of keyword or BERT-based search methods. We begin by identifying the challenges in retrieving relevant information in scenarios such as clinical trial selection or scientific research identification. The focus is on how to effectively integrate large language models into information retrieval systems in a scalable way. We examine several stages of the information retrieval pipeline: data annotation, data generation, and document reranking. The talk will discuss two strategies — information regularization and task decomposition — that improve the performance of language models for complex queries. I will discuss the implications of large language models for information retrieval, emphasizing their potential to transform how we query vast data sets.

Speaker Bio

Leon Bergen is an Associate Professor in the department of Linguistics at University of California, San Diego, and associate director of the Laboratory for Emerging Intelligence. He was previously a research scientist at X and Google, and did a postdoc at Stanford. He received his Ph.D. from MIT in Brain and Cognitive Sciences in 2015.