HCII Ph.D. Thesis Defense: Alex Cabrera

Open in new window

When
March 21, 2024 2:00pm - March 21, 2024 5:00pm

Description

Behavior-Driven AI Development

Thursday, March 21, 2:00 - 5:00PM ET

Room: GHC 6501

Zoom Link: https://cmu.zoom.us/j/95515298320?pwd=eC9IcWtXNTlNS0E1aE9QWlZKQlY5UT09

Meeting ID: 95515298320

Passcode: FzPJ5D

Committee:

Adam Perer (HCII, CMU, Co-chair)

Jason I. Hong (HCII, CMU, Co-chair)

Kenneth Holstein (HCII, CMU)

Ameet Talwalkar (MLD, CMU)

Aditya Parameswaran (EECS, UC Berkeley)

Abstract:

Massive AI systems that can be guided with text prompts or fine-tuned with small datasets have enabled millions of people to prototype complex AI products. But it remains a significant challenge to go from an initial prototype to a robust, deployable model that is equitable, safe, and works for most users and edge cases. This challenge is compounded by the increasingly complex tasks for which AI is used, such as text and image generation, which do not have clearly defined metrics or evaluation methods. As it becomes easier and faster to create candidate AI systems, the brunt of development work moves from getting a system working towards the design problem of which AI system should be built and how it should behave.

This thesis proposes an AI development philosophy called behavior-driven AI development (BDAI) that centers the AI development lifecycle on the desired behaviors of complex AI systems. By centering development on a model's desired behaviors instead of the training data and model architecture, developers can focus on creating responsible AI systems that best fulfill end-user needs.

In the first half of the thesis, I present a series of interviews, a theoretical framework, and a user study defining the core principles of BDAI. First, I describe qualitative interview studies with 27 practitioners investigating how they understand and improve behaviors of complex AI systems. Next, I introduce a theoretical framework that describes this process as a form of sensemaking and show how the framework can be used to describe and create AI development tools. I further show how insights into model behavior can be shown to end users to improve human-AI collaboration by calibrating end-users' reliance on model outputs.

In the second half of the thesis, I implement two systems that, combined, fulfill the requirements of the full sense-making process and BDAI workflow. I first introduce Zeno, an interactive platform that lets practitioners discover and validate behaviors across any AI system. I then describe Zeno Reports, a no-code tool built on Zeno for authoring interactive evaluation reports. Through case studies and real-world deployment, I show how AI analysis tools covering the sensemaking process can empower practitioners to develop more performant and equitable AI systems.

Document Link: https://drive.google.com/file/d/1xQNMvOKERZzYfXg0FfSi1KLEJIx6Vso-/view?usp=drive_link