Thesis Defense - Sarah Cen - Paths to AI Accountability

#talk #ai #society
Sarah H. Cen
Full title: Paths to AI Accountability: Making AI Fit for Humans Through Design, Incentives, and Evidence

Abstract: Artificial Intelligence (AI) has gained increasing sociopolitical significance over the past decade. In response, there are efforts devoted to understanding the implications of AI's progress and developing AI in a way that is "responsible," "ethical," and "safe." Within this broader context, this thesis studies how we can better integrate AI into a fundamentally human society. We focus on three particular avenues for making AI "fit" for humans. First, we examine ways to design responsible AI from the ground-up through a work on algorithmic fairness. Second, we explore the role of incentives in better aligning humans and algorithms through a game-theoretic model of trustworthy AI. Finally, we discuss the power of evidence in AI accountability through the lens of algorithmic auditing.

My notes

Just the talk

AI has become part of many decision making processes, both big and small:

The actual "explicit decisions" seem more scary, but the individually small "micro-decisions" such as social media feeds and recommendation systems are so prevalent as to perhaps have even bigger impact.

Cen outlines an approach to designing, measuring, and regulating these micro-decisions.

Design

  1. What are the effects of user strategization on recommendation systems?
  2. What makes a recommendation system trustworthy?

Cen considers a model in which platforms make recommendations to users based on what they learn through user behavior, passed through a public algorithm which takes a model of the user and determines what to recommend. Users decide on a behavior given access to this algorithm, which maximizes their long run value (not just what they like the best in this recommendation, but also taking into account how their actions affect the future recommendations).

She finds that:

  1. Strategization can make it so that platforms learn very little about the users' true utility functions. E.g. they cannot determine what the value will be of other algorithms, even ones similar to the deployed algorithm.
  2. Having a richer model class can hurt performance. This makes sense as it gives users more power to steer what the platform things of them.

She defines a notion of trustworthiness based on best response: an algorithm is trustworthy if it incentivizes users to naively/locally best respond, and if the value to the user is high. (This is similar to [2310.17651] High-Dimensional Prediction for Sequential Decision Making)

related:: machine learning with strategic interactions

Measurement

  1. How do you audit an algorithmic filtering system to see if it lives up to standards set by regulation?

Cen identifies one particular class of regulations: that individuals determined to be "similar" must have recommendations which are similar. Here, the similarity of recommendations is based on how much it causes someone to update their beliefs about the world.
This bias can be easily estimated/upper bounded by sampling similar individuals (these can be synthetically generated), sampling recommendations for these individuals, and seeing how this affects the "most gullible user" (i.e. we assume that this user has the most gullible type of belief update function).

Regulation
Didn't have time to go deep on this. But talked about how data driven rules are always only probabilistic: there will be exceptions; there's a need to provide people with a way to "be an exception," whatever that means

My thoughts

I think this question of how platform algorithms affect society at large is huge. Optimizing purely for short term engagement is clearly, and has proven to be, dangerous. Just as it's probably not the best idea for me to always cave to the "instant gratification monkey," it's probably not the best idea to hand society over to the "instant profit monkey." Instead, a healthy society requires planning ahead. Which means somehow propagating the long term incentives of a "healthy society" downwards to the constituents of said society.

Note

Of course, even with the incentives correctly propagated, there's still the question of how to solve the problem.

But I think at that point, it just becomes a learning problem. A harder learning problem than just maximizing clicks, but still a well defined learning problem, akin to what tech companies have shown themselves very capable of. That is, how can a platform learn the long term reward function of users?

But, there's still a definite possibility that this local reward optimization isn't sufficient. Maybe optimizing for individual reward can still leave us with an unstable/non-resilient society.

Example