Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

07/27/2022
by   Tilman Räuker, et al.
10

The last decade of machine learning has seen drastic increases in scale and capabilities, and deep neural networks (DNNs) are increasingly being deployed across a wide range of domains. However, the inner workings of DNNs are generally difficult to understand, raising concerns about the safety of using these systems without a rigorous understanding of how they function. In this survey, we review literature on techniques for interpreting the inner components of DNNs, which we call "inner" interpretability methods. Specifically, we review methods for interpreting weights, neurons, subnetworks, and latent representations with a focus on how these techniques relate to the goal of designing safer, more trustworthy AI systems. We also highlight connections between interpretability and work in modularity, adversarial robustness, continual learning, network compression, and studying the human visual system. Finally, we discuss key challenges and argue for future work in interpretability for AI safety that focuses on diagnostics, benchmarking, and robustness.

READ FULL TEXT
research
09/12/2019

New Perspective of Interpretability of Deep Neural Networks

Deep neural networks (DNNs) are known as black-box models. In other word...
research
01/27/2023

Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean

Deep neural networks (DNNs) have proven to be highly effective in a vari...
research
07/11/2023

Scale Alone Does not Improve Mechanistic Interpretability in Vision Models

In light of the recent widespread adoption of AI systems, understanding ...
research
01/22/2021

i-Algebra: Towards Interactive Interpretability of Deep Neural Networks

Providing explanations for deep neural networks (DNNs) is essential for ...
research
10/09/2022

A Detailed Study of Interpretability of Deep Neural Network based Top Taggers

Recent developments in the methods of explainable AI (xAI) methods allow...
research
06/18/2021

Towards interpreting computer vision based on transformation invariant optimization

Interpreting how does deep neural networks (DNNs) make predictions is a ...
research
08/30/2022

Correct-by-Construction Runtime Enforcement in AI – A Survey

Runtime enforcement refers to the theories, techniques, and tools for en...

Please sign up or login with your details

Forgot password? Click here to reset