Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information
We describe a simple and effective method (Spectral Attribute removaL; SAL) to remove guarded information from neural representations. Our method uses singular value decomposition and eigenvalue decomposition to project the input representations into directions with reduced covariance with the guarded information rather than maximal covariance as normally these factorization methods are used. We begin with linear information removal and proceed to generalize our algorithm to the case of nonlinear information removal through the use of kernels. Our experiments demonstrate that our algorithm retains better main task performance after removing the guarded information compared to previous methods. In addition, our experiments demonstrate that we need a relatively small amount of guarded attribute data to remove information about these attributes, which lowers the exposure to such possibly sensitive data and fits better low-resource scenarios.
READ FULL TEXT