Options for encoding names for data linking at the Australian Bureau of Statistics

02/22/2018
by   Chris Culnane, et al.
0

Publicly, ABS has said it would use a cryptographic hash function to convert names collected in the 2016 Census of Population and Housing into an unrecognisable value in a way that is not reversible. In 2016, the ABS engaged the University of Melbourne to provide expert advice on cryptographic hash functions to meet this objective. For complex unit-record level data, including Census data, auxiliary data can be often be used to link individual records, even without names. This is the basis of ABS's existing bronze linking. This means that records can probably be re-identified without the encoded name anyway. Protection against re-identification depends on good processes within ABS. The undertaking on the encoding of names should therefore be considered in the full context of auxiliary data and ABS processes. There are several reasonable interpretations: 1. That the encoding cannot be reversed except with a secret key held by ABS. This is the property achieved by encryption (Option 1), if properly implemented; 2. That the encoding, taken alone without auxiliary data, cannot be reversed to a single value. This is the property achieved by lossy encoding (Option 2), if properly implemented; 3. That the encoding doesn't make re-identification easier, or increase the number of records that can be re-identified, except with a secret key held by ABS. This is the property achieved by HMAC-based linkage key derivation using subsets of attributes (Option 3), if properly implemented. We explain and compare the privacy and accuracy guarantees of five possible approaches. Options 4 and 5 investigate more sophisticated options for future data linking. We also explain how some commonly-advocated techniques can be reversed, and hence should not be used.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset