Options for encoding names for data linking at the Australian Bureau of Statistics

02/22/2018
by   Chris Culnane, et al.
0

Publicly, ABS has said it would use a cryptographic hash function to convert names collected in the 2016 Census of Population and Housing into an unrecognisable value in a way that is not reversible. In 2016, the ABS engaged the University of Melbourne to provide expert advice on cryptographic hash functions to meet this objective. For complex unit-record level data, including Census data, auxiliary data can be often be used to link individual records, even without names. This is the basis of ABS's existing bronze linking. This means that records can probably be re-identified without the encoded name anyway. Protection against re-identification depends on good processes within ABS. The undertaking on the encoding of names should therefore be considered in the full context of auxiliary data and ABS processes. There are several reasonable interpretations: 1. That the encoding cannot be reversed except with a secret key held by ABS. This is the property achieved by encryption (Option 1), if properly implemented; 2. That the encoding, taken alone without auxiliary data, cannot be reversed to a single value. This is the property achieved by lossy encoding (Option 2), if properly implemented; 3. That the encoding doesn't make re-identification easier, or increase the number of records that can be re-identified, except with a secret key held by ABS. This is the property achieved by HMAC-based linkage key derivation using subsets of attributes (Option 3), if properly implemented. We explain and compare the privacy and accuracy guarantees of five possible approaches. Options 4 and 5 investigate more sophisticated options for future data linking. We also explain how some commonly-advocated techniques can be reversed, and hence should not be used.

READ FULL TEXT

page 34

page 35

page 36

research
12/13/2016

Application of Advanced Record Linkage Techniques for Complex Population Reconstruction

Record linkage is the process of identifying records that refer to the s...
research
03/27/2022

Privacy-preserving record linkage using local sensitive hash and private set intersection

The amount of data stored in data repositories increases every year. Thi...
research
08/08/2023

The Still Secret Ballot: The Limited Privacy Cost of Transparent Election Results

After an election, should election officials release an electronic recor...
research
05/30/2019

Proof-of-forgery for hash-based signatures

In the present work, a peculiar property of hash-based signatures allowi...
research
02/15/2023

Best Arm Identification for Stochastic Rising Bandits

Stochastic Rising Bandits is a setting in which the values of the expect...
research
10/03/2021

Generating and Managing Strong Passwords using Hotel Mnemonic

Weak passwords and availability of supercomputers to password crackers m...

Please sign up or login with your details

Forgot password? Click here to reset