Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, an online service provider (SP) can continuously record user-generated data and depend on various predictive models learned from the data to improve their services and revenue. SPs owning the large collection of user-generated data has raised privacy concerns. We present a privacy-preserving framework, SecureBoost, which allows users to submit encrypted or randomly masked data to SP who learn only prediction models but nothing else. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to simplify the design of privacy-preserving protocol. A Cryptographic Service Provider (CSP) is used to assist SP's processing, reducing the complexity of the protocol constructions while the leakage of information to CSP is limited. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. We have conducted extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. The result shows that SecureBoost efficiently learns high-quality boosting models from protected user-generated data.
READ FULL TEXT