Josep Domingo-Ferrer, Antoni Martínez-Ballesté, Josep Maria Mateo-Sanz, Francesc Sebé
The VLDB Journal 15 (4), 355-369
- Description: Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released while preserving the privacy of the underlying individuals. The principle of microaggregation is to aggregate orig- inal database records into small groups prior to pub- lication. Each group should contain at least k records to prevent disclosure of individual information, where k is a constant value preset by the data protector. Re- cently, microaggregation has been shown to be useful to achieve k-anonymity, in addition to it being a good masking method. Optimal microaggregation (with mini- mum within-groups variability loss) can be computed in polynomial time for univariate data. Unfortunately, for multivariate data it is an NP-hard problem. Several heu- ristic approaches to microaggregation have been pro- posed in the literature. Heuristics yielding groups with fixed size k tends to be more efficient, whereas data- oriented heuristics yielding variable group size tends to result in lower information loss. This paper presents new data-oriented heuristics which improve on the trade-off between computational complexity and information loss and are thus usable for large datasets.