Langsrud, Ø. (2025)
Model Formulas for Table Specification: Adapting a Proven Regression Syntax to SDC Software,
Poster at Expert Meeting on Statistical Data Confidentiality, 15 - 17 October 2025, Barcelona Spain.


View the poster: https://langsrud.com/stat/A0_poster_Barcelona_Oyvind_Langsrud_2025.pdf


See the R code in the poster.

See also the R packages on CRAN: GaussSuppression, SmallCountRounding, SSBtools.


ABSTRACT

Within official statistics, multiple tables are often produced from the same microdata, and a hierarchical structure is commonly involved. When performing tabular cell suppression, all relevant tables should be handled simultaneously. The same holds true for certain perturbation methods, where simultaneous handling helps ensure consistency and additivity.

Through the R package SSBtools, a table specification interface has been introduced that uses the formula syntax widely applied in regression modeling. This approach supports the simultaneous handling of tables. Rather than requiring explicit hierarchy specifications, this interface assumes that all categorical codes appear directly in the microdata variables. For example, one variable might contain municipality codes, and another might contain county codes. From traditional hierarchy specifications, it is possible to derive the necessary additional variables that can be appended to the microdata. Special cases, such as a municipality that does not belong to any county, are handled with missing values in the relevant variable(s).

This formula-based interface has proved very useful in modern statistical production workflows, whether relying solely on R or combining R with Python. In many cases, statistics producers can handle SDC by directly using the formula-based interface in the R packages GaussSuppression and SmallCountRounding. Both packages depend on SSBtools. In other cases, SDC experts can develop specialized functions for statistics producers, incorporating functionality from those packages while still providing a formula-based interface. Both GaussSuppression and SmallCountRounding also include a traditional hierarchy-based interface, and it is even possible to combine hierarchy specifications with the formula syntax. However, the purely formula-based approach has demonstrated significant practical benefits. Accordingly, this poster focuses on the use of the formula interface in GaussSuppression and SmallCountRounding.


© 2025 [Øyvind Langsrud / Statistics Norway]. This work is licensed under a Creative Commons Attribution (CC BY) license. You are free to share and adapt this material, provided appropriate credit is given.


Home page