95-763 Privacy and Confidentiality: Models and Implementations
- 6 units
- Skills: Some enthusiasm for math and statistics is welcome.
Information organizations, such as national statistical agencies, research data repositories and medical records clearinghouses, collect data on individuals and firms. Often, at least a portion of this information is considered private or confidential, and is collected with promises that it will only be used for statistical purposes, and if it is disseminated, the privacy of the respondents will be protected. This course looks at practical methods for disclosure limitation for microdata (raw data about individuals or firms) and tables (aggregations of microdata). Methods such as data swapping, data shuffling, noise addition and synthetic data, among others, are considered. We also look at some analogues of these methods for geographical data. Finally, privacy preserving data mining is briefly surveyed.