Skip to content

Staged Processing #1

@adewes

Description

@adewes

Motivation

Many privacy-enhancing transformations require multiple stages. Generalizing attributes for example requires us to define a generalization hierarchy first. Then, in a second step we can apply this hierarchy to the data items. This requires us to process items in stages.

Examples:

  • Generalization hierarchy:

    • Stage 1:
      • Analyze value distribution in items.
    • Stage 2:
      • Generalize items with the given distribution.
  • k-Anonymity:

    • Stage 1:
      • Analyze attribute frequencies.

Implementation Proposal

To enable such staged processing, we plan to make the following additions to the Kodex stream processing mechanisms:

  • Add a numerical stage attribute to the Config model.
  • Add a Batch model that stores information about the processing of a given stage for a number of items.
  • Add an internal buffering mechanism (using internal channels) that enables us to buffer items for multi-stage processing.
  • Make the group store functionality currently implemented in the anonymization/aggregation action available to all actions as a means to perform distributed, parallel computation on data items.
  • Change the scheduler to enable staged processing of data items.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions