Six ways CIOs can stop process mining becoming a privacy issue

Process mining technologies help businesses uncover how their processes actually behave, unblocking bottlenecks, and identifying areas for process optimisation

On the surface, for a business embarking on its digital transformation journey, there doesn’t seem to be any particular worry for CIOs – only benefits. Digging deeper, however, it’s clear that to truly reap the rewards of your investment, privacy concerns must be addressed to protect people’s data privacy rights.

One example is event logs, the first key step in process mining. Event logs store information – for example in a logistics setting, this could be the person who initiates the activity, such as logs an order, the time stamp when the order was logged, or data like the size of an order or its destination. Businesses use event logs to improve their processes based on tangible data insights, rather than guesses and assumptions. However, the nature of the data means that event logs inevitably contain ways to identify personal data.

It’s the deidentification of this personally identifiable information that can help CIOs safeguard privacy rights. The two common methods of deidentification are anonymisation and pseudonymisation. Anonymisation provides the most stringent mechanism, permanently removing any direct identifiers of personal information – but the drawback is that it does impact the later use of process mining results. Pseudonymisation means that the processing of personal data cannot be attributed to data subjects without the use of additional information. However, it may still be possible to reidentify personal information through security attacks, or by an adversary familiar with the data set.

One thing’s for sure: CIOs can combat the threat. There are six key ways to ensure deidentification methods are successful while maintaining the integrity of process mining – and reaping its rewards:

Risk of re-identification

First, evaluate the risk of re-identification associated with analysis of the event data. After all, it’s better to be safe than sorry.

Clearly, if the event data contains personally identifiable or sensitive personal information, it must be anonymised and substituted with a replacement value. However, there still may be a possibility of reidentification based on combining event log attributes with other available data sources.

Mitigate the possibility of reidentification

Once the risk of re-identification has been assessed, it’s important to absolve any possibility that personal information can be found. This can be achieved with a data governance structure and policy.

This involves initially evaluating the intended uses and users of event logs collected for analysis and then determining the variables included before measuring their reidentification risk. Finally, it’s important to document results consistent with data privacy and security requirements.

Control the conditions when data is used

A key way to ensure the de-identification methods are successful is by specifying the way data can be used.

There are different “release” models associated with the secondary use of personal information: public, quasi-public, and non-public release models. Public release models should apply the most stringent de-identification protocols while quasi and non-public release models ought to include specific contractual provisions as to the confidentiality and terms of use.

Nature of variables

Evaluate the nature of the variables in event logs by asking a few questions: Do they contain sensitive data? Do they include indirect identifiers that may create risk of reidentification? Are there additional sources of publicly available data that may be linked to indirect identifiers in event logs? What is the likelihood that an adversary who may be familiar with the event logs would be able to reidentify data subjects?  

By asking more questions, CIOs are able to tick more boxes in their path to ensuring privacy is safeguarded.

Measure and identify reidentification risk

This step will depend on the context of event logs, the number of attributes that comprise event logs, and the number of similar attributes, referred to as equivalence classes. The fewer the equivalence classes, the higher the degree of probability of reidentification. In these instances, more rigorous deidentification measures need to be considered.

Record internal controls

It’s important to stay on top on privacy and that involves documenting internal controls that can protect these privacy rights.

For example, the General Data Protection Regulation imposes rigorous obligations on data controllers and processors to maintain a record of processing activities under its responsibility. Furthermore, organisations are subject to audit provisions and, upon request from supervisory authorities, must co-operate with the supervisory authority and make those records available.

Process mining and data privacy can co-exist

Process mining can provide your organisation with comprehensive insights about your processes and fuel your improvement initiatives. The ambition of responsible process mining is to achieve a balance between its utility and safeguarding privacy rights. Now more than ever, digital security is incredibly important, which means that businesses must do all they can to improve their cybersecurity efforts. Integrating privacy-enhancing technologies and best practices will create trust and confidence in their continued growth. It is in the best interest of the industry to adopt privacy enhancing practices. The goal is to gain new insights into processes, safely.


About the Author

Andrew Pery is Ethics Evangelist at ABBYY. ABBYY is a Digital Intelligence company. We provide a Digital Intelligence platform that enables organizations to gain a complete understanding of their business. The platform is designed to allow organizations to deploy solutions in standalone configurations or as a tightly integrated extension of industry-leading RPA, BPM and packaged application solutions.

Featured image: ©SeventyFourImages