Overview
Whereas data processing refers to administrative or technical data management practices, in the analysis phase data becomes information that is relevant for political decision-making. Different automated data mining methods serve different purposes and are governed by their own specific rules. Large datasets are used both to identify links between already known individuals or organizations as well as to “search for traces of activity by individuals who may not yet be known but who surface in the course of an investigation, or to identify patterns of activity that might indicate a threat.” For example, contact chaining is one common method used for target discovery: “Starting from a seed selector (perhaps obtained from HUMINT), by looking at the people whom the seed communicates with, and the people they in turn communicate with (the 2-out neighbourhood from the seed), the analyst begins a painstaking process of assembling information about a terrorist cell or network.”
Many intelligence agencies embrace new analytical tools to cope with the information overload challenge in our digitally connected societies. For example, pattern analysis and anomaly detection increasingly rely on self-learning algorithms, commonly referred to as artificial intelligence (AI). AI is expected to be particularly useful for signals intelligence (SIGINT) agencies due to the vast and rapidly expanding datasets at their disposal. However, the risks and benefits generally associated with AI also challenge existing oversight methods and legal safeguards; they also push legislators as well as oversight practitioners to creatively engage with AI as a dual-use technology. Conversely, malicious use of AI creates new security threats that must be mitigated.
Relevant Aspects
What types of data use are permissible in a given legal framework, and are there specific rules for different forms of data use? For example, clear procedures for each type of use, specifying the circumstances under which that specific use is permitted.
There should also be independent oversight (internal and external) of bulk data analysis techniques, including rules and safeguards as concerns the use of AI. How is the level of privacy intrusion of specific data-analysis tools measured? And what kind of material is fed into query-focused databases?
How is the convergence of different databases/ data sources regulated? For example, may bulk communications data be matched with other stored data (such as location data, data gathered via sensors or in hacking operations) or publicly available data? If so, does such enrichment of material happen automatically? What safeguards apply if data is repurposed?
Intelligence oversight has struggled to effectively review the data analysis methods that intelligence agencies use. The increasing importance of AI is just the most recent development in this regard, with potentially far-reaching implications for how intelligence analysis is conducted. Even in an early, experimental stage, the risk of abuse and errors in using AI may already be having real-life impacts. Oversight has to make sure it keeps abreast of developments in data analysis. Additional resources and control instruments are definitely needed for oversight bodies to ensure accountability of such AI-driven surveillance operations.