Blog
Reveal(x) Detects Data Leaks from Employee Use of ChatGPT
ExtraHop
May 18, 2023
Ever since OpenAI released ChatGPT on November 30, 2022, use of AI as a service (AIaaS), including generative AI tools, has skyrocketed. Within five days of launching, ChatGPT exceeded 1 million users. Two months later, it had 100 million users. By March, it boasted 1 billion. By comparison, it took TikTok eight years to reach 1 billion users.
It’s no wonder these tools have been, in large part, such a runaway success, what with the productivity improvements they promise (and deliver) for professionals across a range of occupations, from software development to pharmaceutical research.
And even though these tools are not well understood, many large enterprises that see the enormous potential of generative AI and AIaaS are committing significant resources to use it. According to Gartner® Research, the Gartner 2023 CIO and Technology Executive Survey showed that “nine out of 10 respondents (92%) ranked AI as the technology their organizations were most likely to implement by 2025.”1
The downside? Beyond the ethical concerns, organizations using AIaaS tools have learned some hard lessons about data leaks and intellectual property (IP) risk after employees shared proprietary information with these public tools. What those employees using AI to accelerate code reviews and new drug discovery may not have realized is that they’re effectively putting confidential data in the public domain: once they share proprietary information with an AIaaS, the service continues to use it to process other people’s requests–in some cases, indefinitely, depending on the terms of the AIaaS provider.
The immediate IP risk centers on users logging into the websites and APIs of these generative AI solutions and sharing proprietary data. However, this risk increases as local deployments of these systems flourish and people start connecting them to each other. Once an AI service is determining what data to share with other AIs, the human oversight element that currently makes that determination is lost. The AI service is unlikely to understand the impact and potential consequences of sharing data it has access to and may not notify its human handlers that data has been shared outside of the organization.
Thus, the risk of IP loss and leakage of customer data has made it imperative for organizations to understand the scope of generative AI use across their businesses. Until now, organizations haven’t had an easy way to audit employee use and potential misuse of these tools.
Data Protection From Rogue AI Use and Accidental Misuse
Earlier this week, ExtraHop released a new capability that helps organizations understand their potential risk exposure from employee use of OpenAI ChatGPT.
ExtraHop Reveal(x) provides customers with visibility into the devices and users on their networks that are connecting to OpenAI domains. This capability is essential as organizations move quickly to adopt policies governing the use of large language models and generative AI tools, since it will give organizations a mechanism to audit compliance with those policies. We are taking this step as part of a larger security platform approach, incorporating the AIaaS monitoring with our existing, industry-leading network detection and response (NDR) capabilities.
By tracking which devices are connecting to OpenAI domains, identifying the users associated with those devices and the amount of data those devices are sending to those domains, Reveal(x) enables organizations to assess the risk associated with their users' ongoing use of AI services. (Watch a demo.)
In addition, because Reveal(x) shows the amount of data being sent to and received from OpenAI domains, security leaders can evaluate what falls within an acceptable range and what indicates potential IP loss. For example, simple user queries to a chatbot should fall within a range of bytes to kilobytes. If security teams see MBs of data flowing to these domains, that volume may signify employees are sending proprietary data with their query. Organizations will be able to identify the type of data and individual files that employees are sending to OpenAI domains if the traffic in question is not encrypted and Reveal(x) is able to identify related data exfiltration and data staging detections.
Reveal(x) is able to provide this deep visibility and real-time detection because we use network packets as the primary data source for monitoring and analysis. Using a real-time stream processor, Reveal(x) transforms unstructured packets into structured wire data and analyzes payloads and content from OSI Layer 2–7 for complete network visibility. From device discovery to behavioral analysis, network telemetry is the immutable source of truth for understanding an organization’s hybrid environment. Logs can tell you that two devices talked to each other, but Reveal(x) provides rich context about the communication.
At ExtraHop, we can’t underscore the importance of this capability as organizations grapple with the popularity and proliferation of AIaaS and the data leakage risk associated with it. ExtraHop believes the productivity benefits of these tools outweigh the data exposure risks, provided organizations understand how these services will use their data (and how long they’ll retain it), and provided organizations not only implement policies governing use of these services but also have a control like Reveal(x) in place that allows them to assess policy compliance and spot risks in real time.
Sign up for a demo to learn more.
1 - Gartner®, Applying AI - Key Trends and Futures, Bern Elliott, Jim Hare, Frances Karamouzis, 25 April 2023. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.