Forget about AI taking our jobs, let’s worry about the attackers aiming to weaponize it.
Most IT security specialists readily admit that the future almost certainly contains the union of artificial intelligence and automated cyber attack systems. While that conjures up potential scenarios of Terminator-style “Skynet” disasters, the reality is both a lot more fantastical and mundane.
The notion that there are numerous defensive security solutions already in the marketplace that use some effective form of AI is false. Many vendors use the term very loosely to mean we can automate or orchestrate our detection or rules or scans and thus think they have accomplished some sort of near version of AI. But this isn’t true, and is simplistic pattern matching masquerading and marketed as AI. True AI uses data to make decisions and branches out from those decisions to something that wasn’t immediately obvious or predictable.
So we have this situation where many vendors are talking about AI but aren’t actually using the technology, or not really demonstrating its true potential. Furthermore, there have been few efforts where vendors have collaborated on defending against potential AI-based adversaries: the best examples of this are the 24 Information Sharing and Analysis Centers that have been created around specific vertical markets.
With the ethos of open source in mind, I wanted to share a very serendipitous discovery my team at Pivotal made not too long ago that illustrates how AI can be used in a novel way to solve a very real problem. Our hypothesis borrowed from an education theory called transfer of learning: where knowledge of one context is applied to a completely different scenario.
Rohit Khera on my team hypothesized that we could teach Google’s TensorFlow to detect when data is confidential or private, like a password, and when it wasn’t. The foundational idea was to interpret lines of code as an image of grayscale pixels. He wrote a blog post on the topic in 2016. There’s some mysterious contextual link between AI’s ability to recognize a picture of a flower and a picture of a password in text.
Google’s AlphaZero found a similar link between Go and Chess. For context, Google’s AlphaZero used AI to teach a machine how to play Go. A year after Alpha won a game against the top Go player, it won a match against the top-rated computer chess program. What’s remarkable about the latter feat is that the AI program taught itself in just a few hours to learn how to play at this grandmaster level. No one programmed the machine the typical opening moves of chess, or famous games of the past. It took one set of rules — the game of Go — and extended them into another — chess. Khera presumed that a similar notion could apply to the very real problem of leaked credentials in log files and source repositories.
To understand his approach — which I’ll describe in a moment — let’s first consider a typical IT security engineer’s day. Alerts are coming in constantly about all sorts of anomalies: many of them are false positives, while others could be signs of a system breach. Screening these alerts is a complex and tedious human task, and that is what many defensive AI-type tools are trying to help. This can mean a person doesn’t have to evaluate so many of these alerts and can focus on the ones that really matter. Given that the average IT operation is running hundreds or thousands of different applications, this is a very difficult situation, and one of the reasons why so many breaches occur.
But many vulnerabilities happen because of human errors too, and these situations rarely trigger an alert. An engineer copies a sensitive file to a cloud storage bucket and sets access rights to “anyone.” Or a set of personal data is accidentally copied to a log file and stored as plain text on some external server. These situations aren’t always obvious but are very problematic, and could threaten the entire enterprise. If an attacker finds this information, they can harm our business.
We already use a variety of automated tools to facilitate patching of our systems, or managing common infrastructure situations such as spam handling, and system configuration, so why not take things to the next level here with more advanced AI tools? While AI isn’t yet the answer to everything, and does require a joining of machine and human learning to be effective, it can be a lot more useful with the right set of applications in the security space, and particularly to defend against adversarial AI-based attacks.
For our research, we started providing TensorFlow a series of matrices that represent textual data, such as a series of private encryption keys, or a list of passwords, or something similar. We then used the software to figure out ways to recognize when data was confidential and when it wasn’t. We connected this to a live series of logs that contained both private and public kinds of data and trained it accordingly. We found that the program could detect confidential data much better than what we were doing with regular expressions beforehand.
What’s compelling about the findings: just as Google AlphaZero was able to beat the best computer chess player (and chess programmer) in less than a day, our AI demonstrated the promise of being better than our regular expression authors.
Granted, transfer of learning as it relates to AI is mysterious. We are still building a simple vocabulary to even discuss the most basic of terms of what’s possible. But as we learn how to apply AI in the defensive context, you better believe attackers are also trying to figure out how to weaponize AI.
My hope is that our industry can carve out the necessary time to experiment with AI and learn. Engineers need the freedom to occasionally experiment and fail, attackers do it all the time. If we experiment and innovate, I’m hopeful that transfer of learning will help us to better understand data relationships and successfully defend our enterprises.
Combatting Adversarial AI — Which Side Are You On? was originally published in Built to Adapt.