The ITSPmagazine Podcast

When AI Guesses and Security Pays: Choosing the Right Model for the Right Security Decision | A Brand Story Highlight Conversation with Michael Roytman, CTO of Empirical Security

Episode Summary

Security teams are overusing general purpose AI models for decisions they were never designed to make. This conversation explains why predictive security requires purpose built models, continuous retraining, and disciplined data science.

Episode Notes

Title: The Right Model for the Right Security Task | A Brand Highlight Conversation with Michael Roytman, Co-Founder and CTO of Empirical Security

In this Brand Highlight conversation, Michael Roytman, Co-Founder and CTO of Empirical Security, joins Sean Martin to discuss why choosing the right AI model for the right task is essential for effective cybersecurity.

Michael Roytman explains how Empirical Security takes a data-driven, Moneyball-style approach to preventative security. The company builds and maintains an ensemble of models, including the open EPSS model used by over 100 vendors, global models for vulnerability exploitation forecasting, and local models tailored to each customer's unique environment.

The conversation explores a critical finding: LLMs perform poorly at predictive security tasks. Michael Roytman shares research he published in Forbes comparing EPSS to LLMs from Google, OpenAI, and Anthropic. While LLMs excel at summarization and classification, they struggle to predict future exploitation events. Purpose-built models like XGBoost consistently outperform LLMs for probability forecasting.

Empirical Security positions itself as a data science company operating on security data rather than a traditional security vendor. With two-thirds of the founding team holding data science backgrounds, the company trains models from scratch and continuously retrains them as environments and threat landscapes evolve.

This is a Brand Highlight. A Brand Highlight is a ~5 minute introductory conversation designed to put a spotlight on the guest and their company. Learn more: https://www.studioc60.com/creation#highlight

GUEST

Michael Roytman, Co-Founder and CTO of Empirical Security

On LinkedIn | https://www.linkedin.com/in/michael-roytman/

RESOURCES

Learn more about Empirical Security | https://www.empiricalsecurity.com

Are you interested in telling your story?
▶︎ Full Length Brand Story: https://www.studioc60.com/content-creation#full
▶︎ Brand Spotlight Story: https://www.studioc60.com/content-creation#spotlight
▶︎ Brand Highlight Story: https://www.studioc60.com/content-creation#highlight

KEYWORDS

Empirical Security, Michael Roytman, data-driven security, vulnerability management, EPSS, risk-based vulnerability management, AI in cybersecurity, machine learning security, LLM limitations, predictive security models, XGBoost, local models, global models, preventative security, Moneyball security, cybersecurity AI, threat intelligence, security data science, model retraining, ITSPmagazine, Brand Highlight, Studio C60

Episode Transcription

The Right Model for the Right Security Task | A Brand Highlight Conversation with Michael Roytman, Co-Founder and CTO of Empirical Security

[00:00:00]

Sean Martin: And hello everybody, this is Sean Martin, and you're very welcome to a quick brand highlight. I'm thrilled to have Michael Roytman on from Empirical Security. Michael, how are you?

Michael Roytman: Good. Good. How are you?

Sean Martin: Doing great, thanks. And so we have a mutual friend, obviously. Your coworker, Ed Bellis. I know I've known for a long time and I stay tuned to a lot of things on LinkedIn. Good or bad, and saw a post from Ed that piqued my interest. And so this is about that post talking about the broader world of AI and LLMs and security weaknesses within all that space.

So first, a few words from you. Who you are, your role at the company, and maybe the elevator pitch for Empirical, and then we'll get into the blog and what you guys are up to.

[00:01:00]

Michael Roytman: Yeah, yeah. My name's Michael Roytman. I'm the CTO at Empirical Security. Funny you mention Ed. He's my co-founder, CEO here, but I worked for him at Kenna Security for 10 years and then Cisco for three years. We essentially created the risk-based vulnerability management market and product. Empirical is us going both broader and deeper with that data-driven Moneyball approach to vulnerability management and expanding it to all preventative security.

So we build and maintain models ranging from open models like EPSS. We provide that to FIRST.org. About a hundred vendors use that today. We have a global model for vulnerability management issues, probabilities, forecast for vulnerability exploitation, and then we build local models for our customers that help deal in a data-driven manner with application security, cloud security, vulnerability management, building probability ratings, risk ratings, security models that are specific to an enterprise.

[00:02:00]

That's kind of our new approach to making security a little more data driven. We deal entirely with the preventative side, but unlike the CIEM vendors out there, we actually grab data from telemetry sources like SIEM, CrowdStrike, Palo Alto, and use that to retrain the model as well. So we're really excited for where this takes security.

We think that this is a very broad preventative approach and we use ML AI as tools in a huge bucket of tools. When I say a model, I mean actually an ensemble of models ranging from classifying GitHub repositories as malicious code to finding probabilities of exploitation, to looking at code and doing summarization to create features for ML models.

[00:03:00]

We kind of span the gamut of data science.

Sean Martin: So in Ed's post, he mentioned that it's the right tool for the right job. And so you might, which basically says to me purpose-built specific things, doing specific activities and analysis that they may then pull together to collect a bigger picture to help drive some better decisions.

Talk to me about what you see a lot of companies doing with respect to, I don't know, grab a model, use the public model and get good enough, but may have hallucinations or, I don't know, some other challenges there. So what are you seeing and how does what you offer kind of help close the gaps?

Michael Roytman: Yeah. Hallucinations is a very 2023, 2024 concern. It's gotten worse. So we ran a study, I actually published it in Forbes about six months ago, comparing just our free model EPSS, which is a predictive deterministic model for predicting likelihood of exploitation in the next 30 days.

[00:04:00]

Comparing that model to three LLMs: a Google LLM, an OpenAI LLM, and an Anthropic LLM at the same task. So when you pick tasks for LLMs that are specific to supporting security decisions, issuing predictions about the future, they're terrible at it. They can't really predict the future. They can predict the next character in a sequence.

They can auto-complete a text, or they can do a task in that sense, if it's a sequence of tasks. But to predict the weather tomorrow, they're just not built for that. There are a suite of models that are really good at predictions. XGBoost for predicting that probability, if you have a sequence of exploitation events, is excellent.

And that's what we see continuously outperforming all LLMs. So using the right tool for the job is not just important. It actually guarantees efficiency and security. If you're using an LLM to predict the probability of a vulnerability being exploited, that's not gonna have a great result. If you're using an LLM to summarize code,

[00:05:00]

an extractive feature, or tell me is this one of these four categories of potential exploitation types, it's great at that. It's designed to summarize something into a smaller distillation by using the transformer architecture. So at Empirical, I used to be the chief data scientist at Kenna Security.

Our third co-founder, Jay Jacobs, was the chief data scientist at SIRA Institute and one of the founders there, he was a data scientist at Verizon DBIR, I think the first there. And so, you know, two thirds of the company has a data science background. We're very particular about using the right model and the right tool for the task.

Rather than picking something up off the shelf. And I think part of with that territory comes the challenge of having to train models from scratch. Which I think today a lot of security companies just aren't willing to do. That's not their forte, that's not their wheelhouse. We think of ourselves as a data science company that's operating on security data rather than a pure security company.

[00:06:00]

Sean Martin: Yeah. And if anybody knows it, training it once is not enough.

Michael Roytman: Retraining is actually the technical challenge for sure. Yes. Especially if you think about hundreds of models and hundreds of customers, and then you're retraining each of those when the environment changes, when the threat landscape changes. But that is actually more of a Terraform DevOps ML operations challenge than a data science challenge, ironically.

Sean Martin: Well, no lack of challenges. Thanks for taking time to share this little highlight with us. Thanks again, Michael.

Michael Roytman: Thanks Sean. Appreciate you.

[00:07:00]