Austin

Posted on Dec 8, 2020

Asking Your Big Security Data Questions

#bigdata #cybersecurity #nlp

Humans are now producing more data on a day-by-day basis than ever before. Every minute, users on the internet generate 2.5 quintillion bytes of data on average. The data varies from selfies and pictures of food, to network data created every time we visit a website or get a DoS attack.

As humans become more ‘datafied’ the cyber security industry needs to ingest and make decisions on this vast information every day. With the pace of of data creation showing no signs of slowing down, this means that the cyber security industry needs to adapt to identify threats faster and respond more quickly to more events that could make or break a business.

Interpreting Data with Technology

As the amount of data grows, we need to be able to better understand this information and make decisions from it. Promising technology like machine learning and neural networks dominate every sector of business today, but the problem is the very fact that they are still promising — not yet delivering in a practical sense.

It still takes a lot of work to create good insights from the information, often employing the need for a data scientist. And, while the data science field is growing at an exponential rate every year, it takes time and money to build and perfect these algorithms that many organizations just can’t afford.

Even after organizations invest in all this technology, it doesn’t solve the root problem of how we as humans interact with our data.

Today’s Interfaces

Today many interfaces are built around the technical challenges that developers encounter. To be frank, they look like they were created for computers, not humans. These interfaces are either a circus input elements like checkboxes and text boxes where you don’t even know what to type in them…

…or so complex you have to be a database developer to write a simple query.

SELECT
   extract(dow FROM sale_timestamp) AS day,
   count(sale_id) as num_sales
FROM SALES
WHERE sale_timestamp >= now() - INTERVAL '7 DAY'
GROUP BY extract(dow FROM scale_timestamp)

As humans, we don’t communicate this way. The above query is actually a simple question you might walk up and ask a colleague: How many sales did we close each day last week? Asking this question of a computer is clearly more complex.

Although software engineering is becoming a more common skill, it’s not a valid reason for us avoid the root problem: Why can’t we ask simple questions of our data like we do of each other?

Turning to NLP

Another old but new technology emerging to deal with this issue is called Natural Language Processing (NLP). This technology literally dates back to the birth of computers in the 1950s, but until recently we have not been able to harness it well. Even with all the technology at our hands today, much like machine learning, it’s still emerging and relatively buggy.

Just try and ask Siri What time does Best Buy open today? That’s a relatively easy question to process and understand but, still our technology is not quite there.

Now imagine trying to use a methodology and ask an open ended question on a petabyte of data about your network traffic today as you work to understand if there was a network attack. The level of complexity cyber security teams require of NLP goes above and beyond what our technologies are able to deliver today.

So it’s easy to see how human and technological limitations are currently holding security teams and analysts back from asking direct questions of our data. But how do we as an industry move forward to overcome these challenges?

Approaches to Solving This

The cybersecurity industry needs to ingest and make decisions on this information every day, but as more data is generated it gets harder and harder to find meaningful answers from it.

This issue is compounded by today’s interfaces, which feel like they were built for computers. Even the most ambitious UIs fall victim to limitations in the way we as humans operate. As data grows, how do analysts communicate with machines? How do we ask the data better questions to make critical business decisions?

The State of the Cybersecurity Industry

Splunk is among one of the most popular organizations in the space that security practitioners use to analyze data and make decisions. Splunk and similar companies tend to use a SQL-like experience to find and filter data.

This method, while powerful, takes years of experience to become an expert at and be effective and still yet is error prone. In interfaces like the one depicted above, it’s also difficult to re-adjust the query without re-writing and pivoting on data.

Auditing the Available Technologies

Exciting new technologies like NLP are promising, but it can be cumbersome to type out simple questions in the ‘right’ way to achieve the desired response. NLP is also difficult to use when analysts don’t know exactly what they are looking for. Imagine trying to ask Siri What stores have the new Call of Duty game in stock and are within a 15 mile range and open after 5PM? Even after you have a broad set of answers, you need to analyze and pivot on results, which would require re-typing the question multiple times, with slightly new conditions.

Point-and-click interfaces are easy to use, but very quickly become complex and difficult to understand. They are unpopular amongst senior analysts who just want to jump into a terminal and type out a grep query.

Technologies that use SQL are very powerful, but require years of training to be effective. It is also still difficult to pivot on the information without re-writing the query.

I wanted to combine the strengths of all of these technologies; making a tool that is easy for entry level analysts to use and powerful enough for the most sophisticated analysts. We also wanted to make it feel more natural, as if you were asking your data a question like you would a colleague.

The Ideal State

I believe that the ideal state is on that builds on the strengths of some of the oldest and newest technologies to create an intuitive interface. As analysts explore their data sets, you can quickly type or point and click queries that provide context such as suggestions and visual elements such as calendars.

In this experience, the analysts can filter down to find insights that are important to them. By providing an experience that allows a user to visually ask the question one word at a time, the analyst can create better queries. Once the user creates a query, they can easily go back and edit it without having to re-create the full query, allowing for easy data pivoting. The experience takes creating and modifying complex queries — like showing Insights created in the past seven days, are unassigned and still open — and makes them easy enough for an entry level analyst to build.

This is just one of the ways we are making it easy for security operations centers to operate more effectively while addressing the security job gap in the market today.

DEV Community

Asking Your Big Security Data Questions

Interpreting Data with Technology

Today’s Interfaces

Turning to NLP

Approaches to Solving This

The State of the Cybersecurity Industry

Auditing the Available Technologies

The Ideal State

Top comments (0)

Read next

How to prevent XSS Attacks in ASP.NET Core Web API

Top 10 Web Application Security Threats

Recent 0day Threat Intelligence Alert from Darknet

POC of Grafana Post-Auth DuckDB SQL Injection (File Read) CVE-2024-9264