Using Big Data to Catch Big Cyber Criminals

Using Big Data to Defend Cloud Accounts Against Cyber Criminals

Share with your network!

May 02, 2023 Guy Sela

We all know that cloud platforms are essential for modern businesses. And as we’re constantly reminded with each new cyber attack, they also pose significant security risks. Cyber criminals are constantly trying to compromise cloud accounts using brute-force techniques such as guessing passwords or exploiting weak authentication mechanisms until they gain access. And from there, compromised cloud accounts open the door to internal phishing, data theft, and email fraud.

Fortunately, with a simple data pipeline, we can gain invaluable insights into brute-force tactics all over the world and stop new cyber attacks in real time. Here is how Proofpoint Cloud App Security Broker (CASB) uses Big Data analytics to detect account takeover attempts in real time.

The data pipeline

Our CASB solution protects cloud platforms for thousands of businesses across the globe. We monitor their cloud activity and accumulate data on all logins to their cloud services—Microsoft 365, Google Workspace, Salesforce and others. We process more than 200 million login events daily and detect account takeover attempts using multiple detection systems.

The architecture for our Brute-Force Attacks Detector looks like this:

Brute force attacks

The Brute-Force Attacks Detector architecture.

Here’s an example from Microsoft 365. The fetcher service is constantly gathering the latest login activity that occurred in the client’s Microsoft 365 environment. These events flow downstream to the Apache Kafka Logins topic. From this topic, events are sent to two destinations:

Persistence—AWS Firehose is persisting all login events in a Parquet format in S3. The objects are partitioned in S3 according to their ingest timestamp: “year=<year>/month=<month>/day=<day>/hour=<hour>.” Using the efficient columnar Parquet format and partitioning the events based on their ingest timestamp allows us to perform very efficient and cost-effective analytics queries on the data.
Detection—The real-time detector service tries to match each login event to a list of known brute-force IPs, produced by the “brute-force glue job” described below. If the login event’s IP matches the malicious IP list, the detector service can send a reset password/reset session request to Microsoft 365 for the compromised user.

Our brute-force glue job runs daily. It analyzes all the login events over a given timeframe, both failed attempts and successful logins. Using our proprietary algorithms, it creates an up-to-date list of the IPs used worldwide to launch brute-force attacks.

As you can imagine, analyzing more than 40 billion login events can be a slow and expensive process. But because we applied the best practices for big data persistence mentioned above, this job takes us just minutes to run and at low cost.

The output from the job is saved in S3 and loaded to the memory of the real-time detector service, which uses it for each newly ingested login.

The data value chain

Our brute-force detection architecture is a perfect example of the data value chain in action. The value of an individual data item—in our case, a login event—is at its peak when analyzed in near-real time. When a malicious actor breaches a client’s account, our product detects the activity within minutes. The attacker is kicked out of the session instantaneously, which minimizes the damage they can cause.

Without such protection, bad actors would quickly expand their grasp on the client’s system, increasing financial losses. Sophisticated attackers create hidden entry points for themselves; they can infiltrate the victim to such an extent that resetting the session no longer suffices. Every second lost reduces the value of protection and increases the potential harm. That’s why we designed this mechanism to respond as close to real time as possible.

Big data value continuum

The Big Data value continuum.

But how did we create the brute-force IPs list in the first place?

This is where the value of data in aggregate shines. Our security analysts determined the optimal span of data needed for this type of detector. On Day One, when we started accumulating data, we had less than 1% of the ideal amount. Every day we ingested more and more data. As the aggregate grew, our daily brute-force glue job produced better and better results, covering ever more malicious IPs and improving the efficacy of the list.

Eventually, we reached the sweet spot—the pinnacle of the aggregated data value.

Proofpoint CASB contains multiple systems for detecting malicious activity and cyber crime in the cloud. We strive to run them as close to real time as possible to provide the best protection for our customers.

To learn more about how Proofpoint CASB safeguards Microsoft 365 and other cloud platforms, visit our Proofpoint CASB solution page.

About the author

Guy Sela is a senior staff engineer and architect on the CASB team at Proofpoint. He has worked in the software industry for more than 18 years in various domains. He even founded his own company for poker-training software. Guy lives in Austin, Texas, with his wife, Alejandra and their child, Oz.

Guy Sela

Plataforma

Using Big Data to Defend Cloud Accounts Against Cyber Criminals

The data pipeline

The data value chain

About the author