Gravwell Blog

Grouping Related Entries with the Transaction Module

Written by Fritz | Apr 1, 2021 3:36:00 PM

In today's blog, we’ll give a short overview of the transaction module introduced in our most recent update: Gravwell 4.1.5. The transaction module is a powerful module that can rewrite individual entries into grouped entries based on any number of keys--essentially, the transaction module allows you to collate entries based on a given criteria.

A transaction is a series of entries that are related somehow. For example, a user browsing a website will generate several entries in the webserver’s log, but the entries as a whole represent the user’s browsing session to that website. Similarly, the file I/O, syscalls, network operations, etc. that a process makes on a Windows machine make up the runtime activity for that process. Even binary data, something Gravwell can easily work with, can be represented as a series of individual entries. 

The transaction module simplifies the organization of transactions by grouping individual entries into single entries. In a query, you simply provide one or more “keys”, enumerated values that represent the grouping criteria, and the transaction module does the rest. For example, to group all actions in Windows event logs by user, we can run this query:

 

tag=sysmon winlog Provider=="Microsoft-Windows-Sysmon" EventID==1 User OriginalFileName
| transaction -e OriginalFileName User
| table User Computer transaction| table User Computer transaction

 

Most of the query is concerned with extracting Sysmon data, but note how we’ve inserted the transaction module (highlighted) to group the “OriginalFileName” EV per every unique “User” EV:

 

 

Let’s take a deeper look at two examples: Apache webserver logs and SMTP sessions from binary data

Example: Viewing user sessions in Apache logs

The core use case for the transaction module is collating textual log entries. Apache webserver logs, like many other log formats, have a discrete entry for every single event that occurs. That means a user who visits a website and follows several links will generate at least as many log entries. Meanwhile, other users may also be navigating and generating log entries. The result is a jumble of log entries that seemingly aren’t related:

 

x.x.x.x - - [17/Mar/2021:14:30:53 +0000] "GET /search/namedfields/namedfields.md HTTP/1.1" 200 7762 "https://docs.gravwell.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"

 

x.x.x.x - - [17/Mar/2021:14:14:31 +0000] "HEAD /api/search HTTP/1.1" 200 290 "https://docs.gravwell.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"

 

x.x.x.x - - [17/Mar/2021:12:35:26 +0000] "GET / HTTP/1.0" 200 344495 "https://docs.gravwell.io/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"

 

The log entries above are real Apache logs for docs.gravwell.io, with IP addresses removed for anonymity. We can see that it’s not so easy to make sense of what individual users are doing. We could narrow a search down to a single IP address by inserting a filter in our query:

 

tag=apache ax IP==1.2.3.4 | table

 

 

This works just fine for drilling down into a single IP address, but what if we want to look at the access patterns for more users? Enter the transaction module. 

Let’s take a look at our Apache logs again:

 

tag=apache ax | table

 

This query shows us that our autoextractor produces IP, Host, Message, and Timestamp fields. Let’s use the transaction module to group by IP address and display the Timestamp and Message fields. We’ll even get fancy and tell the transaction module to separate fields with “ -- “. Presentation counts! 

 

tag=apache ax Host~docs.gravwell.io | transaction -e Timestamp -e Message -fsep " -- " IP | table IP transaction

 

By default, the transaction module creates a new EV named “transaction” (this can be changed with the -o flag), so we simply table the output of “IP” and “transaction”:

 

 

Immediately we can see the typical browsing patterns of readers of docs.gravwell.io. Some are sticking around and reading everything, most just come for something specific. This same simple approach can be used for all sorts of logs. 

Example: Reconstructing SMTP sessions directly from PCAP

In this example, we’ll extract the plaintext payloads of email (SMTP) packets from PCAP into transactions. “Reading” a transaction from PCAP is at best a form of torture, so the transaction module is especially useful here.

Reconstructing plaintext in PCAP is easy with transaction. First, we’ll extract from PCAP all IPs and payloads from sessions on port 25:

 

tag=pcap packet tcp.Port==25 tcp.Payload ipv4.SrcIP ipv4.DstIP

 

Next we’ll invoke the transaction module and key on “SrcIP” and “DstIP” pairs:

 

tag=pcap packet tcp.Port==25 tcp.Payload ipv4.SrcIP ipv4.DstIP
| transaction -e Payload SrcIP DstIP | table SrcIP DstIP transaction



 

Voila! A human-readable list of email sessions! 

The Takeaway

The transaction module is a powerful module that enables you to rewrite individual entries into grouped entries based on any number of keys. For more information about the transaction module and other updates in our 4.1.5 version, click the button below. One of our Gravwell Guides will be happy to show you our data fusion platform in action.