One of Gravwell's great strengths is binary ingest: you can store things like raw packets, then parse them later when you know what you want to extract. This came in handy recently when I set up IPv6 on my home network and wanted to keep an eye on who's issuing Router Advertisement (RA) messages. A RA message by itself isn't very helpful, since you just get a MAC address and an IPv6 link-local address, but with a little bit of Gravwell query magic, I was able to parse out ARP packets to link the IPv6 address to an IPv4 address, which helps identify the machine.
Finding Router Advertisements
As a bit of background, I run Gravwell on my home router. I capture packet traffic from my network in the pcap tag. I started out by asking, "who's sending router advertisements on my network?" I'd expect to see advertisements from only one system: my router. Any other advertisements would indicate either a misconfiguration or a malicious device.
It's very easy to answer this question in Gravwell using the packet module. We tell it to show us only ICMPv6 packets of type 134 (Router Advertisement), extracting the IP address and source MAC address of the request:
tag=pcap packet icmpv6.Type==134 ipv6.SrcIP eth.SrcMAC
| unique SrcIP SrcMAC
| table SrcIP SrcMAC
As I expected, I found advertisements from only one system:
Now the question arises - How can I be sure that's my router? If I saw multiple machines, how would I know what they were? I can of course log on to the router and verify that it has an interface with the matching MAC address. I can also search other data types in Gravwell for information about the MAC address, perhaps finding DHCP logs or other pertinent information, but it would be nice to get additional information purely from the packet captures.
Parsing ARP
Because I run a dual-stack home network, I expect most devices to have both IPv4 and IPv6 addresses. In IPv4 space, systems use ARP messages to inform each other of mappings between MAC and IPv4 addresses. Since we already have a MAC address from the IPv6 router advertisement, we can attempt to match that up against ARP messages to find an IPv4 address.
ARP is a link-level protocol with an EtherType of 0x0806. We can extract the raw ARP messages for the machine in question pretty easily with some simple filters, using the hexlify module to make the Ethernet payload a bit more readable:
tag=pcap packet eth.Type==0x0806 eth.Payload eth.SrcMAC==00:1b:78:5d:cd:8e
| hexlify Payload
| table
The Payload field contains the ARP message. We need to extract two portions of that message: the operation code, and the sender protocol address. The operation code indicates if the message is a request (opcode 1) or a reply (opcode 2); we'll want to filter down to replies only. The "sender protocol address" is, in an ARP reply, the IP address of the sender.
We can extract these two fields using the slice module. Because we're dealing with packets, we know we'll need to extract big-endian numbers. Looking at the ARP packet definition, we can see that we need to extract bytes 6 & 7 to get the operation code, and bytes 14 through 17 to get the sender protocol address. We'll extract those as unsigned 16 and 32 bit integers, respectively, and include an eval expression to keep only replies:
tag=pcap packet eth.Type==0x0806 eth.Payload eth.SrcMAC==00:1b:78:5d:cd:8e
| slice uint16be(Payload[6:8]) as opcode
| eval opcode==2
| slice uint32be(Payload[14:18]) as ip
| table
Representing an IP address as an integer isn't particularly useful for humans, but luckily the eval module's toIP function can convert a 32 bit integer into an IP address. In the query below, we apply that conversion, then use the unique module so we don't get repeated results, and we get a nice table out:
tag=pcap packet eth.Type==0x0806 eth.Payload eth.SrcMAC==00:1b:78:5d:cd:8e
| slice uint16be(Payload[6:8]) as opcode
| eval opcode==2
| slice uint32be(Payload[14:18]) as ip
| eval setEnum("IP", toIP(ip))
| unique IP SrcMAC
| sort by IP asc
| table SrcMAC IP
Joining It Up: Compound Queries
The examples above show a manual process: run the first query to find router advertisements, then use the resulting MAC address in the second query to find a corresponding IPv4 address from ARP packets. It's a slow process, though. Luckily, we can use compound queries to run both steps in one query!
If we eliminate the MAC filter in the ARP parsing query, it builds a table mapping all known MAC->IP associations. We can put that into a sub-query, then use the output of the sub-query with the lookup module to fetch an IPv4 address to match a given router advertisement message:
@arp{tag=pcap packet eth.Type==0x0806 eth.Payload eth.SrcMAC
| slice uint16be(Payload[6:8]) as opcode
| eval opcode==2
| slice uint32be(Payload[14:18]) as ip
| eval setEnum("IP", toIP(ip))
| unique IP SrcMAC
| sort by IP asc
| table SrcMAC IP};
tag=pcap packet icmpv6.Type==134 ipv6.SrcIP eth.SrcMAC
| lookup -r @arp SrcMAC SrcMAC IP as IPv4
| unique SrcIP IPv4
| table SrcIP SrcMAC IPv4
Observe that we define a sub-query named "arp" which does the ARP packet processing and outputs the results with the table module. Because it runs as a sub-query, the results from table are available as an ephemeral resource named @arp for the second query, which extracts a matching IPv4 address for each router advertisement packet.
Conclusion
Gravwell provides a lot of ways to parse data, but we can't cover every use-case. That's where the slice module and other low-level tools come in: they can help you extract useful information from binary formats that Gravwell doesn't explicitly support, the same way you can use the regex module to extract from arbitrary text formats. By combining the packet parser, the slice module, and compound queries, I was able to fuse IPv4 and IPv6 network traffic to gain a better understanding of what's on my network.
Request a demo using the button below, and a Gravwell Guide will walk you through a similar scenario so you can see our data fusion platform in action. The possibilities are limitless.
John's been writing Go since before it was cool and developing distributed systems for almost as long.