DNS auditing is an integral part of any I.T. security program. Name resolutions can act as a great tip for discovering malware, command and control streams, or misbehaving employees. Acquiring DNS audit data can be difficult with some DNS servers (*cough* Windows *cough*); for this post we are going to show an extremely easy method of getting DNS audit data directly into Gravwell.
For this post we deploy CoreDNS with integrated Gravwell auditing. Using the DNS audit data we will demonstrate the process of acquiring and updating an open source DNS threat feed and continuously monitor for any hosts hitting known bad domains. We will then demonstrate catching and decoding a piece of malware that scans an internal network and uses DNS requests to relay the recon data out.
If you are new to Gravwell or do not have an active Gravwell installation, request a Free Trial and then view the Quick Start Guide to get the ball rolling once you have received your license.
CoreDNS is an open source, high performance, and plugin friendly DNS server implementation written in Go. The implementation is so good that it is now the default DNS server for Kubernetes. The plugin architecture is easy to understand and easy to implement, so we wrote one that allows for integrating a Gravwell ingester directly into CoreDNS and published it. The result is a stand-alone DNS server with no external dependencies that can do a ton of cool stuff, not the least of which is send DNS requests and responses directly to Gravwell. To learn more about CoreDNS visit their webpage at coredns.
For the purpose of this article we will be deploying CoreDNS as a caching DNS proxy which forwards all requests on to the Cloudflare public DNS infrastructure over an encrypted TLS connection. This deployment will allow us to audit requests made by internal machines while also securely forwarding on to infrastructure with very robust privacy promises. We also get to take advantage of encrypted DNS requests without all of our internal infrastructure having to support DNS over TLS. I use this setup at home to convert all my internal machines (including some old crappy embedded equipment) to DNS infrastructure that my ISP can’t spy on.
We are going to be building CoreDNS on an AMD64 Linux server, but the Gravwell plugin and CoreDNS supports any platform that Go supports. Before attempting to build CoreDNS, make sure you have the Go toolchain with at least version 1.9 golang.
The basic steps for building CoreDNS with the Gravwell plugin are:
Run the following commands to get a statically compiled CoreDNS binary.
go get github.com/coredns/coredns
go get github.com/gravwell/coredns
pushd $GOPATH/src/github.com/coredns/coredns/
sed -i 's/metadata:metadata/metadata:metadata\ngravwell:github.com\/gravwell\/coredns/g' plugin.cfg
go generate
CGO_ENABLED=0 go build -o /tmp/corednspopd
file /tmp/coredns
/tmp/coredns: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
Let’s also validate that our plugin was built in by asking our newly built application to list all plugins:
/tmp/coredns -plugins | grep gravwell
You should see “dns.gravwell” printed on the command line.
CoreDNS is configured via a single Corefile which can be used to configure multiple listeners, plugins, forwarders, caches, etc. We will be configuring a single listener which is bound to port 53. The listener will be configured to send audit data to a remote Gravwell instance and will employ the local ingester cache to ensure that we never lose DNS audit data (even if our DNS server temporarily can’t talk to our Gravwell indexers). As a bit of a safety net we are limiting the local Gravwell cache to 128 MB. For more information on configuring CoreDNS visit coredns
Let’s start by just showing our Corefile:
.:53 {
forward . tls://1.1.1.1
errors stdout
bind 172.17.0.3
cache 600s
gravwell {
Ingest-Secret _your_ingest_secret_goes_here
Cleartext-Target 172.17.0.1:4023
Tag dns
Encoding json
Max-Cache-Size-MB 128
Ingest-Cache-Path /opt/gravwell/cache/coredns.cache
}
}
We have specified a single listener which will listen to all requests on UDP port 53. Within that listener we have declared a forwarder which forwards all requests to 1.1.1.1 over TLS, binds to address 172.17.0.3, caches queries for 5 minutes, and relays everything to a Gravwell indexer on 172.17.0.1 with tag dns. There are a few different encodings available which we will show, but for most of this post we will be using the JSON encoding. The full list of configuration options available for the Gravwell plugin, along with many examples, is available at coredns.
CoreDNS expects the Corefile to be in the current working directory by default, but you can specify any file using the -conf configuration flag. For example, we could use the Corefile located in /tmp/ by invoking CoreDNS as follows:
./coredns -conf /tmp/Corefile
It is important to note that port 53 is a privileged port meaning that Linux will prevent non-root users from binding to it. If you would like to run CoreDNS as a non-root user, ensure you grant the binary the appropriate capabilities using the setcap command, which for this case is the cap_net_bind_service capability.
setcap cap_net_bind_service=+ep ./coredns
Once we have fired up our CoreDNS server we need to check that the ingester connected. Check the list of ingesters on your Gravwell deployment by clicking System Status and then Remote Ingesters.
You should see an ingester connected and providing the dns tag. If you do not see the ingester verify a few things:
We can also check on the status of the DNS ingester by querying the gravwell tag and filtering for the coredns ingester. By default the Gravwell plugin is configured with a Log-Level of INFO which means it will send an entry to the gravwell tag every time it connects.
tag=gravwell grep coredns
If all goes well we should now be able to query our CoreDNS application and see the requests under the dns tag. To test, we will use the dig command to query for specific domain names.
# dig google.com @172.17.0.3
; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> google.com @172.17.0.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27313
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 147 IN A 216.58.193.78
;; Query time: 31 msec
;; SERVER: 172.17.0.3#53(172.17.0.3)
;; WHEN: Fri Jul 13 17:13:44 MDT 2018
;; MSG SIZE rcvd: 65
We can then check our Gravwell instance to ensure that the query was logged:
tag=dns
We should see a single log event encoded in JSON that represents the request AND response:
{"TS":"2018-07-13T23:13:44.659795524Z","Proto":"udp","Local":"172.17.0.3:53","Remote":"172.17.0.1:35452","Question":{"Hdr":{"Name":"google.com.","Rrtype":1,"Class":1,"Ttl":147,"Rdlength":4},"A":"216.58.193.78"}}
Using the Gravwell json module we can extract a few fields to get a slightly cleaner view
tag=dns json Proto Local Remote Answer.Hdr.Name Answer.A |
table Local Remote Proto Name A
You may notice the trailing dot on the end of domain names, that is part of the domain name system standard and is correct. That dot represents the root domain system, the START HERE if you will. Most tools and applications hide that dot because it is implied, but CoreDNS being a proper DNS implementation leaves it there. If that dot bothers you or you need to remove it for comparison with threat feeds you can use the slice module to remove it by chopping off the last character in the Name enumerated value.
tag=dns json Proto Local Remote Answer.Hdr.Name Answer.A |
slice Name[:-1] | table Local Remote Proto Name A
If we use the dig command to query an invalid domain we can see what a query looks like when the name cannot be resolved. We will be asking for the domain name totally.not.good.domain.me.
{"TS":"2018-07-16T16:23:13.013297133Z","Proto":"udp","Local":"172.17.0.3:53","Remote":"172.17.0.1:39790","Question":{"Hdr":{"Name":"totally.not.good.domain.me.","Qtype":1,"Qclass":1}}}
The structure is near identical, except that we don’t get a Ttl or Rdlength in the Hdr structure and the A field is missing. Using the require module we can query for domains that failed to resolve.
tag=dns json Question.Hdr.Name Question.A | require -v A
CoreDNS is one of those services that should probably be registered with SystemD (or other service managers) so that it comes up at boot and is automatically restarted should it crash or fail. Here is a simple SystemD unit file that runs CoreDNS as the user nobody under the group nogroup. We run the service under these extremely limited credentials so that any failures or compromises (CoreDNS code is pretty clean, so we aren’t very worried) don’t result in additional compromises. However, remember that you must set the bind capability on the binary.
[Unit]
Description=CoreDNS server
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/coredns
ExecStart=/opt/coredns/coredns
User=nobody
Group=nogroup
Restart=always
PIDFile=/var/run/coredns.pid
TimeoutStopSec=5
Auditing DNS can be accomplished a few different ways. There are threat lists, domain scoring, and basic anomaly detection. Threat lists provide a concrete set of domains that have been identified as bad. Domain scoring uses a softer approach of attempting to apply a score to domain which indicates the “trustworthiness” of a domain through a variety of methods. Using DNS scores you can then build up a score on the behavior of a machine (e.g. weight requesters by the cumulative weight of the domains they are requesting). Anomaly detection for DNS traffic is similar to anomaly detection in traffic, logs, etc. Just look at the baseline and find anything that falls sufficiently outside of that baseline. Basic mathematical analysis of query rates, domain name lengths, domain name entropy, etc. can all be effective with some rigorous tuning.
However, be forewarned that some applications can do some very wonky things in order to detect misbehaving DNS servers. Chrome for example attempts to detect when DNS servers redirect non-existent domain names by requesting gibberish domain names and checking for a response. See this for more information. We have seen more than a few system administrators scream about world ending malware due to this behavior, when in reality Google was just attempting to prevent users from landing on ad pages due to mistyped names. To see some of this first hand we can invoke our entropy module (new for version 2.2.0) to look at the entropy of DNS name resolution requests. We will see some Google Chrome browsers throwing a whole slew of DNS requests at <randomstring>.<roothostdomain>. The entropy on these requests will be higher than normal because the browser is quite literally generating random strings.
tag=dns json Local Remote Question.Hdr.Name | slice Name[:-1] |
regex -e Name "^(?P<head>[^\.]+)" |
regex -e Name "(?P<root>[^\.]+\.[^\.]+)$" |
entropy head by root | table root entropy
You might be saying “hey, the entropy wasn’t THAT much higher” and you would be correct. That is because many CDN backed services and ad networks use DNS trickery to ensure they have control over how requests are routed to their infrastructure. When it comes to DNS, the tactics and techniques used by attackers and ad networks are not all that different than those employed by defenders. All of the root domains with the highest entropy values are ad networks and CDN-like.
DNS blacklists provide a rather simple way to quickly audit a DNS feed for potential malware, command and control, or misbehaving employees. Many companies have been built around DNS threat lists and there are a variety of great commercial domain threat feeds. The downside to blacklists is that they are only as good as the intelligence and processes that built them, they also aren’t much use against highly transient domain names.
We will be using the domain threat feed from malwaredomains.com for this post; malwaredomains.com prohibits the usage of their threat feeds for commercial purposes, so if you are using Gravwell in a commercial setting you may need to contact malwaredomains for permission or use an alternate feed. We will pull the blacklist, massage it a bit, and push it into Gravwell as a lookup table resource. Once we have the resource in place we can use it to identify any requests that hit the blacklist and take a few steps to figure out what else the offending machine did around the time off the offending request.
We will demonstrate a script that automatically updates a DNS threat feed resource; then we will build a script that looks for hosts that resolve bad domain names and perform a few investigative steps before calling in the troops. We are going to be using the orchestration functionality in Gravwell to maintain an up-to-date resource. Just because the orchestration engine can run queries, doesn’t mean that is all it can do!
A word of warning, Malwaredomains is a non-profit free-to-use list, but infrastructure isn’t free. Please do not schedule your script to update too often, once every 5 days seems just fine. If you need a more up-to-date list, consider donating to malwaredomains or sponsoring their page. If you are subscribed to a commercial threat feed or are a threat feed provider and want to integrate with Gravwell send us a note at info@gravwell.io.
The first step in preparing our threat feed resource is getting the list, we will be directly downloading a tab delimited text list via an HTTP request, converting it to a CSV, and uploading it as a resource. Using the script in Gravwell’s orchestration engine we will schedule it to run every 5 days. As a result, Gravwell will automatically maintain a DNS threat feed. Here is the script:
var strings = import("strings")
var csv = import("encoding/csv")
dlurl = `http://mirror2.malwaredomains.com/files/domains.txt`
data, err = httpGet(dlurl)
if err != nil {
return err
}
bldr = csv.NewBuilder()
err = bldr.WriteHeaders(["domain", "category", "provider"])
if err != nil {
return err
}
lines = strings.Split(data, "\n")
for line in lines {
line = strings.TrimSpace(line)
if strings.HasPrefix(line, "#") {
continue #skip comments
}
flds = strings.Split(line, "\t")
if len(flds) < 3 {
continue #skip incomplete lines
}
bldr.Write(flds[0:3])
}
bts, err = bldr.Flush()
if err != nil {
return err
}
err = setResource("dnsblacklist", bts)
if err != nil {
return err
}
Use the following schedule string which conforms to the cron specification to schedule the orchestration script to run every five days at midnight:
0 0 */5 * *
NOTE: PLEASE do not hammer the malwaredomains mirrors, they are good people providing a good service for free. Use responsibly.
Now that we have a working blacklist that automatically updates, let’s fire off a few searches that will identify any machines that attempt to resolve domains from our known bad list:
tag=dns json Local Remote Question.Hdr.Name | slice Name[:-1] |
lookup -s -r dnsblacklist Name domain category as cat |
table Name cat Local Remote
Looks like a few of the domains from our threat feed have been hit, but only one resulted in a successful resolution. The host at 10.10.10.59 attempted to resolve a known bad domain a few times, eventually succeeding. We can see the resolved address and pivot into netflow to see the machine actually communicated with that host:
tag=netflow netflow IP==10.10.10.59 Src Dst SrcPort DstPort Bytes |
sum Bytes by Src Dst | fdg -b -v sum Src Dst
We can see that our candidate machine spoke to a few other addresses around the time it attempted to resolve the known bad domain. Given that we have a time range, and some additional hosts, it might be time to take a look at our Windows event logging post and see if we can narrow down exactly what application resolved the address. If it was a binary executing in a non-standard place, the SwiftOnSecurty rules might just catch it.
Now that we have a basic understanding of deploying a Gravwell and CoreDNS configuration, lets go hunt a piece of malware that uses DNS resolution as a form of data exfiltration. Many organizations deploy firewalls, proxies, and security tools designed to control access to the outside world, but DNS almost always makes it out. Malware authors looking to get small bits of information out of a well controlled network can usually rely on DNS.
Using the DNS list we are going to create a script that looks for DNS hits on our blacklist, then attempts to resolve some additional information using other sources before alerting an operator. The goal here is to provide as much context as possible before consuming a humans time. The scripting system allows us to do pretty much whatever we want, including taking corrective action; Gravwell is essentially an orchestration platform as well as an analytics platform. If you are confident in the indicators and data it is entirely possible to kick BobFromAccounting off the network immediately rather than waiting for IT to verify that he was downloading malicious apps again.
Our orchestrated engine is going to begin with a hit on our DNS blacklist, anomaly detection in DNS is extremely hard and requires a great deal of resources. There are DNS threat providers out there and some of them are pretty good. If you have access to a more elaborate DNS threat feed, we are happy to help integrate. For this article we are just going to look for known bad name hits to generate a list of machines that require follow up investigation. The tip query looks like:
tag=dns json Local Remote Question.Hdr.Name Question.A |
slice Name[:-1] |
lookup -s -r dnsblacklist Name domain category as cat |
regex -e Remote "(?P<ip>[\d\.]+):\d+$" |
unique Name, cat, ip
Similar to the previous query we are pulling out DNS resolution requests and removing the trailing dot. We then check the domain name against our threat feed and if there are any matches we clean up the Remote field (this is the machine that made the request). The final module is unique which ensures we only see requests with a unique intersection of DNS name, threat category, and requesting IP; basically a clean list without duplicate requests.
Notice that we are not passing the results into a renderer; we are going to be using the output of this query in a script so we want the raw entry with the attached enumerated values. The raw/text renderers only show you the underlying raw record via the GUI, but when interacting with the renderer via the orchestration engine we get all the enumerated values as well.
Once we have a list of IPs and bad domain names we are going to take a few more steps so that when we finally involve a human operator, they can quickly move on to more complex investigations and/or remediation.
The first query we are going to perform once we get a hit on a malicious lookup is looking for network flows associated with the resolved address. For this example we are going to craft a new query for each resolved address and look for any other machine that hit it. From a security context, you can’t always assume that malware will always perform DNS lookups. If the malware is designed to move laterally and pass information, it might perform a lookup on a sacrificial host and pass the resolved addresses on to other hosts it has infected. If you are only looking at hosts that performed a name resolution, you may miss this lateral movement. We are using netflow to look at network flows, but ipfix, bro, or sflow work too. The base query looks like so:
tag=netflow netflow IP Port Src Dst SrcPort DstPort Bytes |
unique Src Dst Port | table Src Dst SrcPort DstPort Bytes
For our script we craft the query using sprintf so that we can insert the address as a filter:
query = sprintf(“tag=netflow netflow IP==%s Port Src Dst SrcPort
DstPort Bytes | unique Src Dst Port |
table Src Dst SrcPort DstPort Bytes”, addr)
The second follow on query we are going to run will look for any Windows logs which show unexpected network connections to the suspect IP (the IP we resolved). Because we are paranoid administrators and Gravwell licenses are unlimited we can collect everything we want, so we are running the sysmon tool with the swiftonsecurity rule set as well. You can read more about the tools and setting them up with Gravwell in our Windows post.
We are going to be looking for any applications that may have executed in abnormal locations and reached out to the resolved IP. The query looks for EventID 3 from the Microsoft-Windows-Sysmon provider which is the Network Connection event type. We we also extract the Computer, DestinationIp, Image, and User data fields so that we can match them against the target IP and hopefully blame someone. The query looks like so:
tag=windows xml
The constructed query in the script adds a “eval destip==<TARGET IP>” to filter for network connections that match the resolved IP.
Now that we have a few steps and follow on queries we can tie the whole thing together in an orchestration script and schedule it. For this example we are going to run the script every hour and query the last 90 minutes. We add a little overlap so that if any of our sensors have a little bit of clock drift, we will still include them; overlap also helps mitigate the situation where some series of events perfectly span our search Window.
Our orchestration script basically performs the following steps:
If you want to see the entire script checkout the addendum, or just download it.
Once we installed our orchestration script and let it run for a bit, we got an an email indicating that someone hit our DNS threat list. The email contained the following report:
SUBJECT Gravwell Orchestrated Search DNS: 1 Hits
Gravwell Orchestration Engine DNS hits
We saw 1 unique hosts hit the DNS blacklist.
Bad Domain: 10086hyl.com
Timestamp: 2018-07-20T16:59:13.945839875Z
Requesting Host 10.10.10.23:
Reason: malicious
Resolved Address: 104.149.79.79
2 flows to 104.149.79.79 totaling 8930 bytes
10.10.10.23:58743 -> 104.149.79.79:80 1655
104.149.79.79:80 -> 10.10.10.23:58743 7275
Windows Log Hits on 104.149.79.79
Image: C:\Windows\Temp\SuperFuntimeScreensaver.exe Host: AccntPC User: Bob
The report makes it painfully obvious that BobFromAccounting has been downloading screensavers on the accounting PC again. The report tells us that application called SuperFuntimeScreensaver.exe located in C:\Windows\Temp reached out the the address that was resolved from the malicious domain. The application appears to have made an HTTP request and transferred about 7.2KB. Guess it’s time to dig a little deeper.
Our Orchestration script kicked off an email alert letting us know that BobFromAccounting is at it again. The orchestration engine took the first couple of steps for us which means we get to start at step 3, instead of step 1. The first thing you might ask is “what did SuperFuntimeScreensaver do on that HTTP connection”? Well we are in luck, because Gravwell handles binary we have that network session. Let’s kick off another query to see what HTTP requests were made.
tag=pcap packet ipv4.SrcIP==10.10.10.23
ipv4.DstIP==104.149.79.79 tcp.Port==80 tcp.Payload |
regex -e Payload "(?P<method>[A-Z]+)\s(?P<url>\S+)\sHTTP\/" |
table method url
We are going to look at pcap data and use the packet module to filter for packets that match the flows we saw earlier. This allows us to perform some deep packet inspection to pull data out of the body of the packets. For this query we are looking to see what URLs and methods were used. If we had a proxy with logs, we could just as easily pull the method and URL from there.
The results indicate that the application is pulling back a file named config.json from a very suspect directory. And strangely it requests the same URL multiple times in rapid succession. Lets swap the query around a little and see if we can see what was actually retrieved from the server:
tag=pcap packet ipv4.DstIP==10.10.10.23
ipv4.SrcIP==104.149.79.79 tcp.Port==80 tcp.Payload |
strings Payload | table Payload
Looks like the server didn’t have the configuration data the malware was looking for and the application just kept asking. At least the IIS server is wildly out of date, so they have that going for them, which is nice…
So the malware couldn’t pull back its config file, did it do anything else? Let’s go look at those domain queries again, but this time we will widen up our scope a bit. Let’s look at any request with the malicious domain “10086hyl.com” in it:
tag=dns json Local Remote Question.Hdr.Name ~ "10086hyl.com"
Question.A |
table Remote Name A
The results are… um… interesting?
It looks like it got a successful resolution on the root domain, then just kept hammering sub-domains with really long seemingly random strings comprised of letters and numbers. It appears that the sub-domains are encoded data, and encoding that looks suspiciously like base64. As luck would have it, Gravwell has a base64 module; let’s extract the subdomains and decode them to see if anything looks interesting.
tag=dns json Local Remote Question.Hdr.Name ~ "10086hyl.com"
Question.A | unique Name A |
regex -e Name "^(?P<sub>[A-Za-z0-9]+)\." |
base64 -d -t decode sub | table Remote sub decode
WHAT THE HELL!
That malware was scanning my internal network and sending back scan results via DNS queries. It was even pulling back banners! I am guessing that when the application couldn’t get a response it was looking for from the web request, it fell back to using DNS for ex-filtration. BobFromAccounting is going to be getting a very very stern talking to!
For this post we showed how to integrate Gravwell with CoreDNS to provide a performant and high quality DNS server that knows how to directly talk with Gravwell. We then used Gravwell’s orchestration engine to manage a DNS threat feed automatically. Using our DNS data and the self-managing threat feed we found users on our network that were resolving known bad hosts, performed some follow on investigation, and alerted operators, all automatically. Using some follow on queries we leveraged the extremely flexible query system to decode data exfiltration using DNS requests. Unfortunately, Gravwell couldn’t be used to yell at BobFromAccounting, or could it....
Gravwell provides extensive flexibility in how data is ingested, queried, and responded to. Our high performance ingest, query, and storage system allows you to reduce SIEM and analytics costs while the orchestration engine enables rapid response and reduced load on staff. Gravwell is here to help you see more without surprise licensing costs or time intensive setup and data normalization.
If you are interested in seeing the power that Gravwell can provide for your security teams, contact us at info@gravwell.io for more information or visit gravwell trial to request a free trial. Home users can also check out gravwell community edition to get a free Community Edition license.