This year, Gravwell once again participated in the SCinet effort at the ACM/IEEE Supercomputing conference. As in 2018 and 2021, we deployed a Gravwell cluster to aggregate logs and sensor data from the world's fastest network. Gravwell employee John Floren (that's me!) joined the Network Security team to hunt for malicious activity and monitor performance for the thousands of exhibitors and attendees.
In previous years, the Gravwell cluster ran remotely on the Texas Advanced Computing Center's infrastructure, but this year we were on-premises with nodes provided by Cornelis Networks! Although TACC worked great in the past, it was really cool to see Gravwell actually in the rack on the NOC stage!
We ingested several terabytes of data over the course of the event. The majority was sensor data from Corelight, Juniper, and Palo Alto devices, which were listening to network taps and generating logs about traffic. Much smaller but equally important were the syslogs aggregated from the various systems making up the SCinet infrastructure (one of our first actions was notifying another team that one of their servers had been complaining about a full disk since the first day).
The Corelight data was routed into Gravwell via the Simple Relay ingester. We configured the Corelight devices to export their logs via JSON over direct TCP connections (the most efficient way, in our experience) and then used our Corelight preprocessor to convert the JSON to the more traditional TSV Zeek format. Once we had it in TSV, our existing Zeek kit worked great and gave us super efficient overviews of the network traffic. Similarly, we were able to deploy our Palo Alto kit and get immediate value from the logs the Palo Alto device was sending us!
Although there were thousands of attendees, it seems that people in HPC are relatively well-behaved. We set up automated alerts (using Gravwell Flows) to warn us about potentially malicious behavior, mainly focusing on people doing bad things from within SCinet. The most common result? Badly-configured automated SSH, reaching out to one or two servers every few minutes but failing to authenticate. When combined with an occasional manual SSH session, it strongly resembles a brute-forcer who eventually succeeded!
We also found a few devices with unauthenticated Telnet or SSH ports open, placed on the exhibitor network by people who weren't used to having their systems exposed to the entire Internet! We kept an eye out for successful telnet connections from the outside in particular; while it doesn't seem that anyone's systems actually got compromised, we did walk over to one booth and unplug their default-configuration network switch before somebody outside started messing with the passwordless configuration interface via telnet.
SCinet always has IPv6 enabled for the wired networks (exhibitors, etc.) but this year we also turned it on for the wireless clients. Usage was good! We saw a solid distribution of DNS across both protocols, with about 200GB of IPv4 and 160GB of IPv6 DNS over an approximately 4-day period. Regular web traffic (HTTP and HTTPS) is still pretty old-fashioned, with about 38TB of traffic occurring over IPv4 and only 8TB going across IPv6. However, we also saw a lot of QUIC traffic (UDP port 443) this year, and that was evenly distributed: 3.18TB IPv4, 3.55TB IPv6.
This was our third year supporting SCinet. Every year we get the opportunity to put Gravwell through its paces, standing up a cluster in a short period of time (and tuning it fast, because pretty soon we're getting real data!), ingesting a variety of logs at high volume, and then running a ton of queries from lots of simultaneous users. After SC18, we came back and implemented query acceleration based on lessons learned. After SC21, we knew we needed better ways to automate things, which led to the direct query API, API tokens, and the Flows system. We're still absorbing what we've learned from SC22; we definitely found some bugs, and we have some ideas for smoothing rough edges on certain use cases.
SCinet is a fun, frustrating, educational, unique experience. We stand up the world's fastest network, and then by the time the actual conference starts, we're already planning how to tear it down. You get tossed into the deep end with a ton of disparate data flowing in, from unfamiliar devices on an unfamiliar network, and you have to try and stitch it all together to get some insight into just what the heck is happening on the network. The whole team is poking and prodding Gravwell in ways we the developers never expected, both good and bad. It's an experience you'll never find anywhere else, and although it can sometimes be overwhelming, we're looking forward to next year!
Monitor your side hustle project or avoid expensive tools and try Gravwell for free at your place of work.
Then once you installed Gravwell, if you have questions or a suggestion, or just want to share what you discovered? Come join us in our community discord.