Cowrie: Designing SSH and Telnet Proxies (and Dabbling with Qemu)

By Guilherme Borges

August 28, 2019

These past months I’ve been working in the Google Summer of Code program with The Honeynet Project, in a project called Cowrie, about which I’ve talked in a previous post. Cowrie, in turn, is maintained by Michel Oosterhof, with whom I really had the pleasure of working these past months.

Whew, that was a mouthful of links, but I’ve got my references done with for now… I have talked about the experience in the official report, so this post will focus a bit more on the technical challenges I faced and my main takeaways, as well as serving to showcase the new features that have been added.

Prelude

I’ve found Cowrie by chance while searching more about honeypots on the Internet. Security has always been one of my favourite subjects, and the main topic of my MSc thesis, so the idea of directly dealing with attackers always seemed like an attractive idea.

Cowrie is (was?) a medium-interaction honeypot written in Python and Twisted. Mid-interaction, as it provides some commands and realistic interaction with an attacker, but not the full functionality that you’d expect from a real system, as it emulates only some Linux commands (it would be impossible to manually program every Linux command again!). I’d say it can now be considered high-interaction: a real system sits in the backend, and Cowrie is a proxy that logs everything, but executes commands on a real system, thus producing a realistic output.¹

Since Cowrie supports both SSH and Telnet in the medium-interaction (known as shell) version, we set out to do two proxies as well, one for each protocol. Then there was the question of what to use to support our backend… As I discussed previously, one of the things attackers (or their bots, at least) try to do is create TCP tunnels with your machine: this allows them to download any kind of content and be untraceable. This means the records would show YOU! might have downloaded something from an IP know to host illegal content, while the attackers just used your unknown and innocent IP… So our main worry was to provide security in the backend machine, and that was done by including a backend pool in Cowrie, which provides virtual machines for your attackers to use, while also ensuring network lockdown and isolation from you real machines.

Designing the SSH Proxy

I cannot start this section without thanking Thomas Nicholson, the developer of HonSSH, an old SSH proxy written in the same framework as Cowrie. It’s a dead project by now, in almost-deprecated Python, and lacks Cowrie’s logging and other features… But oh was it important to get all the background I needed! Twisted is an event-driven framework for Python, particularly specialised in asynchronous network protocols. Once you get the hang of it, its design mostly does make sense, but it’s a fairly steep curve to master, and its (lack of) clear documentation doesn’t help. I digress. HonSSH was a perfect example of learning by example for me. After adapting it and getting it working with Cowrie, and adding some logging, everything was ready to work with any backend, while following the SSH protocol.

It wasn’t as simple as proxying everything back-and-forth and logging in the meantime, though. If we forward everything, stuff like authentication, for example, becomes quite static, as attackers can only use the backend’s login data to get in, and that’s just not very interesting in an honeypot. Cowrie’s “shell” provides a simple-but-effective mechanism, a config file listing allowed usernames and passwords (supporting wildcards). Obviously, Cowrie already implemented the userauth SSH service for its shell version. We reused that service as-is: an attacker performs the emulated protocol with the proxy (and then we perform our simple userauth with the backend); after both are done, we plug front and back together, and then yes, we can forward stuff back-and-forth. After that, work focused mainly on logging stuff correctly (retro-compatible with the “shell” version) and fixing some bugs here and there.

Second Proxy, Not Really the Same as the First

With the SSH proxy becoming more stable, it was time to take a look at Telnet. Unfortunately there wasn’t any example of a Telnet proxy written in Twisted laying around on abandoned git repo, so this had to be done mostly from scratch. Fortunately, however, I already had some experience with the previous proxy, so I could at least apply the same theory to the new module: a “frontend” transport to handle communication with the attacker, a “backend” transport to connect with the backend, and an “handler” in between (SSH also has a handler there, mostly used to keep references to open channels and whatnot).

Getting it to work without any processing was not too hard (I’ll use some Twisted language here): provide a server transport to receive connections, wait for a connectionMade with an attacker, then create a client transport to the backend via a TCP client endpoint. Messages coming to one transport via dataReceived trigger a write to the other transport and vice-versa. Simply put, receive from one end, send to the other: a dumb proxy. With that technical part out of the way, it was time to actually interpret the protocol, and two challenges arose: authentication and “the strange hex”.

Telnet Auth

Authentication was not as simple as with SSH, where you get messages and message codes indicating where you are in the protocol. Telnet is just a protocol to send some lines over, and there’s no metadata once the initial handshake is performed – and its after the handshake that authentication is performed. And some systems show you a login and password prompt, others try to first show the password prompt only, assuming the username you want to use is your machine’s one, sent by you in the handshake (looking at you, Ubuntu!)… It’s a bit of a mess. We ended up deciding on using regex prompts to analyse traffic and sort of build a stateful Telnet interpreter (the “handler” component) that starts by looking at login and password prompts, capturing and spoofing client details to the backend and, once auth is completed, stops listening for it, lest some prompt from the backend might be captured again as a password prompt… It’s not as pretty as I’d like it to be, but seems to do the job and I think it’s the best we can achieve with the old-fashioned protocol.

Decoding the Hex Files Logs

The strange hex is another story. It comes mostly from the fact that Telnet is not really standardised, and every implementation does something a bit different. You see, I’d been using \r\x00 to detect when a line of output from the Ubuntu 18 server ended. When I told Michel to test the proxy, nothing seemed to be working well. He was using a different backend, and his used \r\n - actually seemed more natural, but we had to account for both cases, and so we did by extending the “handler” to consider them. Right now we still get some weird hex in the logs, and that’s probably related to the same problem, this time with some weirder client that’s not OS X nor Ubuntu. Trouble is, we need to find out which client is being used to successfully reproduce the messages and further extend the “handler”… Rare cases for now, but a nice-to-have in the future.

Now Looking at the Backend Part

Things were taking shape and looking good, but Cowrie is a production project, and we needed to be able to actually use it in the wild. Setting up a machine to safely act as a backend takes a lot of time, and that would probably mean not much people would use the proxy, or they would use it unsafely. Enter the backend pool and its locked-down VMs, easily deployable and (to our best abilities) as safe as Cowrie’s “shell” mode.

My first proposal for this would be to use Docker, but Michel suggested using Qemu. The advantage is significant: Qemu can run basically anything, while in Docker you’re limited to existing images. And with Qemu you can also emulate other architectures, enabling, for example, to emulate IoT devices, which are among the most insecure ones laying around.² This was my first contact with Qemu at all, but I must say you will eventually get the hang of it: once you know how to create images and run VM (or guests), it’s basically rinse-and-repeat (making it work across different host environments is a whole different story, though).

In Cowrie we wanted to be able to manage VMs automatically, for which Qemu does not provide many features. I started looking for Python-based APIs, and libvirt became the obvious choice: well-maintained, cross-platform, and it abstracts Qemu’s annoyingly long commands into nice XML config files. And, best of all, it includes network filters: the ability to restrict guest traffic by protocol and ports with minimal configuration effort! It has all in one package, so I got around to explore and use it in Cowrie.

Besides learning Qemu and more about virtualisation, this part was a nice throwback into Concurrency and Parallelism classes, as I had to implement a producer-consumer buffer to handle VMs. We had to use locking to ensure VMs are not requested by two attackers at the same time, for example. It wasn’t too hard a task, so we went with simple Locks, as we aren’t expect thousands of VMs at a time after all - but is still another dimension to consider in this project. Besides that, we had to deal with some quirks from the VMs, such as them dying seemly randomly at times… A problem whose cause we can’t pinpoint exactly, but to which we made a health check mechanism to at least mitigate it.

There might be a hint or too of OOP scattered around too, as I tried to separate the Pool Server from the Backend Service: the former receives requests for an abstract VM, the later instantiates it via libvirt. In theory, you can implement a Docker Backend Service, for example, and plug it in to the Pool Server, and have it working in the same way. Well, it’s still unchartered territory, but I like to leave things tidy and compartmentalised, so I’m happy with the structure as it is.

Ending with Satisfaction

All of this surmounted to three PRs, one for each “big” module (and some minor ones along the way). I’m satisfied with the result but, truth be told, there’s still stuff to improve, in particular with the backend pool. Working with Cowrie has been a fun experience where I gained a lot of knowledge, from network protocols (SSH wire protocol) and their intricacies (Telnet is tricky) to hypervisors and virtual machines (Qemu was totally new for me)! I also learned a new framework (Python Twisted) which, although twisted at times, does make you think in a new way and is really a challenge.

Undoubtedly our prototype will allow us to better understand and prevent hacker attacks against unprotected machines, by allowing attackers to do their thing with a “real” machine, instead of being constrained by a subset of bash commands. All in all, this was a great experience and I could not have possibly learned as much as I did without it. Let’s see how the new modules will be used now, and what needs to be fixed and improved. And, more importantly, what new intel can we gather from it.

As resources I’m linking the official GSoC report, together with the three main documentation pages / tutorials I wrote, now on the official Cowrie docs:

Any questions you might have, feel free to contact me via email or Twitter. I’m also around in the official Cowrie Slack, where some discussion about the proxy, or Cowrie in general, is always taking place.

Of course, the traditional emulated shell is still there. Running a set of VMs is not the lightest task for a Raspberry Pi, for example, so I’d recommend you keep using the shell if that’s your kind of device. ↩︎
Or, as my academic adviser referred to them, Internet of Hackable Things. ↩︎