Intro
Part of the Cyberforce competition I covered last week was the anomaly section, with ~60 CTF-style challenges to be solved. Admittedly, some of them were a lot more work than they were worth, but this year, many of them were weighted towards reverse engineering and some forensics. I didn’t get a lot of time to really work on these during the event as I was bogged down in incident response, but here are some writeups, mostly for me, if we’re being honest.
What’s Up Bro? (formerly brah)
Anomaly 13
Description
`We have noticed some suspicious activity leaving a particular machine in the network. We have isolated the machine and recorded its behavior over the course of 20 minutes or so. Either way it is not a full day so hopefully it was enough to get what we need for analysis.
We are worried it is exfiltrating some data over a C2 channel but we have not been able to pinpoint the channel. Our only indicator is some shady website online that keeps showing data that has been leaked from our network and specifically data on that machine!
Author: @pascal_0x90 (LLNL - Nate)
Challenge
We’re only given the README.md with the description and the packet capture, so we can pop the packet capture right into Wireshark. Since the packet capture is ~14 MB, we can use the statistics tab to get a gist of what’s at play here, and we find a lot of TLS traffic.
In order to decrypt TLS, we would somehow need to get access to the session keys, which, unless they were transmitted in cleartext at some point over HTTP, we’re not getting them. The next most frequent protocol is DNS, which is actually quite interesting given the description.
Solution
Unit 42 has a very solid blog explaining exactly how this technique works, and you can see examples of it documented with Sliver or Cobalt Strike. The core idea is that if I have a DNS server and keep track of the DNS queries made to the server, in this case for A records, I can parse data transmitted in the subdomain as arbitrary data, as opposed to actual DNS information. This is not the only way DNS can be leveraged for covert operations. For instance, a TXT record could be used to store payloads that could then be used in PowerShell payloads (source: Alh4zr3d, John Hammond).
If we filter the packet capture for DNS, we see this in action.
The goal now is to recover the data sent through here. Rather than copy this out by hand, we can use pyshark
like we did with corCTF 2022: whack-a-frog to automate this process.
One thing to note is the pkt.dns.qry_name not in domains
in the if statement; DNS uses UDP, which doesn’t exactly prioritize the continuity of packets, and instead focuses on speed. As a result, UDP packets can get transmitted multiple times. This is not a perfect solution, as the same domain may have been used in two different places with how the plaintext might have been split up, but it ends up working fine here.
If we run the script, we see some base64:
We can add an extra line to then decode the base64, and find the flag:
flag: flag{wh4t5_up_br0_w3r3_y0u_ch0pp1n_l0g5?}
EmojiWare
Anomaly 14
Description
This is an entire program written in emojis... Yes. You heard right. Emojis! The program that is in this challenge is an emulator that can interpret the emojis and provide emulation support. The code created acts like all the files on the computer have been encrypted. The goal of this challenge? Get the program to print out the encrypted flag.
Author: @pascal_0x90 (LLNL - Nate)
Challenge
We’re given a few files to work with here.
The README.txt
contained the description, so that’s unimportant. The emojis.out
is exactly what it sounds like, it’s a lot of emojis.
The emulator
is an ELF executable that seems like the point of interest here, but we’ll come back to that shortly.
The Dockerfile
is an interesting addition, since emulator
isn’t built on some esoteric architecture. Reading it, we get an interesting clue.
It doesn’t make sense to explicitly include python3.8-dev
without actual python unless (a) this is a distraction or (b) there’s some Python-magic going on here. If we start running through our basic reversing checks, something immediately jumps out at us:
Solution
Notice the references to Python? Unless the author was trying to make it seem as if this was written with Python, there is very real likelihood that this binary was produced by using some library to compile Python to executable code. We can test this by pointing pyinstxtractor at it.
Our hypothesis was right, and we can now dig into the emulator_extracted/
directory to find emulator.pyc
which has the Python byte code in it (i.e. a compiled Python file, which is what the interpreter actually parses).
I’m in the process of writing something on reversing Python malware, so I’ll save an in-depth discussion of the topic for then, but pycdc
is nice in that we can clone the repo, build the project, and just point the decompiler at the .pyc
file. The build instructions aren’t entirely clear from the repo, but you can just set up the Makefile using cmake
and go from there.
I’ll move the emulator.pyc
out to a different directory, and then point pycdc
at it to obtain the original source code.
We get a couple of warnings, but it seems like we get most of the source code, which we can put in a separate .py
file for easier viewing. The file itself comes out to be ~400 lines, so I might put it in a gist or on GitHub later, so we’ll only be looking at the most relevant segments.
We’ve solved a challenge similar in concept to this before here, HTB’s Alien Saboteur challenge was a nice introduction to VM reversing challenges, and I highly recommend you check that out if you’re unfamiliar with the idea. Luckily for us here, the emulator is written in Python, so more of the work is on parsing the emojis.out
than actually reversing the VM. Lines 36 - 72 give us what each of the emojis mean.
The emulator, as expected, tells us exactly how to interpret these opcodes in the interp_instr()
function.
An interesting thing to note is the Processor()
class that this is all coming from seems to have code to debug the registers.
However, it seems like these functions don’t get used in the actual execution of the program when we try to run the emulator (which I totally forgot to check until now).
From here, there’s a couple of different ways to go about this:
- Write a disassembler like we did for Alien Saboteur
- Clean up the
emulator.py
we got frompycdc
and use that to debug the registers - Debug
emulator
with GDB and find the right things to breakpoint on
The third is definitely the worst way to go about this, since it’s still using Python bytecode in the ELF, so we’re not only debugging the emoji VM, but also the underlying Python VM used to make the emulator. As for the other two options, I actually originally tried to do option 1, but when I did, I got 54873 instructions. For reference, the other VM challenge I did only had < 1000. For sanity’s sake (although the brain worms want me to look at the assembly), we’re reconstructing the emulator.
As good as pycdc
is, it did not give us a perfect decompile. For one, there’s various continue
statements strewn across the program in weird spots, and I also have ambiguous things happening:
Part of the reason Python bytecode reversing is so funky is that with every new release of Python, the way control flow works is slightly different. pycdc
’s merits come from the fact it’s written in C++ and generally does not care for the version up until the more recent ones. However, it seems as though we might need to use a Python tool instead, and I used decompyle3, because pycdc
told us the version was 3.8, and that’s what decompyle3
was made for.
In order to use decompyle3
, you’ll need an install of Python 3.8. The easiest solution would probably be using pyenv to manage the versions you have installed, but I used Docker. I pulled down the Python 3.8.5 Docker container, mounted my current directory using a volume, and then entered the container.
Note, when using volumes, make sure your path has no weird characters in it or spaces, you’ll get an error like this and be confused until you remember why!
This emulator.py
is already way better if you look at the source code. Let’s try running it and see what happens to make sure it works.
Ah. Very cool! This part took a minute to figure out, but the problem function is here:
The problem is that the REGS_INV
dictionary doesn’t actually invert the keys and values, and the else
statement here isn’t really necessary. We can fix that pretty quickly though:
If we try it again, we can safely say we’ve restored the emulator!
So what’s changed? Now that we have the Python source code, we can change things how ever we want! In particular, we can change the code that interprets instructions to print out exactly what’s happening. Since this is a crackme, we can start by checking any uses of the CMP
instruction.
If we try running the program now, we get a debug statement.
There are two things to be gleaned from this:
- Despite submitting an 8 character entry, we only had one comparison. This could either be because of a length check, or it is checking one byte at a time and exiting if it’s wrong.
- 69 (nice) and 119 are both decimal values, neither of which correspond to the first character. 119 could be the “w” in the middle of the password but that’s just weird.
If we make a hypothesis and assume that there’s some encryption going on, we could try to hook the XOR opcode as well and see what happens.
Turns out there’s a lot of XORs, ~100 to be exact. We also see that there’s an alternating pattern of XORs with 69 then 12, which we can only assume to be the key (remember that this is decimal!). I’ll use the cyclic
tool from pwntools to generate a string of 100 characters to see if I can pass the length check.
It still didn’t give us additional CMP checks, but at least we can confirm that the XORs are being applied to the password. At this point, you could write a pwntools script to bruteforce the password, or we could try to “dump the memory” at the password check. I can modify the execvm()
function as follows:
Now, when I hit CTRL+C at the password prompt, I see this.
Interestingly, this is only ~32 bytes. Still, we can pull these numbers out, apply the XOR key, and see what the plaintext is.
If I copy this result three times into the password prompt 4 times (to reach the 100 characters), we get the flag.
flag: flag{3m0j1s_L1ght_Up_My_D4y}
WATT’s The Story Morning Glory?
Anomaly 44 - 47
Description
You are a seasoned QA Engineer at DER8.9 testing a new system named the 'SmartMeter Workstation for Administration of Telemetric Technologies (WATT) Control and Maintenance Interface' prior to deployment for business customers and residential field technicians. Before this software goes live, it's crucial to ensure that it is not only free from defects but also securely designed and implemented. Thoroughly test the application and identify all functional and security issues, spotting any vulnerabilities and insecure code practices. Retrieve a set of 4 associated flags for each insecure coding practice / vulnerability as a proof of discovery.
Author: @ANL - Jocelyn
To any Cyberforce competitor, I sincerely apologize for the existence of this challenge. I am friends with the challenge author outside of this event and I am the one who mentioned Nim when that was hot (before I realized writing C, or even better, PIC shellcode, was God’s way).
Solution 1: Hardcoded Key
We’re given a file called smartmeter_management_interface.exe
, some additional information in Question.md
that just adds some context that isn’t necessarily worth mentioning here. If I run the binary, we’re immediately hit with roadblock #1.
Ah, a crackme. Of course.
Before jumping to disassembling the binary, let’s briefly take a look at what PEStudio tells us. There are no immediately weird looking imports (e.g. CreateRemoteThread
, Nt*
), but as soon as we look at strings, we know what were up against.
If it isn’t my old enemy Nim. For the uninitiated, Nim is a language that was very hyped up last year in the red teaming space as something extremely evasive and hard to reverse engineer, while also embracing Python-like simplicity in syntax, yet supporting memory management and macros like languages like Rust or C++ do. An in-depth discussion of the merits of using Nim and other esolangs for malware development is beyond the scope of this blog, but the key factor we’ll be dealing with is the reverse engineering bit.
Nim compiles to C, and then uses a compiler for C to finish it off, meaning symbols and functions get seriously garbled. It’s still very possible to work through it, it’s just a pain. For languages like Nim, I prefer using Cutter over Ghidra. Once the binary is loaded into Cutter, we can start by looking for the main function. If we use the side bar to search for main
, we’re greeted with 6 different “mains”.
For the purposes of reverse engineering, we can usually jump right to NimMainModule
. Keeping the window open in graph view, we have a little bit of noise from what Nim does at an assembly level, but we can stay focused if we just look for calls to other symbols that look like functions. Eventually, we should find main__smartmeter95management95interface_528
, which is where we can actually see assembly that corresponds to what we saw with the execution. Let’s zoom into the first block of assembly here:
If you’ve been following along, or you see this mess, you begin to understand why looking at Nim can be challenging- there’s just a lot of stuff that you don’t need to be looking at 90% of the time. However, at 0x14003826a
, we see a call to what looks like the function that prints the main banner. At 0x1400382ad
, we have another call, this time to the accessMaintenanceInterface()
function. Looking at that function, this bit of assembly jumps out at me.
I don’t know the exact calling convention at play here, but if I had to guess, those data.1400...
addresses are being passed into the decodeBase64()
function. Following those, we get two base64 strings: Y2VhZTA5YzM5YTRiMjczOQ==
and ZDVlMDMwODBmNDI2YzkyMA==
. We can decode them to get the values ceae09c39a4b2739
and d5e03080f426c920
, respectively. Concatenating these and decoding as hex gives random bytes, so we’re still not entirely sure how this gets used. Later on in this function, however, we see the following:
That getMD5
suggests that the user’s input is actually being compared via hashing, so one might guess that the base64 strings encode the hash. We can try both ways, and using Crackstation, we eventually find that ceae09c39a4b2739d5e03080f426c920
corresponds to neverhere
.
We can try this in the terminal and see that it works.
flag: neverhere
Solution 2: Data Deletion
Finding the other flags is a little bit of a challenge, as we don’t really have direction for what to do other than look for vulnerabilities. However, we can do a little bit of metagaming and look for interesting strings, and we find the following.
Since I am privy to the order of the flags, we’ll start with the “insecure data deletion”. First, let’s try to look at this in the program. We have a few options related to configuration management:
- (3) Add Configuration
- (4) Delete Configuration
- (5) Update Configuration
- (6) Display Configuration
- (7) Process Configuration
- (12) Restore Configurations
If we display configurations, we see what’s already loaded.
Let’s try to delete it and see what happens.
Well that was easy. Based on the string we found, let’s try to restore it.
Well that was even easier. Moral of the story, when data gets deleted, make sure it actually gets deleted and wiped from memory. We could dig into why this is happening by looking at the assembly, but this post is long enough as is and we have two more to get through. I might make a follow up post later to dive even deeper, but I’ll be honest, I’m too tired to dig into this right now.
flag: DONT-LOOK-BACK-AT-DELETED-DATA-I-HEARD-YOU-SAY
Solution 3: JSON Deserialization
Another one of the strings had to do with deserialization, and we have two options that would be related to this.
- (10) Export Configurations as JSON
- (11) Import Configurations as JSON
If we try to call (10), we get this:
It seems like we also have the option to submit our own data. I can submit a made up configuration and it seems like it goes through no problem.
Looks like we can’t submit arbitrary keys and values though:
If that’s the case, it seems like we need to find what the possible keys are. We have a couple of functions to look at as far as the Nim code goes: exportConfigurations()
, importConfigurations()
, processConfiguration()
, updateConfiguration()
. After a long winded journey of exploring the various functions, first updateConfiguration()
then importConfigurations()
, we find the following assembly.
That hasKey__pureZjson_3212
is particularly interesting, considering there’s something going on with data.14004a2a0
before it. If I follow that variable, the nearest string in Cutter is isAdmin
, which absolutely looks like a key. We can also see a later call to getBool
, which may imply the data type to go with the isAdmin
key is a boolean. All together, we can try injecting some JSON.
Enter your choice:
11
Provide JSON data for configurations:
[{"isAdmin":true}]
------------------------------------------------------------
IMPORT CONFIGURATION
------------------------------------------------------------
ADMIN ACCESS GRANTED || INSECURE DESERIALIZATION FLAG #4: WOO-HOO-AND-IM-INSECURE-WITH-DESERIALIZATION
DER8.9 SmartMeter WATT Control and Maintenance Interface
Despite the fact that we’re not using something like ysoserial to get RCE, this is still deserialization! The JSON gets loaded into the program as some kind of dictionary structure, which is how it’s checking for keys. Since there’s no checks on what we can submit, that malicious config gets evaluated and injects the admin condition.
flag: WOO-HOO-AND-IM-INSECURE-WITH-DESERIALIZATION
Solution 4: IDOR
Out last challenge has to do with an insecure direct object reference. We can use the success string to find exactly the code block we want to get to. One thing I learned while solving this is that you can’t directly check for X-Refs from the string, you want to scroll up (at least in Cutter) for the data.XXXXXXX
reference and use that.
We’re inside the processConfiguration()
function, and the control flow graph is a little bit more complicated than we might want to look at statically. Let’s take a look at what “processing” a configuration does.
While attempting to solve this one, I actually ended up reverse engineering the entire algorithm to go from configuration to processed string. It didn’t help at all, but you can see from when we looked at the default configuration, the indices are all less than 26, and the resulting string is entirely alphabetic. If you look at the strings, you find the string ZYXWVUTSRQPONMLKJIHGFEDCBA-123456#
, which looks like a lookup array. For instance, the first number in the default config is 5, and U has index 5 (we’re starting from Z = 0, sorry Lua devs).
Although figuring this out was fun, the solution is actually way simpler than that. Knowing what our end goal is, we can trace stuff back through the CFG to find what conditions are necessary to get to our end goal. At one point, we find this block:
The data in data.140048a80
is @5,7,0,2,24,7,14,16,5,14,16,19,2,23,6,8,19
, which is what the default config was. Following this is a call to eqStrings
which probably does what it says it does, checks if two strings are equal. If we try submitting a new config like this, it doesn’t look like anything changes.
There might be an additional check going on. In the block immediately before the eqStrings
, we see another interesting comparison.
Although this is Nim and there’s a bunch of random noise that is happening, the comparison at 0x140035ec4
is such an oddly specific number. We could probably go backwards to confirm that this is a desired configuration ID, but there’s no harm in trying. I’ll restart the program and try again.
And that’s the flag! In most examples of IDOR, it’s usually about accessing information you shouldn’t have access to by changing a parameter, usually an index. While this might not be that, I would still call it an IDOR in the sense that no normal user should just be able to call the debug mode by calling a specific ID (after all, we did find an admin attribute). It’s mostly a problem of hard coded keys, but I don’t think calling this IDOR is extremely wrong.
flag: INSECURE-REFS-LIKE-INSECURE-PROGRAMS-DO
Conclusion
This year’s Cyberforce main event had way more difficult challenges than previous years, and while some of them were extremely stupid (looking at you Ste-what-graphy), the progress the anomaly team has made over the years has been great. I still remember back in 2020 and 2021 where you could cheese half of the reversing and steg challenges by doing strings binary | grep flag
. I wish I was able to solve these during the event, but got extremely bogged down in incident response, but I appreciate the work nonetheless.
If you’re interested in doing some of these challenges yourself (and are a US collegiate student), Cyberforce has some more events coming up that might be worth checking out. I’m not allowed back after (allegedly) stealing the agenda but that’s besides the point :p
Until next time! :D