Containing the Beast: Managing Inter Thread and Process Complexity

Hello paranoids

 Lately I have been investigating Zeus Panda (MD5 82c6a5e05ceec286c79ae978bc746244 or check my repo) which, as one of its features, injects itself into two instances of svchost created by the malware itself. The injected code is then executed using CreateRemoteThread. This is not uncommon and it adds pain to the analysis since once the code is on another process, it is out of the scope of the debugger. While debuggers nowadays support debugging of child processes, managing execution within the same process is way easier and you are less prone to miss something. Another possible approach is to attach to the spawned process with another debugger instance but this approach brings two challenges:

  • New process starts running before you attach which leads to missing part of the execution
  • With injected code, you need to figure out where it starts and while you can check the bytes on IDA and search them with the debugger, you need to pay attention to absolute and relative addressing which changes the bytes (because, well, thew base address is different between the injector and the injected)

 Another source of complexity tends to be the debugging and analysis of multi-threaded code. Lately i have had the pleasure of looking at some samples obfuscated with DotFuscator. Those samples spawned dedicated threads to perform detection of analysis tools. If those tools were found, the thread would kill the process behind the scenes.

 Based on this, i have decided to overview some techniques you can use in order to manage inter thread and process complexity when reversing binaries. While i will focus on WinDbg here, other debuggers such as x64dbg and dnSpy support the ideas i am proposing.

Motivation

 The sample you are analyzing spawns threads and injects shellcode into another vessel processe(s). This raises the following issues:

  • How do i get inside the vessel process without causing any instabilities?
  • How can i have visibility across all executions without letting the malware run free? How can i switch and/or freeze contexts?
  • Assuming the malware does not modify itself much before injection how can i keep adding comments to my current IDA DB without restarting from a clean one?
  • If all i want is to see the shellcode, how can i dump it if it is mapped in another process?
  • Having multiple instances of injected code with different entrypoints, how do i keep track of all of them?

 Execution flows boil down to EIPs, ifs and handles. If you manipulate these variables, you will be able to bend the binary to your will.

Theory

 I am a big believer in static analysis so i use IDA more than the debugger. However, when it comes to shellcode injection and encoding/encryption and due counterparts it is best to see things as a black box. Assuming you need fast results, you are mostly interested in the output.

 Zeus Panda employs a combination of VirtualAllocEx (with svchost’s process handle), WriteProcessMemory and CreateRemoteThread. Some types of malware (e.g. WipBot which you can find on my repo) rely on memory mappings using CreateSection and MapViewOfSection. In any case we can summarize the process of injection in three steps:

  • Allocate RX memory (e.g. VirtualAlloc(Ex), CreateSection)
  • Write shellcode to it
  • Execute it in some way within another process (e.g. jmp, call, CreateRemoteThread)

 Taking Panda as an example, after the unpacking we can spot the following:

Process Infection Routine
Process Infection Routine

 

0:002> dt ole32!LPPROCESS_INFORMATION
Ptr32 +0x000 hProcess : Ptr32 Void
      +0x004 hThread : Ptr32 Void
      +0x008 dwProcessId : Uint4B
      +0x00c dwThreadId : Uint4B

 ecx points to a memory region containing this structure as the result of a call to CreateProcess. The first element of this structure is a handle for the svchost process created previously. Simply put, a handle is an id to obtain an object through the operating system. Handles are widely used throughout Windows APIs but in our case they are relevant because they identify the process on which the shellcode will be written.

 Whenever we talk about handles for processes, there is one special value -1 (or 0xffffffff) which identifies the current process. This means that if you force the malware to take 0xffffffff as input for:

  • VirtualAllocEx
  • WriteProcessMemory
  • CreateRemoteThread
  • MapViewOfSection

the shellcode will never leave the malware and the created thread will fall within the domain of the debugger.

Practice

We start by setting a breakpoint on the call to CreateProcess:

BP on CreateProcessW(svchost.exe)
BP on CreateProcessW(svchost.exe)

We take note of the location of the future LPPROCESS_INFORMATION:

BreakPoint on CreateProcessW(svchost.exe) and Stack Display
BreakPoint on CreateProcessW(svchost.exe) and Stack Display

 

We step over the call and we change the resulting handle:

Changing Svchost Handle
Changing Svchost Handle

 If instead of passing a pointer to the whole structure, your sample passes the handle itself, you just need to change the register value before the call. Next, we set a breakpoint on CreateRemoteThread:

CreateRemoteThread on IDA
CreateRemoteThread on IDA
BP on CreateRemotThread and Stack Display
BP on CreateRemotThread and Stack Display

 

 The entrypoint for the shellcode is 00599861. If we want to dump the memory block containing it we do:

Dumping Section Containing Shellcode
Dumping Section Containing Shellcode

 

 Judging by the simple algorithm used to inject svchost, i can tell the malware did not self-modify to the point where it is better to start from a clean IDA DB with the new dump. I want to reuse the one i already have. However, we need to find where the entrypoint lies within our old IDA. To that end, we can:

  1. Use WinDbg memory window and set the offset to 00599861
  2. Copy the first couple of bytes
  3. Search for them on your old IDA DB
Disassembling 599861
Disassembling 599861
Sequence of Bytes for Entrypoint (first svchost.exe)
Sequence of Bytes for Entrypoint (first svchost.exe)
Searching first bytes on IDA
Searching first bytes on IDA
Entrypoint for first svchost
Entrypoint for first svchost

 Note: Make sure the chosen bytes from the shellcode have no absolute addressing. As i referred at the beginning, the base address for the injected shellcode is not the same as the original malware and it may have been adapted to reflect that.

The shellcode injected within the second svchost is the same as here but there is one dword within each piece of shellcode that dictates how the code branches within each svchost.

 We have not spawned our thread yet. The question now is: How to stop the thread before it runs? One way would be to force CreateRemoteThread to start the thread in suspended state. This could be achieved by modifying dwCreationFlags to CREATE_SUSPENDED. However, we would then need to call ResumeThread and i prefer the thread to run naturally. We know where the entrypoint is and when you set a breakpoint on WinDbg, by default, it applies the breakpoint to all threads. Since CreateRemoteThread will create a thread in the context of the malware process, the execution will stop there:

Threads Before
Threads Before

 Once the breakpoint is set and you let the code run, the execution will transition to the new thread and stop:

Shellcode Entrypoint
Shellcode Entrypoint
Threads After
Threads After

 

 There is a tricky part however. As you may know, a process has multiple threads running in parallel. Also, when you create a thread, that thread is not created instantaneously. You can actually run a couple of instructions and the debugger will still show no traces of the thread. But as soon as it switches the context to the new thread, the other threads will keep running. There are multiple ways to approach this:

  1. Set a breakpoint a couple of instructions after the CreateRemoteThread. Let the the malware run and it should switch to the new context while halting the execution of the first thread. Due to the uncertainties of the OS (e.g. time taken for threads to be spawned, management of threads by the debugger) this approach is not always viable.
  2. Freeze the current thread and let the malware run (e.g. g on WinDBg). This technique tends to be more foolproof.
Freezing Main Thread
Freezing Main Thread

 I have chosen the above picture because it depicts an uncommon case with WinDbg. In this case, i have taken the first approach (i.e. set a breakpoint a couple of instructions after CreateRemoteThread) but the context did not switch right away. However the new thread was created in frozen state.

 Once you freeze the main thread you can run the malware normally and the execution will switch to the new thread. Don’t forget to set the breakpoint on the new thread entrypoint as previously referred!

 From here you can run the shellcode normally. If you want to keep executing the main function but don’t want the shellcode to run freely you can freeze the shellcode thread and switch to the main thread as follows:

Switching to Main Thread
Switching to Main Thread

 Notice that i put a breakpoint on the old thread to make sure i did not accidentally let it run free. In theory you would not need to go this far. A simple switch would land you on the last instruction executed before you switched threads.

 Threads may wait by other threads to return using calls to WaitForSingleObject and the likes. It is a good idea to bypass this if you intend to keep the execution going. Such bypass is out of scope of this post.

Now for the second svchost we rinse and repeat:

  1. Let the main thread execute until svchost is created
  2. Modify new process handle to 0xffffffff
  3. Let it run until the CreateRemoteThread
  4. Take note of the entry point, use !address [ENTRYPOINT_ADDRESS] to locate the memory region and the size. Dump everything with .writemem.
  5. For those who want to reuse the old IDA DB (works well if the malware does not change itself much): check the first couple of bytes for the new function and search it on your old DB. Note: Pay attention to the bytes you are copying. If you are copying call and jump bytes with effective addresses, rather than relative, you will not find a match on your old IDA DB since the base address for the shellcode may be different from the base address for the main malware.
  6. Step over CreateRemoteThread and freeze the main thread. Run the malware (e.g. g on WinDbg). You will land on the shellcode as expected.

 Another simple alternative for threads is to set the eip to the entrypoint of the threads. Bear in mind that the threat may be designed to work with arguments. Make sure you pass them by modifying the stack accordingly.

Conclusions

 On this post i have tried to address the complexities of parallel executions involving threads and injected processes. While there is a multitude of ways to manage such cases, i find it easier to confine the malware to its own process. This allows you to switch and monitor execution flows from the comfort of your current debugging session. There are cases where threads cooperate. Recently i have analyzed an interesting sample that used aa[.]com as C2 through domain fronting. The malware contained an encoded PE file embedded which was sent through a pipe to another thread that was responsible for decoding it and call its entrypoint. Such cases are trickier but still manageable as long as threads are frozen.

Stay safe 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s