Outsmarting Advanced Packers: VMProtect – The Service That is Not of so Much Service

Hello paranoids

Recently,  I came across an executable and a dropped dll protected by VMProtect:

DIE VMProtect

 This turned out to be virtualized VMProtect which is never good news. I will exemplify the trick using the dll present on VT.

The Idea

 When loading the sample on IDA, you will get a couple of warnings about the dll which seems malformed. Load it any way and check the entrypoints/exports:

DLL Entrypoints and Exports

So we have a bunch of exports marked as D and the interesting entrypoints marked as f. Looking at the exported functions we see (all clean and no code):

DLL Exports
DLL Exports

 Since the TLS and DllEntryPoint come before ServiceMain being called they must unpack the malicious code and fill the memory with the exported functions. This represents a big advantage from the start because we could have a case where the whole ServiceMain stub contained the VM code which would make things way harder. So this implies we must run TLS and DllEntryPoint (optionally) first.

 I find Windbg scripting a bit of a pain (not talking about the new JS-driven scripting) but there are two lines with old-school debugging language that i find quite useful (ln allows you to see the where the instructions fall and the sleep is necessary as a delay between t’s):

  • Stops as soon as it is outside the boundaries of the malware (good to catch API calls or jumps to memory allocated outside the binary):
.for (;(@rip ≤ [MALICIOUS_DLL_ADDRESS_UPPER_BOUND]) & (@rip ≥ [MALICIOUS_DLL_ADDRESS_LOWER_BOUND]);){ ln rip; t; .sleep 1; }
  • Reverse for the one above:
.for (;(@rip > [MALICIOUS_DLL_ADDRESS_UPPER_BOUND]) | (@rip < [MALICIOUS_DLL_ADDRESS_LOWER_BOUND]);){ln rip; t; .sleep 1; }

  Why are these good? Because with virtualised packers, there is a lot of code that has hardly any meaning but it is responsible for logistics (e.g. pushing arguments on stack, preparing API calls, etc). Going through that code is painful and time-consuming even though it is quite simple to catch the pattern of calls at a certain point. The first line above allowed me to spot anti-debug calls like: IsDebuggerPresent, CheckRemoteDebugger and NtSetInformationThread (ThreadHideFromDebugger).

 However, at a certain point i let the first line above run for hours and no more API calls where seen. While i cannot tell with certainty why this happened i can assume that it may have been some anti-debugging technique that did not involve any API calls (e.g. the infamous fs:30 BeingDebugged flag) or it could just be VM activity unpacking the malicious code (single stepping into very complex routines is far from fast).

 As mentioned before, the sample subject to analysis here is meant to be run as a service. At around the time of writing this blog post i wrote another blog post on how to debug Windows services using Windbg. When i tried to implement the simple trick i am about to describe in the context of svchost rather than rundll i was unable to debug the service or even reach ServiceMain in a clear and repeatable way. I faced problems such as:

  • Unexpected process exits
  • ServiceMain being executed before i was able to catch it
  • Process exists with single stepping (even when i patched custom exception handlers and there were no hidden threads as far as my tools could tell)
  • Threads i suspended becoming unsuspended

 Once i switched to rundll instead of svchost i was able to unpack the sample and execute ServiceMain with no issues apart from some landmines left by VMProtect/original sample (e.g. failed step-overs due to the return addresses for some functions not being exactly the instructions i was seeing through static analysis). This may be either due to service internals i am not aware of or the sample behaves differently if run inside svchost (as a service).

 Assuming you are using rundll and calling DllMain as the first function, you should be able to get to TLS using part of the steps present on my post about service debugging . Let us examine the stack:

 So we know that TLS callback is being called as a consequence of a call to LoadLibraryExW. We also know that ServiceMain should not be called by LoadLibraryExW because ServiceMain is not present on all DLLs. That should be the function of svchost.exe or some other mytical creature. The combination of TLS and DllEntrypoint should leave the ServiceMain and other exports ready for use or at least they should have some instructions so that the service runs properly. Also, DllEntrypoint on services must call functions like RegisterServiceCtrlHandler and SetServiceStatus. Per the following Microsoft doc:

A ServiceMain function first calls the RegisterServiceCtrlHandlerEx function to get the service’s SERVICE_STATUS_HANDLE. Then it immediately calls the SetServiceStatus function to notify the service control manager that its status is SERVICE_START_PENDING. During initialization, the service can provide updated status to indicate that it is making progress but it needs more time. A common bug is for the service to have the main thread perform the initialization while a separate thread continues to call SetServiceStatus to prevent the service control manager from marking it as hung. However, if the main thread hangs, then the service start ends up in an infinite loop because the worker thread continues to report that the main thread is making progress.

 As you can see, there are a couple of constraints that DllEntrypoint must obey. This leaves us with a couple of ways to catch the execution of ServiceMain as soon as possible. 

The Procedure

 Now, the temptation here is to set a BP on the return from LoadLibraryExW or on one of the Service initialization functions from Windows API. However, VMProtect anti-debug mechanisms will screw the plan. So here is the proposed approach:

  • Take note of the first two at the address where you want the execution to stop (e.g. return address from LoadLibraryExW)
  • Overwrite those bytes with EBFE (jmp to current instruction)
  • .detach on the debugger and let the sample run. Optionally, if you know/suspect that the packer may have checks on windows names (e.g. looking for Windbg, IDA), you can use .abandon and then use Process Hacker to resume the threads (problem if Process Hacker is on the list of checks)
  • Wait a few seconds and attach again
  • At this point, both TLS and DllEntrypoint where called so ServiceMain should be unpacked

 Because i know you love images:

 Going back to svchost approach that i failed to carry out, whenever i set a loop on the address after the call to LoadLibraryExW and detached, ServiceMain would be executed without me being able to catch it. The reason for this is unclear and it may be due to oversimplification on my part of service internals or different behaviors by VMProtect when the sample is inside svchost.

Potential Issues

 When it comes to hard and fast solutions, it is a good idea to play devil’s advocate and come up with issues that the solution may fail to address. So i spin up the hater in me and here we go:

Problem: What if the sample loads itself within TLS/DllEntrypoint and calls all the malicious code before we can catch it?

Answer: This could explain part of the weird behavior i observed when debugging this sample inside svchost as a service. I tried to take advantage of the MO of this sample to skip VMProtect protective measures. This is a service so it is expected that the main malicious activity occurs within the ServiceMain unpacked on TLS and/or DllEntrypoint. The actually sample loads the malicious dll again and even hides it by manipulating loaded modules lists within ServiceMain.

Problem: What if the sample spawns side-threads responsible for anti-debug mechanisms and hides them?

Answer: I caught VMProtect hiding the main thread using NtSetInformationThread. However, for some reason, when i attached the debugger after the calls to TLS and DllEntrypoint, i did not see any threads being hidden (confirmed with Process Hacker which could be wrong). While i am not entirely familiar with how NtSetInformationThread works,  loop on the return address will cause any thread to be stuck because the memory is shared. You can then set the instruction pointer on some other thread to execute the unpacked ServiceMain. Alternatively you can set a loop on NtSetInformationThread, change inputs and change output to return success.

Problem: What if the packer and sample check for malware analysis software or run fancy assembly instructions to determine whether it is running within a VM?

Answer: The purpose of this post is to let the sample run without the presence of a debugger. The mechanisms above are out of the scope of  this blog post.

Problem: You did not debug this as a service. Aren’t you missing functionality? Also, how did you handle the ServiceMain’s arguments?

Answer: Not according to my static analysis of service control handlers. I may have though and i am counting on someone with more experience with Windows internals to tell me why did i run into the issues i did with svchost. As for the arguments, you can check them on MSDN and simulate yourself (e.g. write the service name in some memory region and then use the pointer to satisfy ServiceMain’s needs).

Problem: What if VMProtect detects a loop or tampered return addresses?

Answer: Once more, not an expert on packers but i would say that for this case, VMProtect would have to analyse the call stack and go back in the chain analysing all return addresses looking for loops or sketchy instructions. This is not very practical and a memory scan would be way too slow and prone to FPs.

Final Thoughts

 Too lazy to fully automate but too lazy to not automate at all is the conundrum i usually face. As mentioned on the first blog post on this topic, manually debugging a complex packer should always be the last resort. You are essentially competing with a software designed by Humans and executed by a machine which is capable of generating a degree of complexity that can wear down the bravest soldier.

 Some debugger plugins like ScyllaHide may be of help since they are designed to take into account common anti-debug techniques and even have profiles for well-known packers (didn’t work for me even when i checked all the possible anti-debug options) but i find that the best way is really let the binary run free with some guard rails to allow the reverser to catch it before it executes too much code.

 Exploiting (can i use this word??) the underlying logic of the malware (i.e. Service) to speed up the unpacking process and save me a week of work.

Stay safe 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s