(Not) All She Wrote (Part 2): Rigged Office Documents (Part 1)

Hello paranoids

 Continuing our crusade through the world of malicious documents and following the previous  post, i will now describe the approach for Office Documents. One of the great things about these is that now we have a means to debug malicious code which makes the job easier. Once more, i will start by overviewing the different types of Office documents (e.g. doc, docx, xls, xlsx, xlsm) and their internal structures (OLECF vs. OXML).

 Office documents are slightly more sophisticated because they are able to execute commands on the OS in the background through VBA macros (assuming they are executed). You can execute cmd.exe from a PDF document but the user gets a specific warning, not a generic “Enable Content” at the top of the document. As in PDFs we have crafted documents that exploit certain vulnerabilities. As such, i will write two posts: the first will be (mostly) about generic execution of macros while the second will be about exploitation.

 And since “practice leads to perfection”, i will analyse some specimen and present the tools that you should have in your pocket to address the stickiest situations.

 After the publication of this post i got feedback from @VessOnSecurity and @decalage2 here and as such i have decided to update the post. Many thanks for this guys!

Office Documents 101

  The number of formats and extensions is beyond confusing when it comes to Office documents but we can narrow them down to the ones on Wikipedia. Not a big fan of quoting Wikipedia but i think it is accurate and there is no need to copy it. Basically, the concerning extensions (assuming we are looking for macros ) are:

  • Word: doc, docm, dotm
  • Excel: xls, xlm, xlsm, xltm
  • PowerPoint: ppt, pot, pps, pptm, potm, ppsm, sldm

Structurally speaking, Office documents may have one of two formats:

  • OLE Compound Format (e.g. doc, dot, xls, xlt, pot): Format used for legacy versions of Office 93-2003. Data is stored within these document using a FAT-like file-system.
  • Open XML (e.g. docx, dotx, xlsx, xltx, potx): Basically a ZIP archive. 

  While for older versions of Office (e.g. 93-2003), you can add macros to the documents without changing their extensions, for more contemporary versions of Office (e.g. 2007, 2013), you are required to save documents with macros using a format such as docm, xlsm and pptm/potm/ppsm/sldm for Word, Excel and PowerPoint, respectively. Also, the icon changes and contains an exclamation mark as you can see below:

Macro-enabled document

 According to the Sophos article From the Labs: New developments in Microsoft Office malware, malicious actors tend to leverage old versions of Office. This may be explained in part by compatibility issues and the fact that newer versions of Office require macros to be saved with specific extensions (i.e. macro-enabled).

 When it comes to macro extraction, typical tools such as oledump or Decalage tools (described below) expect the files to be in OLECF format. In order to extract macros from Open XML, you need to open the archive (e.g. using 7-ZIP) and extract vbaProject.bin from the word folder as shown below:


vbaProject.bin is a compound document that can be analysed as you would normally analyse an OLECF document.

 In terms of the macros themselves, the developer can define his own functions but there are some standard ones. With old Office versions (e.g. prior to 97), Subs such as Auto_Open were used to execute VBA as soon as the document was opened and assuming the end-user authorised execution. Starting on Office 97, Microsoft introduced the concept of Events which is more contemporary and observed across programming languages. For the developer, the difference tends to be in terms of naming. The number of events is quite big so, based on some research i have put together a couple of events or keywords to lookout for when analysing malicious documents. You can find a cheat-sheet on my GitHub

 Knowing the events can be useful to understand the workflow of the malicious code. In any case, and according to my experience, expect a tiny percentage of the events referred on the provided repository (e.g. Document_Open, Application_NewWorkbook). Developers may also add other features such as ActiveX controls which translate to VBA code as well. However, ActiveX controls are meant for advanced features like forms which are more oriented to user interaction. Based on what i find on my day-to-day job, detection of macros tends to be quite straightforward. 

The Tools and the Approach

  1. Determine the type of file and do some reckon: OLECF or new Open XML format. The extension should help but if not, the file command should do it. Use oledump.py or Decalage’s VBA Tools and/or OLE Tools to perform some reckon (e.g. find streams, macros). The Linux command file should tell you what kind of Office document you are dealing with (e.g. PowerPoint, Excel). Structured Storage Viewer (a.k.a. SSView) provides a user-friendly interface to browse OLECF files.
  2. Extract the VBA files: oledump can be leveraged directly agains OLECF files (e.g. doc, xls) but requires vbaProject.bin as input if the format is Open XML.
  3. Analyse the VBA code: If it is obfuscated, you can use debugging (using Word/Excel/PowerPoint) and/or selective execution of code (e.g. copying VBA segments to new file and execute them) since VBA represents a subset of VB.

According to  @VessOnSecurity, ppt files are OLECF files where the macros are stored within streams as OLECF files themselves.  Fortunately, @decalage2 olevba is able to extract those as well.

 Armed with this basic knowledge and the proper tools, i will now analyse two specimen which i consider good examples of what you may find.

Specimen Analysis


 This is an example of a Word document where the size, complexity and degree of obfuscation are worth of debugging. We start with some reckon: 


 All these tools are a bit of an overkill but it suffices to know the reckon methods are plenty. In any case, we have a Word document with macros. oledump’s output indicates  (look for M) there is VBA code within the stream “Macros/VBA/ThisDocument”.  olevba.py contains some options to decode strings (e.g. Base64) and VBA expressions but in this case, none of those mechanisms helped. You may see m instead of M sometimes but that means there are no macros there to be examined. We extract the macros using:

#-s: select stream. The number after depends on the stream number and it is 8 for this case.
#-v: VBA extraction
oledump.py -s 8 -v [Document Path] > [Extracted VBA]

 There is only a Sub procedure “Document_Open” executed when the document is opened. The code is pretty messy and obfuscated so we do what Humans do best: look for something understandable. You should be able to spot these:

Set avwwmowlqteqwfxlg = CreateObject(Join(ovocfdozqcvvzattr, ""))
nfpeslotgjxccn = avwwmowlqteqwfxlg.Run(gamesurround, antennaclarify)

 This first step is typical whenever you look at some obfuscated script. You start by looking for standard functions (e.g. CreateObject) or other recognisable strings (e.g. Run, WScript). In this case, avwwmowlqteqwfxlg is likely WScript.Shell (notice the .Run a couple of lines below). We need to extract the arguments. We can do this in one of two ways:

  • Debugging the code
  • Modify the VBA code to something recognizable by a VBScript engine (e.g. wscript, cscript), add prints before the Run and execute the code

 The second approach takes a bit more time and is error prone. We will debug the code using the Office developer tools. You need to enable the developer mode and do a couple of steps pictured below:


 Then, set the breakpoints on the lines i have previously referred. For the CreateObject, you will need to right-click the variable ovocfdozqcvvattr->Add Watch so you can see the variable below (array of strings). The join will concatenate all the strings into WScript.Shell as suspected. Then, if you run until the .Run command, you will be able to see the arguments using either the Add Watch method or just by hovering the mouse over each argument.


Note: There are cases where the “Watch” window is not able to display the full content of a variable. In such cases, you have to do the following: 


 This is another Word document (skipping reckon here). We list macros and extract them. The important macro is 12 since the others have no code:


 We need to find the value of Hammy.Cheesy. Doing some analysis of the file streams we see some interesting streams: “Macros/VBA/Hammy” (macro), “Macros/Hammy/f” and “Macros/Hammy/o”. Hammy is an embedded object and by dumping the contents of both f and o you will see the string Cheesy  and a long Base64 string, respectively. 


We can obtain the decoded string using the following set of commands:

arr=(`oledump.py -s 8 -x [PATH FOR DOCUMENT] | xxd -r -p | strings`); echo ${arr[0]} | base64 --decode

You should see a PowerShell script as a result.

Corner Cases and Curiosities 

 On the aforementioned examples, the analysis consisted of  identification, extraction and analysis of macros. VBA macros can be stored within documents in multiple formats (e.g. source code, P-code and execode). Execode can actually be leveraged, if present, to check previous versions of embedded macros (e.g. hackers reuse the document and macros but update C2 URL). The analysis of P-code and execode differs slightly. On the following subsections i will overview the aforementioned concepts. I will rely mostly on references to other articles since manufacturing documents and duplicate text becomes a waste of time and space.  

 There is also another approach to run code that does not require macros and has been used by malicious actors. It leverages a Windows technology called Dynamic Data Exchange (DDE). I will overview this technology as well on the latest subsection.

Source code, P-code and Execodes

 There is not a lot of information regarding this topic so i will overview some observations and support claims using some online articles. We have previously leveraged oledump and other tools to enumerate embedded macros so we could assume that (and taking into account what i see at work) if the tools show no macros, there are no macros. It turns out that this is not true according to Dr.Bontchev’s GitHub. There you can find a link for a Zip containing multiple Word documents as POCs. If you use oledump and other tools that assume compressed VBA within the file, you will see nothing. However, it is not the case as you can see below (right picture represents output for pcodedmp): 


 There are at least two representations for the macros within Office documents. The first is a compressed version which is extractable by multiple tools (e.g. oledump). However, this is not the code that is executed when you open the document. When you write VBA code to the editor, the VBA is compiled into an intermediate language called P-code. That P-Code is stored within the file and executed when needed. The compressed VBA is not required for the macros to execute as demonstrated on the aforementioned example. The absence of tools that strip away the compressed code may explain why this technique is not being widely used by attackers. You can confirm that both forms coexist within a file by creating a Office document with the following code:

 Sub AutoOpen()
    CreateObject("WScript.Shell").Run "cmd.exe", 1, True
 End Sub

And running oledump and pcodedmp as shown below:



 There is a third and optional representation called Execode (SRP streams) which, according to the SANS article, can be leveraged to obtain older versions of macros. Lenny Zeltser simulated a case where an attacker would update a document to execute Calculator as opposed to Notepad, present on the previous version. The article is not explicit about the conditions that lead to the creation of the SRP streams but through some experimentation (using Word 2003 and 2013) i have concluded that those streams are created when macros are added to the document regardless of being ran. However, in order for traces of previous macros to be present, they must be executed (e.g. opening the document), and then changed. You can test this with the following experiment:

  1. Create document that spawns cmd.exe
  2. Launch the document
  3. Change to mspaint.exe and save

You should see traces of cmd.exe as shown below:

 As referred on the resources i have pointed, SRP streams are application and version specific (i.e. Office version), regardless of the VBA version. This explains why some malicious documents have them while others don’t. In any case, they represent a good forensic artifact. Assuming you detect the download of a malicious document from a server through network traffic analysis but when you inspect the host, the file is no longer there. Assuming the attackers reuse documents and change embedded C2s, you can obtain the current served file and look for old embedded domains or host indicators on the SRP streams to assess whether the old document was opened on the host.

 As clarified later by @VessOnSecurity, these streams are created when the macro is first ran and the document saved.  Running in this case can range from beginning to trace the code using the debugger to running the macro completely and then saving.

Dynamic Data Exchange (DDE)

 I have recently come across the docx 1cb9a32af5b30aa26d6198c8b5c46168 (VirusTotal URL).  This file was being delivered through email to one of our clients and it had neither macros nor exploits but it was capable of spawning PowerShell. Upon opening you would see:

Update fieldscmd.exe

 It turns out that this document employs Dynamic Data Exchange (DDE) to execute cmd.exe. DDE is a protocol that allows exchange of data between applications in a client-server fashion.  Let us say that you want to tailor a document dynamically to suit a specific client. You could leverage DDE to fetch some information from another Windows component and update a field within the document with that information. 

In practical terms you can get a glimpse of DDE’s capabilities by adding a Formula to a Word document and setting the formula to:

        {DDEAUTO c:\\windows\\system32\\cmd.exe "[COMMAND]"}

  A formula can be added like this:


 When opening the document you should see:


 From an attacker’s point of view i am not sure how effective this approach is when compared to the typical macros. The victim has to say “Yes” to two warnings (with the second showing strings like cmd and PowerShell) instead of clicking “Enable Content” for typical macros.

Final Thoughts

 On this post i overviewed the structure of Office documents as well as the tools to dissect them. Assuming a simple scenario with compressed macros, your worst case should be highly obfuscated VBA. Analysis of obfuscated scripts is out of the scope of this post since that would/will require a dedicated one. Suffices to say that a debugger can be your best bet. If you have experience with VBS, selective execution of code or simple static analysis can do the trick. The OLECF format is way more complex that what i have depicted here and you will find cases where macros use fields and objects stored somewhere within the document. The secret tends to be dumping streams and/or strings and see what stands out. Malicious documents are just a means to an end (e.g. drop/download and execute malware) and not the end itself, which reduces complexity drastically. 

 Speaking from experience, malicious actors don’t tend to employ sophisticated documents (e.g. stripped VBA, DDE) and up to 90% of the cases are simple doc/xls files with extractable macros. Obfuscation tends to be employed quite frequently though.

Stay safe 😉

3 thoughts on “(Not) All She Wrote (Part 2): Rigged Office Documents (Part 1)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s