- The structure and contents of a PDF file are defined using objects, which issue directives using ASCII based keywords
- Same risky keywords include
Execute Embedded Javascript --> /JS /Javascript /AcroForm /XFA
Try launching external or embedded programs --> /Launch /EmbeddedFiles
Take actin automatically when the PDF file is opened --> /AA /OpenAction
Interact with websites --> /URI /SubmitForm
header --> %PDF-1.6
object --> object delimited with:
X Y obj
endobj
...
xref --> Table with offsets of objects in the file
trailer --> Lists the number of objects and the offset of xref
- Indirect object 1 0 references 43 0
1 0 obj
Type: /Page
<<
/AA /O 43 O R
>>
endobj
44 0 obj
<<
/Filter
[/FlatDecode]
/Length 463
>>
stream
encoded contents
endstream
endobj
unzip steel1.zip
code steel1.pdf
- Use
pdfid.py
for an initial perspective to check for risky keywords pdfid.py
scans for suspicious keywords without formally parsing the PDF file- Its useful for an initial review to inform the next steps
- The
/URI
keyword indicates clickable URLs can be used in PDFs as phishing bait - We use "keyword" in a generic sense through PDF specs use other terms
pdfid.py steel1.pdf
- Use
pdf-parser.py
for a more detailed look at the PDF file - The
-a
parameter topdf-parser.py
shows statistics - Because
pdf-parser.py
properly parses PDF syntax, its output is more accurate than that ofpdfid.py
pdf-parser.py steel1.pdf -a
- The
-k
parameter shows just the values for the given key
pdf-parser.py steel1.pdf -k /URI
- The attacker tries to persuade the victim to clicking on the picture
- To locate images in the PDF file, look for objects of type
/XObject
- Use the
-o
parameter topdf-parser.py
to examine object 6 which contains/XObject
pdf-parser.py steel1.pdf -o 6
obj 6 0
Type: /XObject
Referencing 7 0 R
Contains Stream <-- Object includes encoded data
<<
/Type /XObject
/Subtype /Image
/Width 625 <-- Image size is 625 x 155 pixels
/Height 155
/BitsPerComponent 8
/ColorSpace /DeviceRBG
/Length 7 0 R
/Filter /DCTDecode <-- This decoding is used for JPEG images
>>
pdf-parser.py steel1.pdf -o 6 -d object6.jpg
- Follow the trail of references that leads to object 6 to see if the strail starts with a link
- The
-r
parameter finds a reference to the specified object - Object 6 which was of type
/XObject
is referenced by object 13
obj 13 0
Type:
Referencing: 4 0 R, 3 0 R, 8 0 R, 9 0 R, 6 0 R
<<
/ColorSpace
<<
/PCSp 4 0 R
/CSp /DeviceRGB
/CSpg /DeviceGray
>>
/ExtGState
- Note:
/Annotes
offers a way to associate a link with an object - Continue to follow the trail of references
- If you see
/Annotes 14 0 R
--> Look at object 14 now
- One-by-one requests using
wget
orcurl
- Recomment spoofing HTTP headers to make these requests look more like a normal web browser....Especially the UA strings for
wget
andcurl
!!!! - Can also tweak the config files of
wget
andcurl
~/.wgetrc, ~/.curlrc
- Specialized tools such as
Pinpoint
orScout
- Honeyclients software such as
Thug
- Real borwser on a purposefully vulnerable Windows system enabling the website to infect the lab machine
Activate behavioral monitoring tools to observe the infection
Capture network traffic
If using a sniffer such as Fiddler configure it to save SSL keys
Visit the website from several different IPs to see if its behavior changes
pdf-parser.py steel2.pdf -O -a
- If you see an
/ObjStream
from the output ofpdf-parser.py steel2.pdf -a
command then you need to view the/ObjStream
pdf-parser.py
does not examine object streams by default
pdf-parser.py steel2.pdf -O -r 10
Aditional Considerations with PDFs
- Look for risky objects, examine them, follow the trail of referenced or otherwise related objects
- If you see a suspicious object with a stream you can dump that stream to a file using parameters
-f -w -d
- Malicious PDFs can include JS --> look for
/JS /Javascript /Acroform /XFA
- PDF files could be password protected
- The strucutre will be visible but youll need to decrypt streams to examine them
- Youll need to determine the password then decrypt with tolls such as
qpdf
andpdftk
- Note: Even if the document of VBA project is password protected the macros are not stored in an encrypted way
- Office docsuments can follow two different formats
- The "legacy" binary format is OLE2 (a.k.a structured storage etc)
- OLE2 mimics capabilities of a file system using the concepts of storages (like folders) and streams (like files)
- The more modern XML based format OOZML incorporates multiple files that include the documents contents in a ZIP file
- Both formats can carry macros
- Macros in an OOZML file are inside a binary OLE2 file which is inside the zip archive
- Normally VBA macro code is embedded inside streams as compiled code (p-code) and compressed source code
file particulars.doc
trid particulars.doc
- Open XML Format --> means its an OOXML files
zipdump.py particulars.doc
unzip particulars.doc -d particulars-files
- Can extract individual files as well with
zipdump.py
-s
--> specify the file-d
--> extract or dump it
zipdump.py particulars.doc -s 5 -d > image1.jpeg
- Use
feh
image viewer to view the image
feh image1.jpeg &
olevba particulars.doc > particulars.olevba #extract
code particulars.olevba #view
olevba
utility can locate, decode, and extract VBA macros from Office files. The tool also shows a summary of the risky keywords it located in the macro- Any line that starts with
'
it is a comment in VBA - When Office sees
AutoOpen
it automatically executes that function as soon as the function is allowed to run - Example:
Sub AutoOpen()
g
End Sub
-----------------------------------------
Sub g()
' useless comment
' another useless comment for obsfucation
y
' blah
' blah blah
B
End Sub
- Can see that
AutoOpen()
callsSub g()
which then call functiony
and functionB
which are defined later - For deeper visibility into VBA macros and related artifacts examine streams
- Use
oledump.py
oledump.py particulars.doc -i
M
means there is a macro present2823+809
Size of the compiled code is the first number, second number is the size of the compressed source code- Example:
A3: M 3632 2823+809 'VBA/Pj
- Use
-s a
parameter to oledump.py to extract VBA macros from all streams inparticulars.doc
oledump.py particulars.doc -s a -v | more
- Pass the
oledump.py
output throughgrep
to eliminate the comments
oledump.py particulars.doc -s a -v | grep -v "^'" | more
- Sometimes minor aspects of the document can offer additional context for your investigation
- They can sometimes reveal artifacts used in its previous version
- Use
oledump.py
to extract them
- Be on the look out for obsfucated strings that are backwards
Public Const O As String =
" 23rvsger"
...
Function U5(qe)
Dim bT As New WshShell
bT.exec StrReverse(O) & " " & DU(1)
End Function
- When this is executed it will use the LOLBin
regserv32
- Be aware of LOLBin
mshta
as well
exiftool filename.doc
- XML source code files sometimes include details such as:
- Hidden comments such as URLs from which images were pasted
- The language code of the system where the document was created
- You can unzip its contents and examine individual XML files
- Start with
zipdump.py
with no command line arguments
zipdump.py particualars.doc
- Once you have identified the index of the file you'd like to examine you can call
zipdump.py
again specifying the desired files index using-s
-d
parameter will direct the tool to dump the file to STDOUT- Can then pipe to
xmldump.py
with the parameterpretty
to reformat the file
zipdump.py particulars.doc -s 9 -d | xmldump.py pretty | more
vmonkey particulars.doc > particulars.vmonkey
code particulars.vmonkey
- Tool will auto decode the VBA macros
- After performing analysis you notice a macro in
A3
- When extracting it with
oledump.py
oledump.py mydoc.docm -s A3 -v | more
- You see alot of these lines
exec = exec & ChrW(112) & ChrW(111)...
- You can use
numbers-to-strings.py
oledump.py mydoc.docm -s A3 -v | numbers-to-string.py -j | more
- Make sure to add new lines and examine the output
numbers-to-strings.py -j | sed "s/;/;\n/g" > mydoc.oledump
- Can see the VBA macro using
oledump.py
even though MSFT Office refuse to show you the code due to the password being set
oledump.py invoice.doc -i
oledump.py invlice.doc -s 7 -v | more
oledump.py invoice.doc -s 7 -v | grep -v "^GoTo" | grep -v ":$" > invoice.oledump
- The tool
xor-kpa.py
is designed to derive an XOR key from the supplied plaintext and cipher text - It can also XOR a string with its multi-byte key which mimics the algorithm employed by our malicious macro
-x
tells the tool to XOR the data with the key- Start each param with
#h#
to designate it as a hex-encoded string and enclose in''
xor-kpa.py -x '#h#89789FD89AF897AKJHF43HK23' '#h#66546F'
plugin_http_heuristics
--> will automatically decode embedded URLs if they are encoded using a common obsfucation method
oledump.py invoice.doc -p plugin_http_heuristics
- Sometimes a faster approach to deobsfucate macros involves the VBA debugger built into MSFT Office
evilclippy -uu invoice.doc
- Will remove the macro password with
-uu
flag - Then open MSFT Word click
View tab --> Macros --> View Macros --> edit
- Bring up the locals window so you can see the variables
- Add the following at the beginning of the macro (e.g. at the start of the AutoOpen function) so the macros starts the debugger
Sub AutoOpen() <-- Line already there
Debug.Assert False <-- Line you add
GoTo jlskdffjieoajioehjfueahfekjanufiw <-- Start of obsfucated mess
- Save the macro so the line you added doesn't get lost
- Switch to the MSFT word main view and enable macros
- Once you enable the macros it will run and pause in the AutoOpen function on the line you set
- Set the breakpoint on the line that interests you
- Then click
Run > Continue
- Once it hits your breakpoint examine the locals window, it will show the current variables in the bottom window, you should see what you are looking for
- When a macro is added to an Office Document MSFT Office compiles it into a bytecode form known as
p-code
- This is the code that is actually executed when the macro is run (most of the time: https://github.com/bontchev/pcodedmp)
- Malware authors could modify or fully delete the source code version of the macro while keeping the
p-code
version intact - Our analysis tools focus on the source code of the macro and wont recognize the true nature of the file
Extract the file as always
olevba order.docm
- Now extract the file structure info
oledump.py order.docm -i
- Will see a
!
which will indicate an Unusual start of source code - Another sign of VBA stomping will show if the size of the compressed source code being
0
oledump.py
can extract thep-code
but it cannot decode it
oledump.py order.docm -s A3 -v <-- Will get an error "Cannot decompress"
oledump.py order.docm -s A3s -A <-- -A will show the contents the way a hex editor might show them
oledump.py order.docm -s A3c | more <-- adding -C will show the compiled code (what c stands for)
- Use
pcodedmp.py
to disassemble VBAp-code
pcodedmp order.docm > order.pcodedmp
code order.pcodedmp
- Use
pcode2code
to decompile VBAp-code
pcode2code order.docm | more
- Note:
- MSFT Office automatically decompiles the
p-code
generating the VBA source code, however: - Macros without the source code will only run in the specific version of Office for which the
p-code
was created - If you want to debug the macros you can decompile the
p-code
usingpcode2code
you can embed the macro in a document
- If you identify some base64 encoded PowerShell, ensure to use
bse64dump.py
to convert it
oledump.py checkbox.doc -s 7 -d | base64dump.py -s 1 -t utf16 > checkbox.ps1
more checkbox.ps1
- However when you view the dump we can see that it is also
gzip
encoded data - Extract the gzip data
base64dump.py checkbox1.ps1 -s 3 -d | gunzip - > checkbox2.ps1
code checkbox2.ps1
- Shellcode is machine code that the CPU can understand
- It is represented as a series of bytes sorted in a memory region
base64dump.py checkbox2.ps1 -n 10
-n
parameter directsbase64dump.py
to only consider strings that when decoded are at least 10 bytes long- You should now see the long shellcode string, and see that it is the second stream, use
-s 2
to extract that stream
base64dump.py checkbox2.ps1 -n 10 -s 2 -d | translate.py "byte ^ 35" > checkbox.bin
- Use
scdbgc
to emulate the execution of shellcode to understand its capabilities
scdbgc /f checkbox.bin /s -1
- Can now use
yara-rules
to identify known malware patterns in file
yara-rules checkbox.bin
1768.py checkbox.bin
yara-rules
command will scan the file to see if it hits off any rules1768.py
is designed for parsing Cobalt Strike artifacts and is installed on REMnux- In CS files the License ID is stored as a 32-bit integer in the last 4 bytes of the shell code
- See more: https://isc.sans.edu/forums/diary/Finding+Metasploit+Cobalt+Strike+URLs/27204/
- RTF documents are supported by MSFT word and many non-MSFT applications
- RTF does not support macros but it allows attackers to embed other dangerous files as
OLE
objects and other binary contents - Users can be persuaded to open and execute the embedded file
- RTF files can also directly target a vulnerability using an exploit to execute the embedded shellcode payload
- When examining RTF documents, focus on the objects or other embedded artifacts
- https://cofense.com/rtf-malware-delivery/
- Usually formatted as ASCII plaintext and includes control words and groups
- Control words start with
/
and specifies how the RTF rendering application should format and display the characters - A group encloses other elements in
{}
delimiters and specifies the text affected by the group and its formatting - Groups can be nested
- Objects and other binary content are embedded as serialized strings that represent hex values
- You will see the
/objdata
control work followed by a string encoded in hex - Use
rtfdump.py
and| more
to get and overview of the RTF files groups and to spot embedded objects -o
will allow you to examine the object
rtfdump.py new-order.doc -O
-s
parameter specifies the index of the object-d
tells the tool to dump the object in its raw form
rtfdump.py new-order.doc -O -s 1 -d > new-order.object
- Use
oledump.py
to examine the extracted object oledump.py new-order.object -i
- If you now want to examine a specific steam use the
-A
parameter oledump.py new-order.object -s 4 -A
- When analyzing malicious documents that might have exploits look for shellcode to understand the payload of the attack
- Use the
-S
parameter to examine the strings oledump.py new-order.object -s 4 -S
- For parsing
Equation Editor 3.0
data we have an option-f name=eqn1
oledump.py new-order.object -s 4 -d | format-bytes.py -f name=eqn1
- When looking for shell code look out for a lot of
0x90
also known as a NOP sled - Use
xorsearch
to spot shellcode patterns in binary files xorsearch -W -d 3 qa.bin
EIP
points to the current instruction but assembly code cannot read it directly, so malware authors do it indirectly
Call followed by a POP allows code to get its EIP contents
CALL 00401024
POP EAX
Sellcode developers attempt to evade detection by using other instructions to perform GetEIP
00401027 JMP SHORT 0040102C #Happens first and moves down to the CALL
00401029 POP ESI
0040102A JMP SHORT 00401031
0040102C CALL 00401029 #Call is made and it moves back up to the POP
00401031 ADD ESI, 9
This code suceeds at making the CALL and then POP in an indirect manner
- Shellcode needs to do some work before it can make API calls
- To load DLLs and resolve API function names, shellcode often seeks
kernel32.dll
forLoadLibrary
andGetProcAddress
- Shellcode loos for the
Process Environment Block (PEB)
to locatekernel32.dll
in memory of the exploited process - For every process the Windows OS creates a structure called the
PEB
- This data structure contains information about the process including the list of modules (DLLs) that have been loaded or mapped into the processes memory
- The
FS
register contains the address of the data structure called theThread Information Block (TIB)
, which contains information about the currently running thread - A pointer to the
PEB
resides within theTIB
at offset0x30
with respect to the beginning of theTIB
- Therefore a pointer to
PEB
is always located atFS:[0x30]
- This syntax directs the processor to look for the address stored
0x30
bytes away from the beginning of theTIB
structure - Two methods to retrieve the
PEB
MOV EAX, DWORD PTR FS:[30h]
PUSH 30h
POP EBX
MOV EAX, FS:[EBX]
- Use
scdbgc
to analyze shellcode by emulating its execution - the
-foff
parameter specifies the hex offset within the file where the shellcode starts - This can be determined by
xorsearch
- Press CTRL+C three times if
scdbgc
gets stuck
scdbgc /f a.bin /s -1 .foff 3B
/s -1
parameter indicates to continue the emulation without restricting the max number of instructions- Direct
scdbgc
to open a handle to the malicious file so the shellcode can find the overlay to where it likely stores additional contents - Hit CTRL+C three times after it starts to avoid too many repeating instructions from filling your screen
- Can hide the numerous
READ/WRITE
events with/norw
scdbgc /f qa.bin /s -1 /foff 3B qa.doc /norw
- If you see shellcode attempting to drop another file such as an exe, we can allow the shellcode to execute in order to capture the file
- use
runsc
- Can use it also on REMnux due to wine being installed
- To execute shellcode:
runsc32 -f qa.bin -o 0x3B -d qa.doc -n
find ~/.wine -name WINWORD.EXE -exec -cp "{}" .\;
- Microsoft Excel 4 (XML) macros are legacy technology that can offer attackers an alternative to VBA macros
- Were built in 1992 before the introduction of VBA in 1993
- Are being retired by MSFT but work in recent versions of Excel
- Are defined as formulas in cells of sheets
- Sheets are often hidden
- The formulas are often in white text on white background
- To see where the XLM macro execution starts use
zipdump.py
with-s
parameter to examine thexl/workbook.xml
zipdump.py koti.xlsm -s "xl/workbook.xml" -d | xmldump.py pretty
- To see where execution starts look for:
<definedNames>
<definedName name="_xlnm.Auto_Open">Lodet!$A$154</definedName>
</definedNames>
- Execution starts in cell A154 in sheet Lodet
- Look above at the
<sheet name=>
parameter to figure out whichrId
number is assigned to our sheetLodet
and whether it is hidden or not - To see which XML files represent the sheets Loded and kOTI look at the
xl/rels/wordkbook.xml
file
zipdump.py koti.xlsm -s "xl/_rels/workbook.xml.rels" -d | xmldump.py pretty
- It will show you:
- `<Relationship Id="rId3"...Target=worksheets/sheet2.xml"/>
- Now examine the
worksheets/sheet2.xml
- Now extract the contents of Lodet which is
macrosheets/sheet1.xml
usingzipdump.py
zipdump.py koti.xlsm
zipdump.py koti.xlsm -s 6 -f | xmldump.py pretty | more
- For easier analysis, direct
xmldump.py
to display just the cell text
zipdump.py koti.xlsm -s 6 -d | xmldump.py celltext > koti.csv
- XML Macro obsfucation techniques include the following:
- Use formulas to compute sensitive values such as strings during the runtime of the macro
- Compute some values randomly during runtime i.e. the URL
Static analysis to compute possible values can be complex and time consuming
Cached value saves time byt displays only one possible outcome
- Instead of including a string in the formula include a reference to a string that is stored in a shared table elsewhere in the document
- The shared strings are always in
xl/sharedStrings.xml
- Shared strings can reveal IOCs
- You can direct
xmldump.py
to look up the strings for you by using the-j
paameter and pointing to a stream that has the macros
zipdump.py koti.xlsm -j | xmldump.py -j 6 celltext
- MSFT office is very helpful for decoding XLM macros
- Use the built in debugger to examine and deobsfucate code
- Covert file format from OOXML to OLE2 and the other way
- Execute the macro the way a victim would to observe effects on the system from a behavorial perspective
- Use Windows AMSI functionality to observe which script commands end up executing
logman start AMSITrace -p Microsoft-Antimalware-Scan-Interface Event1 -o AMSITrace.etl -ets
- Run the suspicious script or macro you wish to examine
- Stop AMSI Monitoring
logman stop AMSITrace -ets
- Examine the AMSI data saved to the file
AMSIScriptContentRetrieval
- Additional tools and considerations for XLM macro analysis
oledump.py
can examine XLM macros in OLE2 filesoledump.py file.xls -p plugin_biff --pluginoptions "-x"