Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@
]
if args.debug:
config.ldflags.append("-g") # Or -gdwarf-2 for Wii linkers
config.ldflags.append("-sym full")
if args.map:
config.ldflags.append("-mapunused")
# config.ldflags.append("-listclosure") # For Wii linkers
Expand Down Expand Up @@ -208,7 +209,7 @@
# Debug flags
if args.debug:
# Or -sym dwarf-2 for Wii compilers
cflags_base.extend(["-sym on", "-DDEBUG=1"])
cflags_base.extend(["-sym full", "-DDEBUG=1"])
else:
cflags_base.append("-DNDEBUG=1")

Expand Down
83 changes: 83 additions & 0 deletions ghidra_scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Ghidra Importer

## What do you get from using the importer?

`bfbb_import` is a script which take basic symbols from the original game (in symbols.txt), and more detailed symbols from the reverse engineered code we can compile so far, and imports them into a Ghidra for easier reverse engineering.

Results of running the import:

* Full parameter type, return type information, parameter names, global variable types etc are imported for the contents of cpp files listed as `Matching` in `configure.py`:

* ![test](gimport/function_with_return.png)

* All struct types referenced in `Matching` files are imported:

* ![test](gimport/struct_import.png)

* Name and parameter types but _not_ return types are imported for other name mangled functions in `symbols.txt`:

* ![test](gimport/function_with_paramn.png)

* All other remaining symbols from `symbols.txt` are annotated in some way in the main Ghidra listing via labels.

## Import Instructions

### Step 1: Install Ghidra

Download and "install" a recent version of Ghidra from https://github.com/NationalSecurityAgency/ghidra/releases. "Install" here just means unzipping the folder, there is no global install process for Ghidra.

Note: You may need to install the JDK if you don't have it already. You will be prompted for this when running Ghidra if you don't have it.

### Step 2: Install the DOL Extension

Ghidra can't understand Gamecube DOL files out of the box. Install the Ghidra Gamecube loader from https://github.com/Cuyler36/Ghidra-GameCube-Loader/releases.

### Step 3: Import the DOL

Open Ghidra and `File > Import File...`, selecting the DOL file you put in `bfbb/orig/GQPE78/sys/main.dol` when setting up the repo.

Open up the imported file and ***allow analysis to run when prompted***. This importer script expects the functions to already be created by analysis.

### Step 4: Install Ghidrathon

We need to give Ghidra the ability to run Python 3 code, we do this with the Ghidrathon extension. Download Ghidrathon from the releases page: https://github.com/mandiant/Ghidrathon/releases

Follow the installation instructions on that page. You probably don't need to create a venv in this case, but you do need to run `ghidrathon_configure.py`.

### Step 5: Install Importer Script Dependencies

The importer script has a single additional Python package dependency on `elftools` to parse the elf file. Install it with the following command:

```bash
pip install pyelftools
```

### Step 6: Add Script Directory

In Ghidra, `Window > Script Manager` to open the script manager. This is what we ill use to run the script.

In the script manager, at the top right, click the "Manage Script Directories" button: ![image](manage_script_directories.png)

Click `+` at the top right of the script manager, and add `bfbb/ghidra_scripts` to the list of script directories.

### Step 7: Run the Importer

In the Script Manager, you should now be able to filter for `bfbb_import.py`. Select it and run it through the context menu or the run button at the top of the Script Manager.

Importing will take as long as a clean build does because we temporarily have to make a debug build of the executable to get the parameter names and other info from already reverse engineered functions (the script will restore your previous build settings after doing so)

### Step 8: (Optionally) Change Additional Files to Matching

The importer script only imports types referenced in files linked into the final DOL file the bulid generates. To generate matching DOLs, the build normally only links compilation units which are 100% matching.

If you're working on a cpp file with structures you want to import into Ghidra, you're not bound by this limitation! As long as enough contents are defined in the file you're working on for it to link you can import things from it.

Temporarily change the file in question to "Matching" in `configure.py`, and re-run the importer. Note that if you build with the file changed to Matching when it is not a 100% match yet, this will give you a "not matching" error at the end of the build. That's expected: The import will still be able to import the symbols correctly regardless because it uses the memory mapping in symbols.txt.

### Step 9: Enjoy The Results

Most functions should now have name / parameter info rather than just being FUN_xxxxxxxx. No more having to look stuff up in symbols.txt!

<!-- ## Ghidra Basics

TODO: Basic guide on using Ghidra -->
7 changes: 7 additions & 0 deletions ghidra_scripts/bfbb_import.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import gimport.extract_info
import gimport.import_info

if __name__ == "__main__":
extracted_info = gimport.extract_info.extract_info()
print("Importing info into Ghidra")
gimport.import_info.import_info(currentProgram(), extracted_info)
Empty file.
207 changes: 207 additions & 0 deletions ghidra_scripts/gimport/demangle.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
from typing import Tuple, List
import re
from .dwarf import DW_FT, DwarfSubscriptDataItem
from .gtypes import GType, GPointerType, GFundType, GArrayType


SPECIAL_NAME_TO_OPERATOR = {
"__as": "=",
"__ml": "*",
"__amu": "*=",
"__mi": "-",
"__ami": "-=",
"__dv": "/",
"__adv": "/=",
"__pl": "+",
"__apl": "+=",
"__nw": "new",
"__dl": "delete",
"__aor": "|=",
"__or": "|",
"__eq": "==",
"__ne": "!=",
"__vc": "<<",
"__mm": "--",
"__pp": "++",
"__rf": "*",
"__cl": "()",
}


SPECIAL_IGNORE = {
# Things containing a "T#", don't know what that means
"setevenodd__FUlPUlUlUlP5BLITST1": True,
"YUV_blit__FPvUlUlUlT0UlUlUlUlUlUlUlT0P5BLITS": True,
"YUV_blit_mask__FPvUlUlUlPUcUlT0UlUlUlUlUlUlUlT0P5BLITS": True,

# Can't disambiguate with normal mangling
"__end__catch": True,
}


def demangle(mangled_name: str, resolve_ud) -> Tuple[str, List[GType]]:
# Cut off name
index = mangled_name.find("__", 1) # 1 instead of 0 to skip __ in names like __ct
if index == -1:
# Not a mangled function
return None
# Don't know how to demangle some things
if mangled_name in SPECIAL_IGNORE:
return None
name = mangled_name[:index] # Name part only
without_name = mangled_name[index+2:] # Cut off name

# Cut off namespacing bits
namespaces = []
while len(without_name) > 0 and (without_name[0] == "Q" or str.isdigit(without_name[0])):
if without_name[0] == "Q":
qualification_count = int(without_name[1])
without_name = without_name[2:]
for i in range(qualification_count):
(namespace_len_text, rest) = re.match(r"^(\d+)(.*)", without_name).groups()
namespace_len = int(namespace_len_text)
namespaces.append(rest[:namespace_len])
without_name = rest[namespace_len:]
else:
(len_str, rest) = re.match(r"^(\d+)(.*)", without_name).groups()
namespace_len = int(len_str)
namespaces.append(rest[:namespace_len])
without_name = rest[namespace_len:]
this_type = resolve_ud(namespaces[-1]) if namespaces else None

# Namespaced global variable, not a function
if len(without_name) == 0:
return None

# Handle special names
if name.startswith("__"):
if name in SPECIAL_NAME_TO_OPERATOR:
name = f"operator{SPECIAL_NAME_TO_OPERATOR[name]}"
elif name == "__ct":
name = namespaces[-1]
elif name == "__dt":
name = f"~{namespaces[-1]}"

# Add namespaces to name
name = "::".join(namespaces + [name])

# C -> Const method.
is_const = without_name[0] == "C"
if is_const:
without_name = without_name[1:]

# F -> function, no F -> method.
is_member = without_name[0] != "F"
whole_text = without_name if is_member else without_name[1:]

# Easier to handle this here
if whole_text == "v":
return (name, [])

"""
Ann_ Array
P pointer
C constant
Qn qualified name, n parts

b bool
c char
s short
i int
l long
x long long
f float
d double
e vararg
nn <name> struct
"""
def parse_type(text: str) -> Tuple[GType, str]:
if text.startswith("A"):
(dim, rest) = re.match(r"^A([0-9]+)_(.*)", text).groups()
(type, rest) = parse_type(rest)
array_type = GArrayType()
count = DwarfSubscriptDataItem()
count.highBound.isConstant = True
count.highBound.constant = int(dim) + 1
element_type = DwarfSubscriptDataItem()
element_type.type = type
array_type.subscripts = [count, element_type]
return (array_type, rest)
elif text.startswith("F"):
text = text[1:]
if text.startswith("v"):
text = text[1:]
else:
while text and not text.startswith("_"):
(param_type, text) = parse_type(text)
assert text.startswith("_"), f"Expect _ after function type in {mangled_name}"
# TODO: Actually handle function type
return (GPointerType(GFundType(DW_FT.void)), text[1:])
elif text.startswith("Q"):
qualification_count = int(text[1])
text = text[2:]
parts = []
for i in range(qualification_count):
(namespace_len_text, rest) = re.match(r"^(\d+)(.*)", text).groups()
namespace_len = int(namespace_len_text)
parts.append(rest[:namespace_len])
text = rest[namespace_len:]
return (resolve_ud(parts[-1]), text)
elif text.startswith("Pv"):
# Pointer to void is special
return (GPointerType(GFundType(DW_FT.void)), text[2:])
elif text.startswith("PCv"):
return (GPointerType(GFundType(DW_FT.void)), text[3:])
elif text.startswith("P") or text.startswith("R"):
(type, rest) = parse_type(text[1:])
pointer_type = GPointerType(type)
return (pointer_type, rest)
elif text.startswith("C"):
# Constness ignored here
return parse_type(text[1:])
elif text.startswith("b"):
return (GFundType(DW_FT.bool), text[1:])
elif text.startswith("c"):
return (GFundType(DW_FT.S8), text[1:])
elif text.startswith("s"):
return (GFundType(DW_FT.S16), text[1:])
elif text.startswith("i"):
return (GFundType(DW_FT.S32), text[1:])
elif text.startswith("l"):
return (GFundType(DW_FT.SLong), text[1:])
elif text.startswith("x"):
return (GFundType(DW_FT.S64), text[1:])
elif text.startswith("f"):
return (GFundType(DW_FT.F32), text[1:])
elif text.startswith("d"):
return (GFundType(DW_FT.F64), text[1:])
elif text.startswith("Uc"):
return (GFundType(DW_FT.U8), text[2:])
elif text.startswith("Us"):
return (GFundType(DW_FT.U16), text[2:])
elif text.startswith("Ui"):
return (GFundType(DW_FT.U32), text[2:])
elif text.startswith("Ul"):
return (GFundType(DW_FT.ULong), text[2:])
else:
# Handle struct
if match := re.match(r"^(\d+)(.*)", text):
(ident_len_text, rest) = match.groups()
ident_len = int(ident_len_text)
ident = rest[:ident_len]
rest = rest[ident_len:]
return (resolve_ud(ident), rest)
else:
print("Unexpected mangle:", text, mangled_name)
exit(0)

result = []
if this_type:
result.append(GPointerType(this_type))
while whole_text:
# End of empty arg list, or variable args
if whole_text.startswith("v") or whole_text.startswith("e"):
return (name, result)
(type, whole_text) = parse_type(whole_text)
result.append(type)
return (name, result)
Loading