-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Match fingerprints by instruction filters #329
base: dev
Are you sure you want to change the base?
feat: Match fingerprints by instruction filters #329
Conversation
Still a work in progress. But so far works well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should be updated to match the new usage.
* @param opcodes An opcode pattern of instructions. | ||
* Wildcard or unknown opcodes can be specified by `null`. | ||
*/ | ||
fun instructions(vararg instructionFilters: InstructionFilter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency and ease of use, DSL APIs should be provided. So, instead of constructing objects for filters, the API could look like this:
fingerprint {
opcodes()
methods()
...
}
calling those functions would construct the respective filter and add it to the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works with DSL unless I'm not understanding quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the API exports constructors, however with DSL, the API should be purely functional. So instead of
fingerprint {
instructions(SomeFilter())
}
it should for example be
fingerprint {
some()
}
which gets rid of the constructor as part of the public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to add DSL functions now, but running into an issue where DSL builder functions from the wrong DSL builders can be used:
fieldAccess {
definingClass("Landroid/os/Build;")
name("MODEL")
type("Ljava/lang/String;")
returns("Ljava/lang/String;") // Wrong function. Belongs to fingerprint(), and no compile error or warning is thrown.
}
I cannot find any way around this. It gets particularly bad because MethodCallFilter
has identical accessor functions as fingerprint() such as 'returns()', 'parameters()', etc.
Why not use regular constructors here? This would avoid this issue. I think applying DSL for these filters adds more problems than it solves, and it's easier to understand since the constructors are more explicit than figuring out all the DSL set functions. It also makes adding filters much easier since it's just 1 class and not 2 or 3 that DSL requires:
FieldAccessFilter(
definingClass = "Landroid/os/Build;",
name = "MODEL",
type = "Ljava/lang/String;"
)
Or even more simply:
FieldAccessFilter("Landroid/os/Build;", "MODEL", "Ljava/lang/String;")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, the filters with a single required parameter do not lend well to DSL.
Such as NewInstanceFilter
where it's usage is always:
NewInstanceFilter("com.company.ClassType")
There is nothing to gain using DSL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find any way around this
Perhaps your issue is solved with https://kotlinlang.org/docs/type-safe-builders.html#scope-control-dslmarker. This way a DSL function will raise a compile time error if attempted to be called from the wrong scope.
I tried control markers, and it does not fix the issue. The wrong DSL functions can be used and no warning or compile error is thrown. Only at runtime does it finally throw an exception.
Why not use regular constructors here? This would avoid this issue
Consistency requires DSL functions. I strongly want to avoid mixing API style. The functional API is simpler to write and read.
Function declarations can be made for the constructors, giving what you suggested:
fieldAccess(
definingClass = "Landroid/os/Build;",
name = "MODEL",
type = "Ljava/lang/String;"
)
At this point why even separate the strings?
fingerprint { instructions { field("LDefiningClass;->name:LType;") // Instead of three constructor args } }
MehodCallFilter has fun parseJvmMethodCall(String)
and allows using the toString of non obfuscated class and method calls:
MethodCallFilter.parseJvmMethodCall("Landroid/view/View;->inflate(Landroid/content/Context;ILandroid/view/ViewGroup;)Landroid/view/View;",)
FieldAccess does not have a toString parser, because in practice it is rare for for a non obfuscated field to be accessed from a non obfuscated class (and there's only 3 parameters for the filter, making it simply to break up the instruction yourself).
Allowing declaring only part of the call (such as only parameters and return type) is needed if any part of the call is obfuscated, since defining class and method name differ between app targets and is not constant.
It is a matter of consistency to use functions, even if it has one parameter. Using classes you'd have to do
NewInstanceFilter()
and with DSL you haveinstance()
.
...
is a little abusive of DSL. DSL doesn't mean turning everything into lambdas. This is DSL too:FieldAccess( definingClass = "Landroid/os/Build;", name = "MODEL", type = "Ljava/lang/String;" )
I agree and the DSL constructor methods would make it more consistent. I'll try something like:
fun newInstance(type: String, maxInstructionsBefore: Int = METHOD_MAX_INSTRUCTIONS) =
NewInstanceFilter(type, maxInstructionsBefore)
And then the style will be the same as the FieldAccess() example above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of note, Fingerprint strings()
could be deprecated and instead the strings are declared with the instructions filters:
instructions {
// ...
string("Whatever text"),
fieldAccess(definingClass = "this", type = "Ljava/lang/String;"),
// ...
}
This would simplify things since stringMatchResults
is then part of the instruction filter results.
fingerprint#strings()
would need deprecation but still kept for the short term since all the existing fingerprints would need changing since the ordering of the string declaration then matters (too much of a hassle to change the old code), but all future fingerprints could declare the strings in the instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. For note, it may be possible to design the DSl so that you can repeat the instructions block. It could give access to a scope without overriding it so:
fingerprint {
instructions { }
instructions { }
}
Would not set instructions field in the builder twice. This way the deprecation could replace strings() with instructions { strings() }. However with instructions { } filters have order so perhaps this doesn't make too much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the ordering makes it difficult to support string instructions with the old opcode patterns because of the instruction ordering.
It's not difficult to deprecate but still support the existing strings { }
declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about ready.
84739dd
to
98c98e2
Compare
98c98e2
to
7547319
Compare
… `@context` usage, simplify instruction filter block calls.
7ef1357
to
4d38837
Compare
57a5370
to
319a8a7
Compare
src/main/kotlin/app/revanced/patcher/patch/BytecodePatchContext.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/app/revanced/patcher/patch/BytecodePatchContext.kt
Outdated
Show resolved
Hide resolved
*/ | ||
private const val LOG_RESOLVING_SPEED_MINIMUM_MS_TO_LOG = 50 | ||
|
||
internal fun parametersStartsWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved back to where the function is used. Closely connected code is better read when together. When reading this diff, I was confused what this function is or what and where it is used for, even when next to where it is used would avoid all those questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is used in both fingerprint and instruction filter.
This could be moved to a utils class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting moving instruction filters to the fingerprint class; that way, closely related code, as seen here, would be together. Moving it to a utility class makes it even farther away from the places it is used, perhaps duplicating code would be even OK here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not an internal utility class? Copy pasting code seems more sloppy than adding an internal shared class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a general consensus that such classes are not elegant or usually indicate that something isn't done the right way:
https://stackoverflow.com/questions/3340032/are-utility-classes-evil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the filters and the shared method into Fingerprint.kt
@@ -0,0 +1,633 @@ | |||
@file:Suppress("unused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's multiple issues I have with this file and still confused from past discussions.
-
If InstructionFilter filters instructions as it implies, it is not supposed to have context about methods or classes, but instructions, regardless of the reason behind it. If context about classes or methods is needed, "InstructionFilter" needs to have a different name as it is something else than what the current name implies.
-
There are currently some existing filters implemented in Patcher under assumptions for relevance that should not be taken. Today field references, method calls, consts, and object instantiations may be relevant, tomorrow class references, string values, or other things that would require changing this file to adjust to a new assumption. APIs shouldn't be offered based on assumptions. Based on the current assumptions, you will restrict someone to current existing filters to avoid implementing the interface for whatever filter they need. Instead, an universal filter API should be offered that does not make any assumptions of what might or might not be useful and leave this decision to the API consumer. That said, those filters can be implemented somewhere in a separate module, outside of the patcher module, but can't be part of the patcher module just based on assumptions of relevance.
-
Comments should follow the current style. Constructor parameters should be commented as @param in an inline comment for the constructor. Constructors should start with a sentence explaining what the class is/does. Currently, some just jump to examples (such as in LiteralFilter)
-
Currently no DSL api is present, even though the patches & fingerprint API is currently fully DSL. Something like this would be acceptable:
fingerprint { instructions { field(...) Opcode.X() Opcode.Y() "string"() 123() } }
However with the filter API the current simple usage of opcode patterns now involves more boilerplate. Before:
fingerprint { opcodes(Opcode.X, Opcode.Y, ...) }
fingerprint { Opcode.X() Opcode.Y() ...() }
Every filter is added via invocation which has to be done as many times as many opcodes exist, a linear amount of times.
An alternative API would be
fingerprint { instructions { opcodes(Opcode.X, Opcode.Y) // Also works for one field() string("") .... } }
-
Filters look to me more like something suitable in custom block rather than replacing the opcode pattern. As explained in another review comment, fingerprints image a method, filters are not a direct attribute of a method making them suitable at most in custom (where also context about class and method both exist furthermore showing evidence of being a suitable place). Not sure how you'd want to pull that off, but replacing a direct attribute a fingerprint can image is something to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be the usage of method filters, field, and literal constants.
It's up to the patches to declare new filters if desired. Such as ResourceMappingPatch declaring it's own filter that finds decoded resource literals.
Instruction filter no longer has a classDef parameter. The instruction method is passed as a parameter and most filters don't use it, but some require it to check how many indexes the method has and others to parse out the enclosing class.
It's important to note this is not a custom block replacement. It's checking the instruction on a more fine grained scale, and checking the ordering of the instructions, and it produces a list of indexes for those matches that is then used by the patch itself. There should be little to no usage of method.indexOfFirst(previousIndex) { /* do checking here */ }
, as these checks are now part of the fingerprint itself.
With just opcodes you only get patternMatchResults
which is the start/end. But with instruction filters you get indexes of each filter since there can be variable spacing between each filter.
This is an expansion of what opcodes previously did, which is why opcode filters still exist and can still be used.
Previously with only opcode method calls you could only declare invoke_interface
, but there is no way to indicate what it's invoking, especially when it's a non obfuscated class such as List.add()
. Now you can declare more specific usage of these opcodes and not just patterns which are fragile, can match completely unrelated stuff, and frequently break when just a single extra register move opcode is added by the compiler.
DSL style declarations can be added, that's not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a simple example, where all the index searching was previously in the patch execute:
Before:
internal val shortsBottomBarContainerFingerprint = fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
literal { bottomBarContainer }
}
shortsBottomBarContainerFingerprint.method.apply {
// Search for indexes after the fact, after the fingerprint already resolved.
// First instruction of interest.
val resourceIndex = indexOfFirstLiteralInstruction(bottomBarContainer)
// Second instruction of interest.
val index = indexOfFirstInstructionOrThrow(resourceIndex) {
getReference<MethodReference>()?.name == "getHeight"
}
// Third instruction of interest.
val heightRegister = getInstruction<OneRegisterInstruction>(index + 1).registerA
addInstructions(
index + 2,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
Now the indexOfFirst() logic is in the fingerprint itself:
internal val shortsBottomBarContainerFingerprint by fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
instructions(
// First instruction of interest.
ResourceMappingFilter("id", "bottom_bar_container"),
// Here lies other unrelated instructions.
// Second instruction of interest.
MethodFilter(methodName = "getHeight"),
// Third instruction of interest.
OpcodeFilter(Opcode.MOVE_RESULT)
)
}
shortsBottomBarContainerFingerprint.let {
it.method.apply {
val index = it.filterMatches.last().index
val heightRegister = getInstruction<OneRegisterInstruction>(targetIndex).registerA
addInstructions(
index + 1,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstructionFilter
can be renamed, but I'm unsure what other name to use.
MethodFilter
might be more appropriately named MethodCallFilter
, since it matches method calls based on specifics of that call.
FieldFilter
could be renamed to something like FieldAccessFilter
since it matches iget
, iput
, sget
, sput
, etc.
Edit: Renamed to MethodCallFilter
and FieldAccessFilter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be the usage of method filters, field, and literal constants.
It's up to the patches to declare new filters if desired. Such as ResourceMappingPatch declaring it's own filter that finds decoded resource literals.
While the current filters and functions like addInstructions
are useful for certain cases, they can't cover every scenario. Assuming that filters will always be necessary is a flawed approach, as there may be situations where none of the existing filters are suitable. These utilities are based on assumptions of common usage, but this can limit flexibility.
The same applies to addInstructions
. Its essentially just adding an item to a list, but introducing this specific functionality as an extension function is an overspecialization that conflicts with the goal of maintaining a generic library. Filters should follow the same principle: while the interface for filters is generic, providing a predefined set of filters creates unnecessary constraints. I’d prefer to move the actual filter implementations to an external module, ideally separate from the patcher repo.
Instruction filter no longer has a classDef parameter. The instruction method is passed as a parameter and most filters don't use it, but some require it to check how many indexes the method has and others to parse out the enclosing class.
An instruction filter should only have context about the instructions. Bringing the method into this context is problematic in terms of abstraction. A filter for instructions relying on a method does not sound right. If somehow it is necessary, it means you need to rethink what "instruction filters" actually are. Perhaps they are more than just that given that you need context about the method.
This is an expansion of what opcodes previously did, which is why opcode filters still exist and can still be used.
There should be one clear way to handle instruction fingerprinting. If we’re moving forward with filters, the old opcode filter approach should be replaced with the new approach and reimplemented in it if necessary.
Now you can declare more specific usage of these opcodes and not just patterns which are fragile, can match completely unrelated stuff, and frequently break when just a single extra register move opcode is added by the compiler.
I think here it also shows that there is a specialization in one direction that is assumed to be likely useful. However, it is nonetheless a specialization that shouldn't happen in a library context that is supposed to be abstract. An example is that you can now filter for method references, but how about filtering for the field type only in field references? You'd now ask to implement the interface to satisfy this situation and would have failed to provide an universal API via the existing filters implementations, because, albeit being likely useful, they are after all specialized for specific usecases.
val index = it.filterMatches.last().index
Regarding the API, it can also be useful, if you can declare a filter so that you can reference it later on. This avoids having to rely on the index of filterMatches. In your example it could look like that:
val opcodeFilter = OpcodeFilter(Opcode.MOVE_RESULT)
internal val shortsBottomBarContainerFingerprint by fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
instructions(
// First instruction of interest.
ResourceMappingFilter("id", "bottom_bar_container"),
// Here lies other unrelated instructions.
// Second instruction of interest.
MethodFilter(methodName = "getHeight"),
// Third instruction of interest.
opcodeFilter()
)
}
shortsBottomBarContainerFingerprint.let {
it.method.apply {
val index = iopcodeFilter.index // Notice the reference to the val opcodeFilter
val heightRegister = getInstruction<OneRegisterInstruction>(targetIndex).registerA
addInstructions(
index + 1,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
}
InstructionFilter can be renamed, but I'm unsure what other name to use.
After all you don't just filter based on the instructions. This is the reason I had initially assumed it to be similar to the custom block of fingerprints. For that reason, a different name is needed. Can you explain why you need anything else than instructions to filter instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the API, it can also be useful, if you can declare a filter so that you can reference it later on. This avoids having to rely on the index of filterMatches. In your example it could look like that:
The example you list is exactly how this works right now, and the code shown is taken from the patches PR of this change.
Assuming that filters will always be necessary is a flawed approach, as there may be situations where none of the existing filters are suitable.
The built in filters are no different than the previous opcode declarations, except the filters allow declaring more of the opcode call.
Before the opcode declarations were very limited such as INVOKE_STATIC
where its literally any static call with no way to restrict to specific defining classes, method names, return types, or parameters. Now you can specify what the method call is, such as methodCall(definingClass = "Ljava/lang/String;", name = "toString")
which makes fingerprinting way simpler and much more precise.
The JVM opcode instruction set is never going to remove any of the opcodes used for the built in filters, so there is no assumption of the built in filters becoming outdated.
I don't see any reason to make a different project just for the 9 built in filters, especially since declaring more precise method and field access calls are very basic. Making a separate project means a project needs to import both patcher and the basic declaration of filters, which means they're not really separate projects and should be one project.
If a patch wants to declare some very specific instruction filter they can do that in their own project (just as the RV Patches has it's own resourceLiteral
fingerprint.
Of note, with this PR I added YT 20.02 support in only a few minutes because the small changes in opcode patterns no longer break fingerprints, since the instruction fingerprinting allows picking out instructions that are common between all versions and then ignore all the junk opcodes such as register moves.
but how about filtering for the field type only in field references?
You can already do exactly that with this PR. A field access can declare any part of the access, and leave out the parts that are obfuscated or it's desired to ignore. Such as: fieldAccess(type = "Ljava/lang/String;")
Can you explain why you need anything else than instructions to filter instructions?
The enclosing method is needed for methodCall()
andfieldAccess()
to use the this
keyword, since it's impossible to declare an obfuscated class for the method/field access but using this
can be used to indicate it's a call to the enclosing class. The declaring class (which is part of the method object) is needed for support the functionality. The declaring method is also used by lastInstruction()
since it needs to know how many instructions are in the enclosed method. I don't see any issues with passing along the enclosing method of the instruction as it allows more flexibility such as here.
There should be one clear way to handle instruction fingerprinting. If we’re moving forward with filters, the old opcode filter approach should be replaced with the new approach and reimplemented in it if necessary.
I deprecated the old opcodes()
method, but unless someone wants to spend the possible 10+ hours updating all the old code (I definitely do not want to), then it's much easier and more reliably to deprecate but still support the old patches code.
6215c7c
to
5ff38c9
Compare
5ff38c9
to
132fa00
Compare
What is left to get this PR merged? I want to start using these features. The changes here are much simpler and more flexible than the old way of finding match indexes after the fingerprint has already resolved. (old way): (more old way): |
I am still contemplating about design decisions introduced in this PR. I still need to reply to some reviews, particularly regarding the pre-made filters. After these are cleared, we can merge the PR. |
…ul to sometimes use only opcodes and declaring entirely as instruction() is a lot of boilerplate code. This reverts commit f93f870.
@oSumAtrIX There's an existing bug where if multiple patches modify a single method, then when I added some debug code to show the issue with the matching patches PR. This can be fixed by changing |
This reverts commit 57d8087.
…e patches modify the same code
I committed a work in progress change that fixes the issue, but it's really slow since it's a linear search for each class on each fingerprint. This can be sped up by using a map and not a list, or using a contain class to hold both the immutable and mutable class together (mutable will be null until it's set). I'll try fixing the speed issue by one of these. |
…ting usage of this always immediately uses the mutable class thus there is no reason to lazy load.
I refactored the ProxyClassList and fingerprint lookup after the method is modified is fixed. I also changed the proxy class list to a set and patching is now ~25 seconds faster on my laptop. |
Those are likely changes that should be separate, unless there's some reason they need to be in this PR |
It needs to be here, because without it the fingerprint indexes are broken for some of the YouTube patch changes. |
src/main/kotlin/app/revanced/patcher/patch/BytecodePatchContext.kt
Outdated
Show resolved
Hide resolved
…p map. Encapsulate ProxyList into PatchClasses.
cc4f158
to
8951990
Compare
Adds instruction filters to support more flexible instruction fingerprinting.
Changes
Fingerprints can still use opcode patterns, or they can use instruction filters that allow more precise matching.
Basic support exists for matching instructions using:
Projects can define their own custom instruction filters, such as ResourceMappingPatch with it's own kind of LiteralFilter that matches resource literal values (no more mucking about with using a ResourcePatches to first set a resource value a fingerprint then uses).
Variable space allowed between instruction filters
By default, all filters allow any amount of space between them. But if filters are always immediately after each other, or there is a rough estimate of the maximum number of indexes until the next instruction, then a maximum distance can be set. An example is using an opcode filter of
MOVE_RESULT
orMOVE_RESULT_OBJECT
after a method call, where the max instruction spacing is always 0.Breaking changes
Fuzzy pattern match is now obsolete, as it's functionality is now part of the filtering itself. Variable spacing is allowed between instruction filters, and non important instructions are now ignored by simply not defining instruction filters for them.
Example fingerprint before and after this change
Before:
Now the indexOfFirst() logic is in the fingerprint itself: