Skip to content

Add other attack mechanisms #2

Open
@100

Description

@100

Right now we assume no feedback between adversary and classifier.

What if the adversary has access to the labels? What if the adversary has access to the raw probabilities? What is the adversary has access to some observation that can be linked back to the label or probability?

These are very broad, and while some have been addressed in machine learning literature, there are many possible takes on this as it specifically applies to text classification.

Potential ideas (this list will grow):

  • Use Lime to identify words that are important to classification results and apply targeted attacks
  • Simulate a sequence of back-and-forths between classifier and adversary

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions