forked from NationalSecurityAgency/ghidra
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GP-4009 Introduced BSim functionality including support for postgresql,
elasticsearch and h2 databases. Added BSim correlator to Version Tracking.
- Loading branch information
Showing
509 changed files
with
77,126 additions
and
935 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
Installation of the Elasticsearch BSim Plug-in: | ||
|
||
In order to use Elasticsearch as the back-end database for a BSim instance, | ||
the lsh plug-in, included with this Ghidra extension, must be installed on | ||
the Elasticsearch cluster. | ||
|
||
The lsh plug-in is bundled in the standard plug-in format as the file | ||
'lsh.zip'. It must be installed separately on EVERY node of the cluster, | ||
and each node must be restarted after the install in order for the plug-in to | ||
become active. | ||
|
||
For a single node, installation is accomplished with the command-line | ||
'elasticsearch-plugin' script that comes with the standard Elasticsearch | ||
distribution. It expects a URL pointing to the plug-in to be installed. | ||
The basic command, executed in the Elasticsearch installation directory | ||
for the node, is | ||
|
||
bin/elasticsearch-plugin install file:///path/to/ghidra/Ghidra/Extensions/BSimElasticPlugin/data/lsh.zip | ||
|
||
Replace the initial portion of the absolute path in the URL to point to your | ||
particular Ghidra installation. | ||
|
||
Deployment: | ||
|
||
Follow the Elasticsearch documentation to do any additional configuration, | ||
starting, stopping, and management of your Elasticsearch cluster. | ||
|
||
To try BSim with a toy deployment, you can start a single node (as per the | ||
documentation) from the command-line by just running | ||
|
||
bin/elasticsearch | ||
|
||
This will dump logging messages to the console, and you should see '[lsh]' | ||
listed among the loaded plug-ins as the node starts up. | ||
|
||
Once the Elasticsearch node(s) are running, whether they are a toy or a full | ||
deployment, you can immediately proceed to the BSim 'bsim' command. | ||
The Ghidra/BSim client and 'bsim' command automatically assume an | ||
Elasticsearch server when they see the 'https' protocol in the provided URLs, | ||
although the 'elastic" protocol may also be specified and is equivalent. | ||
The use of the 'http' protocol for Elasticsearch is not supported. | ||
Adjust the hostname, port number, and repository name as appropriate. | ||
Use a command-line similar to the following to create a BSim instance: | ||
|
||
bsim createdatabase elastic://1.2.3.4:9200/repo medium_32 | ||
|
||
This is equivalent to: | ||
|
||
bsim createdatabase https://1.2.3.4:9200/repo medium_32 | ||
|
||
Use a command-line like this to generate and commit signatures from a Ghidra Server | ||
repository to the Elasticsearch database created above: | ||
|
||
bsim generatesigs ghidra://1.2.3.4/repo bsim=elastic://1.2.3.4:9200/repo | ||
|
||
Within Ghidra's BSim client, enter the same URL into the database connection | ||
panel in order to place queries to your Elasticsearch deployment. See the BSim | ||
documentation included with Ghidra for full details. | ||
|
||
|
||
Version: | ||
|
||
The current BSim plug-in was designed and tested with Elasticsearch version 7.17.4. | ||
A change to the Elasticsearch scripting interface, starting with version 7.15, makes the BSim | ||
plug-in incompatible with previous versions, but the lsh plug-in jars may work without change | ||
across later Elasticsearch versions. | ||
|
||
Elasticsearch plug-ins explicitly encode the version of Elasticsearch they work with, and the | ||
plug-in script will refuse to install the lsh plug-in if its version does not match your | ||
particular installation. If your Elasticsearch version is slightly different, you can try | ||
unpacking the zip file, changing the version number to match your software, and then repacking | ||
the zip file. Within the zip archive, the version number is stored in a configuration file | ||
|
||
elasticsearch/plugin-descriptor.properties | ||
|
||
The file format is fairly simple: edit the line | ||
|
||
elasticsearch.version=7.17.4 | ||
|
||
The plugin may work with other nearby versions, but proceed at your own risk. | ||
|
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
/* ### | ||
* IP: GHIDRA | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
apply from: "$rootProject.projectDir/gradle/distributableGhidraExtension.gradle" | ||
apply from: "$rootProject.projectDir/gradle/javaProject.gradle" | ||
apply plugin: 'eclipse' | ||
eclipse.project.name = 'Xtra BSimElasticPlugin' | ||
// This module is very different from other Ghidra modules. It is creating a stand-alone jar | ||
// file for an elastic database plugin. It is copying files from other modules into this module | ||
// before building a jar file from the files in this module and the cherry-picked files from | ||
// other modules (This is very brittle and will break if any of the files are renamed or moved.) | ||
project.ext.includeExtensionInInstallation = true | ||
|
||
apply plugin: 'java' | ||
|
||
sourceSets { | ||
elasticPlugin { | ||
java { | ||
srcDirs = [ 'src', 'srcdummy', 'build/genericSrc', 'build/utilitySrc', 'build/bsimSrc' ] | ||
} | ||
} | ||
} | ||
// this dependency block is needed for this code to compile in our eclipse environment. It is not needed | ||
// for the gradle build | ||
dependencies { | ||
|
||
implementation project(':BSim') | ||
} | ||
libsDirName='ziplayout' | ||
|
||
task copyGenericTask(type: Copy) { | ||
from project(':Generic').file('src/main/java') | ||
into 'build/genericSrc' | ||
include 'generic/lsh/vector/*.java' | ||
include 'generic/hash/SimpleCRC32.java' | ||
include 'ghidra/util/xml/SpecXmlUtils.java' | ||
} | ||
|
||
task copyUtilityTask(type: Copy) { | ||
from project(':Utility').file('src/main/java') | ||
into 'build/utilitySrc' | ||
include 'ghidra/xml/XmlPullParser.java' | ||
include 'ghidra/xml/XmlElement.java' | ||
} | ||
|
||
task copyBSimTask(type: Copy) { | ||
from project(':BSim').file('src/main/java') | ||
into 'build/bsimSrc' | ||
include 'ghidra/features/bsim/query/elastic/ElasticUtilities.java' | ||
include 'ghidra/features/bsim/query/elastic/Base64Lite.java' | ||
include 'ghidra/features/bsim/query/elastic/Base64VectorFactory.java' | ||
} | ||
|
||
task copyPropertiesFile(type: Copy) { | ||
from 'contribZipExclude/plugin-descriptor.properties' | ||
into 'build/ziplayout' | ||
} | ||
|
||
task elasticPluginJar(type: Jar) { | ||
from sourceSets.elasticPlugin.output | ||
archiveBaseName = 'lsh' | ||
excludes = [ | ||
'**/org/apache', | ||
'**/org/elasticsearch/common', | ||
'**/org/elasticsearch/env', | ||
'**/org/elasticsearch/index', | ||
'**/org/elasticsearch/indices', | ||
'**/org/elasticsearch/plugins', | ||
'**/org/elasticsearch/script', | ||
'**/org/elasticsearch/search' | ||
] | ||
} | ||
|
||
task elasticPluginZip(type: Zip) { | ||
from 'build/ziplayout' | ||
archiveBaseName = 'lsh' | ||
destinationDirectory = file("build/data") | ||
} | ||
|
||
compileElasticPluginJava.dependsOn copyGenericTask | ||
compileElasticPluginJava.dependsOn copyUtilityTask | ||
compileElasticPluginJava.dependsOn copyBSimTask | ||
|
||
elasticPluginZip.dependsOn elasticPluginJar | ||
elasticPluginZip.dependsOn copyPropertiesFile | ||
|
||
jar.dependsOn elasticPluginZip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
##VERSION: 2.0 | ||
##MODULE IP: Apache License 2.0 | ||
INSTALL.txt||GHIDRA||||END| | ||
Module.manifest||GHIDRA||reviewed||END| | ||
contribZipExclude/plugin-descriptor.properties||GHIDRA||||END| | ||
extension.properties||GHIDRA||||END| |
6 changes: 6 additions & 0 deletions
6
Ghidra/Extensions/BSimElasticPlugin/contribZipExclude/plugin-descriptor.properties
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
description=Feature Vector Plugin | ||
version=1.0 | ||
name=lsh | ||
classname=org.elasticsearch.plugin.analysis.lsh.AnalysisLSHPlugin | ||
java.version=1.11 | ||
elasticsearch.version=8.8.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
name=BSimElasticPlugin | ||
description=Elastic search backend for BSim. | ||
author=Ghidra Team | ||
createdOn=11/23/20 | ||
version=@extversion@ |
134 changes: 134 additions & 0 deletions
134
...nsions/BSimElasticPlugin/src/org/elasticsearch/plugin/analysis/lsh/AnalysisLSHPlugin.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
/* ### | ||
* IP: GHIDRA | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.elasticsearch.plugin.analysis.lsh; | ||
|
||
import java.io.IOException; | ||
import java.util.*; | ||
|
||
import org.elasticsearch.common.settings.Settings; | ||
import org.elasticsearch.env.Environment; | ||
import org.elasticsearch.index.IndexModule; | ||
import org.elasticsearch.index.IndexSettings; | ||
import org.elasticsearch.index.analysis.TokenizerFactory; | ||
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider; | ||
import org.elasticsearch.plugins.*; | ||
import org.elasticsearch.script.ScriptContext; | ||
import org.elasticsearch.script.ScriptEngine; | ||
|
||
import generic.lsh.vector.IDFLookup; | ||
import generic.lsh.vector.WeightFactory; | ||
import ghidra.features.bsim.query.elastic.Base64VectorFactory; | ||
import ghidra.features.bsim.query.elastic.ElasticUtilities; | ||
|
||
public class AnalysisLSHPlugin extends Plugin implements AnalysisPlugin, ScriptPlugin { | ||
|
||
public static final String TOKENIZER_SETTINGS_BASE = "index.analysis.tokenizer.lsh_"; | ||
public static String settingString = ""; | ||
|
||
static private Map<String, Base64VectorFactory> vecFactoryMap = new HashMap<>(); | ||
private Map<String, AnalysisProvider<TokenizerFactory>> tokFactoryMap; | ||
|
||
public class TokenizerFactoryProvider implements AnalysisProvider<TokenizerFactory> { | ||
|
||
@Override | ||
public TokenizerFactory get(IndexSettings indexSettings, Environment env, String name, | ||
Settings settings) throws IOException { | ||
// settingString = settingString + " : " + indexSettings.getIndex().getName() + '(' + name + ')'; | ||
return new LSHTokenizerFactory(indexSettings, env, name, settings); | ||
} | ||
} | ||
|
||
public AnalysisLSHPlugin() { | ||
TokenizerFactoryProvider provider = new TokenizerFactoryProvider(); | ||
tokFactoryMap = Collections.singletonMap("lsh_tokenizer", provider); | ||
} | ||
|
||
private static void setupVectorFactory(String name, String idfConfig, String lshWeights) { | ||
WeightFactory weightFactory = new WeightFactory(); | ||
String[] split = lshWeights.split(" "); | ||
double[] weightArray = new double[split.length]; | ||
for (int i = 0; i < weightArray.length; ++i) { | ||
weightArray[i] = Double.parseDouble(split[i]); | ||
} | ||
weightFactory.set(weightArray); | ||
IDFLookup idfLookup = new IDFLookup(); | ||
split = idfConfig.split(" "); | ||
int[] intArray = new int[split.length]; | ||
for (int i = 0; i < intArray.length; ++i) { | ||
intArray[i] = Integer.parseInt(split[i]); | ||
} | ||
idfLookup.set(intArray); | ||
Base64VectorFactory vectorFactory = new Base64VectorFactory(); | ||
// Server-side factory is never used to generate signatures, | ||
// so we don't need to specify settings | ||
vectorFactory.set(weightFactory, idfLookup, 0); | ||
vecFactoryMap.put(name, vectorFactory); | ||
} | ||
|
||
/** | ||
* Entry point for Tokenizer and Script factories to grab the global vector factory | ||
* @param name is the name of the tokenizer | ||
* @return the vector factory used by the tokenizer | ||
*/ | ||
public static Base64VectorFactory getVectorFactory(String name) { | ||
return vecFactoryMap.get(name); | ||
} | ||
|
||
@Override | ||
public void onIndexModule(IndexModule indexModule) { | ||
super.onIndexModule(indexModule); | ||
|
||
Settings settings = indexModule.getSettings(); | ||
String name = null; | ||
// Look for the specific kind of tokenizer settings, within the global settings for the index | ||
for (String key : settings.keySet()) { | ||
if (key.startsWith(TOKENIZER_SETTINGS_BASE)) { | ||
// We can have different settings for different indices, distinguished by this name | ||
int pos = key.indexOf('.', TOKENIZER_SETTINGS_BASE.length() + 1); | ||
if (pos > 0) { | ||
name = key.substring(TOKENIZER_SETTINGS_BASE.length(), pos); | ||
break; | ||
} | ||
} | ||
} | ||
if (name != null) { | ||
String tokenizerName = "lsh_" + name; | ||
if (getVectorFactory(tokenizerName) != null) { | ||
return; // Factory already exists | ||
} | ||
settingString = settingString + " : onModule(" + name + ')'; | ||
// If we found LSH tokenizer settings, pull them out and construct an LSHVectorFactory with them | ||
String baseKey = TOKENIZER_SETTINGS_BASE + name + '.'; | ||
String idfConfig = settings.get(baseKey + ElasticUtilities.IDF_CONFIG); | ||
String lshWeights = settings.get(baseKey + ElasticUtilities.LSH_WEIGHTS); | ||
if (idfConfig == null || lshWeights == null) { | ||
return; // IDF_CONFIG and LSH_WEIGHTS settings must be present to proceed | ||
} | ||
setupVectorFactory(tokenizerName, idfConfig, lshWeights); | ||
} | ||
} | ||
|
||
@Override | ||
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) { | ||
return new BSimScriptEngine(); | ||
} | ||
|
||
@Override | ||
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() { | ||
return tokFactoryMap; | ||
} | ||
|
||
} |
Oops, something went wrong.