Skip to content

Commit

Permalink
GP-4009 Introduced BSim functionality including support for postgresql,
Browse files Browse the repository at this point in the history
elasticsearch and h2 databases.  Added BSim correlator to Version
Tracking.
  • Loading branch information
caheckman authored and ghidra1 committed Dec 5, 2023
1 parent f0f5b8f commit 0865a3d
Show file tree
Hide file tree
Showing 509 changed files with 77,126 additions and 935 deletions.
2 changes: 0 additions & 2 deletions Ghidra/Debug/Debugger-agent-frida/certification.manifest
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
##VERSION: 2.0
##MODULE IP: Apache License 2.0
##MODULE IP: Apache License 2.0 with LLVM Exceptions
.classpath||NONE||reviewed||END|
.project||NONE||reviewed||END|
FridaNotes.txt||GHIDRA||||END|
Module.manifest||GHIDRA||||END|
build.gradle||GHIDRA||||END|
Expand Down
2 changes: 0 additions & 2 deletions Ghidra/Debug/Debugger-agent-lldb/certification.manifest
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
##VERSION: 2.0
##MODULE IP: Apache License 2.0
##MODULE IP: Apache License 2.0 with LLVM Exceptions
.classpath||NONE||reviewed||END|
.project||NONE||reviewed||END|
Module.manifest||GHIDRA||||END|
build.gradle||GHIDRA||||END|
src/llvm-project/lldb/bindings/java/java-typemaps.swig||Apache License 2.0 with LLVM Exceptions||||END|
Expand Down
2 changes: 0 additions & 2 deletions Ghidra/Debug/Debugger-swig-lldb/certification.manifest
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
##VERSION: 2.0
##MODULE IP: Apache License 2.0
##MODULE IP: Apache License 2.0 with LLVM Exceptions
.classpath||NONE||reviewed||END|
.project||NONE||reviewed||END|
InstructionsForBuildingLLDBInterface.txt||GHIDRA||||END|
Module.manifest||GHIDRA||||END|
build.gradle||GHIDRA||||END|
Expand Down
81 changes: 81 additions & 0 deletions Ghidra/Extensions/BSimElasticPlugin/INSTALL.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
Installation of the Elasticsearch BSim Plug-in:

In order to use Elasticsearch as the back-end database for a BSim instance,
the lsh plug-in, included with this Ghidra extension, must be installed on
the Elasticsearch cluster.

The lsh plug-in is bundled in the standard plug-in format as the file
'lsh.zip'. It must be installed separately on EVERY node of the cluster,
and each node must be restarted after the install in order for the plug-in to
become active.

For a single node, installation is accomplished with the command-line
'elasticsearch-plugin' script that comes with the standard Elasticsearch
distribution. It expects a URL pointing to the plug-in to be installed.
The basic command, executed in the Elasticsearch installation directory
for the node, is

bin/elasticsearch-plugin install file:///path/to/ghidra/Ghidra/Extensions/BSimElasticPlugin/data/lsh.zip

Replace the initial portion of the absolute path in the URL to point to your
particular Ghidra installation.

Deployment:

Follow the Elasticsearch documentation to do any additional configuration,
starting, stopping, and management of your Elasticsearch cluster.

To try BSim with a toy deployment, you can start a single node (as per the
documentation) from the command-line by just running

bin/elasticsearch

This will dump logging messages to the console, and you should see '[lsh]'
listed among the loaded plug-ins as the node starts up.

Once the Elasticsearch node(s) are running, whether they are a toy or a full
deployment, you can immediately proceed to the BSim 'bsim' command.
The Ghidra/BSim client and 'bsim' command automatically assume an
Elasticsearch server when they see the 'https' protocol in the provided URLs,
although the 'elastic" protocol may also be specified and is equivalent.
The use of the 'http' protocol for Elasticsearch is not supported.
Adjust the hostname, port number, and repository name as appropriate.
Use a command-line similar to the following to create a BSim instance:

bsim createdatabase elastic://1.2.3.4:9200/repo medium_32

This is equivalent to:

bsim createdatabase https://1.2.3.4:9200/repo medium_32

Use a command-line like this to generate and commit signatures from a Ghidra Server
repository to the Elasticsearch database created above:

bsim generatesigs ghidra://1.2.3.4/repo bsim=elastic://1.2.3.4:9200/repo

Within Ghidra's BSim client, enter the same URL into the database connection
panel in order to place queries to your Elasticsearch deployment. See the BSim
documentation included with Ghidra for full details.


Version:

The current BSim plug-in was designed and tested with Elasticsearch version 7.17.4.
A change to the Elasticsearch scripting interface, starting with version 7.15, makes the BSim
plug-in incompatible with previous versions, but the lsh plug-in jars may work without change
across later Elasticsearch versions.

Elasticsearch plug-ins explicitly encode the version of Elasticsearch they work with, and the
plug-in script will refuse to install the lsh plug-in if its version does not match your
particular installation. If your Elasticsearch version is slightly different, you can try
unpacking the zip file, changing the version number to match your software, and then repacking
the zip file. Within the zip archive, the version number is stored in a configuration file

elasticsearch/plugin-descriptor.properties

The file format is fairly simple: edit the line

elasticsearch.version=7.17.4

The plugin may work with other nearby versions, but proceed at your own risk.

Empty file.
99 changes: 99 additions & 0 deletions Ghidra/Extensions/BSimElasticPlugin/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
apply from: "$rootProject.projectDir/gradle/distributableGhidraExtension.gradle"
apply from: "$rootProject.projectDir/gradle/javaProject.gradle"
apply plugin: 'eclipse'
eclipse.project.name = 'Xtra BSimElasticPlugin'
// This module is very different from other Ghidra modules. It is creating a stand-alone jar
// file for an elastic database plugin. It is copying files from other modules into this module
// before building a jar file from the files in this module and the cherry-picked files from
// other modules (This is very brittle and will break if any of the files are renamed or moved.)
project.ext.includeExtensionInInstallation = true

apply plugin: 'java'

sourceSets {
elasticPlugin {
java {
srcDirs = [ 'src', 'srcdummy', 'build/genericSrc', 'build/utilitySrc', 'build/bsimSrc' ]
}
}
}
// this dependency block is needed for this code to compile in our eclipse environment. It is not needed
// for the gradle build
dependencies {

implementation project(':BSim')
}
libsDirName='ziplayout'

task copyGenericTask(type: Copy) {
from project(':Generic').file('src/main/java')
into 'build/genericSrc'
include 'generic/lsh/vector/*.java'
include 'generic/hash/SimpleCRC32.java'
include 'ghidra/util/xml/SpecXmlUtils.java'
}

task copyUtilityTask(type: Copy) {
from project(':Utility').file('src/main/java')
into 'build/utilitySrc'
include 'ghidra/xml/XmlPullParser.java'
include 'ghidra/xml/XmlElement.java'
}

task copyBSimTask(type: Copy) {
from project(':BSim').file('src/main/java')
into 'build/bsimSrc'
include 'ghidra/features/bsim/query/elastic/ElasticUtilities.java'
include 'ghidra/features/bsim/query/elastic/Base64Lite.java'
include 'ghidra/features/bsim/query/elastic/Base64VectorFactory.java'
}

task copyPropertiesFile(type: Copy) {
from 'contribZipExclude/plugin-descriptor.properties'
into 'build/ziplayout'
}

task elasticPluginJar(type: Jar) {
from sourceSets.elasticPlugin.output
archiveBaseName = 'lsh'
excludes = [
'**/org/apache',
'**/org/elasticsearch/common',
'**/org/elasticsearch/env',
'**/org/elasticsearch/index',
'**/org/elasticsearch/indices',
'**/org/elasticsearch/plugins',
'**/org/elasticsearch/script',
'**/org/elasticsearch/search'
]
}

task elasticPluginZip(type: Zip) {
from 'build/ziplayout'
archiveBaseName = 'lsh'
destinationDirectory = file("build/data")
}

compileElasticPluginJava.dependsOn copyGenericTask
compileElasticPluginJava.dependsOn copyUtilityTask
compileElasticPluginJava.dependsOn copyBSimTask

elasticPluginZip.dependsOn elasticPluginJar
elasticPluginZip.dependsOn copyPropertiesFile

jar.dependsOn elasticPluginZip
6 changes: 6 additions & 0 deletions Ghidra/Extensions/BSimElasticPlugin/certification.manifest
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
##VERSION: 2.0
##MODULE IP: Apache License 2.0
INSTALL.txt||GHIDRA||||END|
Module.manifest||GHIDRA||reviewed||END|
contribZipExclude/plugin-descriptor.properties||GHIDRA||||END|
extension.properties||GHIDRA||||END|
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
description=Feature Vector Plugin
version=1.0
name=lsh
classname=org.elasticsearch.plugin.analysis.lsh.AnalysisLSHPlugin
java.version=1.11
elasticsearch.version=8.8.1
5 changes: 5 additions & 0 deletions Ghidra/Extensions/BSimElasticPlugin/extension.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name=BSimElasticPlugin
description=Elastic search backend for BSim.
author=Ghidra Team
createdOn=11/23/20
version=@extversion@
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.elasticsearch.plugin.analysis.lsh;

import java.io.IOException;
import java.util.*;

import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexModule;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider;
import org.elasticsearch.plugins.*;
import org.elasticsearch.script.ScriptContext;
import org.elasticsearch.script.ScriptEngine;

import generic.lsh.vector.IDFLookup;
import generic.lsh.vector.WeightFactory;
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
import ghidra.features.bsim.query.elastic.ElasticUtilities;

public class AnalysisLSHPlugin extends Plugin implements AnalysisPlugin, ScriptPlugin {

public static final String TOKENIZER_SETTINGS_BASE = "index.analysis.tokenizer.lsh_";
public static String settingString = "";

static private Map<String, Base64VectorFactory> vecFactoryMap = new HashMap<>();
private Map<String, AnalysisProvider<TokenizerFactory>> tokFactoryMap;

public class TokenizerFactoryProvider implements AnalysisProvider<TokenizerFactory> {

@Override
public TokenizerFactory get(IndexSettings indexSettings, Environment env, String name,
Settings settings) throws IOException {
// settingString = settingString + " : " + indexSettings.getIndex().getName() + '(' + name + ')';
return new LSHTokenizerFactory(indexSettings, env, name, settings);
}
}

public AnalysisLSHPlugin() {
TokenizerFactoryProvider provider = new TokenizerFactoryProvider();
tokFactoryMap = Collections.singletonMap("lsh_tokenizer", provider);
}

private static void setupVectorFactory(String name, String idfConfig, String lshWeights) {
WeightFactory weightFactory = new WeightFactory();
String[] split = lshWeights.split(" ");
double[] weightArray = new double[split.length];
for (int i = 0; i < weightArray.length; ++i) {
weightArray[i] = Double.parseDouble(split[i]);
}
weightFactory.set(weightArray);
IDFLookup idfLookup = new IDFLookup();
split = idfConfig.split(" ");
int[] intArray = new int[split.length];
for (int i = 0; i < intArray.length; ++i) {
intArray[i] = Integer.parseInt(split[i]);
}
idfLookup.set(intArray);
Base64VectorFactory vectorFactory = new Base64VectorFactory();
// Server-side factory is never used to generate signatures,
// so we don't need to specify settings
vectorFactory.set(weightFactory, idfLookup, 0);
vecFactoryMap.put(name, vectorFactory);
}

/**
* Entry point for Tokenizer and Script factories to grab the global vector factory
* @param name is the name of the tokenizer
* @return the vector factory used by the tokenizer
*/
public static Base64VectorFactory getVectorFactory(String name) {
return vecFactoryMap.get(name);
}

@Override
public void onIndexModule(IndexModule indexModule) {
super.onIndexModule(indexModule);

Settings settings = indexModule.getSettings();
String name = null;
// Look for the specific kind of tokenizer settings, within the global settings for the index
for (String key : settings.keySet()) {
if (key.startsWith(TOKENIZER_SETTINGS_BASE)) {
// We can have different settings for different indices, distinguished by this name
int pos = key.indexOf('.', TOKENIZER_SETTINGS_BASE.length() + 1);
if (pos > 0) {
name = key.substring(TOKENIZER_SETTINGS_BASE.length(), pos);
break;
}
}
}
if (name != null) {
String tokenizerName = "lsh_" + name;
if (getVectorFactory(tokenizerName) != null) {
return; // Factory already exists
}
settingString = settingString + " : onModule(" + name + ')';
// If we found LSH tokenizer settings, pull them out and construct an LSHVectorFactory with them
String baseKey = TOKENIZER_SETTINGS_BASE + name + '.';
String idfConfig = settings.get(baseKey + ElasticUtilities.IDF_CONFIG);
String lshWeights = settings.get(baseKey + ElasticUtilities.LSH_WEIGHTS);
if (idfConfig == null || lshWeights == null) {
return; // IDF_CONFIG and LSH_WEIGHTS settings must be present to proceed
}
setupVectorFactory(tokenizerName, idfConfig, lshWeights);
}
}

@Override
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
return new BSimScriptEngine();
}

@Override
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
return tokFactoryMap;
}

}
Loading

0 comments on commit 0865a3d

Please sign in to comment.