Skip to content
This repository was archived by the owner on Dec 15, 2021. It is now read-only.

Commit ea61107

Browse files
committed
Merge pull request #2 from benjcooley/master
Fixed gem installation to include new required JAR's added in a previous commit. Fixed installation documenation.
2 parents 13238f1 + 63c4dc9 commit ea61107

File tree

2 files changed

+223
-121
lines changed

2 files changed

+223
-121
lines changed

README.md

+158-118
Original file line numberDiff line numberDiff line change
@@ -1,177 +1,217 @@
1-
# Logstash Plugin
1+
# Logstash Plugin for Amazon DynamoDB
22

3-
NOTE: CONFIGURATION ON RUNNING THE INPUT PLUGIN FOR DYNAMODB LOOK AT THE BOTTOM
3+
The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice.
44

5-
This is a plugin for [Logstash](https://github.com/elasticsearch/logstash).
5+
Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to https://www.elastic.co/products/elasticsearch.
66

7-
It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
7+
## The following sections walk you through the process to:
88

9-
## Documentation
9+
1. Create a DynamoDB table and enable a new stream on the table.
10+
2. Download, build, and install the Logstash plugin for DynamoDB.
11+
3. Configure Logstash to output to Elasticsearch and the command line.
12+
4. Run the Logstash plugin for DynamoDB.
13+
5. Test Logstash by adding DynamoDB items to the table.
1014

11-
Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/).
15+
When this process is finished, you can search your data in the Elasticsearch cluster.
1216

13-
- For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
14-
- For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide
17+
### Prerequisites
1518

16-
## Need Help?
19+
**The following items are required to use the Logstash plugin for Amazon DynamoDB:**
1720

18-
Need help? Try #logstash on freenode IRC or the [email protected] mailing list.
21+
1. Amazon Web Services (AWS) account with DynamoDB
22+
2. A running Elasticsearch cluster—To download Elasticsearch, go to https://www.elastic.co/products/elasticsearch.
23+
3. Logstash—To download Logstash, go to https://github.com/awslabs/logstash-input-dynamodb.
24+
4. JRuby—To download JRuby, go to http://jruby.org/download.
25+
5. Git—To download Git, go to http://git-scm.com/downloads
26+
6. Apache Maven—To get Apache Maven, go to http://maven.apache.org/.
1927

20-
## Developing
28+
### Before You Begin: Create a Source Table
2129

22-
### 1. Plugin Developement and Testing
30+
In this step, you will create a DynamoDB table with DynamoDB Streams enabled. This will be the source table and writes to this table will be processed by the Logstash plugin for DynamoDB.
2331

24-
#### Code
25-
- To get started, you'll need JRuby with the Bundler gem installed.
32+
**To create the source table**
2633

27-
- Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
34+
1. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
35+
2. Choose **Create Table**.
36+
3. On the **Create Table** page, enter the following settings:
37+
1. **Table Name** — SourceTable
38+
2. **Primary Key Type** — Hash
39+
3. **Hash attribute data type** — Number
40+
4. **Hash Attribute Name** — Id
41+
5. Choose **Continue**.
42+
4. On the **Add Indexes** page, choose **Continue**. You will not need any indexes for this exercise.
43+
5. On the **Provisioned Throughput** page, choose **Continue**.
44+
6. On the **Additional Options** page, do the following:
45+
1. Select **Enable Streams**, and then set the **View Type** to **New and Old Images**.
46+
2. Clear **Use Basic Alarms**. You will not need alarms for this exercise.
47+
3. When you are ready, choose **Continue**.
48+
7. On the **Summary** page, choose **Create**.
2849

29-
- Install dependencies
30-
```sh
31-
bundle install
32-
```
50+
The source table will be created within a few minutes.
51+
52+
### Setting Up the Logstash Plugin for Amazon DynamoDB
53+
54+
To use the Logstash plugin for DynamoDB, you need to build, install, run the plugin, and then you can test it.
55+
56+
**IMPORTANT: in order to successfully build and install Logstash, you must have previously installed ```MAVEN``` to satisfy jar dependencies, and ```JRUBY``` to build and run the logstash gem.**
57+
58+
**To build the Logstash plugin for DynamoDB**
3359

34-
#### Test
60+
At the command prompt, change to the directory where you want to install the Logstash plugin for DynamoDB and demo project.
3561

36-
- Update your dependencies
62+
In the directory where you want the Git project, clone the Git project:
3763

38-
#####TODO: NOT DONE YET
39-
```sh
40-
bundle install
64+
```
65+
git clone https://github.com/awslabs/logstash-input-dynamodb.git
4166
```
4267

43-
- Run tests
68+
**Install the Bundler gem by typing the following:**
4469

45-
```sh
46-
bundle exec rspec
70+
```
71+
jruby -S gem install bundler
4772
```
4873

49-
### 2. Running your unpublished Plugin in Logstash
74+
**NOTE: The ```jruby -S``` syntax ensures that our gem is installed with ```jruby``` and not ```ruby```**
5075

51-
#### 2.1 Run in a local Logstash clone
76+
The Bundler gem checks dependencies for Ruby gems and installs them for you.
77+
78+
To install the dependencies for the Logstash plugin for DynamoDB, type the following command:
5279

53-
##### TODO need to figure out the local plugin path. For now use 'gem build logstash-input-dynamodbstreams.gemspec' and add the absolute path of this the gem created to the Gemfile of the logstash app.
54-
- Edit Logstash `Gemfile` and add the local plugin path, for example:
55-
```ruby
56-
gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
5780
```
58-
- Install plugin
59-
```sh
60-
bin/plugin install --no-verify
81+
jruby -S bundle install
6182
```
62-
- Run Logstash with your plugin
63-
```sh
64-
bin/logstash -e 'filter {awesome {}}'
83+
84+
To build the gem, type the following command:
85+
86+
```
87+
jruby -S gem build logstash-input-dynamodb.gemspec
88+
```
89+
90+
To install the gem, in the logstash-dynamodb-input folder type:
91+
92+
```
93+
jruby -S gem install --local logstash-input-dynamodb-1.0.0-java.gem
6594
```
66-
At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
6795

68-
#### 2.2 Run in an installed Logstash
96+
### To install the Logstash plugin for DynamoDB
97+
98+
Now that you have built the plugin gem, you can install it.
99+
100+
Change directories to your local Logstash directory.
69101

70-
You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
102+
In the Logstash directory, open the Gemfile file in a text editor and add the following line.
71103

72-
- Install all dependencies of the gem
73-
```sh
74-
bundle install
75104
```
105+
gem "logstash-input-dynamodb"
106+
```
107+
108+
To install the plugin, in your logstash folder type the command:
76109

77-
- Build your plugin gem
78-
```sh
79-
gem build logstash-filter-awesome.gemspec
110+
```
111+
bin/plugin install --no-verify logstash-input-dynamodb
80112
```
81113

82-
- Install the plugin from the Logstash home
83-
```sh
84-
bin/plugin install /your/local/plugin/logstash-filter-awesome.gem
114+
To list all the installed plugins type the following command:
115+
116+
```
117+
bin/plugin list
85118
```
86-
- Start Logstash and proceed to test the plugin
87119

88-
## Contributing
120+
If the logstash-output-elasticsearch or logstash-output-stdout plugins are not listed you need to install them. For instructions on installing plugins, go to the Working with Plugins page in the Logstash documentation.
89121

90-
All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
122+
### Running the Logstash Plugin for Amazon DynamoDB
91123

92-
Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
124+
**NOTE: First, make sure you have *Enabled Streams* (see above) for your DynamoDB table(s) before running logstash. Logstash for DynamoDB requires that each table you are logging from have a streams enabled to work.**
93125

94-
It is more important to the community that you are able to contribute.
126+
In the local Logstash directory create a ```logstash-dynamodb.conf``` file with the following contents:
95127

96-
For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file.
128+
```
129+
input {
130+
dynamodb{
131+
endpoint => "dynamodb.us-east-1.amazonaws.com"
132+
streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com"
133+
view_type => "new_and_old_images"
134+
aws_access_key_id => "<access_key_id>"
135+
aws_secret_access_key => "<secret_key>"
136+
table_name => "SourceTable"
137+
}
138+
}
139+
output {
140+
elasticsearch {
141+
host => localhost
142+
}
143+
stdout { }
144+
}
145+
```
97146

98-
#Configuration for DynamoDB Logstash plugin
147+
**Important**
99148

100-
To run the DynamoDB Logstash plugin simply add a configuration following the below documentation.
149+
This is an example configuration. You must replace ```<access_key_id>``` and ```<secret_key>``` with your access key and secret key. If you have credentials saved in a credentials file, you can omit these configuration values.
101150

102-
An example configuration:
103-
input {
104-
dynamodb {
105-
table_name => "My_DynamoDB_Table"
106-
endpoint => "dynamodb.us-west-1.amazonaws.com"
107-
streams_endpoint => "streams.dynamodb.us-west-1.amazonaws.com"
108-
aws_access_key_id => "my aws access key"
109-
aws_secret_access_key => "my aws secret access key"
110-
perform_scan => true
111-
perform_stream => true
112-
read_ops => 100
113-
number_of_write_threads => 8
114-
number_of_scan_threads => 8
115-
log_format => "plain"
116-
view_type => "new_and_old_images"
117-
}
118-
}
151+
To run logstash type:
152+
153+
```
154+
bin/logstash -f logstash-dynamodb.conf
155+
```
156+
157+
Logstash should successfully start and begin indexing the records from your DynamoDB table.
158+
159+
You can also change the other configuration options to match your particular use case.
160+
161+
You can also configure the plugin to index multiple tables by adding additional ```dynamodb { }``` sections to the ```input``` section.
162+
163+
**The following table shows the configuration values.**
164+
165+
### Setting Description
119166

120-
#Configuration Parameters
167+
Settings Id | Description
168+
------- | --------
169+
table_name | The name of the table to index. This table must exist.
170+
endpoint | The DynamoDB endpoint to use. If you are running DynamoDB on your computer, use http://localhost:8000 as the endpoint.
171+
streams_endpoint | The name of a checkpoint table. This does not need to exist prior to plugin activation.
172+
view_type | The view type of the DynamoDB stream. ("new_and_old_images", "new_image", "old_image", "keys_only" Note: these must match the settings for your table's stream configured in the DynamoDB console.)
173+
aws_access_key_id | Your AWS access key ID. This is optional if you have credentials saved in a credentials file. Note: If you are running DynamoDB on your computer, this ID must match the access key ID that you used to create the table. If it does not match, the Logstash plugin will fail because DynamoDB partitions data by access key ID and region.
174+
aws_secret_access_key | Your AWS access key ID. Your AWS access key ID. This is optional if you have credentials saved in a credentials file.
175+
perform_scan | A boolean flag to indicate whether or not Logstash should scan the entire table before streaming new records. Note: Set this option to false if your are restarting the Logstash plugin.
176+
checkpointer | A string that uniquely identifies the KCL checkpointer name and CloudWatch metrics name. This is used when one worker leaves a shard so that another worker knows where to start again.
177+
publish_metrics | Boolean option to publish metrics to CloudWatch using the checkpointer name.
178+
perform_stream | Boolean option to not automatically stream new data into Logstash from DynamoDB streams.
179+
read_ops | Number of read operations per second to perform when scanning the specified table.
180+
number_of_scan_threads | Number of threads to use when scanning the specified table.
181+
number_of_write_threads | Number of threads to write to the Logstash queue when scanning the table.
182+
log_format | Log transfer format. "plain" - Returns the object as a DynamoDB object. "json_drop_binary" - Translates the item format to JSON and drops any binary attributes. "json_binary_as_text" - Translates the item format to JSON and represents any binary attributes as 64-bit encoded binary strings. For more information, see the JSON Data Format topic in the DynamoDB documentation.
121183

122-
config :<variable name>, <type of expected variable>, :required => <true if required to run>, :default => <default value of configuration>
184+
### Testing the Logstash Plugin for Amazon DynamoDB
123185

124-
# The name of the table to copy and stream through Logstash
125-
config :table_name, :validate => :string, :required => true
186+
The Logstash plugin for DynamoDB starts scanning the DynamoDB table and indexing the table data when you run it. As you insert new records into the DynamoDB table, the Logstash plugin consumes the new records from DynamoDB streams to continue indexing.
126187

127-
# Configuration for what information from the scan and streams to include in the log.
128-
# keys_only will return the hash and range keys along with the values for each entry
129-
# new_image will return the entire new entry and keys
130-
# old_image will return the entire entry before modification and keys (NOTE: Cannot perform scan when using this option)
131-
# new_and_old_images will return the old entry before modification along with the new entry and keys
132-
config :view_type, :validate => ["keys_only", "new_image", "old_image", "new_and_old_images"], :required => true
188+
To test this, you can add items to the DynamoDB table in the AWS console, and view the output (stdout) in the command prompt window. The items are also inserted into Elasticsearch and indexed for searching.
133189

134-
# Endpoint from which the table is located. Example: dynamodb.us-east-1.amazonaws.com
135-
config :endpoint, :validate => :string, :required => true
190+
**To test the Logstash plugin for DynamoDB**
136191

137-
# Endpoint from which streams should read. Example: streams.dynamodb.us-east-1.amazonaws.com
138-
config :streams_endpoint, :validate => :string
192+
Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
139193

140-
# AWS credentials access key.
141-
config :aws_access_key_id, :validate => :string, :default => ""
194+
In the list of tables, open (double-click) **SourceTable**.
142195

143-
# AWS credentials secret access key.
144-
config :aws_secret_access_key, :validate => :string, :default => ""
196+
Choose **New Item**, add the following data, and then choose **PutItem**:
145197

146-
# A flag to indicate whether or not the plugin should scan the entire table before streaming new records.
147-
# Streams will only push records that are less than 24 hours old, so in order to get the entire table
148-
# an initial scan must be done.
149-
config :perform_scan, :validate => :boolean, :default => true
198+
Id—1
199+
Message—First item
150200

151-
# A string that uniquely identifies the KCL checkpointer name and cloudwatch metrics name.
152-
# This is used when one worker leaves a shard so that another worker knows where to start again.
153-
config :checkpointer, :validate => :string, :default => "logstash_input_dynamodb_cptr"
201+
Repeat the previous step to add the following data items:
154202

155-
# Option to publish metrics to Cloudwatch using the checkpointer name.
156-
config :publish_metrics, :validate => :boolean, :default => false
203+
Id—2 and Message—Second item
204+
Id—3 and Message—Third item
157205

158-
# Option to not automatically stream new data into logstash from DynamoDB streams.
159-
config :perform_stream, :validate => :boolean, :default => true
206+
Return to the command-prompt window and verify the Logstash output (it should have dumped the logstash output for each item you added to the console).
160207

161-
# Number of read operations per second to perform when scanning the specified table.
162-
config :read_ops, :validate => :number, :default => 1
208+
**(Optional) Go back to the SourceTable in us-east-1 and do the following:**
163209

164-
# Number of threads to use when scanning the specified table
165-
config :number_of_scan_threads, :validate => :number, :default => 1
210+
Update item 2. Set the Message to Hello world!
211+
Delete item 3.
166212

167-
# Number of threads to write to the logstash queue when scanning the table
168-
config :number_of_write_threads, :validate => :number, :default => 1
213+
Go to the command-prompt window and verify the data output.
169214

170-
# Configuation for how the logs will be transferred.
171-
# plain is simply pass the message along without editing it.
172-
# dynamodb will return just the data specified in the view_format in dynamodb format.
173-
# For more information see: docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataFormat.html
174-
# json_drop_binary will return just the data specified in the view_format in JSON while not including any binary values that were present.
175-
# json_binary_as_text will return just the data specified in the view_format in JSON while including binary values as base64-encoded text.
176-
config :log_format, :validate => ["plain", "dynamodb", "json_drop_binary", "json_binary_as_text"], :default => "plain"
215+
You can now search the DynamoDB items in Elasticsearch.
177216

217+
For information about accessing and searching data in Elasticsearch, see the Elasticsearch documentation.

0 commit comments

Comments
 (0)