|
1 |
| -# Logstash Plugin |
| 1 | +# Logstash Plugin for Amazon DynamoDB |
2 | 2 |
|
3 |
| -NOTE: CONFIGURATION ON RUNNING THE INPUT PLUGIN FOR DYNAMODB LOOK AT THE BOTTOM |
| 3 | +The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice. |
4 | 4 |
|
5 |
| -This is a plugin for [Logstash](https://github.com/elasticsearch/logstash). |
| 5 | +Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to https://www.elastic.co/products/elasticsearch. |
6 | 6 |
|
7 |
| -It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. |
| 7 | +## The following sections walk you through the process to: |
8 | 8 |
|
9 |
| -## Documentation |
| 9 | +1. Create a DynamoDB table and enable a new stream on the table. |
| 10 | +2. Download, build, and install the Logstash plugin for DynamoDB. |
| 11 | +3. Configure Logstash to output to Elasticsearch and the command line. |
| 12 | +4. Run the Logstash plugin for DynamoDB. |
| 13 | +5. Test Logstash by adding DynamoDB items to the table. |
10 | 14 |
|
11 |
| -Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/). |
| 15 | +When this process is finished, you can search your data in the Elasticsearch cluster. |
12 | 16 |
|
13 |
| -- For formatting code or config example, you can use the asciidoc `[source,ruby]` directive |
14 |
| -- For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide |
| 17 | +### Prerequisites |
15 | 18 |
|
16 |
| -## Need Help? |
| 19 | +**The following items are required to use the Logstash plugin for Amazon DynamoDB:** |
17 | 20 |
|
18 |
| -Need help? Try #logstash on freenode IRC or the [email protected] mailing list. |
| 21 | +1. Amazon Web Services (AWS) account with DynamoDB |
| 22 | +2. A running Elasticsearch cluster—To download Elasticsearch, go to https://www.elastic.co/products/elasticsearch. |
| 23 | +3. Logstash—To download Logstash, go to https://github.com/awslabs/logstash-input-dynamodb. |
| 24 | +4. JRuby—To download JRuby, go to http://jruby.org/download. |
| 25 | +5. Git—To download Git, go to http://git-scm.com/downloads |
| 26 | +6. Apache Maven—To get Apache Maven, go to http://maven.apache.org/. |
19 | 27 |
|
20 |
| -## Developing |
| 28 | +### Before You Begin: Create a Source Table |
21 | 29 |
|
22 |
| -### 1. Plugin Developement and Testing |
| 30 | +In this step, you will create a DynamoDB table with DynamoDB Streams enabled. This will be the source table and writes to this table will be processed by the Logstash plugin for DynamoDB. |
23 | 31 |
|
24 |
| -#### Code |
25 |
| -- To get started, you'll need JRuby with the Bundler gem installed. |
| 32 | +**To create the source table** |
26 | 33 |
|
27 |
| -- Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example). |
| 34 | +1. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/. |
| 35 | +2. Choose **Create Table**. |
| 36 | +3. On the **Create Table** page, enter the following settings: |
| 37 | + 1. **Table Name** — SourceTable |
| 38 | + 2. **Primary Key Type** — Hash |
| 39 | + 3. **Hash attribute data type** — Number |
| 40 | + 4. **Hash Attribute Name** — Id |
| 41 | + 5. Choose **Continue**. |
| 42 | +4. On the **Add Indexes** page, choose **Continue**. You will not need any indexes for this exercise. |
| 43 | +5. On the **Provisioned Throughput** page, choose **Continue**. |
| 44 | +6. On the **Additional Options** page, do the following: |
| 45 | + 1. Select **Enable Streams**, and then set the **View Type** to **New and Old Images**. |
| 46 | + 2. Clear **Use Basic Alarms**. You will not need alarms for this exercise. |
| 47 | + 3. When you are ready, choose **Continue**. |
| 48 | +7. On the **Summary** page, choose **Create**. |
28 | 49 |
|
29 |
| -- Install dependencies |
30 |
| -```sh |
31 |
| -bundle install |
32 |
| -``` |
| 50 | +The source table will be created within a few minutes. |
| 51 | + |
| 52 | +### Setting Up the Logstash Plugin for Amazon DynamoDB |
| 53 | + |
| 54 | +To use the Logstash plugin for DynamoDB, you need to build, install, run the plugin, and then you can test it. |
| 55 | + |
| 56 | +**IMPORTANT: in order to successfully build and install Logstash, you must have previously installed ```MAVEN``` to satisfy jar dependencies, and ```JRUBY``` to build and run the logstash gem.** |
| 57 | + |
| 58 | +**To build the Logstash plugin for DynamoDB** |
33 | 59 |
|
34 |
| -#### Test |
| 60 | +At the command prompt, change to the directory where you want to install the Logstash plugin for DynamoDB and demo project. |
35 | 61 |
|
36 |
| -- Update your dependencies |
| 62 | +In the directory where you want the Git project, clone the Git project: |
37 | 63 |
|
38 |
| -#####TODO: NOT DONE YET |
39 |
| -```sh |
40 |
| -bundle install |
| 64 | +``` |
| 65 | +git clone https://github.com/awslabs/logstash-input-dynamodb.git |
41 | 66 | ```
|
42 | 67 |
|
43 |
| -- Run tests |
| 68 | +**Install the Bundler gem by typing the following:** |
44 | 69 |
|
45 |
| -```sh |
46 |
| -bundle exec rspec |
| 70 | +``` |
| 71 | +jruby -S gem install bundler |
47 | 72 | ```
|
48 | 73 |
|
49 |
| -### 2. Running your unpublished Plugin in Logstash |
| 74 | +**NOTE: The ```jruby -S``` syntax ensures that our gem is installed with ```jruby``` and not ```ruby```** |
50 | 75 |
|
51 |
| -#### 2.1 Run in a local Logstash clone |
| 76 | +The Bundler gem checks dependencies for Ruby gems and installs them for you. |
| 77 | + |
| 78 | +To install the dependencies for the Logstash plugin for DynamoDB, type the following command: |
52 | 79 |
|
53 |
| -##### TODO need to figure out the local plugin path. For now use 'gem build logstash-input-dynamodbstreams.gemspec' and add the absolute path of this the gem created to the Gemfile of the logstash app. |
54 |
| -- Edit Logstash `Gemfile` and add the local plugin path, for example: |
55 |
| -```ruby |
56 |
| -gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome" |
57 | 80 | ```
|
58 |
| -- Install plugin |
59 |
| -```sh |
60 |
| -bin/plugin install --no-verify |
| 81 | +jruby -S bundle install |
61 | 82 | ```
|
62 |
| -- Run Logstash with your plugin |
63 |
| -```sh |
64 |
| -bin/logstash -e 'filter {awesome {}}' |
| 83 | + |
| 84 | +To build the gem, type the following command: |
| 85 | + |
| 86 | +``` |
| 87 | +jruby -S gem build logstash-input-dynamodb.gemspec |
| 88 | +``` |
| 89 | + |
| 90 | +To install the gem, in the logstash-dynamodb-input folder type: |
| 91 | + |
| 92 | +``` |
| 93 | +jruby -S gem install --local logstash-input-dynamodb-1.0.0-java.gem |
65 | 94 | ```
|
66 |
| -At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash. |
67 | 95 |
|
68 |
| -#### 2.2 Run in an installed Logstash |
| 96 | +### To install the Logstash plugin for DynamoDB |
| 97 | + |
| 98 | +Now that you have built the plugin gem, you can install it. |
| 99 | + |
| 100 | +Change directories to your local Logstash directory. |
69 | 101 |
|
70 |
| -You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using: |
| 102 | +In the Logstash directory, open the Gemfile file in a text editor and add the following line. |
71 | 103 |
|
72 |
| -- Install all dependencies of the gem |
73 |
| -```sh |
74 |
| -bundle install |
75 | 104 | ```
|
| 105 | +gem "logstash-input-dynamodb" |
| 106 | +``` |
| 107 | + |
| 108 | +To install the plugin, in your logstash folder type the command: |
76 | 109 |
|
77 |
| -- Build your plugin gem |
78 |
| -```sh |
79 |
| -gem build logstash-filter-awesome.gemspec |
| 110 | +``` |
| 111 | +bin/plugin install --no-verify logstash-input-dynamodb |
80 | 112 | ```
|
81 | 113 |
|
82 |
| -- Install the plugin from the Logstash home |
83 |
| -```sh |
84 |
| -bin/plugin install /your/local/plugin/logstash-filter-awesome.gem |
| 114 | +To list all the installed plugins type the following command: |
| 115 | + |
| 116 | +``` |
| 117 | +bin/plugin list |
85 | 118 | ```
|
86 |
| -- Start Logstash and proceed to test the plugin |
87 | 119 |
|
88 |
| -## Contributing |
| 120 | +If the logstash-output-elasticsearch or logstash-output-stdout plugins are not listed you need to install them. For instructions on installing plugins, go to the Working with Plugins page in the Logstash documentation. |
89 | 121 |
|
90 |
| -All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin. |
| 122 | +### Running the Logstash Plugin for Amazon DynamoDB |
91 | 123 |
|
92 |
| -Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here. |
| 124 | +**NOTE: First, make sure you have *Enabled Streams* (see above) for your DynamoDB table(s) before running logstash. Logstash for DynamoDB requires that each table you are logging from have a streams enabled to work.** |
93 | 125 |
|
94 |
| -It is more important to the community that you are able to contribute. |
| 126 | +In the local Logstash directory create a ```logstash-dynamodb.conf``` file with the following contents: |
95 | 127 |
|
96 |
| -For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file. |
| 128 | +``` |
| 129 | +input { |
| 130 | + dynamodb{ |
| 131 | + endpoint => "dynamodb.us-east-1.amazonaws.com" |
| 132 | + streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com" |
| 133 | + view_type => "new_and_old_images" |
| 134 | + aws_access_key_id => "<access_key_id>" |
| 135 | + aws_secret_access_key => "<secret_key>" |
| 136 | + table_name => "SourceTable" |
| 137 | + } |
| 138 | +} |
| 139 | +output { |
| 140 | + elasticsearch { |
| 141 | + host => localhost |
| 142 | + } |
| 143 | + stdout { } |
| 144 | +} |
| 145 | +``` |
97 | 146 |
|
98 |
| -#Configuration for DynamoDB Logstash plugin |
| 147 | +**Important** |
99 | 148 |
|
100 |
| -To run the DynamoDB Logstash plugin simply add a configuration following the below documentation. |
| 149 | +This is an example configuration. You must replace ```<access_key_id>``` and ```<secret_key>``` with your access key and secret key. If you have credentials saved in a credentials file, you can omit these configuration values. |
101 | 150 |
|
102 |
| -An example configuration: |
103 |
| -input { |
104 |
| - dynamodb { |
105 |
| - table_name => "My_DynamoDB_Table" |
106 |
| - endpoint => "dynamodb.us-west-1.amazonaws.com" |
107 |
| - streams_endpoint => "streams.dynamodb.us-west-1.amazonaws.com" |
108 |
| - aws_access_key_id => "my aws access key" |
109 |
| - aws_secret_access_key => "my aws secret access key" |
110 |
| - perform_scan => true |
111 |
| - perform_stream => true |
112 |
| - read_ops => 100 |
113 |
| - number_of_write_threads => 8 |
114 |
| - number_of_scan_threads => 8 |
115 |
| - log_format => "plain" |
116 |
| - view_type => "new_and_old_images" |
117 |
| - } |
118 |
| -} |
| 151 | +To run logstash type: |
| 152 | + |
| 153 | +``` |
| 154 | +bin/logstash -f logstash-dynamodb.conf |
| 155 | +``` |
| 156 | + |
| 157 | +Logstash should successfully start and begin indexing the records from your DynamoDB table. |
| 158 | + |
| 159 | +You can also change the other configuration options to match your particular use case. |
| 160 | + |
| 161 | +You can also configure the plugin to index multiple tables by adding additional ```dynamodb { }``` sections to the ```input``` section. |
| 162 | + |
| 163 | +**The following table shows the configuration values.** |
| 164 | + |
| 165 | +### Setting Description |
119 | 166 |
|
120 |
| -#Configuration Parameters |
| 167 | +Settings Id | Description |
| 168 | +------- | -------- |
| 169 | +table_name | The name of the table to index. This table must exist. |
| 170 | +endpoint | The DynamoDB endpoint to use. If you are running DynamoDB on your computer, use http://localhost:8000 as the endpoint. |
| 171 | +streams_endpoint | The name of a checkpoint table. This does not need to exist prior to plugin activation. |
| 172 | +view_type | The view type of the DynamoDB stream. ("new_and_old_images", "new_image", "old_image", "keys_only" Note: these must match the settings for your table's stream configured in the DynamoDB console.) |
| 173 | +aws_access_key_id | Your AWS access key ID. This is optional if you have credentials saved in a credentials file. Note: If you are running DynamoDB on your computer, this ID must match the access key ID that you used to create the table. If it does not match, the Logstash plugin will fail because DynamoDB partitions data by access key ID and region. |
| 174 | +aws_secret_access_key | Your AWS access key ID. Your AWS access key ID. This is optional if you have credentials saved in a credentials file. |
| 175 | +perform_scan | A boolean flag to indicate whether or not Logstash should scan the entire table before streaming new records. Note: Set this option to false if your are restarting the Logstash plugin. |
| 176 | +checkpointer | A string that uniquely identifies the KCL checkpointer name and CloudWatch metrics name. This is used when one worker leaves a shard so that another worker knows where to start again. |
| 177 | +publish_metrics | Boolean option to publish metrics to CloudWatch using the checkpointer name. |
| 178 | +perform_stream | Boolean option to not automatically stream new data into Logstash from DynamoDB streams. |
| 179 | +read_ops | Number of read operations per second to perform when scanning the specified table. |
| 180 | +number_of_scan_threads | Number of threads to use when scanning the specified table. |
| 181 | +number_of_write_threads | Number of threads to write to the Logstash queue when scanning the table. |
| 182 | +log_format | Log transfer format. "plain" - Returns the object as a DynamoDB object. "json_drop_binary" - Translates the item format to JSON and drops any binary attributes. "json_binary_as_text" - Translates the item format to JSON and represents any binary attributes as 64-bit encoded binary strings. For more information, see the JSON Data Format topic in the DynamoDB documentation. |
121 | 183 |
|
122 |
| -config :<variable name>, <type of expected variable>, :required => <true if required to run>, :default => <default value of configuration> |
| 184 | +### Testing the Logstash Plugin for Amazon DynamoDB |
123 | 185 |
|
124 |
| - # The name of the table to copy and stream through Logstash |
125 |
| - config :table_name, :validate => :string, :required => true |
| 186 | +The Logstash plugin for DynamoDB starts scanning the DynamoDB table and indexing the table data when you run it. As you insert new records into the DynamoDB table, the Logstash plugin consumes the new records from DynamoDB streams to continue indexing. |
126 | 187 |
|
127 |
| - # Configuration for what information from the scan and streams to include in the log. |
128 |
| - # keys_only will return the hash and range keys along with the values for each entry |
129 |
| - # new_image will return the entire new entry and keys |
130 |
| - # old_image will return the entire entry before modification and keys (NOTE: Cannot perform scan when using this option) |
131 |
| - # new_and_old_images will return the old entry before modification along with the new entry and keys |
132 |
| - config :view_type, :validate => ["keys_only", "new_image", "old_image", "new_and_old_images"], :required => true |
| 188 | +To test this, you can add items to the DynamoDB table in the AWS console, and view the output (stdout) in the command prompt window. The items are also inserted into Elasticsearch and indexed for searching. |
133 | 189 |
|
134 |
| - # Endpoint from which the table is located. Example: dynamodb.us-east-1.amazonaws.com |
135 |
| - config :endpoint, :validate => :string, :required => true |
| 190 | +**To test the Logstash plugin for DynamoDB** |
136 | 191 |
|
137 |
| - # Endpoint from which streams should read. Example: streams.dynamodb.us-east-1.amazonaws.com |
138 |
| - config :streams_endpoint, :validate => :string |
| 192 | +Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/. |
139 | 193 |
|
140 |
| - # AWS credentials access key. |
141 |
| - config :aws_access_key_id, :validate => :string, :default => "" |
| 194 | +In the list of tables, open (double-click) **SourceTable**. |
142 | 195 |
|
143 |
| - # AWS credentials secret access key. |
144 |
| - config :aws_secret_access_key, :validate => :string, :default => "" |
| 196 | +Choose **New Item**, add the following data, and then choose **PutItem**: |
145 | 197 |
|
146 |
| - # A flag to indicate whether or not the plugin should scan the entire table before streaming new records. |
147 |
| - # Streams will only push records that are less than 24 hours old, so in order to get the entire table |
148 |
| - # an initial scan must be done. |
149 |
| - config :perform_scan, :validate => :boolean, :default => true |
| 198 | +Id—1 |
| 199 | +Message—First item |
150 | 200 |
|
151 |
| - # A string that uniquely identifies the KCL checkpointer name and cloudwatch metrics name. |
152 |
| - # This is used when one worker leaves a shard so that another worker knows where to start again. |
153 |
| - config :checkpointer, :validate => :string, :default => "logstash_input_dynamodb_cptr" |
| 201 | +Repeat the previous step to add the following data items: |
154 | 202 |
|
155 |
| - # Option to publish metrics to Cloudwatch using the checkpointer name. |
156 |
| - config :publish_metrics, :validate => :boolean, :default => false |
| 203 | +Id—2 and Message—Second item |
| 204 | +Id—3 and Message—Third item |
157 | 205 |
|
158 |
| - # Option to not automatically stream new data into logstash from DynamoDB streams. |
159 |
| - config :perform_stream, :validate => :boolean, :default => true |
| 206 | +Return to the command-prompt window and verify the Logstash output (it should have dumped the logstash output for each item you added to the console). |
160 | 207 |
|
161 |
| - # Number of read operations per second to perform when scanning the specified table. |
162 |
| - config :read_ops, :validate => :number, :default => 1 |
| 208 | +**(Optional) Go back to the SourceTable in us-east-1 and do the following:** |
163 | 209 |
|
164 |
| - # Number of threads to use when scanning the specified table |
165 |
| - config :number_of_scan_threads, :validate => :number, :default => 1 |
| 210 | +Update item 2. Set the Message to Hello world! |
| 211 | +Delete item 3. |
166 | 212 |
|
167 |
| - # Number of threads to write to the logstash queue when scanning the table |
168 |
| - config :number_of_write_threads, :validate => :number, :default => 1 |
| 213 | +Go to the command-prompt window and verify the data output. |
169 | 214 |
|
170 |
| - # Configuation for how the logs will be transferred. |
171 |
| - # plain is simply pass the message along without editing it. |
172 |
| - # dynamodb will return just the data specified in the view_format in dynamodb format. |
173 |
| - # For more information see: docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataFormat.html |
174 |
| - # json_drop_binary will return just the data specified in the view_format in JSON while not including any binary values that were present. |
175 |
| - # json_binary_as_text will return just the data specified in the view_format in JSON while including binary values as base64-encoded text. |
176 |
| - config :log_format, :validate => ["plain", "dynamodb", "json_drop_binary", "json_binary_as_text"], :default => "plain" |
| 215 | +You can now search the DynamoDB items in Elasticsearch. |
177 | 216 |
|
| 217 | +For information about accessing and searching data in Elasticsearch, see the Elasticsearch documentation. |
0 commit comments