Skip to content

Add support for pagination #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ Gemfile.lock
vendor
*~
.idea
*.log
*.log
125 changes: 94 additions & 31 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ input plugins.
 

[id="plugins-{type}s-{plugin}-automatic_retries"]
===== `automatic_retries`
===== `automatic_retries`

* Value type is <<number,number>>
* Default value is `1`
Expand All @@ -173,39 +173,39 @@ to zero if keepalive is enabled. Some servers incorrectly end keepalives early r
Note: if `retry_non_idempotent` is set only GET, HEAD, PUT, DELETE, OPTIONS, and TRACE requests will be retried.

[id="plugins-{type}s-{plugin}-cacert"]
===== `cacert`
===== `cacert`

* Value type is <<path,path>>
* There is no default value for this setting.

If you need to use a custom X.509 CA (.pem certs) specify the path to that here

[id="plugins-{type}s-{plugin}-client_cert"]
===== `client_cert`
===== `client_cert`

* Value type is <<path,path>>
* There is no default value for this setting.

If you'd like to use a client certificate (note, most people don't want this) set the path to the x509 cert here

[id="plugins-{type}s-{plugin}-client_key"]
===== `client_key`
===== `client_key`

* Value type is <<path,path>>
* There is no default value for this setting.

If you're using a client certificate specify the path to the encryption key here

[id="plugins-{type}s-{plugin}-connect_timeout"]
===== `connect_timeout`
===== `connect_timeout`

* Value type is <<number,number>>
* Default value is `10`

Timeout (in seconds) to wait for a connection to be established. Default is `10s`

[id="plugins-{type}s-{plugin}-cookies"]
===== `cookies`
===== `cookies`

* Value type is <<boolean,boolean>>
* Default value is `true`
Expand Down Expand Up @@ -289,15 +289,15 @@ Example output:
----

[id="plugins-{type}s-{plugin}-follow_redirects"]
===== `follow_redirects`
===== `follow_redirects`

* Value type is <<boolean,boolean>>
* Default value is `true`

Should redirects be followed? Defaults to `true`

[id="plugins-{type}s-{plugin}-keepalive"]
===== `keepalive`
===== `keepalive`

* Value type is <<boolean,boolean>>
* Default value is `true`
Expand All @@ -306,15 +306,15 @@ Turn this on to enable HTTP keepalive support. We highly recommend setting `auto
one with this to fix interactions with broken keepalive implementations.

[id="plugins-{type}s-{plugin}-keystore"]
===== `keystore`
===== `keystore`

* Value type is <<path,path>>
* There is no default value for this setting.

If you need to use a custom keystore (`.jks`) specify that here. This does not work with .pem keys!

[id="plugins-{type}s-{plugin}-keystore_password"]
===== `keystore_password`
===== `keystore_password`

* Value type is <<password,password>>
* There is no default value for this setting.
Expand All @@ -323,15 +323,15 @@ Specify the keystore password here.
Note, most .jks files created with keytool require a password!

[id="plugins-{type}s-{plugin}-keystore_type"]
===== `keystore_type`
===== `keystore_type`

* Value type is <<string,string>>
* Default value is `"JKS"`

Specify the keystore type here. One of `JKS` or `PKCS12`. Default is `JKS`

[id="plugins-{type}s-{plugin}-metadata_target"]
===== `metadata_target`
===== `metadata_target`

* Value type is <<string,string>>
* Default value is `"@metadata"`
Expand All @@ -341,31 +341,31 @@ Set this value to the name of the field you'd like to store a nested
hash of metadata.

[id="plugins-{type}s-{plugin}-password"]
===== `password`
===== `password`

* Value type is <<password,password>>
* There is no default value for this setting.

Password to be used in conjunction with <<plugins-{type}s-{plugin}-user>> for HTTP authentication.

[id="plugins-{type}s-{plugin}-pool_max"]
===== `pool_max`
===== `pool_max`

* Value type is <<number,number>>
* Default value is `50`

Max number of concurrent connections. Defaults to `50`

[id="plugins-{type}s-{plugin}-pool_max_per_route"]
===== `pool_max_per_route`
===== `pool_max_per_route`

* Value type is <<number,number>>
* Default value is `25`

Max number of concurrent connections to a single host. Defaults to `25`

[id="plugins-{type}s-{plugin}-proxy"]
===== `proxy`
===== `proxy`

* Value type is <<string,string>>
* There is no default value for this setting.
Expand All @@ -377,23 +377,23 @@ If you'd like to use an HTTP proxy . This supports multiple configuration syntax
3. Proxy host in form: `{url => 'http://proxy.org:1234', user => 'username@host', password => 'password'}`

[id="plugins-{type}s-{plugin}-request_timeout"]
===== `request_timeout`
===== `request_timeout`

* Value type is <<number,number>>
* Default value is `60`

Timeout (in seconds) for the entire request.

[id="plugins-{type}s-{plugin}-retry_non_idempotent"]
===== `retry_non_idempotent`
===== `retry_non_idempotent`

* Value type is <<boolean,boolean>>
* Default value is `false`

If `automatic_retries` is enabled this will cause non-idempotent HTTP verbs (such as POST) to be retried.

[id="plugins-{type}s-{plugin}-schedule"]
===== `schedule`
===== `schedule`

* Value type is <<hash,hash>>
* There is no default value for this setting.
Expand All @@ -408,7 +408,7 @@ Examples:
See: rufus/scheduler for details about different schedule options and value string format

[id="plugins-{type}s-{plugin}-socket_timeout"]
===== `socket_timeout`
===== `socket_timeout`

* Value type is <<number,number>>
* Default value is `10`
Expand Down Expand Up @@ -449,7 +449,7 @@ It is primarily intended as a temporary diagnostic mechanism when attempting to
Using `none` in production environments is strongly discouraged.

[id="plugins-{type}s-{plugin}-target"]
===== `target`
===== `target`

* Value type is <<string,string>>
* There is no default value for this setting.
Expand All @@ -461,15 +461,15 @@ Example: `codec => json { target => "TARGET_FIELD_NAME" }`


[id="plugins-{type}s-{plugin}-truststore"]
===== `truststore`
===== `truststore`

* Value type is <<path,path>>
* There is no default value for this setting.

If you need to use a custom truststore (`.jks`) specify that here. This does not work with .pem certs!

[id="plugins-{type}s-{plugin}-truststore_password"]
===== `truststore_password`
===== `truststore_password`

* Value type is <<password,password>>
* There is no default value for this setting.
Expand All @@ -478,15 +478,15 @@ Specify the truststore password here.
Note, most .jks files created with keytool require a password!

[id="plugins-{type}s-{plugin}-truststore_type"]
===== `truststore_type`
===== `truststore_type`

* Value type is <<string,string>>
* Default value is `"JKS"`

Specify the truststore type here. One of `JKS` or `PKCS12`. Default is `JKS`

[id="plugins-{type}s-{plugin}-urls"]
===== `urls`
===== `urls`

* This is a required setting.
* Value type is <<hash,hash>>
Expand All @@ -501,21 +501,25 @@ The values in urls can be either:
* a sub-hash containing many useful keys provided by the Manticore backend:
** url: the String url
** method: (optional) the HTTP method to use (defaults to GET)
** user: (optional) the HTTP Basic Auth user. The user must be under
** user: (optional) the HTTP Basic Auth user. The user must be under
an auth sub-hash for Manticore, but this plugin also accepts it either way.
** password: (optional) the HTTP Basic Auth password. The password
** password: (optional) the HTTP Basic Auth password. The password
must be under an auth sub-hash for Manticore, but this plugin accepts it either way.
** headers: a hash containing key-value pairs of headers.
** body: a string (supported only on POST and PUT requests)
** possibly other options mentioned in the
** pagination: (optional) a hash containing options for pagination handling
** failure_mode: (optional) a string specifying what to do on failure
** retry_delay: (optional) a number specifying the amount of seconds to wait to retry if failure_mode = retry
** success_status_codes: (optional) array of HTTP status codes (integers) to be considered as successful
** possibly other options mentioned in the
https://www.rubydoc.info/github/cheald/manticore/Manticore/Client#http-instance_method[Manticore docs].
Note that Manticore options that are not explicitly documented above are not
thoroughly tested and therefore liable to break in unexpected ways if we
replace the backend.

*Notes:*

* Passwords specified as a part of `urls` are prone to exposure in plugin log output.
* Passwords specified as a part of `urls` are prone to exposure in plugin log output.
The plugin does not declare them as passwords, and therefore doesn't wrap them in
leak-reducing wrappers as we do elsewhere.
* We don't guarantee that boolean-type options like Manticore's `follow_redirects` are supported
Expand All @@ -525,7 +529,7 @@ string is "truthy."
as anything other than true

[id="plugins-{type}s-{plugin}-user"]
===== `user`
===== `user`

* Value type is <<string,string>>
* There is no default value for this setting.
Expand All @@ -534,7 +538,7 @@ Username to use with HTTP authentication for ALL requests. Note that you can als
If you set this you must also set the <<plugins-{type}s-{plugin}-password>> option.

[id="plugins-{type}s-{plugin}-validate_after_inactivity"]
===== `validate_after_inactivity`
===== `validate_after_inactivity`

* Value type is <<number,number>>
* Default value is `200`
Expand All @@ -550,6 +554,65 @@ being leased to the consumer. Non-positive value passed to this method disables
connection validation. This check helps detect connections that have become
stale (half-closed) while kept inactive in the pool."

[id="plugins-{type}s-{plugin}-failure_mode"]
===== `failure_mode`
* Value type is <<string, string>>
* Default value is `continue`

Specifies what should be done if requests fail. Request failures (server not responding etc.) are classfied as failures, also certain status codes can be set to trigger failure handling with the success_status_codes option.

Allowed values:

* continue: On failure, emit the event normally to the pipeline.
* retry: On failure, wait for the amount of seconds specified in retry_delay and try again.
* stop: On failure, stop the plugin.

[id="plugins-{type}s-{plugin}-retry_delay"]
===== `retry_delay`
* Value type is <<number, number>>
* There is no default value for this setting.

The amount of time (in seconds) to wait if the request fails and failure_mode is set to `retry`.

[id="plugins-{type}s-{plugin}-success_status_codes"]
===== `success_status_codes`

* Value type is <<array, array>>
* There is no default value for this setting.

If specified, all requests with a response code not defined here will go to the failure handling defined with failure_mode.

[id="plugins-{type}s-{plugin}-pagination"]
===== `pagination`

* Value type is <<hash,hash>>
* There is no default value for this setting.

If pagination is set, the request is handled as a paginated request, creating separate events for each page. The current page is stored to a file to allow continuing in case the process gets closed. The state saving follows the at-least-once principle, so if Logstash gets shut down abnormally, some events might get queried twice.

The following required values must be set to use pagination:

* start_page: Number of page to start from
* end_page: Number of page to end at
* page_parameter: The name of the query parameter where to send the page to the server

The following values can be optionally set:

* concurrent_requests: Amount of requests processed by the HTTP client at the same time. Note that the amount of requests actually sent to the server concurrently also depends on the HTTP client's settings, set separately. The default value is 1.
* last_run_metadata_path: Path to a file that will be created to persist the current page, if the pipeline gets stopped for some reason. If not specified, the file will be created in the Logstash data directory in data/plugins/inputs/http_poller/state/state_<url name>. Note that if multiple urls with the same name are ran in pipelines with Logstash instances with the same data directory, last_run_metadata_path should be set to avoid multiple pipelines from overwriting the file.
* delete_last_run_metadata: Whether to delete the last run metadata file when all pages are queried, so the query starts from the first page on next run. Defaults to true.

For example, with the values

* start_page = 1
* end_page = 2
* page_parameter = page

two requests will be sent to the server with these URLs:

http://example.com/example?page=1 and
http://example.com/example?page=2

[id="plugins-{type}s-{plugin}-common-options"]
include::{include_path}/{type}.asciidoc[]

Expand Down
Loading