Skip to content

Commit 1c67294

Browse files
authored
Support replay mode with garbage collection (microsoft#845)
This change enables a new 'replay' mode option that replays requests from the trace database. Changes include: 1. Refactor driver to consume sequences from different sources (with trace database another supported source of specific sequences, similar to the existing smoke test mode) 2. Generate 'replay blocks' as part of payload rendering. Replay blocks contain dynamic objects, but do not support custom payloads (this is future work). 3. Enables replay to be based on replay blocks With this change, RESTler generates sequences from replay blocks if available, or from raw request/responses (similar to the existing replay mode) when they are not available. The current version of replay is based on the rendered data sent at the time of original sequence rendering. This has several limitations, such as not being able to garbage-collect resources or plugging in custom payloads. In particular, when a bug is reproduced in RESTler today, the same replay mechanism is used, which means that GC does not collect resources created while reproducing the bug. Moreover, re-running the same sequence without GC will not work if the resource has not been deleted if a resource-generating request has a unique ID parameter (e.g. PUT /resource/{resourceId}), causing false negative non-reproducible bugs. This change addresses this issue by implementing replay blocks-based replay, which is invoked when reproducing bugs. Note: the existing replay based on .replay.txt files will also be able to use this new mechanism if a grammar is provided, but this is not implemented as part of this change. 4. Added a setting to filter the origin during DB replay. To only replay specific origins, add the following to the settings file: "replay": { "include_origins": ["main_driver", "InvalidValueChecker"] } Testing: - manual testing of demo_server replay as follows: 1) Run 'test' task and generate trace database >restler.exe test --grammar_file .\Compile\grammar.py --dictionary_file .\Compile\dict.json --host localhost --target_port 8888 --settings .\compile\engine_settings.json --no_ssl Engine settings: { "use_trace_database": true, "trace_database": { "root_dir": "d:\\demo_server\\trace_databases", }, } 2) Run 'replay' task from above trace database >restler.exe replay --replay_log ./trace_data.ndjson --grammar_file .\Compile\grammar.py --dictionary_file .\Compile\dict.json --host localhost --target_port 8888 --settings .\compile\engine_settings.json --no_ssl - updated unit tests
1 parent 86581a2 commit 1c67294

27 files changed

+910
-416
lines changed

.vscode/launch.json

+43-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"request": "launch",
1818
"program": "${workspaceFolder}\\restler\\end_to_end_tests\\test_quick_start.py",
1919
"args": [
20-
"d:\\restlerdrop\\main",
20+
"d:\\restlerdrop\\tracedb2",
2121
]
2222
},
2323
{
@@ -106,6 +106,48 @@
106106
],
107107
"justMyCode": false
108108
},
109+
{
110+
"name": "Python: replay mode with grammar",
111+
"type": "python",
112+
"request": "launch",
113+
"program": "${workspaceFolder}\\restler\\restler.py",
114+
"args": [
115+
"--replay_log",
116+
"D:\\test\\demo_server\\replaytests3\\replay_trace_data.ndjson",
117+
"--restler_grammar",
118+
"d:\\test\\demo_server\\replaytests3\\Compile\\grammar.py",
119+
"--custom_mutations",
120+
"d:\\test\\demo_server\\replaytests3\\Compile\\dict.json",
121+
"--settings",
122+
"d:\\test\\demo_server\\replaytests3\\Compile\\engine_settings.json",
123+
"--no_ssl",
124+
"--host",
125+
"localhost",
126+
"--target_port",
127+
"8888",
128+
"--garbage_collection_interval",
129+
"30"
130+
],
131+
"justMyCode": false
132+
},
133+
{
134+
"name": "Python: replay mode no grammar",
135+
"type": "python",
136+
"request": "launch",
137+
"program": "${workspaceFolder}\\restler\\restler.py",
138+
"args": [
139+
"--replay_log",
140+
"D:\\test\\demo_server\\replaytests\\replay_trace_data.ndjson",
141+
"--no_ssl",
142+
"--host",
143+
"localhost",
144+
"--target_port",
145+
"8888",
146+
"--garbage_collection_interval",
147+
"30"
148+
],
149+
"justMyCode": false
150+
},
109151
{
110152
"name": "Python: examples checker",
111153
"type": "python",

docs/user-guide/Replay.md

+24-3
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Any resources that were created during the replay will NOT be automatically dele
2828
unless the replaying sequence itself deletes the resource.
2929
Any resources created should be removed manually.
3030

31-
## Replay log format
31+
### Replay log format
3232

3333
The replay log is created anytime a new bug bucket is reported.
3434
This replay log consists of the full sequence of requests that were sent to create the bug.
@@ -60,7 +60,7 @@ You may notice that content-length and user-agent are not included in the replay
6060
These fields are populated automatically by RESTler when the request is sent to the server,
6161
so they are not needed (and shouldn't exist) in the log.
6262

63-
## Using replay logs to send custom sequences
63+
### Using replay logs to send custom sequences
6464
While the main purpose of replay logs are to re-test bugs previously found,
6565
it is also possible to use these files as a way to send custom sequences to RESTler, similar to how you may send a request through *curl* or *Postman*.
6666

@@ -84,4 +84,25 @@ while max_async_wait_time will attempt to perform an asynchronous polling-wait b
8484
with a maximum resource-creation-wait-time of the max_async_wait_time setting.
8585

8686

87-
##
87+
## Using the Trace Database
88+
89+
Previously executed RESTler sequences may be re-played by configuring a trace database to be written during `test` or `fuzz` tasks, then specifying it as the replay file for the `replay` task.
90+
91+
For example:
92+
93+
1. Generate the trace database by adding the following to the engine settings:
94+
```json
95+
{
96+
"use_trace_database": true,
97+
"trace_database": {
98+
"root_dir": "/path/to/trace_databases",
99+
},
100+
}
101+
```
102+
103+
2. Replay the same sequences of requests (in the same order) from the replay log. The below command specifies to run the RESTler `replay` task. The grammar, dictionary, and engine settings files must be specified to enable generating unique dynamic object names and garbage collection as in the original run (note: custom payloads from the specified dictionary will not currently be used for replay). If the grammar and dictionary are omitted, the replay will execute the same request text as sent
104+
in the original run, and GC will not be triggered.
105+
106+
>restler.exe replay --replay_log /path/to/trace_databases/trace_data.ndjson --grammar_file ./Compile/grammar.py --dictionary_file ./Compile/dict.json --host localhost --target_port 8888 --settings ./Compile/engine_settings.json
107+
108+
Replaying sequences from checkers is enabled for experimental purposes, but is not fully supported at this time.

docs/user-guide/SettingsFile.md

+15
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,21 @@ Dictionary containing settings for the trace database.
385385

386386
`cleanup_time` float (default 10): The maximum amount of time, in seconds, to wait for the data serialization to be complete before exiting.
387387

388+
### replay: dict (default empty)
389+
390+
Dictionary containing replay settings.
391+
392+
`trace_database_file_path` str (default None): The path to the trace database from which to replay requests.
393+
Overrides the value of `--replay_file` if specified on the command line.
394+
395+
`include_origins` list (default empty list=No filtering): When replaying requests from the trace database, specify a list of origin values to use. For example:
396+
397+
```json
398+
"replay": {
399+
"include_origins": ["main_driver", "InvalidValueChecker"]
400+
}
401+
```
402+
388403
### request_throttle_ms: float (default None)
389404
The time, in milliseconds, to throttle each request being sent.
390405
This is here for special cases where the server will block requests from connections that arrive too quickly.

restler/checkers/checker_base.py

+8-3
Original file line numberDiff line numberDiff line change
@@ -119,15 +119,17 @@ def _render_and_send_data(self, seq, request, check_async=True):
119119
@rtype : Tuple(HttpResponse, HttpResponse)
120120
121121
"""
122-
rendered_data, parser, tracked_parameters, updated_writer_variables = request.render_current(self._req_collection.candidate_values_pool)
122+
rendered_data, parser, tracked_parameters, updated_writer_variables, replay_blocks =\
123+
request.render_current(self._req_collection.candidate_values_pool)
123124
rendered_data = seq.resolve_dependencies(rendered_data)
124125

125126
# We need to record that the request originates from the checker, but
126127
# there is not a clear sequence origin.
127128
SequenceTracker.initialize_sequence_trace(combination_id=seq.combination_id,
128129
tags={'hex_definition': seq.hex_definition})
129130
SequenceTracker.initialize_request_trace(combination_id=seq.combination_id,
130-
request_id=request.hex_definition)
131+
request_id=request.hex_definition,
132+
replay_blocks=replay_blocks)
131133

132134
response = self._send_request(parser, rendered_data)
133135
if response.has_valid_code():
@@ -141,7 +143,10 @@ def _render_and_send_data(self, seq, request, check_async=True):
141143
responses_to_parse, _, _ = async_request_utilities.try_async_poll(
142144
rendered_data, response, async_wait)
143145
request_utilities.call_response_parser(parser, None, responses=responses_to_parse)
144-
seq.append_data_to_sent_list(rendered_data, parser, response, producer_timing_delay=0, max_async_wait_time=async_wait)
146+
seq.append_data_to_sent_list(request.method_endpoint_hex_definition,
147+
rendered_data, parser, response, producer_timing_delay=0,
148+
max_async_wait_time=async_wait,
149+
replay_blocks=replay_blocks)
145150
SequenceTracker.clear_sequence_trace()
146151
return response, response_to_parse
147152

restler/checkers/demo_checker.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -71,10 +71,10 @@ def apply(self, rendered_sequence, lock):
7171
# Add the sent prefix requests for replay
7272
checked_seq.set_sent_requests_for_replay(new_seq.sent_request_data_list)
7373
# Create a placeholder sent data, so it can be replaced below when bugs are detected for replays
74-
checked_seq.append_data_to_sent_list("GET /", None, HttpResponse(), max_async_wait_time=req_async_wait)
74+
checked_seq.append_data_to_sent_list("-", "GET /", None, HttpResponse(), max_async_wait_time=req_async_wait)
7575

7676
# Render the current request combination
77-
rendered_data, parser, tracked_parameters, updated_writer_variables = \
77+
rendered_data, parser, tracked_parameters, updated_writer_variables, replay_blocks = \
7878
next(last_request.render_iter(self._req_collection.candidate_values_pool,
7979
skip=last_request._current_combination_id - 1,
8080
preprocessing=False))
@@ -105,7 +105,9 @@ def apply(self, rendered_sequence, lock):
105105
responses=responses_to_parse)
106106

107107
if response and self._rule_violation(checked_seq, response, valid_response_is_violation=True):
108-
checked_seq.replace_last_sent_request_data(rendered_data, parser, response, max_async_wait_time=req_async_wait)
108+
checked_seq.replace_last_sent_request_data(request_hash,
109+
rendered_data, parser, response, max_async_wait_time=req_async_wait,
110+
replay_blocks=replay_blocks)
109111
self._print_suspect_sequence(checked_seq, response)
110112
BugBuckets.Instance().update_bug_buckets(checked_seq, response.status_code, origin=self.__class__.__name__)
111113
self.bugs_reported += 1

restler/checkers/invalid_dynamic_object_checker.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def apply(self, rendered_sequence, lock):
5858
InvalidDynamicObjectChecker.generation_executed_requests[generation].add(last_request.hex_definition)
5959

6060
# Get the current rendering of the sequence, which will be the valid rendering of the last request
61-
last_rendering, last_request_parser, tracked_parameters, updated_writer_variables =\
61+
last_rendering, last_request_parser, tracked_parameters, updated_writer_variables, replay_blocks =\
6262
last_request.render_current(self._req_collection.candidate_values_pool)
6363

6464
# Execute the sequence up until the last request
@@ -77,12 +77,12 @@ def apply(self, rendered_sequence, lock):
7777
request_utilities.call_response_parser(last_request_parser, response)
7878
if response and self._rule_violation(new_seq, response):
7979
# Append the data that we just sent to the sequence's sent list
80-
new_seq.append_data_to_sent_list(data, last_request_parser, response)
80+
new_seq.append_data_to_sent_list(last_request.method_endpoint_hex_definition,
81+
data, last_request_parser, response, replay_blocks=replay_blocks)
8182
BugBuckets.Instance().update_bug_buckets(new_seq, response.status_code, origin=self.__class__.__name__)
8283
self._print_suspect_sequence(new_seq, response)
8384

8485

85-
8686
def _prepare_invalid_requests(self, data):
8787
""" Prepares requests with invalid dynamic objects.
8888
Each combination of valid/invalid for requests with multiple

restler/checkers/invalid_value_checker.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -245,11 +245,11 @@ def should_fuzz(req_block):
245245
# Add the sent prefix requests for replay
246246
checked_seq.set_sent_requests_for_replay(new_seq.sent_request_data_list)
247247
# Create a placeholder sent data, so it can be replaced below when bugs are detected for replays
248-
checked_seq.append_data_to_sent_list("GET /", None, HttpResponse(), max_async_wait_time=req_async_wait)
248+
checked_seq.append_data_to_sent_list("-", "GET /", None, HttpResponse(), max_async_wait_time=req_async_wait)
249249

250250
# Render the current request combination, but get the list of primitive
251251
# values before they are concatenated.
252-
rendered_values, parser, tracked_parameters, updated_writer_variables = \
252+
rendered_values, parser, tracked_parameters, updated_writer_variables, replay_blocks = \
253253
next(last_request.render_iter(self._req_collection.candidate_values_pool,
254254
skip=last_request._current_combination_id - 1,
255255
preprocessing=False,
@@ -310,7 +310,8 @@ def should_fuzz(req_block):
310310
if not isinstance(fuzzed_value, str):
311311
print("not a string!")
312312
rendered_data = "".join(rendered_values)
313-
313+
# Get the replay blocks that contain the value currently being fuzzed
314+
fuzzed_replay_blocks = request_utilities.get_replay_blocks(last_request.definition, rendered_values)
314315
# Check time budget
315316
if Monitor().remaining_time_budget <= 0:
316317
raise TimeOutException('Exceed Timeout')
@@ -339,7 +340,10 @@ def should_fuzz(req_block):
339340
status_code = response.status_code
340341

341342
if response and self._rule_violation(checked_seq, response, valid_response_is_violation=False):
342-
checked_seq.replace_last_sent_request_data(rendered_data, parser, response, max_async_wait_time=req_async_wait)
343+
checked_seq.replace_last_sent_request_data(last_request.method_endpoint_hex_definition,
344+
rendered_data, parser, response,
345+
max_async_wait_time=req_async_wait,
346+
replay_blocks=fuzzed_replay_blocks)
343347
self._print_suspect_sequence(checked_seq, response)
344348
BugBuckets.Instance().update_bug_buckets(checked_seq, response.status_code, origin=self.__class__.__name__)
345349

restler/checkers/namespace_rule_checker.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ def _render_original_sequence_start(self, seq):
7474
self._checker_log.checker_print("\nRe-rendering start of original sequence")
7575

7676
for request in seq.requests[:-1]:
77-
rendered_data, parser, tracked_parameters, updated_writer_variables = request.render_current(
77+
rendered_data, parser, tracked_parameters, updated_writer_variables, replay_blocks = request.render_current(
7878
self._req_collection.candidate_values_pool
7979
)
8080
rendered_data = seq.resolve_dependencies(rendered_data)
@@ -106,7 +106,7 @@ def _namespace_rule(self):
106106
# Check if last request contains any trigger_object
107107

108108
last_request = self._sequence.last_request
109-
last_rendering, last_parser, _, _ = last_request.render_current(self._req_collection.candidate_values_pool)
109+
last_rendering, last_parser, _, _,_ = last_request.render_current(self._req_collection.candidate_values_pool)
110110

111111
last_request_contains_a_trigger_object = False
112112
for obj in self._trigger_objects:
@@ -182,7 +182,7 @@ def _render_attacker_subsequence(self, req):
182182

183183
for i in range(stopping_length):
184184
request = self._sequence.requests[i]
185-
rendered_data, parser, tracked_parameters = request.render_current(
185+
rendered_data, parser, tracked_parameters, replay_blocks = request.render_current(
186186
self._req_collection.candidate_values_pool
187187
)
188188
rendered_data = self._sequence.resolve_dependencies(rendered_data)
@@ -206,7 +206,7 @@ def _render_hijack_request(self, req):
206206
207207
"""
208208
self._checker_log.checker_print("Hijack request rendering")
209-
rendered_data, parser, tracked_parameters, updated_writer_variables = req.render_current(
209+
rendered_data, parser, tracked_parameters, updated_writer_variables, replay_blocks = req.render_current(
210210
self._req_collection.candidate_values_pool
211211
)
212212
rendered_data = self._sequence.resolve_dependencies(rendered_data)

restler/checkers/payload_body_checker.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -1122,7 +1122,7 @@ def _exec_request_with_new_body(
11221122
cnt = 0
11231123

11241124
# iterate through different value combinations
1125-
for rendered_data, parser,_,updated_writer_variables in new_request.render_iter(
1125+
for rendered_data, parser,_,updated_writer_variables, replay_blocks in new_request.render_iter(
11261126
self._req_collection.candidate_values_pool
11271127
):
11281128
# check time budget
@@ -1230,7 +1230,8 @@ def _exec_request_with_new_body(
12301230
# analyze response -- error
12311231
if self._rule_violation(seq, response, valid_is_violation):
12321232
# Append the new request to the sequence before filing the bug
1233-
seq.replace_last_sent_request_data(rendered_data, parser, response)
1233+
seq.replace_last_sent_request_data(request.method_endpoint_hex_definition,
1234+
rendered_data, parser, response)
12341235
err_seq = sequences.Sequence(seq.requests[:-1] + [new_request])
12351236
err_seq.set_sent_requests_for_replay(seq.sent_request_data_list)
12361237
self._print_suspect_sequence(err_seq, response)

restler/checkers/use_after_free_checker.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ def _render_last_request(self, seq):
109109
110110
"""
111111
request = seq.last_request
112-
for rendered_data, parser,_,updated_writer_variables in\
112+
for rendered_data, parser,_,updated_writer_variables, replay_blocks in\
113113
request.render_iter(self._req_collection.candidate_values_pool,
114114
skip=request._current_combination_id):
115115
# Hold the lock (because other workers may be rendering the same
@@ -138,11 +138,10 @@ def _render_last_request(self, seq):
138138
for name,v in updated_writer_variables.items():
139139
dependencies.set_variable(name, v)
140140

141-
142-
143141
# Append the rendered data to the sent list as we will not be rendering
144142
# with the sequence's render function
145-
seq.append_data_to_sent_list(rendered_data, parser, response)
143+
seq.append_data_to_sent_list(request.method_endpoint_hex_definition,
144+
rendered_data, parser, response, replay_blocks=replay_blocks)
146145
if response and self._rule_violation(seq, response):
147146
self._print_suspect_sequence(seq, response)
148147
BugBuckets.Instance().update_bug_buckets(seq, response.status_code, origin=self.__class__.__name__)

0 commit comments

Comments
 (0)