The following services within the Manta data path are Node.js-based:
-
Webapi (muskie)
-
Electric-Moray
-
Moray
-
Authcache (mahi)
-
Nameservice (not covered in this section)
Note
|
This section assumes that you’re looking at a specific tier of services (i.e., one of the ones listed above). If you’re looking at Moray specifically, as you follow this section, consider only the Moray instances for the particular shard you’re looking at. |
First, have you already confirmed that this tier is reporting high latency? If not, check metric dashboards first to see whether latency is high. See [_investigating_an_increase_in_latency].
Next, have you also confirmed that the reason for the latency is not due to a dependent service? For example, if you’re here because of Electric-Moray, have you confirmed that Moray isn’t also seeing high latency? If you haven’t, see [_investigating_an_increase_in_latency] for how to locate which tier is the origin of overall latency.
At this point, you’ve confirmed that this tier appears to be the source of a latency increase. Now, use the latency metric dashboards to see is the increase in latency affecting nearly all instances?
-
Yes, nearly all instances are reporting high latency. This might reflect insufficient capacity. Individual Node processes are usually close to capacity when they exceed about 80-85% of CPU utilization. Each component typically has multiple processes deployed in each zone, which means the saturation point (80-85% per process) is different for each type of zone:
Service Processes per zone Saturation point Equivalent CPUs webapi
16
13000% (zone-wide)
13 CPUs
electric-moray
4
325% (zone-wide)
3.25 CPUs
moray
4
325% (zone-wide)
3.25 CPUs
authcache
1
80% (zone-wide)
0.8 CPUs
Given the above guides, use CPU utilization graphs to determine: are most instances in this service close to CPU saturation? (The [_deployment_specific_details] section recommends having both a line graph showing the minimum zone-wide CPU usage for each service and a heat map of zone-wide CPU utilization. If the line graph is above these guidelines, the service definitely looks saturated. The heat map allows you to identify cases where some instances might have more headroom available, but most of them are still too busy.)
-
Yes, most instances are nearing saturation. Deploy more instances of the service in question. See [_scaling_up_a_component]. Additionally, if the workload has not changed substantially, you may want to check for a memory leak that may have affected many processes. If it has, then deploying more instances will likely only help for a little while — until those suffer the same leak.
-
No, many instances appear to have plenty of headroom. This is very unusual, so it’s worth double-checking that latency is elevated across the board, but latency at dependent services is not high, and CPU utilization is not high. If this is really the case, pick any specific process showing high latency and see [_investigating_a_slow_process]. Other avenues to consider: is the dependent service close to CPU saturation? If so, clients of the dependent service may see much higher latency than the service reports because of queueing. Is there evidence of elevated packet loss? This can also increase client-side latency without manifesting as latency reported by the dependent service.
-
-
No, only some instances are reporting high latency. In this case, this service does not appear generally overloaded, although it’s possible that some instances are. Next question: can you tell from the per-request metrics for this tier whether the workload is evenly distributed across instances?
-
Yes, the workload is evenly distributed across instances. In this case, it appears that processes are generally doing comparable work, but some are doing it much slower. The next step is to use the dashboard to identify a zone with particularly high latency and dig deeper into a specific slow process. See [_investigating_a_slow_process].
-
No, the workload is unevenly distributed. See [_investigating_service_discovery]. (Remember, if you’re looking at Moray, this section assumes you’re looking only at instances for a particular shard. As described under [_investigating_an_increase_in_latency], if you see latency at Moray, you should first isolate the shard and investigate latency in each shard separately.)
-
When trying to understand either an explicit error or high latency, it can be helpful to investigate the log entry written by "webapi" for a specific request.
Do you already have information about a specific request you want to investigate?
-
Yes, I have information about a specific request. See [_investigating_a_specific_request_that_has_failed].
-
No, I don’t have information about a specific request yet. Move on.
Next question: does the problem appear to be reproducible? Try reproducing
the problem with the node-manta
command-line tools (e.g., mls
). You can use the -v
flag and redirect stderr
to the bunyan command to see request and
response headers, like this:
$ mls -v /poseidon/public 2> >(bunyan)
[2018-07-19T21:18:48.146Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:1536 in ls): ls: entered (req_id=4b4927be-fc1f-4dd8-88fe-2ae75dcbc262, path=/poseidon/public)
[2018-07-19T21:18:48.148Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:1128 in info): info: entered (req_id=0f34c68e-2072-405a-be3a-248e8020f1ba, path=/poseidon/public, id=0f34c68e-2072-405a-be3a-248e8020f1ba, query={})
headers: {
"accept": "application/json, */*",
"x-request-id": "0f34c68e-2072-405a-be3a-248e8020f1ba"
}
[2018-07-19T21:18:48.189Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:314 in rawRequest): request sent
HEAD /poseidon/public HTTP/1.1
Host: us-central.manta.mnx.io:null
accept: application/json, */*
x-request-id: 0f34c68e-2072-405a-be3a-248e8020f1ba
date: Thu, 19 Jul 2018 21:18:48 GMT
authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow=="
user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
accept-version: ~1.0
[2018-07-19T21:18:48.861Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received
HTTP/1.1 200 OK
content-type: application/x-json-stream; type=directory
result-set-size: 5
date: Thu, 19 Jul 2018 21:18:48 GMT
server: Manta
x-request-id: 0f34c68e-2072-405a-be3a-248e8020f1ba
x-response-time: 254
x-server-name: 511ea59e-2a4a-486b-9258-ef59016d064d
connection: keep-alive
x-request-received: 1532035128176
x-request-processing-time: 684
[2018-07-19T21:18:48.867Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:820 in get): get: entered (req_id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37, path=/poseidon/public, id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37)
headers: {
"accept": "application/x-json-stream",
"x-request-id": "4b4927be-fc1f-4dd8-88fe-2ae75dcbc262",
"date": "Thu, 19 Jul 2018 21:18:48 GMT",
"authorization": "Signature keyId=\"/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f\",algorithm=\"rsa-sha1\",headers=\"date\",signature=\"kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow==\""
}
--
query: {
"limit": 1024
}
[2018-07-19T21:18:48.872Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:314 in rawRequest): request sent
GET /poseidon/public?limit=1024 HTTP/1.1
Host: us-central.manta.mnx.io:null
accept: application/x-json-stream
x-request-id: 4b4927be-fc1f-4dd8-88fe-2ae75dcbc262
date: Thu, 19 Jul 2018 21:18:48 GMT
authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow=="
user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
accept-version: ~1.0
[2018-07-19T21:18:49.365Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received
HTTP/1.1 200 OK
content-type: application/x-json-stream; type=directory
result-set-size: 5
date: Thu, 19 Jul 2018 21:18:49 GMT
server: Manta
x-request-id: 4b4927be-fc1f-4dd8-88fe-2ae75dcbc262
x-response-time: 219
x-server-name: 60771e58-2ad0-4c50-8b23-86b72f9307f8
connection: keep-alive
transfer-encoding: chunked
x-request-received: 1532035128869
x-request-processing-time: 496
-rwxr-xr-x 1 poseidon 17 Dec 04 2015 agent.sh
drwxr-xr-x 1 poseidon 0 Sep 18 2014 manatee
drwxr-xr-x 1 poseidon 0 Jun 18 2013 medusa
drwxr-xr-x 1 poseidon 0 Aug 01 2013 minke
drwxr-xr-x 1 poseidon 0 Nov 07 2013 stud
[2018-07-19T21:18:49.480Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:887 in onEnd): get: done (req_id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37, path=/poseidon/public)
From the output, we can see that this operation made two requests. The second
one has x-server-name: 60771e58-2ad0-4c50-8b23-86b72f9307f8
and x-request-id:
4b4927be-fc1f-4dd8-88fe-2ae75dcbc262
. You can now use these to start
[_investigating_a_specific_request_that_has_failed].
If the problem isn’t quite so easily reproducible, but you suspect it still
affects a variety of requests, you can use
manta-mlive to generate more requests and
collect debug output for all of them. For mlive
, you’ll want to set
LOG_LEVEL
in the environment to generate the debug logging and you’ll likely
want to redirect stderr to a file that you can search through later:
$ LOG_LEVEL=trace ./bin/mlive -S 2>debug.out
2018-07-19T21:24:01.307Z: reads okay, writes stuck (4/4 ok since start)
2018-07-19T21:24:02.310Z: all okay (17/17 ok since 2018-07-19T21:24:01.307Z)
^C
As before, you can use the bunyan
tool to format the log:
$ bunyan debug.out
...
[2018-07-19T21:25:01.716Z] TRACE: mlive/MantaClient/9435 on zathras.local: request sent
HEAD /dap/stor/mlive HTTP/1.1
Host: us-central.manta.mnx.io:null
accept: application/json, */*
x-request-id: c317603c-82d4-4b2e-ac4b-066c9ece1864
date: Thu, 19 Jul 2018 21:25:01 GMT
authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="oJZZIDh1qT8PeSSpz09bIzYT4LYK6rqXS2G5bHhh2r37SNOs0vBkFHUhfso6tSq1hmHIlkCEMXX9zGLIvYxQtHj6/KtiNgZgyWzGHms+qhc2gziXnOrMybxmWqJwipd8rAJCdDBV0B5FlCDeELWIA+1LifGDqqLdDZT4ScBUNOm9JG2+mha2U+pFbNtaXQQyyoPgopk+4ur4OHYpcaK/KY6WdC91quLIaIKV28VMtPoN/q/15lzRj6G6L7mbIMyd48ut0EbmTTR/CfYq9dquNsWDlyWgEJJVYyPZ9odAE34YQiYt/N4JXH7Crr9M6md9GtZonY+DbP8vvb5+7xr8dA=="
user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
accept-version: ~1.0
[2018-07-19T21:25:02.548Z] TRACE: mlive/MantaClient/9435 on zathras.local: Response received
HTTP/1.1 200 OK
last-modified: Tue, 16 Dec 2014 01:17:29 GMT
content-type: application/x-json-stream; type=directory
result-set-size: 45
date: Thu, 19 Jul 2018 21:25:02 GMT
server: Manta
x-request-id: c317603c-82d4-4b2e-ac4b-066c9ece1864
x-response-time: 462
x-server-name: 39adec6c-bded-4a14-9d80-5a8bfc1121f9
connection: keep-alive
x-request-received: 1532035501703
x-request-processing-time: 844
You can use grep
, json
, or other tools to filter the output for requests of
interest (e.g., those with a particular HTTP response code or an
x-response-time
larger than some value). From the filtered results, you can
identify an x-server-name
and x-request-id
and then see
[_investigating_a_specific_request_that_has_failed].
Real-time logs contain data from the current hour. These are typically
stored as regular files within each zone. The specific file used varies by type
of zone. See the Operator Guide for
details. The manta-oneach
command can be used as a low-level way to scan
these real-time logs. For example, a common way to count recent 500-level
errors in webapi logs is:
manta-oneach --service=webapi 'svcs -L muskie | xargs cat | grep -c "handled: 5"'
Since the real-time logs only store the current hour’s data, at 01:02Z, this would only scan 2 minutes worth of data. At 01:58Z, this would scan 58 minutes worth of data.
If we were looking for a specific request that took place at 01:37Z, then we’d look at the real-time log file immediately after the request happened until 02:00Z (i.e., for 23 minutes). After that, we’d have to look at the historical logs.
Historical logs are maintained by rotating the real-time logs at the top of each hour and then uploading them into Manta. Once in Manta, they’re stored under:
/poseidon/stor/logs/COMPONENT/YYYY/MM/DD/HH/SHORTZONE.log
where:
-
COMPONENT
varies based on the component you’re looking for -
YYYY
,MM
,DD
, andHH
represent the year, month, day, and hour for the entries in the log file -
SHORTZONE
is the first 8 characters of the zone’s uuid.
For example, to find the load balancer (haproxy) logs from zone f6817865-10fb-416c-a079-47941ac2aab4 from 2018-12-05T01:37Z, we would look for the object:
/poseidon/stor/logs/haproxy/2018/12/05/01/f6817865.log
You can scan a number of these logs at once using a compute job. For example, you could look for requests from IP address 10.1.2.3 in all of the load balancer logs from December 5, 2018 using:
mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 | mjob create -o -m 'grep 10.1.2.3'
You can adjust the mfind
invocation as needed to scan a broader or more narrow
range. You could also use the -n
argument to mfind
to select log files from
a particular load balancer:
mfind -n f6817865.log -t o /poseidon/stor/logs/haproxy/2018/12/05 | mjob create -o -m 'grep 10.1.2.3'
When working with a Manta deployment where jobs aren’t functional, you can
instead mget
log files and do your filtering locally, like so:
mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 | while read f; do mget $f | grep 10.1.2.3; done
Note that the above sequence of commands will download the entirety of every log
file found by mfind
, which may occupy an inconvenient amount of time and disk
space. To mitigate this, you can use head
to fetch a smaller number of log
entries, like so:
mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 | while read f; do mget $f | head -n 1000 | grep 10.1.2.3; done
When the above command sequence is run, each invocation of mget
will terminate
once head
has read 1000 lines.
Note: the archival process for historical logs first rotates the logs to new
files under /var/log/manta/upload
. A few minutes later, these are uploaded to
Manta and then removed from the local filesystem. If the upload fails, the
files are kept in /var/log/manta/upload
for up to two days. In extreme
situations where Manta has been down for over an hour, you may find recent
historical log files in /var/log/manta/upload
, and you can scan them similar
to the live log files using manta-oneach
.
Muskie (the Manta API server) logs an audit entry for every request that it completes. These logs are useful for:
-
understanding how Manta handled a particular request (e.g., how long did it take? where was the time spent? what metadata shards were involved? what storage nodes were involved? what errors were encountered?)
-
understanding the workload Manta is serving (e.g., what percentage of requests are GETs? what percentage are failing? what’s the distribution of sizes for uploaded objects? which accounts are making the most requests?)
Note that for real-time incident response, it’s often faster to start with [_checking_recent_historical_metrics]. These logs are most useful for digging deeper into a particular request or the workload overall.
Muskie logs in bunyan format, which is a
JSON-based format. You typically use the bunyan
tool to view them. You can
also use the json
tool to filter and aggregate them.
For more information, see [_muskie_log_entry_properties].
If you have a particular Muskie log entry with an error message in it and you want to know more about the error, see [_details_about_specific_error_messages].
Below is an example log entry for a GET request, formatted using the bunyan
command-line tool. See [_muskie_log_entry_properties] for more details.
[2017-08-01T03:03:13.985Z] INFO: muskie/HttpServer/79465 on 204ac483-7e7e-4083-9ea2-c9ea22f459fd: handled: 200 (audit=true, _audit=true, operation=getstorage, billable_operation=LIST, logicalRemoteAddress=172.27.4.22, remoteAddress=127.0.0.1, remotePort=64628, reqHeaderLength=754, resHeaderLength=269, err=false, latency=26, entryShard=tcp://3.moray.staging.joyent.us:2020, route=getstorage, req.owner=4d649f41-cf87-ca1d-c2c0-bb6a9004311d) GET /poseidon/stor/manta_gc/mako/1.stor.staging.joyent.us?limit=1024 HTTP/1.1 accept: */* x-request-id: a080d88b-8e42-4a98-a6ec-12e1b0dbf612 date: Tue, 01 Aug 2017 03:03:13 GMT authorization: Signature keyId="/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14",algorithm="rsa-sha256",headers="date",signature="Q74o9RHIwrDT15ogL2WeB/jankUIqJAtMM5t7+VzrHxzoB52/BoqEnq9uMY0wEvPJxv+Lf1VyLG/IBXCXeUx+fZlkhKWIWd2jkpLRdVLKwZ4nnqTfHM+YXhZ0vSN1X1W2demmnpPRTRK/RaG21pyvlbIrSTwI+N5MtKFDh9/4Ks43wSyM4MvqZZWywfs7LgKz7UtjL1Z+juhJDT8mrfQYCDpZw/NDhHmoslKsMFesMrMjPALy/CBSB23800+MhLiFB7LT0nTyCLonPBmIOjrQCZu99ICXbCxx096XCzZ2XBOK1Pe4eoDUHWx5ukTbCJV63QA+gvcvDCbS5BdDn0Xiw==" user-agent: restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32 accept-version: ~1.0 host: manta.staging.joyent.us connection: keep-alive x-forwarded-for: ::ffff:172.27.4.22 -- HTTP/1.1 200 OK last-modified: Sat, 22 Mar 2014 01:17:01 GMT content-type: application/x-json-stream; type=directory result-set-size: 1 date: Tue, 01 Aug 2017 03:03:13 GMT server: Manta x-request-id: a080d88b-8e42-4a98-a6ec-12e1b0dbf612 x-response-time: 26 x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd -- req.caller: { "login": "poseidon", "uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d", "groups": [ "operators" ], "user": null } -- req.timers: { "earlySetup": 32, "parseDate": 8, "parseQueryString": 28, "handler-3": 127, "checkIfPresigned": 3, "enforceSSL": 3, "ensureDependencies": 5, "_authSetup": 5, "preSignedUrl": 3, "checkAuthzScheme": 4, "parseAuthTokenHandler": 36, "signatureHandler": 73, "parseKeyId": 59, "loadCaller": 133, "verifySignature": 483, "parseHttpAuthToken": 5, "loadOwner": 268, "getActiveRoles": 43, "gatherContext": 27, "setup": 225, "getMetadata": 5790, "storageContext": 8, "authorize": 157, "ensureEntryExists": 3, "assertMetadata": 3, "getDirectoryCount": 7903, "getDirectory": 10245 }
The raw JSON, formatted with the json
tool, looks like this:
{
"name": "muskie",
"hostname": "204ac483-7e7e-4083-9ea2-c9ea22f459fd",
"pid": 79465,
"component": "HttpServer",
"audit": true,
"level": 30,
"_audit": true,
"operation": "getstorage",
"billable_operation": "LIST",
"logicalRemoteAddress": "172.27.4.22",
"remoteAddress": "127.0.0.1",
"remotePort": 64628,
"reqHeaderLength": 754,
"req": {
"method": "GET",
"url": "/poseidon/stor/manta_gc/mako/1.stor.staging.joyent.us?limit=1024",
"headers": {
"accept": "*/*",
"x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
"date": "Tue, 01 Aug 2017 03:03:13 GMT",
"authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",headers=\"date\",signature=\"Q74o9RHIwrDT15ogL2WeB/jankUIqJAtMM5t7+VzrHxzoB52/BoqEnq9uMY0wEvPJxv+Lf1VyLG/IBXCXeUx+fZlkhKWIWd2jkpLRdVLKwZ4nnqTfHM+YXhZ0vSN1X1W2demmnpPRTRK/RaG21pyvlbIrSTwI+N5MtKFDh9/4Ks43wSyM4MvqZZWywfs7LgKz7UtjL1Z+juhJDT8mrfQYCDpZw/NDhHmoslKsMFesMrMjPALy/CBSB23800+MhLiFB7LT0nTyCLonPBmIOjrQCZu99ICXbCxx096XCzZ2XBOK1Pe4eoDUHWx5ukTbCJV63QA+gvcvDCbS5BdDn0Xiw==\"",
"user-agent": "restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32",
"accept-version": "~1.0",
"host": "manta.staging.joyent.us",
"connection": "keep-alive",
"x-forwarded-for": "::ffff:172.27.4.22"
},
"httpVersion": "1.1",
"owner": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
"caller": {
"login": "poseidon",
"uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
"groups": [
"operators"
],
"user": null
},
"timers": {
"earlySetup": 32,
"parseDate": 8,
"parseQueryString": 28,
"handler-3": 127,
"checkIfPresigned": 3,
"enforceSSL": 3,
"ensureDependencies": 5,
"_authSetup": 5,
"preSignedUrl": 3,
"checkAuthzScheme": 4,
"parseAuthTokenHandler": 36,
"signatureHandler": 73,
"parseKeyId": 59,
"loadCaller": 133,
"verifySignature": 483,
"parseHttpAuthToken": 5,
"loadOwner": 268,
"getActiveRoles": 43,
"gatherContext": 27,
"setup": 225,
"getMetadata": 5790,
"storageContext": 8,
"authorize": 157,
"ensureEntryExists": 3,
"assertMetadata": 3,
"getDirectoryCount": 7903,
"getDirectory": 10245
}
},
"resHeaderLength": 269,
"res": {
"statusCode": 200,
"headers": {
"last-modified": "Sat, 22 Mar 2014 01:17:01 GMT",
"content-type": "application/x-json-stream; type=directory",
"result-set-size": 1,
"date": "Tue, 01 Aug 2017 03:03:13 GMT",
"server": "Manta",
"x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
"x-response-time": 26,
"x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
}
},
"err": false,
"latency": 26,
"entryShard": "tcp://3.moray.staging.joyent.us:2020",
"route": "getstorage",
"msg": "handled: 200",
"time": "2017-08-01T03:03:13.985Z",
"v": 0
}
Below is an example log entry for a GET request, formatted using the json
tool. See [_muskie_log_entry_properties] for more details.
{
"name": "muskie",
"hostname": "204ac483-7e7e-4083-9ea2-c9ea22f459fd",
"pid": 79465,
"component": "HttpServer",
"audit": true,
"level": 30,
"_audit": true,
"operation": "putdirectory",
"billable_operation": "PUT",
"logicalRemoteAddress": "172.27.3.22",
"reqHeaderLength": 655,
"req": {
"method": "PUT",
"url": "/poseidon/stor/logs/config-agent/2017/08/01/02",
"headers": {
"user-agent": "curl/7.37.0",
"host": "manta.staging.joyent.us",
"accept": "*/*",
"date": "Tue, 01 Aug 2017 03:01:10 GMT",
"authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",signature=\"VkRkcUK7Y796whM3/IsAl+wVvsu9pKwVGNHIHxLqeBtJZqrR+cbgWZ/E9uchhsxsMezvVXVN7hMXhiSxlfnGJKjPoTKJzfJNSW8WEUhu7rMilRi9WkYGvxo/PpdplK0/Evx1dvxHSX2TiAoTgBs5s6IyP7j6LgySfDu6TzJu/9HJdLzIwAf/TTiHU15okOUoJGbcNb+OcGN/mp+EZpYbNbJ8+I585v1ZLTuta1eAPngUPWp5E7Vm5sUpJH87/8bx2H/3HaMB9YCCacorZ7NkVS5Mbiaz0ptYYEESj8DCJScKnEVrM/L97zGuTPOnQ38Il/CZfENAP7ZH2u029h3WSg==\"",
"connection": "close",
"content-type": "application/json; type=directory",
"x-forwarded-for": "::ffff:172.27.3.22"
},
"httpVersion": "1.1",
"owner": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
"caller": {
"login": "poseidon",
"uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
"groups": [
"operators"
],
"user": null
},
"timers": {
"earlySetup": 94,
"parseDate": 45,
"parseQueryString": 32,
"handler-3": 268,
"checkIfPresigned": 8,
"enforceSSL": 7,
"ensureDependencies": 9,
"_authSetup": 10,
"preSignedUrl": 7,
"checkAuthzScheme": 8,
"parseAuthTokenHandler": 78,
"signatureHandler": 155,
"parseKeyId": 166,
"loadCaller": 346,
"verifySignature": 1164,
"parseHttpAuthToken": 12,
"loadOwner": 234,
"getActiveRoles": 43,
"gatherContext": 28,
"setup": 315,
"getMetadata": 13345,
"storageContext": 14,
"authorize": 409,
"ensureParent": 222,
"mkdir": 841
}
},
"resHeaderLength": 215,
"res": {
"statusCode": 204,
"headers": {
"connection": "close",
"last-modified": "Tue, 01 Aug 2017 03:01:01 GMT",
"date": "Tue, 01 Aug 2017 03:01:11 GMT",
"server": "Manta",
"x-request-id": "ac2a5780-7665-11e7-b9e8-cf86a4bf1253",
"x-response-time": 18,
"x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
}
},
"latency": 18,
"entryShard": "tcp://3.moray.staging.joyent.us:2020",
"parentShard": "tcp://2.moray.staging.joyent.us:2020",
"route": "putdirectory",
"msg": "handled: 204",
"time": "2017-08-01T03:01:11.048Z",
"v": 0
}
All HTTP requests to Manta travel through an haproxy-based load balancer (in a component sometimes called "muppet") before reaching the Manta API ("webapi" or "muskie"). This load balancer is one of the first components that processes these requests when they arrive at Manta. For many problems internal to Manta, it’s more useful to look at log entries at Muskie (webapi). However, there are a several cases where it’s helpful to investigate the load balancer:
-
when investigating a problem where the client reports not having received a normal HTTP response (e.g., a "connection refused", "connection reset", or a connection timeout)
-
when investigating a problem where the client reports having received a 500-level error with no "x-server-name" header. (This generally indicates the response was sent by the load balancer, which happens when Muskie sends an invalid response or fails to send a response within a given timeout.)
-
when investigating a problem where Muskie reports surprising client behavior (e.g., client closed its connection mid-upload, or Muskie timed out waiting for a client to either upload or download data)
-
when investigating a failed request for which there appears to be no Muskie log entry at all
-
to identify the source IP address of a client in a case where Muskie fails to report that in its own log entry
Generally, if a client receives a well-formed response from Manta, the Muskie logs have more useful information than the load balancer logs. In these other cases where either the client or Muskie is doing something surprising, the load balancer log entries can provide more information about exactly what happened.
There’s a major caveat to the load balancer logs: haproxy is only able to interpret HTTP-level information about the first request on each TCP connection. For clients using HTTP keep-alive, where multiple HTTP requests are sent sequentially over a TCP connection, you may not find information in the haproxy logs about requests after the first one.
First, see [_locating_log_files] for information on where to find real-time
and historical log files. For the load balancer, real-time log files are stored
in /var/log/haproxy.log
. Historical log files are stored in Manta under
/poseidon/stor/logs/haproxy
.
Usually when you’re looking for a load balancer log entry, you know one or more of the following:
-
an approximate time of the request (almost always necessary)
-
the URL that was requested
-
the remote IP address that made the request
-
the status code that was returned
-
that the request experienced an unusual HTTP exchange (e.g., a malformed server response, a client timeout, or a server timeout)
-
the particular load balancer that handled the request
Since these are plaintext logs, you can use grep
or awk
to filter or
summarize them. See [_understanding_a_load_balancer_log_entry] for more.
Often you don’t know which load balancer handled a particular request. In
that case, you need to scan all of them for a given time. That might involve
manta-oneach
or a compute job. Again, see [_locating_log_files].
The load balancer logs in a plaintext format described in the haproxy documentation. (The haproxy documentation is also plaintext so it’s not possible to link directly to the right section, but look for the section called "Log format".) Our load balancer logs these through syslog, which prepends a few fields.
Here’s an example entry from our load balancer logs:
2018-12-05T18:32:01+00:00 42563f8d-4d61-4045-ab87-c71560388399 haproxy[65158]: ::ffff:72.2.115.97:42121 [05/Dec/2018:18:30:01.859] https secure_api/be6 2/0/0/-1/120005 502 245 - - SH-- 247/192/240/19/0 0/0 "GET /thoth/stor/thoth?limit=1024 HTTP/1.1"
We have a tool called haplog for
converting our haproxy log entries into JSON. Often, the easiest way to filter
and summarize these log entries is to pass the log through haplog
, use json
to extract the relevant fields, and then use grep
or awk
to summarize. We
can use it like this:
$ haplog 42563f8d.log | json ... { "syslog_date": "2018-12-05T18:32:01.000Z", "syslog_hostname": "42563f8d-4d61-4045-ab87-c71560388399", "pid": 65158, "client_ip": "72.2.115.97", "client_port": 42121, "accept_date": "05/Dec/2018:18:30:01.859", "frontend_name": "https", "backend_name": "secure_api", "server_name": "be6", "Tq": 2, "Tw": 0, "Tc": 0, "Tr": -1, "Tt": 120005, "status_code": 502, "bytes_read": 245, "termination_state": { "raw": "SH--", "termination_cause": "BACKEND_ABORT", "state_at_close": "WAITING_FOR_RESPONSE_HEADERS", "persistence_cookie_client": "N/A", "persistence_cookie_server": "N/A" }, "actconn": 247, "feconn": 192, "beconn": 240, "srv_conn": 19, "retries": 0, "srv_queue": 0, "backend_queue": 0, "http_request": "GET /thoth/stor/thoth?limit=1024 HTTP/1.1" }
There’s quite a lot of information here! Among the most relevant bits:
Example value | haplog JSON field name | Meaning |
---|---|---|
|
|
The timestamp when this entry was logged. This usually corresponds to the end of an HTTP request or TCP connection. This is very useful for constructing timelines of what happened. |
|
|
On our systems, this is the zonename of the particular load balancer that handled this request. |
|
|
The (remote) IP address of the client that connected to Manta. |
|
|
The timestamp when the TCP connection was accepted by the load balancer. This is very useful for constructing timelines of what happened. |
|
|
On our system, this is a unique identifier that indicates which Muskie zone
handled this request. This identifier varies between load balancer zones and
over time. In order to know which Muskie zone this corresponds to, you need to
find the corresponding line in the haproxy log file (at
|
|
|
This is a four-digit code that describes how the TCP session finally terminated. This is among the most useful fields for understanding abnormal behavior from the client or Muskie. This code can be used to tell whether the client or server either did something unexpected (like closed the TCP connection) or stopped responding. For details, on what each code means, see the haproxy documentation linked above. |
|
|
The first line of the HTTP request, which contains the HTTP method and request URL. |
There are a few telltale symptoms here:
-
The elapsed time between
accept_date
andsyslog_date
is exactly two minutes. The load balancer has a two-minute timeout for Muskie responding to requests. -
The termination status
SH--
is documented (in the haproxy docs) to mean:SH The server aborted before sending its full headers, or it crashed.
This entry appears to reflect Muskie closing its TCP connection (without sending an HTTP response) after exactly two minutes. If we didn’t know what happened here, we at least know now that Muskie did something unexpected and not the client, and we also know which Muskie zone it was.
Building a request timeline can be incredibly valuable for understanding why a particular request failed or why a request took as long as it did. Timelines are especially useful for understanding apparently mysterious failure modes like request timeouts, "socket hang-up" errors, "connection reset" errors, and the like.
A request timeline is simply a list of events related to a request and the precise timestamps when they were reported. You can include as much or as little detail as is relevant to your problem. It’s common to focus on the phases that happen inside Muskie. These are the simplest to collect and they’re often sufficient when failure modes are explicit (e.g., an authentication failure or a 404 response). A pretty complete request timeline could include a lot more than that about what happens at the load balancer and storage nodes.
Note
|
Keep in mind that wall clocks vary across physical machines. When comparing timestamps recorded from different systems, beware that differences may simply be a result of differences in the wall clock time on those systems. Within Manta, clocks are usually synchronized to within a few hundred milliseconds. (You can check this for the servers you’re looking at using NTP tools.) Timestamps reported by clients can be much further off, but if they’re also using NTP to synchronize their clocks, they may well be pretty close. |
A typical GET request for an object stored in Manta runs as follows:
-
The client establishes a TCP connection to the Manta load balancer. As part of this process, the load balancer establishes a TCP connection to a Muskie instance to handle any requests that arrive over this connection.
-
The client constructs an HTTP request and sends it over the TCP connection.
-
The load balancer forwards the request to Muskie.
-
Muskie begins processing the request.
-
There are a number of (usually very fast) setup phases within Muskie.
-
Muskie authenticates the request. This may involve fetching authentication and authorization information from Mahi, in which case an HTTP request will be sent over the network to Mahi to fetch basic information about the account.
-
Muskie fetches metadata associated with the object. This step involves a few network RPC calls to the metadata tier: first to an Electric-Moray instance, which forwards the request to a Moray instance, which executes a SQL query against a PostgreSQL primary.
-
Muskie authorizes the request. This involves comparing the authenticated user’s credentials to the credentials required to access the object (specified in its metadata).
-
Muskie makes an HTTP request to a storage node hosting a copy of the object.
-
Assuming this request succeeds, Muskie reports a successful response header to the client. Then data is sent from the storage node, through Muskie, through the load balancer, and to the client.
-
The request concludes when the last data is sent to the client.
-
PUT requests to upload objects are similar except that there’s an additional metadata RPC after all the data has streamed to the storage nodes. Other types of requests (e.g., creation and deletion of directories) are largely similar, but generally don’t involve storage nodes.
This is a simplification. For details, see the Muskie source code. (Don’t be afraid to crack it open!)
There are 1-2 dozen phases of request handling within Muskie, but most of the elapsed time of a request happens in only a handful of phases that relate to making requests to external services. These are described below.
The table below explains a number of events that happen while processing a request and where you can find more information about it. Many of these entries refer to entries in logs that are documented elsewhere in this guide. See [_understanding_a_muskie_log_entry] and [_understanding_a_load_balancer_log_entry].
Remember, it’s not necessary to collect all of these to start! Start with the basics and flesh out what looks relevant. Some of what’s below won’t apply to every request. However, if you’re stumped about a strange failure mode, it’s often helpful to construct a pretty complete timeline, as you’ll often find surprising gaps or unusual intervals (e.g., exactly 60 seconds from when something started until when some obscure error message was reported, which might suggest a timeout).
If you don’t know where to start, consider a timeline that just includes:
-
when the request was constructed on the client
-
when the request was received by the load balancer (if possible)
-
when the request was received by Muskie
-
when the metadata phases of request handling were completed in Muskie
-
when the request was completed by Muskie
It’s common to start there, skim the req.timers
field (mentioned below) to
look for unusually long phases (e.g., those taking upwards of 1 second), and
add those to the timeline as needed.
Event | Component where it happens | Where to find the timestamp | Notes |
---|---|---|---|
Client establishes a TCP connection to the load balancer. |
Load balancer |
Load balancer log entry, |
Both the client and server may use timeouts that measure from this timestamp. It can be helpful to check for intervals of round numbers (e.g., exactly 1 or 2 minutes). |
Client constructs the HTTP request. |
Client (outside Manta) |
Muskie log entry, |
Large gaps between when the client constructed the request and when Muskie began processing it could indicate queueing in Muskie or somewhere in the network before reaching Muskie. |
Muskie begins processing the request. |
Muskie |
Muskie log entry. There’s no discrete field for this, but you can compute this by taking the timestamp when Muskie completed the request (see below) and subtracting the total latency of the request (including time to transfer data). To calculate the total latency, sum all the timers in |
This is a very useful point at which to divide an investigation, since large gaps in time before this point indicate queueing prior to the request reaching Muskie, which generally involve different tools and data sources than latency after this point. |
Muskie makes HTTP request to Mahi (authcache) to load information needed to authenticate and authorize this request. |
Muskie |
Muskie log entry, |
Most requests do not actually involve any of these HTTP requests because the information is usually cached in Muskie. However, latency here may indicate a problem with Mahi or the network. |
Muskie makes RPCs to load metadata. |
Muskie |
Muskie log entry, |
High latency here indicates a slow read RPC to the metadata tier. |
Muskie issues HTTP requests to storage nodes. |
Muskie |
Muskie log entry, |
This step is often forgotten, but it’s important to complete before Muskie sends successful response headers to the client. High latency between when these requests are issued and when storage nodes send headers back may indicate a DNS problem, a network problem, or a storage node that’s offline or overloaded. |
Muskie sends response headers. |
Muskie |
Muskie log entry, |
Sometimes called latency-to-first-byte, this is a significant point in the request because until this point, the client has heard nothing from the server while the server authenticates the client, loads metadata, and authorizes the request. Many clients have timeouts (sometimes as short as a few seconds) from when they send the request until they see response headers. If there’s a long gap in this range and then Muskie subsequently reports a client error (e.g., timed out waiting for the client or a closed connection from the client), the client may have given up. Also, the work associated with the rest of this request depends on how large the end user’s object is and how much network bandwidth they have available to Manta. As a result, it’s common to summarize this latency-to-first-byte rather than the whole latency of the request, as this number is more comparable across different types of requests. |
Muskie begins streaming data between storage nodes and the client. |
Muskie |
Muskie log entry, |
This should be about the same time as the response headers are sent. |
Storage nodes process HTTP requests |
Mako |
|
High latency between when Muskie began transferring data to or from storage nodes and when storage nodes report having started may indicate overload on the storage node or a network problem. |
Muskie finishes streaming data between storage nodes and the client. |
Muskie |
See the row above where Muskie begins streaming data. The end of the corresponding phase indicates when this finished. |
The time for this phase is highly variable depending on how much data is requested and the bandwidth available to the client. It’s more useful to compute throughput (as the total bytes sent divided by the latency of this phase) than look at the latency directly. Low throughput here can be indicative of almost anything: a slow client, a slow network, overloaded Muskie, or an overloaded storage node. |
Muskie makes RPCs to save metadata. |
Muskie |
Muskie log entry, Another possible indicator of this timestamp may be given by the |
High latency here indicates a slow write RPC to the metadata tier. |
Load balancer indicates the TCP session ended. |
Load balancer |
Load balancer log, |
This is generally the last time the client heard anything from the server associated with this request. Normally, this indicates the end of the request. In pathological cases where metadata operation takes several minutes, the load balancer may terminate the request, logging this entry and sending a failure response to the client, even though Muskie is still processing the request. This is typically followed by a Muskie log entry with a timestamp after this point. The latency of the various phases in the Muskie request point to what took so long. |
There’s loads more information available in the system. Depending on your problem, you may need to get more creative. Examples:
-
The approximate time when any system (or zone) halted or booted can be determined from the output of
last reboot
,uptime
, SMF service times, or process start times. -
The precise time when a process started can be extracted from
ps(1)
. -
The precise time when a system panicked can be extracted from a crash dump from the
panic_hrestime
andpanic_hrtime
variables. -
The time that a process crashed (dumped core) can be inferred from the mtime of the core file or entries in the SMF log file.
Let’s make our own test request and make a timeline for it:
$ mget -v /dap/stor/cmd.tgz 2> >(bunyan) > /dev/null
...
[2019-05-22T18:22:55.329Z] TRACE: mget/MantaClient/12438 on blinky.local (/Users/dap/install/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received (client_res={})
HTTP/1.1 200 OK
etag: cb1036e4-3b57-c118-cd46-961f6ebe12d0
last-modified: Tue, 08 Apr 2014 21:56:13 GMT
accept-ranges: bytes
content-type: application/octet-stream
content-md5: vVRjo74mJquDRsoW2HJM/g==
content-length: 17062152
durability-level: 2
date: Wed, 22 May 2019 18:22:55 GMT
server: Manta
x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
x-response-time: 187
x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd
connection: keep-alive
x-request-received: 1558549375054
x-request-processing-time: 275
With the debug output from the client command, we can easily find the Muskie
instance (204ac483-7e7e-4083-9ea2-c9ea22f459fd
) that handled this request and
request-id (0b241d9c-d076-4c9b-b954-3b65adb73c73
). (This is just an example.
If you don’t have these in your case, see
[_finding_or_generating_a_failed_request].)
To find the Muskie log entry, we must first find Muskie zone
204ac483-7e7e-4083-9ea2-c9ea22f459fd
. We can see which datacenter it’s in
using manta-adm show
:
[root@headnode (staging-1) ~]# manta-adm show -a webapi
SERVICE SH DATACENTER ZONENAME
webapi 1 staging-1 380920d9-ed44-4bcd-b61c-4b99f49c1329
webapi 1 staging-2 204ac483-7e7e-4083-9ea2-c9ea22f459fd
webapi 1 staging-3 6e59a763-6f6a-46a1-926e-90c1b7fc370b
It’s in staging-2. Following the instructions above, we can search for the log entry for this request:
[root@headnode (staging-2) ~]# manta-oneach -z 204ac483-7e7e-4083-9ea2-c9ea22f459fd 'grep 0b241d9c-d076-4c9b-b954-3b65adb73c73 /var/log/muskie.log'
SERVICE ZONE OUTPUT
webapi 204ac483 {"name":"muskie","hostname":"204ac483-7e7e-4083-9ea2-c9ea22f459fd","pid":969231,"component":"HttpServer","audit":true,"level":30,"_audit":true,"operation":"getstorage","billable_operation":"GET","bytesTransferred":"17062152","logicalRemoteAddress":"172.20.5.18","remoteAddress":"127.0.0.1","remotePort":53399,"reqHeaderLength":503,"req":{"method":"GET","url":"/dap/stor/cmd.tgz","headers":{"accept":"*/*","x-request-id":"0b241d9c-d076-4c9b-b954-3b65adb73c73","date":"Wed, 22 May 2019 18:22:55 GMT","authorization":"Signature keyId=\"/dap/keys/c0:24:6d:54:c4:5e:72:15:b5:65:21:80:69:81:cb:14\",algorithm=\"ecdsa-sha256\",headers=\"date\",signature=\"MEUCIBaFtaKwO2XwOfJGXPw2ITfr1C2v7fHi0lMcsCa4049FAiEAkaZ/oPTYNbYdVvAXJM9pFq3rJgZB2F7gUQjKu2UG1m4=\"","user-agent":"restify/1.4.1 (x64-darwin; v8/3.14.5.9; OpenSSL/1.0.1t) node/0.10.45","accept-version":"~1.0","host":"manta.staging.joyent.us","connection":"keep-alive","x-forwarded-for":"::ffff:172.20.5.18"},"httpVersion":"1.1","owner":"bc8cd146-fecb-11e1-bd8a-bb6f54b49808","caller":{"login":"dap","uuid":"bc8cd146-fecb-11e1-bd8a-bb6f54b49808","groups":[],"user":null},"timers":{"earlySetup":99,"parseDate":30,"parseQueryString":35,"handler-3":202,"checkIfPresigned":10,"enforceSSL":8,"ensureDependencies":10,"_authSetup":9,"preSignedUrl":7,"checkAuthzScheme":14,"parseAuthTokenHandler":300,"signatureHandler":152,"parseKeyId":149,"loadCaller":3396,"verifySignature":4258,"parseHttpAuthToken":15,"loadOwner":4353,"getActiveRoles":192,"gatherContext":34,"setup":303,"getMetadata":169779,"storageContext":17,"authorize":201,"ensureEntryExists":5,"assertMetadata":3,"getDirectoryCount":3,"getDirectory":3,"negotiateContent":65,"checkIfMatch":10,"checkIfNoneMatch":5,"checkIfModified":10,"checkIfUnmodified":5,"verifyRange":4,"streamFromSharks":576907}},"resHeaderLength":369,"res":{"statusCode":200,"headers":{"etag":"cb1036e4-3b57-c118-cd46-961f6ebe12d0","last-modified":"Tue, 08 Apr 2014 21:56:13 GMT","accept-ranges":"bytes","content-type":"application/octet-stream","content-md5":"vVRjo74mJquDRsoW2HJM/g==","content-length":"17062152","durability-level":2,"date":"Wed, 22 May 2019 18:22:55 GMT","server":"Manta","x-request-id":"0b241d9c-d076-4c9b-b954-3b65adb73c73","x-response-time":187,"x-server-name":"204ac483-7e7e-4083-9ea2-c9ea22f459fd"}},"latency":187,"objectId":"cb1036e4-3b57-c118-cd46-961f6ebe12d0","sharksContacted":[{"shark":"1.stor.staging.joyent.us","result":"ok","timeToFirstByte":2,"timeTotal":577,"_startTime":1558549375336}],"entryShard":"tcp://3.moray.staging.joyent.us:2020","route":"getstorage","_auditData":true,"dataLatency":762,"dataSize":17062152,"latencyToFirstByte":187,"msg":"handled: 200","time":"2019-05-22T18:22:55.914Z","v":0}
or, formatted with bunyan:
[2019-05-22T18:22:55.914Z] INFO: muskie/HttpServer/969231 on 204ac483-7e7e-4083-9ea2-c9ea22f459fd: handled: 200 (audit=true, _audit=true, operation=getstorage, billable_operation=GET, bytesTransferred=17062152, logicalRemoteAddress=172.20.5.18, remoteAddress=127.0.0.1, remotePort=53399, reqHeaderLength=503, resHeaderLength=369, latency=187, objectId=cb1036e4-3b57-c118-cd46-961f6ebe12d0, entryShard=tcp://3.moray.staging.joyent.us:2020, route=getstorage, _auditData=true, dataLatency=762, dataSize=17062152, latencyToFirstByte=187, req.owner=bc8cd146-fecb-11e1-bd8a-bb6f54b49808)
GET /dap/stor/cmd.tgz HTTP/1.1
accept: */*
x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
date: Wed, 22 May 2019 18:22:55 GMT
authorization: Signature keyId="/dap/keys/c0:24:6d:54:c4:5e:72:15:b5:65:21:80:69:81:cb:14",algorithm="ecdsa-sha256",headers="date",signature="MEUCIBaFtaKwO2XwOfJGXPw2ITfr1C2v7fHi0lMcsCa4049FAiEAkaZ/oPTYNbYdVvAXJM9pFq3rJgZB2F7gUQjKu2UG1m4="
user-agent: restify/1.4.1 (x64-darwin; v8/3.14.5.9; OpenSSL/1.0.1t) node/0.10.45
accept-version: ~1.0
host: manta.staging.joyent.us
connection: keep-alive
x-forwarded-for: ::ffff:172.20.5.18
--
HTTP/1.1 200 OK
etag: cb1036e4-3b57-c118-cd46-961f6ebe12d0
last-modified: Tue, 08 Apr 2014 21:56:13 GMT
accept-ranges: bytes
content-type: application/octet-stream
content-md5: vVRjo74mJquDRsoW2HJM/g==
content-length: 17062152
durability-level: 2
date: Wed, 22 May 2019 18:22:55 GMT
server: Manta
x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
x-response-time: 187
x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd
--
sharksContacted: [
{
"shark": "1.stor.staging.joyent.us",
"result": "ok",
"timeToFirstByte": 2,
"timeTotal": 577,
"_startTime": 1558549375336
}
]
--
req.caller: {
"login": "dap",
"uuid": "bc8cd146-fecb-11e1-bd8a-bb6f54b49808",
"groups": [],
"user": null
}
--
req.timers: {
"earlySetup": 99,
"parseDate": 30,
"parseQueryString": 35,
"handler-3": 202,
"checkIfPresigned": 10,
"enforceSSL": 8,
"ensureDependencies": 10,
"_authSetup": 9,
"preSignedUrl": 7,
"checkAuthzScheme": 14,
"parseAuthTokenHandler": 300,
"signatureHandler": 152,
"parseKeyId": 149,
"loadCaller": 3396,
"verifySignature": 4258,
"parseHttpAuthToken": 15,
"loadOwner": 4353,
"getActiveRoles": 192,
"gatherContext": 34,
"setup": 303,
"getMetadata": 169779,
"storageContext": 17,
"authorize": 201,
"ensureEntryExists": 5,
"assertMetadata": 3,
"getDirectoryCount": 3,
"getDirectory": 3,
"negotiateContent": 65,
"checkIfMatch": 10,
"checkIfNoneMatch": 5,
"checkIfModified": 10,
"checkIfUnmodified": 5,
"verifyRange": 4,
"streamFromSharks": 576907
}
Similarly, we can look for a load balancer log entry, which gets us this:
2019-05-22T18:22:56+00:00 710f85a9-6f13-4fcf-8f3f-a03c4a508714 haproxy[550758]: ::ffff:172.20.5.18:56707 [22/May/2019:18:22:55.125] https secure_api/be1 6/0/0/188/934 200 17062612 - - ---- 25/8/24/8/0 0/0 "GET /dap/stor/cmd.tgz HTTP/1.1"
From these log entries, we can put together this timeline:
18:22:55 Z Client generates "Date" header for request
(source: Muskie log entry, request headers)
18:22:55.125Z Load balancer haproxy accepts TCP request from client (172.20.5.18)
(source: Load balancer log entry, accept timestamp)
18:22:55.153Z Muskie begins processing request
(source: Muskie log entry timestamp minus the sum of all handler times)
18:22:55.166Z Muskie begins loading metadata for request
(source: Muskie log entry, inferred wall time for "getMetadata" handler start)
18:22:55.336Z Muskie begins HTTP request to storage node "1.stor.staging.joyent.us"
(source: Muskie log entry, "sharksContacted" object)
18:22:55.337Z Muskie begins streaming data from storage node
(source: Muskie log entry, inferred wall time for "streamFromSharks" handler start)
18:22:55.913Z Muskie finishes streaming data from storage node
(source: Muskie log entry, "sharksContacted" object)
18:22:55.914Z Muskie records the request completed
(source: Muskie log entry timestamp)
18:22:56 Z Load balancer haproxy terminates session
(source: haproxy log message timestamp)
18:22:56.059Z Load balancer haproxy terminates session
(source: accept time (see above) plus 934ms total latency reported by haproxy)
You may observe any of the following symptoms:
-
a 413 ("Request Entity Too Large") response code
-
an error with name
MaxContentLengthExceededError
-
an error with a message like
request has exceeded 11 bytes
(the actual number of bytes varies) -
a Muskie log entry showing
MaxSizeExceededError
All of these indicate that the client attempted a streaming upload but attempted
to upload more bytes than Manta expected. If the max-content-length
header
was provided, its value was too small for the content that was uploaded. If the
header was not provided, then the default value picked for the maximum streaming
upload size was not sufficient. In both cases, the issue can be resolved by
having the client provide a max-content-length
that’s at least as large as the
object that it’s attempting to upload.
Here’s an example that demonstrates the case:
$ echo 1234567890 | mput -q -H 'max-content-length: 7' /dap/public/foo
mput: MaxContentLengthExceededError: request has exceeded 11 bytes
$
See the section on [_not_enough_free_space_for_mb] for more details on
streaming uploads and the max-content-length
header. Note that the error
message incorrectly implies that 11 bytes was the limit. This is a
bug. The actual limit was whatever was
specified by the max-content-length
header or else a default value described
in that linked section.
Note that for non-streaming uploads, the failure mode for attempting to upload too many bytes is less well-defined, as this is likely to break the framing of HTTP requests on the TCP connection.
Certain underlying conditions (described below) result in the following symptoms:
-
an error (reported either from the client or server) with message "No storage nodes available for this request"
-
a 503 response code (reported either from the client or server), likely having a "Retry-After" header
-
a client-reported error called "InternalError"
-
a server-reported error called "SharksExhaustedError"
If you see these symptoms, read on for more information.
Note
|
This section does not apply to all 503 response codes nor all instances of InternalError. However, the condition described here always produces a 503 response code, an InternalError reported to the client, and a SharksExhaustedError reported internally, and both the client and the server errors always include this message. |
These symptoms occur when Manta gave up on an object upload request (i.e., a PUT
request) while trying to connect to storage nodes to store copies of the
requested object. In many cases, retrying the request is likely to succeed
because different storage nodes will be selected that are likely to be
available. For that reason, the response generally includes a Retry-After
HTTP header. If the problem is persistent, there may be an unusual problem
affecting a lot of storage nodes (e.g., a whole datacenter partition).
More specifically: for each upload request, Manta selects up to three sets of storage nodes that will be used to store copies of the object. Each set contains as many storage nodes as there will be copies of the object (two by default, but this can be overridden by the client). Manta first attempts to initiate an object upload to all the storage nodes in the first set. If any of those fails, it moves on to the second set, and so on until all sets are exhausted, at which point the request fails.
Based on the way this works, it’s possible for Manta to report this failure even when most of the system is functioning normally — it only takes three storage node failures. Keep in mind, though, that when Manta selects storage nodes for these sets (before even attempting to connect to them), it only selects storage nodes that have reported their own capacity relatively recently (typically within the last minute or so), so it’s expected that most of them should be online and functioning. That’s why retries are likely to succeed and why persistent occurrence of this problem may indicate a more unusual network issue.
To understand the problem in more detail:
-
First, follow the steps to find the Muskie log entry for the failed request.
-
The log entry for object uploads includes a property called
sharksContacted
, which describes each of the sharks (storage nodes) that Muskie attempted to reach. Each entry indicates the time Manta started trying to reach it, how long it took (if successful), and what the result was (e.g.,ok
). See the entry forsharksContacted
under [_muskie_provided_properties_for_debugging_only] for an example. Also see [_understanding_a_muskie_log_entry].
Using the sharksContacted
property, you can [_build_a_request_timeline].
You should be able to tell from this property which storage nodes were part of
which set (e.g., the first set of two, the second set of two, and so on). This
will allow you to confirm that Manta failed to reach at least one of the storage
nodes in each set, and it will indicate which storage nodes it failed to
contact.
What if there are fewer than three sets of storage nodes? This can happen when there aren’t enough storage nodes meeting the desired storage requirements (e.g., that copies are in different datacenters). In small deployments, this can lead to higher error rates than expected.
Once you’ve determined which storage nodes Manta failed to reach, you can dig further into why Manta failed to reach them. This is largely beyond the scope of this guide, but below are some questions you might start with:
-
Was the storage node actually online at this time? You may be able to use logs in the zone (e.g.,
/var/log/mako-access.log
) or the output of commands likelast reboot
oruptime
to demonstrate that the zone was online (or not). Similarly, you could look for evidence that the server itself was online using logs or these same commands from the global zone. You could look for crash dumps (indicating an operating system panic) in/var/crash/volatile
in the global zone or looking at logs in/var/log
in the global zone. -
Was the storage node reachable at this time? This can be difficult to debug after the fact, but you can look for evidence in some of these same logs (e.g.,
/var/log/mako-access.log
). Evidence of serving requests indicates at least some remote systems were able to reach it. A sudden drop in request activity is consistent with network problems. However, many of the symptoms of network issues (including disconnects, timeouts, and reset connections) are also consistent with systemic problems that caused important programs to stop running for an extended period (e.g., major I/O problems or problems with the operating system).
If you’re debugging the problem right after it happened or if it’s currently
reproducible, these questions are easier to answer. For example, if you can log
into a zone, then it’s not offline. You can check for network connectivity with
tools like ping(1)
.
This message (associated with 507 errors) indicates that Manta does not have enough space available on enough storage nodes for the write that was requested. This would be surprising in production environments, although it’s easy to induce even in production by requesting an absurd size. For example, you’ll see this if you attempt to upload an enormous object:
$ mput -H 'max-content-length: 1125899906842624' /dap/stor/waytoobig
mput: NotEnoughSpaceError: not enough free space for 1073741824 MB
Manta determines a maximum object size for each upload request and validates that enough storage is available to satisfy the request. When there aren’t enough storage nodes that have the required amount of space, the request fails with this message. (A request will also fail if the client attempts to upload more than the maximum expected size, but that produces a different error.)
In determining the maximum object size for an upload request,
Manta supports two kinds of
uploads: fixed-size and streaming. A fixed-size upload is one that
specifies the size of the object using the content-length
request header.
Manta uses this value directly as the maximum object size. A streaming
upload is specified using the transfer-encoding: chunked
header. In this
case, the space that Manta allocates up front (and validates) is given by the
max-content-length
header, if provided, or else a server-side, configurable
default value. The storage.defaultMaxStreamingSizeMB
property in Muskie’s
configuration file determines this value. This is typically populated from the
MUSKIE_DEFAULT_MAX_STREAMING_SIZE_MB
SAPI property. The typical value is 5120
MB. This is documented in the Manta API, so changing it may break working
clients. Note that this default is sometimes too large in some development
environments, which can cause streaming uploads to fail with this message. An
object PUT must specify either transfer-encoding: chunked
or content-length
.
To understand a specific instance of this error, ask:
-
Does the size in the error message match what you (and the end user) expected would be required for this request? The process by which Manta determines this size is described above. If this does not match what you and the end user expect, that may reflect a bug. (More commonly, end users may not understand the semantics of
content-length
andmax-content-length
or operators don’t understand thedefaultMaxStreamingSizeMB
default value. Again, see above and the linked API documentation for an explanation of those.) -
If the value matches what you expect, does Manta actually have that much space? This question is more complex than it sounds. Keep in mind that:
-
Manta will only consider storage zones currently using less than a configurable maximum amount of space. This is determined by the
storage.maxUtilizationPct
property of Muskie configuration files, which comes from theMUSKIE_MAX_UTILIZATION_PCT
SAPI property. For requests from operator accounts, thestorage.maxOperatorUtilizationPct
property is used instead, which comes from theMUSKIE_MAX_OPERATOR_UTILIZATION_PCT
SAPI property. -
Multiple copies are written for each new object. The number of copies is determined by the
durability-level
header that is specified by the client. All copies of the object must be assigned to separate storage zones. The requested amount of space must be present on as many storage zones are there are supposed to be copies created. -
If this is a multi-datacenter deployment (i.e., if the
storage.multiDC
Muskie configuration property (which comes from theMUSKIE_MULTI_DC
SAPI property) is set to a true value), then copies must be spread across datacenters as well. If the number of copies is less than the number of datacenters (e.g., 2 copies in a 3-datacenter deployment), then all copies must be placed into separate datacenters (so there must be a storage zone in each datacenter with enough capacity). If the number of copies is at least the number of datacenters, at least one copy must be placed in each datacenter.
-
Example: in the common case of durability-level: 2
(the default) in a multi-DC
deployment with content-length
of 10 GB, there must be 10 GB of space
available on at least one storage zone in each of two datacenters. The
request could fail with this error even if there are two storage zones with
enough space if those storage zones are all in the same datacenter.
The mpicker
tool can be used to determine how much space is available on each
storage zone. The mpicker poll
command summarizes available space on each
storage zone that’s under the configured utilization percentage. For example:
[root@380920d9 (webapi) ~]$ /opt/smartdc/muskie/bin/mpicker poll
polling options:
max percentage utilization: 92
DATACENTER MANTA_STORAGE_ID %USED AVAIL_MB TIMESTAMP
staging-3 3.stor.staging.joyent.us 91 197218 2019-04-10T21:15:35.726Z
staging-2 2.stor.staging.joyent.us 87 306039 2019-04-10T21:15:35.221Z
staging-1 1.stor.staging.joyent.us 81 456521 2019-04-10T21:15:37.506Z
[root@380920d9 (webapi) ~]$
The mpicker choose
command can be used to simulate Manta’s storage allocation
behavior for a request of a given size. Here’s an example request for a size
that would fail, given the above output:
[root@380920d9 (webapi) ~]$ /opt/smartdc/muskie/bin/mpicker poll -j | /opt/smartdc/muskie/bin/mpicker choose -s 310000
replicas requested: 2
size requested: 310000 MB
storage server count by datacenter:
staging-3: 1 (0 available)
staging-2: 1 (0 available)
staging-1: 1 (1 available)
total storage server count: 3 (1 available)
could not choose storage node set: 2 copies requested, but only 1 DC(s) have sufficient space
If you run mpicker choose
in this way using the same size as you see in the
"No space available for … " message, you’d typically expected to see the same
error.
The mpicker
command has a number of options to control the output and simulate
different conditions. See the help output for details.
See Locating object data in the Manta Operator’s Guide.
See Locating object data in the Manta Operator’s Guide.
See Locating servers in the Manta Operator’s Guide.
See Locating Manta component zones in the Manta Operator’s Guide.
It’s very valuable to briefly and precisely characterize a problem:
-
when asking for help from others, so they can quickly understand the relevant context
-
for summarizing status to stakeholders
-
for handing off work to others (e.g., when escalating to another team or changing shifts)
-
as a form of rubber-ducking — it often helps spark insight
When characterizing a problem, include the basic facts:
-
Exactly what you’ve observed: 500 errors? 503 errors? Elevated latency? It’s useful to be as specific as you can (e.g., "a 5% increase in 500 errors"), but it’s better to be vague (e.g., "an increase in latency") than to say something false (e.g., "an increase in average latency" when only tail latency is affected).
-
Something about when the observations started. Again, it’s useful to be as specific as possible (e.g., "starting at 2018-08-09T16:47Z"), but it’s better to be vague (e.g., "seems to have increased since yesterday") than incorrect (e.g., "it’s higher than yesterday" when you’re really just eyeballing an average value from a graph).
-
What other observations you’ve made (or anything else you’ve tried, and the results)
-
(if you’re asking for help) Any specific questions you have.
It’s useful to mention what conclusions you’ve drawn, but try to distinguish facts (e.g., "moray queue lengths are high") from inferences ("moray is overloaded").
Here’s a good status summary:
Webapi is reporting an increase in 500 and 503 errors since 2018-08-09T16:58Z. These do not seem correlated with increased latency, nor with any errors from the metadata tier. How do we check for errors from the storage tier?
It’s often very helpful to include screenshots (MacOS instructions). Here’s an example:
Here’s one way to characterize this:
We saw elevated latency since early on 8/1 (UTC)
Better:
We saw a modest increase in tail latency (both p90 and p99) starting early on 8/1 and continuing through 8/4 (UTC). This was followed by a sharp, significant increase on 8/6.
More detailed (may be better, depending on the situation):
We saw a modest increase in tail latency (both p90 and p99) starting early on 8/1 and continuing through 8/4 (UTC). This was followed by a sharp, significant increase on 8/6. During both increases, latency was much less stable than before. p90 peaked at about 50% higher than normal, while p95 spiked about 3x what it was before.
Of course, it’s easy to give too much detail, too. Think about what’s likely to be relevant for your audience.
Grafana’s features can help you make quantitative statements. For example, take this graph:
About the dip on 8/6, you could say:
We saw a dip in throughput on 8/6.
Better would be:
There was a brief dip in inbound throughput on 8/6 around 12:36Z.
Since Grafana can show you both the average for the whole period, plus the value at a point, you can make more specific statements:
There was a brief dip in inbound throughput on 8/6 around 12:36Z. It dropped about 38% relative to the average for 14 days around it (3.3 GBps during the dip, compared to a recent average of about 5.3 GBps).