Skip to content

Latest commit

 

History

History
1580 lines (1327 loc) · 76.7 KB

0002-investigation-tasks.adoc

File metadata and controls

1580 lines (1327 loc) · 76.7 KB

Basic investigation tasks

Investigating a slow tier of Node processes

The following services within the Manta data path are Node.js-based:

  • Webapi (muskie)

  • Electric-Moray

  • Moray

  • Authcache (mahi)

  • Nameservice (not covered in this section)

Note
This section assumes that you’re looking at a specific tier of services (i.e., one of the ones listed above). If you’re looking at Moray specifically, as you follow this section, consider only the Moray instances for the particular shard you’re looking at.

First, have you already confirmed that this tier is reporting high latency? If not, check metric dashboards first to see whether latency is high. See [_investigating_an_increase_in_latency].

Next, have you also confirmed that the reason for the latency is not due to a dependent service? For example, if you’re here because of Electric-Moray, have you confirmed that Moray isn’t also seeing high latency? If you haven’t, see [_investigating_an_increase_in_latency] for how to locate which tier is the origin of overall latency.

At this point, you’ve confirmed that this tier appears to be the source of a latency increase. Now, use the latency metric dashboards to see is the increase in latency affecting nearly all instances?

  • Yes, nearly all instances are reporting high latency. This might reflect insufficient capacity. Individual Node processes are usually close to capacity when they exceed about 80-85% of CPU utilization. Each component typically has multiple processes deployed in each zone, which means the saturation point (80-85% per process) is different for each type of zone:

    Service Processes per zone Saturation point Equivalent CPUs

    webapi

    16

    13000% (zone-wide)

    13 CPUs

    electric-moray

    4

    325% (zone-wide)

    3.25 CPUs

    moray

    4

    325% (zone-wide)

    3.25 CPUs

    authcache

    1

    80% (zone-wide)

    0.8 CPUs

    Given the above guides, use CPU utilization graphs to determine: are most instances in this service close to CPU saturation? (The [_deployment_specific_details] section recommends having both a line graph showing the minimum zone-wide CPU usage for each service and a heat map of zone-wide CPU utilization. If the line graph is above these guidelines, the service definitely looks saturated. The heat map allows you to identify cases where some instances might have more headroom available, but most of them are still too busy.)

    • Yes, most instances are nearing saturation. Deploy more instances of the service in question. See [_scaling_up_a_component]. Additionally, if the workload has not changed substantially, you may want to check for a memory leak that may have affected many processes. If it has, then deploying more instances will likely only help for a little while — until those suffer the same leak.

    • No, many instances appear to have plenty of headroom. This is very unusual, so it’s worth double-checking that latency is elevated across the board, but latency at dependent services is not high, and CPU utilization is not high. If this is really the case, pick any specific process showing high latency and see [_investigating_a_slow_process]. Other avenues to consider: is the dependent service close to CPU saturation? If so, clients of the dependent service may see much higher latency than the service reports because of queueing. Is there evidence of elevated packet loss? This can also increase client-side latency without manifesting as latency reported by the dependent service.

  • No, only some instances are reporting high latency. In this case, this service does not appear generally overloaded, although it’s possible that some instances are. Next question: can you tell from the per-request metrics for this tier whether the workload is evenly distributed across instances?

    • Yes, the workload is evenly distributed across instances. In this case, it appears that processes are generally doing comparable work, but some are doing it much slower. The next step is to use the dashboard to identify a zone with particularly high latency and dig deeper into a specific slow process. See [_investigating_a_slow_process].

    • No, the workload is unevenly distributed. See [_investigating_service_discovery]. (Remember, if you’re looking at Moray, this section assumes you’re looking only at instances for a particular shard. As described under [_investigating_an_increase_in_latency], if you see latency at Moray, you should first isolate the shard and investigate latency in each shard separately.)

Investigating PostgreSQL latency

Finding (or generating) a failed request

When trying to understand either an explicit error or high latency, it can be helpful to investigate the log entry written by "webapi" for a specific request.

Do you already have information about a specific request you want to investigate?

Next question: does the problem appear to be reproducible? Try reproducing the problem with the node-manta command-line tools (e.g., mls). You can use the -v flag and redirect stderr to the bunyan command to see request and response headers, like this:

$ mls -v /poseidon/public 2> >(bunyan)
[2018-07-19T21:18:48.146Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:1536 in ls): ls: entered (req_id=4b4927be-fc1f-4dd8-88fe-2ae75dcbc262, path=/poseidon/public)
[2018-07-19T21:18:48.148Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:1128 in info): info: entered (req_id=0f34c68e-2072-405a-be3a-248e8020f1ba, path=/poseidon/public, id=0f34c68e-2072-405a-be3a-248e8020f1ba, query={})
    headers: {
      "accept": "application/json, */*",
      "x-request-id": "0f34c68e-2072-405a-be3a-248e8020f1ba"
    }
[2018-07-19T21:18:48.189Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:314 in rawRequest): request sent
    HEAD /poseidon/public HTTP/1.1
    Host: us-central.manta.mnx.io:null
    accept: application/json, */*
    x-request-id: 0f34c68e-2072-405a-be3a-248e8020f1ba
    date: Thu, 19 Jul 2018 21:18:48 GMT
    authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow=="
    user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
    accept-version: ~1.0
[2018-07-19T21:18:48.861Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received
    HTTP/1.1 200 OK
    content-type: application/x-json-stream; type=directory
    result-set-size: 5
    date: Thu, 19 Jul 2018 21:18:48 GMT
    server: Manta
    x-request-id: 0f34c68e-2072-405a-be3a-248e8020f1ba
    x-response-time: 254
    x-server-name: 511ea59e-2a4a-486b-9258-ef59016d064d
    connection: keep-alive
    x-request-received: 1532035128176
    x-request-processing-time: 684
[2018-07-19T21:18:48.867Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:820 in get): get: entered (req_id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37, path=/poseidon/public, id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37)
    headers: {
      "accept": "application/x-json-stream",
      "x-request-id": "4b4927be-fc1f-4dd8-88fe-2ae75dcbc262",
      "date": "Thu, 19 Jul 2018 21:18:48 GMT",
      "authorization": "Signature keyId=\"/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f\",algorithm=\"rsa-sha1\",headers=\"date\",signature=\"kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow==\""
    }
    --
    query: {
      "limit": 1024
    }
[2018-07-19T21:18:48.872Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:314 in rawRequest): request sent
    GET /poseidon/public?limit=1024 HTTP/1.1
    Host: us-central.manta.mnx.io:null
    accept: application/x-json-stream
    x-request-id: 4b4927be-fc1f-4dd8-88fe-2ae75dcbc262
    date: Thu, 19 Jul 2018 21:18:48 GMT
    authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="kG7IydhNO06ImfI6hFzFXXoSrWT6+2kCcDUC3swGebIr7YxeDcLEWMxGzB4z5lC29Vv7kgpLGaTc218m+63D0Y3M84LTNCvM1Va5COetXhIHkkAlBtXpJt5MUjqsRFK1xrpGKJjDc1QIBGSQIDmh4p6wNjofeaLX8jYnYa7FagW5iyQIHQmpAwe/AO+9Bg7fXBgzfvVZjWfhLaBA4G2CwuCSlkpF7mR7t04pTn+oxOmufE5h6XI/VLNsLZyQkc6prBFDoSiOLMgZsGfdsF11J9c/lCK/PW1y4MlTZBDGG8W1F0ssUEx0euLdm4TsqoBc1cfeIC43fV6sR2nN/CSiow=="
    user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
    accept-version: ~1.0
[2018-07-19T21:18:49.365Z] TRACE: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received
    HTTP/1.1 200 OK
    content-type: application/x-json-stream; type=directory
    result-set-size: 5
    date: Thu, 19 Jul 2018 21:18:49 GMT
    server: Manta
    x-request-id: 4b4927be-fc1f-4dd8-88fe-2ae75dcbc262
    x-response-time: 219
    x-server-name: 60771e58-2ad0-4c50-8b23-86b72f9307f8
    connection: keep-alive
    transfer-encoding: chunked
    x-request-received: 1532035128869
    x-request-processing-time: 496
-rwxr-xr-x 1 poseidon            17 Dec 04  2015 agent.sh
drwxr-xr-x 1 poseidon             0 Sep 18  2014 manatee
drwxr-xr-x 1 poseidon             0 Jun 18  2013 medusa
drwxr-xr-x 1 poseidon             0 Aug 01  2013 minke
drwxr-xr-x 1 poseidon             0 Nov 07  2013 stud
[2018-07-19T21:18:49.480Z] DEBUG: mls/MantaClient/7054 on zathras.local (/Users/dap/install/node-v4.9.1-darwin-x64/lib/node_modules/manta/lib/client.js:887 in onEnd): get: done (req_id=dce478bd-6bc7-451b-ac2b-22d74d7bfd37, path=/poseidon/public)

From the output, we can see that this operation made two requests. The second one has x-server-name: 60771e58-2ad0-4c50-8b23-86b72f9307f8 and x-request-id: 4b4927be-fc1f-4dd8-88fe-2ae75dcbc262. You can now use these to start [_investigating_a_specific_request_that_has_failed].

If the problem isn’t quite so easily reproducible, but you suspect it still affects a variety of requests, you can use manta-mlive to generate more requests and collect debug output for all of them. For mlive, you’ll want to set LOG_LEVEL in the environment to generate the debug logging and you’ll likely want to redirect stderr to a file that you can search through later:

$ LOG_LEVEL=trace ./bin/mlive -S 2>debug.out
2018-07-19T21:24:01.307Z: reads okay, writes stuck (4/4 ok since start)
2018-07-19T21:24:02.310Z: all okay (17/17 ok since 2018-07-19T21:24:01.307Z)
^C

As before, you can use the bunyan tool to format the log:

$ bunyan debug.out
...
[2018-07-19T21:25:01.716Z] TRACE: mlive/MantaClient/9435 on zathras.local: request sent
    HEAD /dap/stor/mlive HTTP/1.1
    Host: us-central.manta.mnx.io:null
    accept: application/json, */*
    x-request-id: c317603c-82d4-4b2e-ac4b-066c9ece1864
    date: Thu, 19 Jul 2018 21:25:01 GMT
    authorization: Signature keyId="/dap/keys/56:f3:e1:56:3d:e6:f7:83:a9:ce:19:5d:62:ba:5c:1f",algorithm="rsa-sha1",headers="date",signature="oJZZIDh1qT8PeSSpz09bIzYT4LYK6rqXS2G5bHhh2r37SNOs0vBkFHUhfso6tSq1hmHIlkCEMXX9zGLIvYxQtHj6/KtiNgZgyWzGHms+qhc2gziXnOrMybxmWqJwipd8rAJCdDBV0B5FlCDeELWIA+1LifGDqqLdDZT4ScBUNOm9JG2+mha2U+pFbNtaXQQyyoPgopk+4ur4OHYpcaK/KY6WdC91quLIaIKV28VMtPoN/q/15lzRj6G6L7mbIMyd48ut0EbmTTR/CfYq9dquNsWDlyWgEJJVYyPZ9odAE34YQiYt/N4JXH7Crr9M6md9GtZonY+DbP8vvb5+7xr8dA=="
    user-agent: restify/1.4.1 (x64-darwin; v8/4.5.103.53; OpenSSL/1.0.2o) node/4.9.1
    accept-version: ~1.0
[2018-07-19T21:25:02.548Z] TRACE: mlive/MantaClient/9435 on zathras.local: Response received
    HTTP/1.1 200 OK
    last-modified: Tue, 16 Dec 2014 01:17:29 GMT
    content-type: application/x-json-stream; type=directory
    result-set-size: 45
    date: Thu, 19 Jul 2018 21:25:02 GMT
    server: Manta
    x-request-id: c317603c-82d4-4b2e-ac4b-066c9ece1864
    x-response-time: 462
    x-server-name: 39adec6c-bded-4a14-9d80-5a8bfc1121f9
    connection: keep-alive
    x-request-received: 1532035501703
    x-request-processing-time: 844

You can use grep, json, or other tools to filter the output for requests of interest (e.g., those with a particular HTTP response code or an x-response-time larger than some value). From the filtered results, you can identify an x-server-name and x-request-id and then see [_investigating_a_specific_request_that_has_failed].

Checking recent historical metrics

Locating log files

Real-time logs contain data from the current hour. These are typically stored as regular files within each zone. The specific file used varies by type of zone. See the Operator Guide for details. The manta-oneach command can be used as a low-level way to scan these real-time logs. For example, a common way to count recent 500-level errors in webapi logs is:

manta-oneach --service=webapi 'svcs -L muskie | xargs cat | grep -c "handled: 5"'

Since the real-time logs only store the current hour’s data, at 01:02Z, this would only scan 2 minutes worth of data. At 01:58Z, this would scan 58 minutes worth of data.

If we were looking for a specific request that took place at 01:37Z, then we’d look at the real-time log file immediately after the request happened until 02:00Z (i.e., for 23 minutes). After that, we’d have to look at the historical logs.

Historical logs are maintained by rotating the real-time logs at the top of each hour and then uploading them into Manta. Once in Manta, they’re stored under:

/poseidon/stor/logs/COMPONENT/YYYY/MM/DD/HH/SHORTZONE.log

where:

  • COMPONENT varies based on the component you’re looking for

  • YYYY, MM, DD, and HH represent the year, month, day, and hour for the entries in the log file

  • SHORTZONE is the first 8 characters of the zone’s uuid.

For example, to find the load balancer (haproxy) logs from zone f6817865-10fb-416c-a079-47941ac2aab4 from 2018-12-05T01:37Z, we would look for the object:

/poseidon/stor/logs/haproxy/2018/12/05/01/f6817865.log

You can scan a number of these logs at once using a compute job. For example, you could look for requests from IP address 10.1.2.3 in all of the load balancer logs from December 5, 2018 using:

mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 |
    mjob create -o -m 'grep 10.1.2.3'

You can adjust the mfind invocation as needed to scan a broader or more narrow range. You could also use the -n argument to mfind to select log files from a particular load balancer:

mfind -n f6817865.log -t o /poseidon/stor/logs/haproxy/2018/12/05 |
    mjob create -o -m 'grep 10.1.2.3'

When working with a Manta deployment where jobs aren’t functional, you can instead mget log files and do your filtering locally, like so:

mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 |
    while read f; do mget $f | grep 10.1.2.3; done

Note that the above sequence of commands will download the entirety of every log file found by mfind, which may occupy an inconvenient amount of time and disk space. To mitigate this, you can use head to fetch a smaller number of log entries, like so:

mfind -t o /poseidon/stor/logs/haproxy/2018/12/05 |
    while read f; do mget $f | head -n 1000 | grep 10.1.2.3; done

When the above command sequence is run, each invocation of mget will terminate once head has read 1000 lines.

Note: the archival process for historical logs first rotates the logs to new files under /var/log/manta/upload. A few minutes later, these are uploaded to Manta and then removed from the local filesystem. If the upload fails, the files are kept in /var/log/manta/upload for up to two days. In extreme situations where Manta has been down for over an hour, you may find recent historical log files in /var/log/manta/upload, and you can scan them similar to the live log files using manta-oneach.

Understanding a Muskie log entry

Muskie (the Manta API server) logs an audit entry for every request that it completes. These logs are useful for:

  • understanding how Manta handled a particular request (e.g., how long did it take? where was the time spent? what metadata shards were involved? what storage nodes were involved? what errors were encountered?)

  • understanding the workload Manta is serving (e.g., what percentage of requests are GETs? what percentage are failing? what’s the distribution of sizes for uploaded objects? which accounts are making the most requests?)

Note that for real-time incident response, it’s often faster to start with [_checking_recent_historical_metrics]. These logs are most useful for digging deeper into a particular request or the workload overall.

Muskie logs in bunyan format, which is a JSON-based format. You typically use the bunyan tool to view them. You can also use the json tool to filter and aggregate them.

For more information, see [_muskie_log_entry_properties].

If you have a particular Muskie log entry with an error message in it and you want to know more about the error, see [_details_about_specific_error_messages].

Contents of a GET log entry

Below is an example log entry for a GET request, formatted using the bunyan command-line tool. See [_muskie_log_entry_properties] for more details.

[2017-08-01T03:03:13.985Z]  INFO: muskie/HttpServer/79465 on 204ac483-7e7e-4083-9ea2-c9ea22f459fd: handled: 200 (audit=true, _audit=true, operation=getstorage, billable_operation=LIST, logicalRemoteAddress=172.27.4.22, remoteAddress=127.0.0.1, remotePort=64628, reqHeaderLength=754, resHeaderLength=269, err=false, latency=26, entryShard=tcp://3.moray.staging.joyent.us:2020, route=getstorage, req.owner=4d649f41-cf87-ca1d-c2c0-bb6a9004311d)
    GET /poseidon/stor/manta_gc/mako/1.stor.staging.joyent.us?limit=1024 HTTP/1.1
    accept: */*
    x-request-id: a080d88b-8e42-4a98-a6ec-12e1b0dbf612
    date: Tue, 01 Aug 2017 03:03:13 GMT
    authorization: Signature keyId="/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14",algorithm="rsa-sha256",headers="date",signature="Q74o9RHIwrDT15ogL2WeB/jankUIqJAtMM5t7+VzrHxzoB52/BoqEnq9uMY0wEvPJxv+Lf1VyLG/IBXCXeUx+fZlkhKWIWd2jkpLRdVLKwZ4nnqTfHM+YXhZ0vSN1X1W2demmnpPRTRK/RaG21pyvlbIrSTwI+N5MtKFDh9/4Ks43wSyM4MvqZZWywfs7LgKz7UtjL1Z+juhJDT8mrfQYCDpZw/NDhHmoslKsMFesMrMjPALy/CBSB23800+MhLiFB7LT0nTyCLonPBmIOjrQCZu99ICXbCxx096XCzZ2XBOK1Pe4eoDUHWx5ukTbCJV63QA+gvcvDCbS5BdDn0Xiw=="
    user-agent: restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32
    accept-version: ~1.0
    host: manta.staging.joyent.us
    connection: keep-alive
    x-forwarded-for: ::ffff:172.27.4.22
    --
    HTTP/1.1 200 OK
    last-modified: Sat, 22 Mar 2014 01:17:01 GMT
    content-type: application/x-json-stream; type=directory
    result-set-size: 1
    date: Tue, 01 Aug 2017 03:03:13 GMT
    server: Manta
    x-request-id: a080d88b-8e42-4a98-a6ec-12e1b0dbf612
    x-response-time: 26
    x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd
    --
    req.caller: {
      "login": "poseidon",
      "uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
      "groups": [
        "operators"
      ],
      "user": null
    }
    --
    req.timers: {
      "earlySetup": 32,
      "parseDate": 8,
      "parseQueryString": 28,
      "handler-3": 127,
      "checkIfPresigned": 3,
      "enforceSSL": 3,
      "ensureDependencies": 5,
      "_authSetup": 5,
      "preSignedUrl": 3,
      "checkAuthzScheme": 4,
      "parseAuthTokenHandler": 36,
      "signatureHandler": 73,
      "parseKeyId": 59,
      "loadCaller": 133,
      "verifySignature": 483,
      "parseHttpAuthToken": 5,
      "loadOwner": 268,
      "getActiveRoles": 43,
      "gatherContext": 27,
      "setup": 225,
      "getMetadata": 5790,
      "storageContext": 8,
      "authorize": 157,
      "ensureEntryExists": 3,
      "assertMetadata": 3,
      "getDirectoryCount": 7903,
      "getDirectory": 10245
    }

The raw JSON, formatted with the json tool, looks like this:

{
  "name": "muskie",
  "hostname": "204ac483-7e7e-4083-9ea2-c9ea22f459fd",
  "pid": 79465,
  "component": "HttpServer",
  "audit": true,
  "level": 30,
  "_audit": true,
  "operation": "getstorage",
  "billable_operation": "LIST",
  "logicalRemoteAddress": "172.27.4.22",
  "remoteAddress": "127.0.0.1",
  "remotePort": 64628,
  "reqHeaderLength": 754,
  "req": {
    "method": "GET",
    "url": "/poseidon/stor/manta_gc/mako/1.stor.staging.joyent.us?limit=1024",
    "headers": {
      "accept": "*/*",
      "x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
      "date": "Tue, 01 Aug 2017 03:03:13 GMT",
      "authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",headers=\"date\",signature=\"Q74o9RHIwrDT15ogL2WeB/jankUIqJAtMM5t7+VzrHxzoB52/BoqEnq9uMY0wEvPJxv+Lf1VyLG/IBXCXeUx+fZlkhKWIWd2jkpLRdVLKwZ4nnqTfHM+YXhZ0vSN1X1W2demmnpPRTRK/RaG21pyvlbIrSTwI+N5MtKFDh9/4Ks43wSyM4MvqZZWywfs7LgKz7UtjL1Z+juhJDT8mrfQYCDpZw/NDhHmoslKsMFesMrMjPALy/CBSB23800+MhLiFB7LT0nTyCLonPBmIOjrQCZu99ICXbCxx096XCzZ2XBOK1Pe4eoDUHWx5ukTbCJV63QA+gvcvDCbS5BdDn0Xiw==\"",
      "user-agent": "restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32",
      "accept-version": "~1.0",
      "host": "manta.staging.joyent.us",
      "connection": "keep-alive",
      "x-forwarded-for": "::ffff:172.27.4.22"
    },
    "httpVersion": "1.1",
    "owner": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
    "caller": {
      "login": "poseidon",
      "uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
      "groups": [
        "operators"
      ],
      "user": null
    },
    "timers": {
      "earlySetup": 32,
      "parseDate": 8,
      "parseQueryString": 28,
      "handler-3": 127,
      "checkIfPresigned": 3,
      "enforceSSL": 3,
      "ensureDependencies": 5,
      "_authSetup": 5,
      "preSignedUrl": 3,
      "checkAuthzScheme": 4,
      "parseAuthTokenHandler": 36,
      "signatureHandler": 73,
      "parseKeyId": 59,
      "loadCaller": 133,
      "verifySignature": 483,
      "parseHttpAuthToken": 5,
      "loadOwner": 268,
      "getActiveRoles": 43,
      "gatherContext": 27,
      "setup": 225,
      "getMetadata": 5790,
      "storageContext": 8,
      "authorize": 157,
      "ensureEntryExists": 3,
      "assertMetadata": 3,
      "getDirectoryCount": 7903,
      "getDirectory": 10245
    }
  },
  "resHeaderLength": 269,
  "res": {
    "statusCode": 200,
    "headers": {
      "last-modified": "Sat, 22 Mar 2014 01:17:01 GMT",
      "content-type": "application/x-json-stream; type=directory",
      "result-set-size": 1,
      "date": "Tue, 01 Aug 2017 03:03:13 GMT",
      "server": "Manta",
      "x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
      "x-response-time": 26,
      "x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
    }
  },
  "err": false,
  "latency": 26,
  "entryShard": "tcp://3.moray.staging.joyent.us:2020",
  "route": "getstorage",
  "msg": "handled: 200",
  "time": "2017-08-01T03:03:13.985Z",
  "v": 0
}

Contents of a PUT log entry

Below is an example log entry for a GET request, formatted using the json tool. See [_muskie_log_entry_properties] for more details.

{
  "name": "muskie",
  "hostname": "204ac483-7e7e-4083-9ea2-c9ea22f459fd",
  "pid": 79465,
  "component": "HttpServer",
  "audit": true,
  "level": 30,
  "_audit": true,
  "operation": "putdirectory",
  "billable_operation": "PUT",
  "logicalRemoteAddress": "172.27.3.22",
  "reqHeaderLength": 655,
  "req": {
    "method": "PUT",
    "url": "/poseidon/stor/logs/config-agent/2017/08/01/02",
    "headers": {
      "user-agent": "curl/7.37.0",
      "host": "manta.staging.joyent.us",
      "accept": "*/*",
      "date": "Tue, 01 Aug 2017 03:01:10 GMT",
      "authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",signature=\"VkRkcUK7Y796whM3/IsAl+wVvsu9pKwVGNHIHxLqeBtJZqrR+cbgWZ/E9uchhsxsMezvVXVN7hMXhiSxlfnGJKjPoTKJzfJNSW8WEUhu7rMilRi9WkYGvxo/PpdplK0/Evx1dvxHSX2TiAoTgBs5s6IyP7j6LgySfDu6TzJu/9HJdLzIwAf/TTiHU15okOUoJGbcNb+OcGN/mp+EZpYbNbJ8+I585v1ZLTuta1eAPngUPWp5E7Vm5sUpJH87/8bx2H/3HaMB9YCCacorZ7NkVS5Mbiaz0ptYYEESj8DCJScKnEVrM/L97zGuTPOnQ38Il/CZfENAP7ZH2u029h3WSg==\"",
      "connection": "close",
      "content-type": "application/json; type=directory",
      "x-forwarded-for": "::ffff:172.27.3.22"
    },
    "httpVersion": "1.1",
    "owner": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
    "caller": {
      "login": "poseidon",
      "uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
      "groups": [
        "operators"
      ],
      "user": null
    },
    "timers": {
      "earlySetup": 94,
      "parseDate": 45,
      "parseQueryString": 32,
      "handler-3": 268,
      "checkIfPresigned": 8,
      "enforceSSL": 7,
      "ensureDependencies": 9,
      "_authSetup": 10,
      "preSignedUrl": 7,
      "checkAuthzScheme": 8,
      "parseAuthTokenHandler": 78,
      "signatureHandler": 155,
      "parseKeyId": 166,
      "loadCaller": 346,
      "verifySignature": 1164,
      "parseHttpAuthToken": 12,
      "loadOwner": 234,
      "getActiveRoles": 43,
      "gatherContext": 28,
      "setup": 315,
      "getMetadata": 13345,
      "storageContext": 14,
      "authorize": 409,
      "ensureParent": 222,
      "mkdir": 841
    }
  },
  "resHeaderLength": 215,
  "res": {
    "statusCode": 204,
    "headers": {
      "connection": "close",
      "last-modified": "Tue, 01 Aug 2017 03:01:01 GMT",
      "date": "Tue, 01 Aug 2017 03:01:11 GMT",
      "server": "Manta",
      "x-request-id": "ac2a5780-7665-11e7-b9e8-cf86a4bf1253",
      "x-response-time": 18,
      "x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
    }
  },
  "latency": 18,
  "entryShard": "tcp://3.moray.staging.joyent.us:2020",
  "parentShard": "tcp://2.moray.staging.joyent.us:2020",
  "route": "putdirectory",
  "msg": "handled: 204",
  "time": "2017-08-01T03:01:11.048Z",
  "v": 0
}

Understanding latency for a specific request

Finding a load balancer log entry

When to investigate the load balancer

All HTTP requests to Manta travel through an haproxy-based load balancer (in a component sometimes called "muppet") before reaching the Manta API ("webapi" or "muskie"). This load balancer is one of the first components that processes these requests when they arrive at Manta. For many problems internal to Manta, it’s more useful to look at log entries at Muskie (webapi). However, there are a several cases where it’s helpful to investigate the load balancer:

  • when investigating a problem where the client reports not having received a normal HTTP response (e.g., a "connection refused", "connection reset", or a connection timeout)

  • when investigating a problem where the client reports having received a 500-level error with no "x-server-name" header. (This generally indicates the response was sent by the load balancer, which happens when Muskie sends an invalid response or fails to send a response within a given timeout.)

  • when investigating a problem where Muskie reports surprising client behavior (e.g., client closed its connection mid-upload, or Muskie timed out waiting for a client to either upload or download data)

  • when investigating a failed request for which there appears to be no Muskie log entry at all

  • to identify the source IP address of a client in a case where Muskie fails to report that in its own log entry

Generally, if a client receives a well-formed response from Manta, the Muskie logs have more useful information than the load balancer logs. In these other cases where either the client or Muskie is doing something surprising, the load balancer log entries can provide more information about exactly what happened.

There’s a major caveat to the load balancer logs: haproxy is only able to interpret HTTP-level information about the first request on each TCP connection. For clients using HTTP keep-alive, where multiple HTTP requests are sent sequentially over a TCP connection, you may not find information in the haproxy logs about requests after the first one.

Finding load balancer log entries

First, see [_locating_log_files] for information on where to find real-time and historical log files. For the load balancer, real-time log files are stored in /var/log/haproxy.log. Historical log files are stored in Manta under /poseidon/stor/logs/haproxy.

Usually when you’re looking for a load balancer log entry, you know one or more of the following:

  • an approximate time of the request (almost always necessary)

  • the URL that was requested

  • the remote IP address that made the request

  • the status code that was returned

  • that the request experienced an unusual HTTP exchange (e.g., a malformed server response, a client timeout, or a server timeout)

  • the particular load balancer that handled the request

Since these are plaintext logs, you can use grep or awk to filter or summarize them. See [_understanding_a_load_balancer_log_entry] for more.

Often you don’t know which load balancer handled a particular request. In that case, you need to scan all of them for a given time. That might involve manta-oneach or a compute job. Again, see [_locating_log_files].

Understanding a load balancer log entry

The load balancer logs in a plaintext format described in the haproxy documentation. (The haproxy documentation is also plaintext so it’s not possible to link directly to the right section, but look for the section called "Log format".) Our load balancer logs these through syslog, which prepends a few fields.

Here’s an example entry from our load balancer logs:

2018-12-05T18:32:01+00:00 42563f8d-4d61-4045-ab87-c71560388399 haproxy[65158]: ::ffff:72.2.115.97:42121 [05/Dec/2018:18:30:01.859] https secure_api/be6 2/0/0/-1/120005 502 245 - - SH-- 247/192/240/19/0 0/0 "GET /thoth/stor/thoth?limit=1024 HTTP/1.1"

We have a tool called haplog for converting our haproxy log entries into JSON. Often, the easiest way to filter and summarize these log entries is to pass the log through haplog, use json to extract the relevant fields, and then use grep or awk to summarize. We can use it like this:

$ haplog 42563f8d.log | json
...
{
  "syslog_date": "2018-12-05T18:32:01.000Z",
  "syslog_hostname": "42563f8d-4d61-4045-ab87-c71560388399",
  "pid": 65158,
  "client_ip": "72.2.115.97",
  "client_port": 42121,
  "accept_date": "05/Dec/2018:18:30:01.859",
  "frontend_name": "https",
  "backend_name": "secure_api",
  "server_name": "be6",
  "Tq": 2,
  "Tw": 0,
  "Tc": 0,
  "Tr": -1,
  "Tt": 120005,
  "status_code": 502,
  "bytes_read": 245,
  "termination_state": {
    "raw": "SH--",
    "termination_cause": "BACKEND_ABORT",
    "state_at_close": "WAITING_FOR_RESPONSE_HEADERS",
    "persistence_cookie_client": "N/A",
    "persistence_cookie_server": "N/A"
  },
  "actconn": 247,
  "feconn": 192,
  "beconn": 240,
  "srv_conn": 19,
  "retries": 0,
  "srv_queue": 0,
  "backend_queue": 0,
  "http_request": "GET /thoth/stor/thoth?limit=1024 HTTP/1.1"
}

There’s quite a lot of information here! Among the most relevant bits:

Example value haplog JSON field name Meaning

"2018-12-05T18:32:01.000Z"

syslog_date

The timestamp when this entry was logged. This usually corresponds to the end of an HTTP request or TCP connection. This is very useful for constructing timelines of what happened.

"42563f8d-4d61-4045-ab87-c71560388399"

syslog_hostname

On our systems, this is the zonename of the particular load balancer that handled this request.

"72.2.115.97"

client_ip

The (remote) IP address of the client that connected to Manta.

"05/Dec/2018:18:30:01.859"

accept_date

The timestamp when the TCP connection was accepted by the load balancer. This is very useful for constructing timelines of what happened.

"be6"

server_name

On our system, this is a unique identifier that indicates which Muskie zone handled this request. This identifier varies between load balancer zones and over time. In order to know which Muskie zone this corresponds to, you need to find the corresponding line in the haproxy log file (at /opt/smartdc/muppet/etc/haproxy.cfg). This will contain the IP address of the Muskie zone that handled the request. This is useful for understanding the flow of the request through Manta.

"SH--"

termination_state

This is a four-digit code that describes how the TCP session finally terminated. This is among the most useful fields for understanding abnormal behavior from the client or Muskie. This code can be used to tell whether the client or server either did something unexpected (like closed the TCP connection) or stopped responding. For details, on what each code means, see the haproxy documentation linked above.

"GET /thoth/stor/thoth?limit=1024 HTTP/1.1"

http_request

The first line of the HTTP request, which contains the HTTP method and request URL.

There are a few telltale symptoms here:

  • The elapsed time between accept_date and syslog_date is exactly two minutes. The load balancer has a two-minute timeout for Muskie responding to requests.

  • The termination status SH-- is documented (in the haproxy docs) to mean:

    SH The server aborted before sending its full headers, or it crashed.

This entry appears to reflect Muskie closing its TCP connection (without sending an HTTP response) after exactly two minutes. If we didn’t know what happened here, we at least know now that Muskie did something unexpected and not the client, and we also know which Muskie zone it was.

Build a request timeline

Building a request timeline can be incredibly valuable for understanding why a particular request failed or why a request took as long as it did. Timelines are especially useful for understanding apparently mysterious failure modes like request timeouts, "socket hang-up" errors, "connection reset" errors, and the like.

A request timeline is simply a list of events related to a request and the precise timestamps when they were reported. You can include as much or as little detail as is relevant to your problem. It’s common to focus on the phases that happen inside Muskie. These are the simplest to collect and they’re often sufficient when failure modes are explicit (e.g., an authentication failure or a 404 response). A pretty complete request timeline could include a lot more than that about what happens at the load balancer and storage nodes.

Note
Keep in mind that wall clocks vary across physical machines. When comparing timestamps recorded from different systems, beware that differences may simply be a result of differences in the wall clock time on those systems. Within Manta, clocks are usually synchronized to within a few hundred milliseconds. (You can check this for the servers you’re looking at using NTP tools.) Timestamps reported by clients can be much further off, but if they’re also using NTP to synchronize their clocks, they may well be pretty close.

A brief review of request handling

A typical GET request for an object stored in Manta runs as follows:

  • The client establishes a TCP connection to the Manta load balancer. As part of this process, the load balancer establishes a TCP connection to a Muskie instance to handle any requests that arrive over this connection.

  • The client constructs an HTTP request and sends it over the TCP connection.

  • The load balancer forwards the request to Muskie.

  • Muskie begins processing the request.

    • There are a number of (usually very fast) setup phases within Muskie.

    • Muskie authenticates the request. This may involve fetching authentication and authorization information from Mahi, in which case an HTTP request will be sent over the network to Mahi to fetch basic information about the account.

    • Muskie fetches metadata associated with the object. This step involves a few network RPC calls to the metadata tier: first to an Electric-Moray instance, which forwards the request to a Moray instance, which executes a SQL query against a PostgreSQL primary.

    • Muskie authorizes the request. This involves comparing the authenticated user’s credentials to the credentials required to access the object (specified in its metadata).

    • Muskie makes an HTTP request to a storage node hosting a copy of the object.

    • Assuming this request succeeds, Muskie reports a successful response header to the client. Then data is sent from the storage node, through Muskie, through the load balancer, and to the client.

    • The request concludes when the last data is sent to the client.

PUT requests to upload objects are similar except that there’s an additional metadata RPC after all the data has streamed to the storage nodes. Other types of requests (e.g., creation and deletion of directories) are largely similar, but generally don’t involve storage nodes.

This is a simplification. For details, see the Muskie source code. (Don’t be afraid to crack it open!)

There are 1-2 dozen phases of request handling within Muskie, but most of the elapsed time of a request happens in only a handful of phases that relate to making requests to external services. These are described below.

Key events during a request lifetime

The table below explains a number of events that happen while processing a request and where you can find more information about it. Many of these entries refer to entries in logs that are documented elsewhere in this guide. See [_understanding_a_muskie_log_entry] and [_understanding_a_load_balancer_log_entry].

Remember, it’s not necessary to collect all of these to start! Start with the basics and flesh out what looks relevant. Some of what’s below won’t apply to every request. However, if you’re stumped about a strange failure mode, it’s often helpful to construct a pretty complete timeline, as you’ll often find surprising gaps or unusual intervals (e.g., exactly 60 seconds from when something started until when some obscure error message was reported, which might suggest a timeout).

If you don’t know where to start, consider a timeline that just includes:

  • when the request was constructed on the client

  • when the request was received by the load balancer (if possible)

  • when the request was received by Muskie

  • when the metadata phases of request handling were completed in Muskie

  • when the request was completed by Muskie

It’s common to start there, skim the req.timers field (mentioned below) to look for unusually long phases (e.g., those taking upwards of 1 second), and add those to the timeline as needed.

Event Component where it happens Where to find the timestamp Notes

Client establishes a TCP connection to the load balancer.

Load balancer

Load balancer log entry, accept_date field. Note that this is not the date at the very start of the load balancer log entry.

Both the client and server may use timeouts that measure from this timestamp. It can be helpful to check for intervals of round numbers (e.g., exactly 1 or 2 minutes).

Client constructs the HTTP request.

Client (outside Manta)

Muskie log entry, req.headers.date (the Date header supplied by the client with the HTTP request)

Large gaps between when the client constructed the request and when Muskie began processing it could indicate queueing in Muskie or somewhere in the network before reaching Muskie.

Muskie begins processing the request.

Muskie

Muskie log entry. There’s no discrete field for this, but you can compute this by taking the timestamp when Muskie completed the request (see below) and subtracting the total latency of the request (including time to transfer data). To calculate the total latency, sum all the timers in req.phases.

This is a very useful point at which to divide an investigation, since large gaps in time before this point indicate queueing prior to the request reaching Muskie, which generally involve different tools and data sources than latency after this point.

Muskie makes HTTP request to Mahi (authcache) to load information needed to authenticate and authorize this request.

Muskie

Muskie log entry, req.timers. These requests generally occur during the loadCaller and loadOwner phases, though they can also happen during parseHttpAuthToken (if the x-auth-token header is provided by the client), or getActiveRoles. You can generally infer that if these HTTP requests were made, they started after one of these phases began and completed before the same phase ended. To determine the actual start and end timestamps, you have compute them based on the latency of the previous phases and the time when Muskie began processing the request (or the latency of the subsequent phases and the time when Muskie finished processing the request).

Most requests do not actually involve any of these HTTP requests because the information is usually cached in Muskie. However, latency here may indicate a problem with Mahi or the network.

Muskie makes RPCs to load metadata.

Muskie

Muskie log entry, req.timers. These requests most commonly occur during the getMetadata phase. For the actual start and end timestamps, you must compute them based on the latency of the other phases and the time the request completed (or finished).

High latency here indicates a slow read RPC to the metadata tier.

Muskie issues HTTP requests to storage nodes.

Muskie

Muskie log entry, req.timers and req.sharksContacted. For GET requests, this happens during the streamFromSharks phase. For PUT requests, this happens during the startSharkStreams phase. Since multiple storage nodes may be contacted, sometimes in parallel and sometimes sequentially, the sharksContacted field has more fine-grained information about the time for each particular storage node.

This step is often forgotten, but it’s important to complete before Muskie sends successful response headers to the client. High latency between when these requests are issued and when storage nodes send headers back may indicate a DNS problem, a network problem, or a storage node that’s offline or overloaded.

Muskie sends response headers.

Muskie

Muskie log entry, latency field (elapsed time in milliseconds, which must be added to the initial timestamp)

Sometimes called latency-to-first-byte, this is a significant point in the request because until this point, the client has heard nothing from the server while the server authenticates the client, loads metadata, and authorizes the request. Many clients have timeouts (sometimes as short as a few seconds) from when they send the request until they see response headers. If there’s a long gap in this range and then Muskie subsequently reports a client error (e.g., timed out waiting for the client or a closed connection from the client), the client may have given up.

Also, the work associated with the rest of this request depends on how large the end user’s object is and how much network bandwidth they have available to Manta. As a result, it’s common to summarize this latency-to-first-byte rather than the whole latency of the request, as this number is more comparable across different types of requests.

Muskie begins streaming data between storage nodes and the client.

Muskie

Muskie log entry, req.timers field. For GET operations, the phase where data is transferred is called streamFromSharks. For PUT operations, the phase is called sharkStreams.

This should be about the same time as the response headers are sent.

Storage nodes process HTTP requests

Mako

mako-access log. Entries in the mako-access log include both a timestamp when they completed and a latency, which allows you to compute the end timestamp.

High latency between when Muskie began transferring data to or from storage nodes and when storage nodes report having started may indicate overload on the storage node or a network problem.

Muskie finishes streaming data between storage nodes and the client.

Muskie

See the row above where Muskie begins streaming data. The end of the corresponding phase indicates when this finished.

The time for this phase is highly variable depending on how much data is requested and the bandwidth available to the client. It’s more useful to compute throughput (as the total bytes sent divided by the latency of this phase) than look at the latency directly. Low throughput here can be indicative of almost anything: a slow client, a slow network, overloaded Muskie, or an overloaded storage node.

Muskie makes RPCs to save metadata.

Muskie

Muskie log entry, req.timers. These requests most commonly occur during the saveMetadata phase. For the actual start and end timestamps, you must compute them based on the latency of the other phases and the time the request completed (or finished).

Another possible indicator of this timestamp may be given by the mtime stored in the object metadata, assuming it hasn’t been changed since the request you’re investigating completed.

High latency here indicates a slow write RPC to the metadata tier.

Load balancer indicates the TCP session ended.

Load balancer

Load balancer log, syslog_date field.

This is generally the last time the client heard anything from the server associated with this request. Normally, this indicates the end of the request. In pathological cases where metadata operation takes several minutes, the load balancer may terminate the request, logging this entry and sending a failure response to the client, even though Muskie is still processing the request. This is typically followed by a Muskie log entry with a timestamp after this point. The latency of the various phases in the Muskie request point to what took so long.

There’s loads more information available in the system. Depending on your problem, you may need to get more creative. Examples:

  • The approximate time when any system (or zone) halted or booted can be determined from the output of last reboot, uptime, SMF service times, or process start times.

  • The precise time when a process started can be extracted from ps(1).

  • The precise time when a system panicked can be extracted from a crash dump from the panic_hrestime and panic_hrtime variables.

  • The time that a process crashed (dumped core) can be inferred from the mtime of the core file or entries in the SMF log file.

Example request timeline

Let’s make our own test request and make a timeline for it:

$ mget -v /dap/stor/cmd.tgz 2> >(bunyan) > /dev/null
...
[2019-05-22T18:22:55.329Z] TRACE: mget/MantaClient/12438 on blinky.local (/Users/dap/install/lib/node_modules/manta/node_modules/restify-clients/lib/HttpClient.js:210 in onResponse): Response received (client_res={})
    HTTP/1.1 200 OK
    etag: cb1036e4-3b57-c118-cd46-961f6ebe12d0
    last-modified: Tue, 08 Apr 2014 21:56:13 GMT
    accept-ranges: bytes
    content-type: application/octet-stream
    content-md5: vVRjo74mJquDRsoW2HJM/g==
    content-length: 17062152
    durability-level: 2
    date: Wed, 22 May 2019 18:22:55 GMT
    server: Manta
    x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
    x-response-time: 187
    x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd
    connection: keep-alive
    x-request-received: 1558549375054
    x-request-processing-time: 275

With the debug output from the client command, we can easily find the Muskie instance (204ac483-7e7e-4083-9ea2-c9ea22f459fd) that handled this request and request-id (0b241d9c-d076-4c9b-b954-3b65adb73c73). (This is just an example. If you don’t have these in your case, see [_finding_or_generating_a_failed_request].)

To find the Muskie log entry, we must first find Muskie zone 204ac483-7e7e-4083-9ea2-c9ea22f459fd. We can see which datacenter it’s in using manta-adm show:

[root@headnode (staging-1) ~]# manta-adm show -a webapi
SERVICE          SH DATACENTER ZONENAME
webapi            1 staging-1  380920d9-ed44-4bcd-b61c-4b99f49c1329
webapi            1 staging-2  204ac483-7e7e-4083-9ea2-c9ea22f459fd
webapi            1 staging-3  6e59a763-6f6a-46a1-926e-90c1b7fc370b

It’s in staging-2. Following the instructions above, we can search for the log entry for this request:

[root@headnode (staging-2) ~]# manta-oneach -z 204ac483-7e7e-4083-9ea2-c9ea22f459fd 'grep 0b241d9c-d076-4c9b-b954-3b65adb73c73 /var/log/muskie.log'
SERVICE          ZONE     OUTPUT
webapi           204ac483 {"name":"muskie","hostname":"204ac483-7e7e-4083-9ea2-c9ea22f459fd","pid":969231,"component":"HttpServer","audit":true,"level":30,"_audit":true,"operation":"getstorage","billable_operation":"GET","bytesTransferred":"17062152","logicalRemoteAddress":"172.20.5.18","remoteAddress":"127.0.0.1","remotePort":53399,"reqHeaderLength":503,"req":{"method":"GET","url":"/dap/stor/cmd.tgz","headers":{"accept":"*/*","x-request-id":"0b241d9c-d076-4c9b-b954-3b65adb73c73","date":"Wed, 22 May 2019 18:22:55 GMT","authorization":"Signature keyId=\"/dap/keys/c0:24:6d:54:c4:5e:72:15:b5:65:21:80:69:81:cb:14\",algorithm=\"ecdsa-sha256\",headers=\"date\",signature=\"MEUCIBaFtaKwO2XwOfJGXPw2ITfr1C2v7fHi0lMcsCa4049FAiEAkaZ/oPTYNbYdVvAXJM9pFq3rJgZB2F7gUQjKu2UG1m4=\"","user-agent":"restify/1.4.1 (x64-darwin; v8/3.14.5.9; OpenSSL/1.0.1t) node/0.10.45","accept-version":"~1.0","host":"manta.staging.joyent.us","connection":"keep-alive","x-forwarded-for":"::ffff:172.20.5.18"},"httpVersion":"1.1","owner":"bc8cd146-fecb-11e1-bd8a-bb6f54b49808","caller":{"login":"dap","uuid":"bc8cd146-fecb-11e1-bd8a-bb6f54b49808","groups":[],"user":null},"timers":{"earlySetup":99,"parseDate":30,"parseQueryString":35,"handler-3":202,"checkIfPresigned":10,"enforceSSL":8,"ensureDependencies":10,"_authSetup":9,"preSignedUrl":7,"checkAuthzScheme":14,"parseAuthTokenHandler":300,"signatureHandler":152,"parseKeyId":149,"loadCaller":3396,"verifySignature":4258,"parseHttpAuthToken":15,"loadOwner":4353,"getActiveRoles":192,"gatherContext":34,"setup":303,"getMetadata":169779,"storageContext":17,"authorize":201,"ensureEntryExists":5,"assertMetadata":3,"getDirectoryCount":3,"getDirectory":3,"negotiateContent":65,"checkIfMatch":10,"checkIfNoneMatch":5,"checkIfModified":10,"checkIfUnmodified":5,"verifyRange":4,"streamFromSharks":576907}},"resHeaderLength":369,"res":{"statusCode":200,"headers":{"etag":"cb1036e4-3b57-c118-cd46-961f6ebe12d0","last-modified":"Tue, 08 Apr 2014 21:56:13 GMT","accept-ranges":"bytes","content-type":"application/octet-stream","content-md5":"vVRjo74mJquDRsoW2HJM/g==","content-length":"17062152","durability-level":2,"date":"Wed, 22 May 2019 18:22:55 GMT","server":"Manta","x-request-id":"0b241d9c-d076-4c9b-b954-3b65adb73c73","x-response-time":187,"x-server-name":"204ac483-7e7e-4083-9ea2-c9ea22f459fd"}},"latency":187,"objectId":"cb1036e4-3b57-c118-cd46-961f6ebe12d0","sharksContacted":[{"shark":"1.stor.staging.joyent.us","result":"ok","timeToFirstByte":2,"timeTotal":577,"_startTime":1558549375336}],"entryShard":"tcp://3.moray.staging.joyent.us:2020","route":"getstorage","_auditData":true,"dataLatency":762,"dataSize":17062152,"latencyToFirstByte":187,"msg":"handled: 200","time":"2019-05-22T18:22:55.914Z","v":0}

or, formatted with bunyan:

[2019-05-22T18:22:55.914Z]  INFO: muskie/HttpServer/969231 on 204ac483-7e7e-4083-9ea2-c9ea22f459fd: handled: 200 (audit=true, _audit=true, operation=getstorage, billable_operation=GET, bytesTransferred=17062152, logicalRemoteAddress=172.20.5.18, remoteAddress=127.0.0.1, remotePort=53399, reqHeaderLength=503, resHeaderLength=369, latency=187, objectId=cb1036e4-3b57-c118-cd46-961f6ebe12d0, entryShard=tcp://3.moray.staging.joyent.us:2020, route=getstorage, _auditData=true, dataLatency=762, dataSize=17062152, latencyToFirstByte=187, req.owner=bc8cd146-fecb-11e1-bd8a-bb6f54b49808)
    GET /dap/stor/cmd.tgz HTTP/1.1
    accept: */*
    x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
    date: Wed, 22 May 2019 18:22:55 GMT
    authorization: Signature keyId="/dap/keys/c0:24:6d:54:c4:5e:72:15:b5:65:21:80:69:81:cb:14",algorithm="ecdsa-sha256",headers="date",signature="MEUCIBaFtaKwO2XwOfJGXPw2ITfr1C2v7fHi0lMcsCa4049FAiEAkaZ/oPTYNbYdVvAXJM9pFq3rJgZB2F7gUQjKu2UG1m4="
    user-agent: restify/1.4.1 (x64-darwin; v8/3.14.5.9; OpenSSL/1.0.1t) node/0.10.45
    accept-version: ~1.0
    host: manta.staging.joyent.us
    connection: keep-alive
    x-forwarded-for: ::ffff:172.20.5.18
    --
    HTTP/1.1 200 OK
    etag: cb1036e4-3b57-c118-cd46-961f6ebe12d0
    last-modified: Tue, 08 Apr 2014 21:56:13 GMT
    accept-ranges: bytes
    content-type: application/octet-stream
    content-md5: vVRjo74mJquDRsoW2HJM/g==
    content-length: 17062152
    durability-level: 2
    date: Wed, 22 May 2019 18:22:55 GMT
    server: Manta
    x-request-id: 0b241d9c-d076-4c9b-b954-3b65adb73c73
    x-response-time: 187
    x-server-name: 204ac483-7e7e-4083-9ea2-c9ea22f459fd
    --
    sharksContacted: [
      {
        "shark": "1.stor.staging.joyent.us",
        "result": "ok",
        "timeToFirstByte": 2,
        "timeTotal": 577,
        "_startTime": 1558549375336
      }
    ]
    --
    req.caller: {
      "login": "dap",
      "uuid": "bc8cd146-fecb-11e1-bd8a-bb6f54b49808",
      "groups": [],
      "user": null
    }
    --
    req.timers: {
      "earlySetup": 99,
      "parseDate": 30,
      "parseQueryString": 35,
      "handler-3": 202,
      "checkIfPresigned": 10,
      "enforceSSL": 8,
      "ensureDependencies": 10,
      "_authSetup": 9,
      "preSignedUrl": 7,
      "checkAuthzScheme": 14,
      "parseAuthTokenHandler": 300,
      "signatureHandler": 152,
      "parseKeyId": 149,
      "loadCaller": 3396,
      "verifySignature": 4258,
      "parseHttpAuthToken": 15,
      "loadOwner": 4353,
      "getActiveRoles": 192,
      "gatherContext": 34,
      "setup": 303,
      "getMetadata": 169779,
      "storageContext": 17,
      "authorize": 201,
      "ensureEntryExists": 5,
      "assertMetadata": 3,
      "getDirectoryCount": 3,
      "getDirectory": 3,
      "negotiateContent": 65,
      "checkIfMatch": 10,
      "checkIfNoneMatch": 5,
      "checkIfModified": 10,
      "checkIfUnmodified": 5,
      "verifyRange": 4,
      "streamFromSharks": 576907
    }

Similarly, we can look for a load balancer log entry, which gets us this:

2019-05-22T18:22:56+00:00 710f85a9-6f13-4fcf-8f3f-a03c4a508714 haproxy[550758]: ::ffff:172.20.5.18:56707 [22/May/2019:18:22:55.125] https secure_api/be1 6/0/0/188/934 200 17062612 - - ---- 25/8/24/8/0 0/0 "GET /dap/stor/cmd.tgz HTTP/1.1"

From these log entries, we can put together this timeline:

18:22:55    Z  Client generates "Date" header for request
               (source: Muskie log entry, request headers)

18:22:55.125Z  Load balancer haproxy accepts TCP request from client (172.20.5.18)
               (source: Load balancer log entry, accept timestamp)

18:22:55.153Z  Muskie begins processing request
               (source: Muskie log entry timestamp minus the sum of all handler times)

18:22:55.166Z  Muskie begins loading metadata for request
               (source: Muskie log entry, inferred wall time for "getMetadata" handler start)

18:22:55.336Z  Muskie begins HTTP request to storage node "1.stor.staging.joyent.us"
               (source: Muskie log entry, "sharksContacted" object)

18:22:55.337Z  Muskie begins streaming data from storage node
               (source: Muskie log entry, inferred wall time for "streamFromSharks" handler start)

18:22:55.913Z  Muskie finishes streaming data from storage node
               (source: Muskie log entry, "sharksContacted" object)

18:22:55.914Z  Muskie records the request completed
               (source: Muskie log entry timestamp)

18:22:56    Z  Load balancer haproxy terminates session
               (source: haproxy log message timestamp)

18:22:56.059Z  Load balancer haproxy terminates session
               (source: accept time (see above) plus 934ms total latency reported by haproxy)

Details about specific error messages

"Request has exceeded …​ bytes"

You may observe any of the following symptoms:

  • a 413 ("Request Entity Too Large") response code

  • an error with name MaxContentLengthExceededError

  • an error with a message like request has exceeded 11 bytes (the actual number of bytes varies)

  • a Muskie log entry showing MaxSizeExceededError

All of these indicate that the client attempted a streaming upload but attempted to upload more bytes than Manta expected. If the max-content-length header was provided, its value was too small for the content that was uploaded. If the header was not provided, then the default value picked for the maximum streaming upload size was not sufficient. In both cases, the issue can be resolved by having the client provide a max-content-length that’s at least as large as the object that it’s attempting to upload.

Here’s an example that demonstrates the case:

$ echo 1234567890 | mput -q -H 'max-content-length: 7' /dap/public/foo
mput: MaxContentLengthExceededError: request has exceeded 11 bytes
$

See the section on [_not_enough_free_space_for_mb] for more details on streaming uploads and the max-content-length header. Note that the error message incorrectly implies that 11 bytes was the limit. This is a bug. The actual limit was whatever was specified by the max-content-length header or else a default value described in that linked section.

Note that for non-streaming uploads, the failure mode for attempting to upload too many bytes is less well-defined, as this is likely to break the framing of HTTP requests on the TCP connection.

"No storage nodes available for this request"

Certain underlying conditions (described below) result in the following symptoms:

  • an error (reported either from the client or server) with message "No storage nodes available for this request"

  • a 503 response code (reported either from the client or server), likely having a "Retry-After" header

  • a client-reported error called "InternalError"

  • a server-reported error called "SharksExhaustedError"

If you see these symptoms, read on for more information.

Note
This section does not apply to all 503 response codes nor all instances of InternalError. However, the condition described here always produces a 503 response code, an InternalError reported to the client, and a SharksExhaustedError reported internally, and both the client and the server errors always include this message.

These symptoms occur when Manta gave up on an object upload request (i.e., a PUT request) while trying to connect to storage nodes to store copies of the requested object. In many cases, retrying the request is likely to succeed because different storage nodes will be selected that are likely to be available. For that reason, the response generally includes a Retry-After HTTP header. If the problem is persistent, there may be an unusual problem affecting a lot of storage nodes (e.g., a whole datacenter partition).

More specifically: for each upload request, Manta selects up to three sets of storage nodes that will be used to store copies of the object. Each set contains as many storage nodes as there will be copies of the object (two by default, but this can be overridden by the client). Manta first attempts to initiate an object upload to all the storage nodes in the first set. If any of those fails, it moves on to the second set, and so on until all sets are exhausted, at which point the request fails.

Based on the way this works, it’s possible for Manta to report this failure even when most of the system is functioning normally — it only takes three storage node failures. Keep in mind, though, that when Manta selects storage nodes for these sets (before even attempting to connect to them), it only selects storage nodes that have reported their own capacity relatively recently (typically within the last minute or so), so it’s expected that most of them should be online and functioning. That’s why retries are likely to succeed and why persistent occurrence of this problem may indicate a more unusual network issue.

To understand the problem in more detail:

Using the sharksContacted property, you can [_build_a_request_timeline]. You should be able to tell from this property which storage nodes were part of which set (e.g., the first set of two, the second set of two, and so on). This will allow you to confirm that Manta failed to reach at least one of the storage nodes in each set, and it will indicate which storage nodes it failed to contact.

What if there are fewer than three sets of storage nodes? This can happen when there aren’t enough storage nodes meeting the desired storage requirements (e.g., that copies are in different datacenters). In small deployments, this can lead to higher error rates than expected.

Once you’ve determined which storage nodes Manta failed to reach, you can dig further into why Manta failed to reach them. This is largely beyond the scope of this guide, but below are some questions you might start with:

  • Was the storage node actually online at this time? You may be able to use logs in the zone (e.g., /var/log/mako-access.log) or the output of commands like last reboot or uptime to demonstrate that the zone was online (or not). Similarly, you could look for evidence that the server itself was online using logs or these same commands from the global zone. You could look for crash dumps (indicating an operating system panic) in /var/crash/volatile in the global zone or looking at logs in /var/log in the global zone.

  • Was the storage node reachable at this time? This can be difficult to debug after the fact, but you can look for evidence in some of these same logs (e.g., /var/log/mako-access.log). Evidence of serving requests indicates at least some remote systems were able to reach it. A sudden drop in request activity is consistent with network problems. However, many of the symptoms of network issues (including disconnects, timeouts, and reset connections) are also consistent with systemic problems that caused important programs to stop running for an extended period (e.g., major I/O problems or problems with the operating system).

If you’re debugging the problem right after it happened or if it’s currently reproducible, these questions are easier to answer. For example, if you can log into a zone, then it’s not offline. You can check for network connectivity with tools like ping(1).

"Not enough free space for …​ MB"

This message (associated with 507 errors) indicates that Manta does not have enough space available on enough storage nodes for the write that was requested. This would be surprising in production environments, although it’s easy to induce even in production by requesting an absurd size. For example, you’ll see this if you attempt to upload an enormous object:

$ mput -H 'max-content-length: 1125899906842624' /dap/stor/waytoobig
mput: NotEnoughSpaceError: not enough free space for 1073741824 MB

Manta determines a maximum object size for each upload request and validates that enough storage is available to satisfy the request. When there aren’t enough storage nodes that have the required amount of space, the request fails with this message. (A request will also fail if the client attempts to upload more than the maximum expected size, but that produces a different error.)

In determining the maximum object size for an upload request, Manta supports two kinds of uploads: fixed-size and streaming. A fixed-size upload is one that specifies the size of the object using the content-length request header. Manta uses this value directly as the maximum object size. A streaming upload is specified using the transfer-encoding: chunked header. In this case, the space that Manta allocates up front (and validates) is given by the max-content-length header, if provided, or else a server-side, configurable default value. The storage.defaultMaxStreamingSizeMB property in Muskie’s configuration file determines this value. This is typically populated from the MUSKIE_DEFAULT_MAX_STREAMING_SIZE_MB SAPI property. The typical value is 5120 MB. This is documented in the Manta API, so changing it may break working clients. Note that this default is sometimes too large in some development environments, which can cause streaming uploads to fail with this message. An object PUT must specify either transfer-encoding: chunked or content-length.

To understand a specific instance of this error, ask:

  • Does the size in the error message match what you (and the end user) expected would be required for this request? The process by which Manta determines this size is described above. If this does not match what you and the end user expect, that may reflect a bug. (More commonly, end users may not understand the semantics of content-length and max-content-length or operators don’t understand the defaultMaxStreamingSizeMB default value. Again, see above and the linked API documentation for an explanation of those.)

  • If the value matches what you expect, does Manta actually have that much space? This question is more complex than it sounds. Keep in mind that:

    • Manta will only consider storage zones currently using less than a configurable maximum amount of space. This is determined by the storage.maxUtilizationPct property of Muskie configuration files, which comes from the MUSKIE_MAX_UTILIZATION_PCT SAPI property. For requests from operator accounts, the storage.maxOperatorUtilizationPct property is used instead, which comes from the MUSKIE_MAX_OPERATOR_UTILIZATION_PCT SAPI property.

    • Multiple copies are written for each new object. The number of copies is determined by the durability-level header that is specified by the client. All copies of the object must be assigned to separate storage zones. The requested amount of space must be present on as many storage zones are there are supposed to be copies created.

    • If this is a multi-datacenter deployment (i.e., if the storage.multiDC Muskie configuration property (which comes from the MUSKIE_MULTI_DC SAPI property) is set to a true value), then copies must be spread across datacenters as well. If the number of copies is less than the number of datacenters (e.g., 2 copies in a 3-datacenter deployment), then all copies must be placed into separate datacenters (so there must be a storage zone in each datacenter with enough capacity). If the number of copies is at least the number of datacenters, at least one copy must be placed in each datacenter.

Example: in the common case of durability-level: 2 (the default) in a multi-DC deployment with content-length of 10 GB, there must be 10 GB of space available on at least one storage zone in each of two datacenters. The request could fail with this error even if there are two storage zones with enough space if those storage zones are all in the same datacenter.

The mpicker tool can be used to determine how much space is available on each storage zone. The mpicker poll command summarizes available space on each storage zone that’s under the configured utilization percentage. For example:

[root@380920d9 (webapi) ~]$ /opt/smartdc/muskie/bin/mpicker poll
polling options:
 max percentage utilization: 92

DATACENTER MANTA_STORAGE_ID          %USED         AVAIL_MB      TIMESTAMP
staging-3  3.stor.staging.joyent.us  91            197218        2019-04-10T21:15:35.726Z
staging-2  2.stor.staging.joyent.us  87            306039        2019-04-10T21:15:35.221Z
staging-1  1.stor.staging.joyent.us  81            456521        2019-04-10T21:15:37.506Z
[root@380920d9 (webapi) ~]$

The mpicker choose command can be used to simulate Manta’s storage allocation behavior for a request of a given size. Here’s an example request for a size that would fail, given the above output:

[root@380920d9 (webapi) ~]$ /opt/smartdc/muskie/bin/mpicker poll -j | /opt/smartdc/muskie/bin/mpicker choose -s 310000
replicas requested: 2
size requested: 310000 MB
storage server count by datacenter:
        staging-3: 1 (0 available)
        staging-2: 1 (0 available)
        staging-1: 1 (1 available)
total storage server count: 3 (1 available)

could not choose storage node set: 2 copies requested, but only 1 DC(s) have sufficient space

If you run mpicker choose in this way using the same size as you see in the "No space available for …​ " message, you’d typically expected to see the same error.

The mpicker command has a number of options to control the output and simulate different conditions. See the help output for details.

Locating metadata for an object

See Locating object data in the Manta Operator’s Guide.

Locating actual data for an object

See Locating object data in the Manta Operator’s Guide.

Locating a particular server

See Locating servers in the Manta Operator’s Guide.

Locating a particular zone

See Locating Manta component zones in the Manta Operator’s Guide.

Locating a particular database shard

Finding what shard a particular zone is part of

Save debugging state and restart a process

Investigating service discovery

Investigating a slow process

Investigating why a process is on-CPU

Investigating why a process is off-CPU

Check for a garbage collection issue (or memory leak)

Scaling up a component

Characterizing a problem

It’s very valuable to briefly and precisely characterize a problem:

  • when asking for help from others, so they can quickly understand the relevant context

  • for summarizing status to stakeholders

  • for handing off work to others (e.g., when escalating to another team or changing shifts)

  • as a form of rubber-ducking — it often helps spark insight

When characterizing a problem, include the basic facts:

  • Exactly what you’ve observed: 500 errors? 503 errors? Elevated latency? It’s useful to be as specific as you can (e.g., "a 5% increase in 500 errors"), but it’s better to be vague (e.g., "an increase in latency") than to say something false (e.g., "an increase in average latency" when only tail latency is affected).

  • Something about when the observations started. Again, it’s useful to be as specific as possible (e.g., "starting at 2018-08-09T16:47Z"), but it’s better to be vague (e.g., "seems to have increased since yesterday") than incorrect (e.g., "it’s higher than yesterday" when you’re really just eyeballing an average value from a graph).

  • What other observations you’ve made (or anything else you’ve tried, and the results)

  • (if you’re asking for help) Any specific questions you have.

It’s useful to mention what conclusions you’ve drawn, but try to distinguish facts (e.g., "moray queue lengths are high") from inferences ("moray is overloaded").

Here’s a good status summary:

Webapi is reporting an increase in 500 and 503 errors since 2018-08-09T16:58Z. These do not seem correlated with increased latency, nor with any errors from the metadata tier. How do we check for errors from the storage tier?

It’s often very helpful to include screenshots (MacOS instructions). Here’s an example:

Elevated latency at Webapi

Here’s one way to characterize this:

We saw elevated latency since early on 8/1 (UTC)

Better:

We saw a modest increase in tail latency (both p90 and p99) starting early on 8/1 and continuing through 8/4 (UTC). This was followed by a sharp, significant increase on 8/6.

More detailed (may be better, depending on the situation):

We saw a modest increase in tail latency (both p90 and p99) starting early on 8/1 and continuing through 8/4 (UTC). This was followed by a sharp, significant increase on 8/6. During both increases, latency was much less stable than before. p90 peaked at about 50% higher than normal, while p95 spiked about 3x what it was before.

Of course, it’s easy to give too much detail, too. Think about what’s likely to be relevant for your audience.

Grafana’s features can help you make quantitative statements. For example, take this graph:

Dips in throughput at Webapi

About the dip on 8/6, you could say:

We saw a dip in throughput on 8/6.

Better would be:

There was a brief dip in inbound throughput on 8/6 around 12:36Z.

Since Grafana can show you both the average for the whole period, plus the value at a point, you can make more specific statements:

There was a brief dip in inbound throughput on 8/6 around 12:36Z. It dropped about 38% relative to the average for 14 days around it (3.3 GBps during the dip, compared to a recent average of about 5.3 GBps).