Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only 14 add node events in AWS dev test environment with PR100 #110

Closed
q131172019 opened this issue Jul 27, 2022 · 2 comments
Closed

Only 14 add node events in AWS dev test environment with PR100 #110

q131172019 opened this issue Jul 27, 2022 · 2 comments

Comments

@q131172019
Copy link
Collaborator

q131172019 commented Jul 27, 2022

In the following AWS dev test environment for test 1M nodes/ 5 regions /10 RPs per region/ 20K nodes per RP / 200k nodes per region / 40 schedulers / 25K nodes per scheduler, use PR100 "Split metrics report for new node events and update node events" to do the test.

test environment:

-- 1 service_api running on AWS EC2 instance (t2.2xlarge - 8 vcpu and 32GB memory)
-- 5 resource region simulators on 5 AWS EC2 instances in 5 different AWS regions(us-west-2, us-west-1, us-east-2, us-east-1 and ca-central-1)
    * each EC2 instance simulates 200K nodes per region, 10RPs, 20K nodes per RP
-- 41 scheduler client simulators on 3 AWS EC2 instances in 2 different AWS regions (us-west-2 and us-west-1)
    * 14 scheduler client simulators on 1 AWS EC2 instances in AWS region us-west-2
    * 14 scheduler client simulators on 1 AWS EC2 instances in AWS region us-west-2
    * 13 scheduler client simulators on 1 AWS EC2 instances in AWS region us-west-1

git log output:

commit 5e23651e28ebd9bdfdd5ee523881613c3de09f42 (HEAD, ying/perf-checkpoint)
Author: Ying Huang <[email protected]>
Date:   Thu Jul 21 20:36:53 2022 +0000

    Split metrics report for new node events and update node events

commit 526147a811c1704675cdde2f460f64804d78e33b (ying/main, origin/main, origin/HEAD, main, Carl_Test_By_PR93)
Author: Ying Huang <[email protected]>
Date:   Thu Jul 21 09:01:52 2022 -0700

    Change checkpoints from map to array (#93)

    * Revert unnecessary changes in PR 85

    * Move feature check out of business logic

    * Fix spelling error

    * Change checkpoints from map to array

    * Add test case to TestSingleRPMutipleClients_Workflow: 1M nodes with 50 clients each has 15000 , each got 100K update events

    * Print registration result properly

    * Log latency detail for each event

    * Add back testcases

    * Use constants for checkpoint name

    * Add perf data for distributor concurrency test after adding checkpoints with array

    * Update per CR

commit 1ac3b09c60c1a25131f8ca6bbf718a9bf94c5730
Author: Yunwen Bai <[email protected]>
Date:   Wed Jul 20 19:04:28 2022 -0700

    aggregator PULL() optimization and logging adjust (#90)

    * perf optimization and high logging level adjust

    * minor fix to reduce client log

We only found 14 added events from service log /home/ubuntu/TMP/service.log.2022-07-21.v000060 under account 'ubuntu'.

I0721 21:43:16.182459   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 0s, perc90 0s, perc99 0s. Total count 0 
I0721 21:43:16.182506   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 0s, perc90 0s, perc99 0s. Total count 0
I0721 21:43:16.182516   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 0s, perc90 0s, perc99 0s. Total count 0
I0721 21:43:16.182524   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 0s, perc90 0s, perc99 0s. Total count 0
I0721 21:43:16.182532   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 0s, perc90 0s, perc99 0s. Total count 0
I0721 21:43:16.182539   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 0s, perc90 0s, perc99 0s. Total count 0
I0721 21:48:16.193209   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 52.681598ms, perc90 52.681598ms, perc99 52.681598ms. Total count 2
I0721 21:48:16.193261   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 54.754101ms, perc90 54.754101ms, perc99 54.754101ms. Total count 2
I0721 21:48:16.193275   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 65.07107ms, perc90 65.099251ms, perc99 65.099251ms. Total count 2
I0721 21:48:16.193284   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 65.07346ms, perc90 65.100694ms, perc99 65.100694ms. Total count 2
I0721 21:48:16.193293   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 65.082258ms, perc90 65.980772ms, perc99 65.980772ms. Total count 2
I0721 21:48:16.193301   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 65.08718ms, perc90 65.982058ms, perc99 65.982058ms. Total count 2
I0721 21:53:16.262955   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 608.194258ms, perc90 683.11118ms, perc99 683.11118ms. Total count 6
I0721 21:53:16.263012   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 645.53421ms, perc90 703.593398ms, perc99 703.593398ms. Total count 6
I0721 21:53:16.263023   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 777.862734ms, perc90 784.74085ms, perc99 784.74085ms. Total count 6
I0721 21:53:16.263033   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 777.864072ms, perc90 784.74216ms, perc99 784.74216ms. Total count 6
I0721 21:53:16.263066   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 781.430278ms, perc90 802.827566ms, perc99 802.827566ms. Total count 6
I0721 21:53:16.263075   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 781.431689ms, perc90 802.828712ms, perc99 802.828712ms. Total count 6
I0721 21:58:16.431757   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 608.194258ms, perc90 683.11118ms, perc99 683.11118ms. Total count 6
I0721 21:58:16.431805   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 645.53421ms, perc90 703.593398ms, perc99 703.593398ms. Total count 6
I0721 21:58:16.431819   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 777.862734ms, perc90 784.74085ms, perc99 784.74085ms. Total count 6
I0721 21:58:16.431829   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 777.864072ms, perc90 784.74216ms, perc99 784.74216ms. Total count 6
I0721 21:58:16.431838   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 781.430278ms, perc90 802.827566ms, perc99 802.827566ms. Total count 6
I0721 21:58:16.431848   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 781.431689ms, perc90 802.828712ms, perc99 802.828712ms. Total count 6
I0721 22:03:16.644943   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 608.194258ms, perc90 1.268068174s, perc99 1.268068174s. Total count 8
I0721 22:03:16.644996   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 645.53421ms, perc90 1.287790313s, perc99 1.287790313s. Total count 8
I0721 22:03:16.645025   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 777.862734ms, perc90 1.37189753s, perc99 1.37189753s. Total count 8
I0721 22:03:16.645036   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 777.864072ms, perc90 1.371899029s, perc99 1.371899029s. Total count 8
I0721 22:03:16.645049   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 781.447447ms, perc90 1.371906371s, perc99 1.371906371s. Total count 8
I0721 22:03:16.645059   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 781.448817ms, perc90 1.37190933s, perc99 1.37190933s. Total count 8
I0721 22:08:16.877998   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 683.11118ms, perc90 1.268068174s, perc99 1.268068174s. Total count 12
I0721 22:08:16.878056   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 703.593398ms, perc90 1.287790313s, perc99 1.287790313s. Total count 12
I0721 22:08:16.878068   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 784.74085ms, perc90 1.371875023s, perc99 1.37189753s. Total count 12
I0721 22:08:16.878078   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 784.74216ms, perc90 1.37187673s, perc99 1.371899029s. Total count 12
I0721 22:08:16.878087   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 802.827566ms, perc90 1.371885272s, perc99 1.371906371s. Total count 12
I0721 22:08:16.878097   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 802.828712ms, perc90 1.371890149s, perc99 1.37190933s. Total count 12
I0721 22:13:17.207249   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 795.978596ms, perc90 1.696185577s, perc99 1.696185577s. Total count 14
I0721 22:13:17.207302   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 803.391119ms, perc90 1.709851399s, perc99 1.709851399s. Total count 14
I0721 22:13:17.207314   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 839.493536ms, perc90 1.778152243s, perc99 1.778176709s. Total count 14
I0721 22:13:17.207323   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 839.495307ms, perc90 1.77815393s, perc99 1.778178074s. Total count 14
I0721 22:13:17.207333   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 839.503736ms, perc90 1.778163512s, perc99 1.778185418s. Total count 14
I0721 22:13:17.207344   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 839.5086ms, perc90 1.77816789s, perc99 1.778188511s. Total count 14
I0721 22:18:17.592768   30506 event_metrics.go:132] [Metrics][ADDED][AGG_RECEIVED] perc50 795.978596ms, perc90 1.696185577s, perc99 1.696185577s. Total count 14
I0721 22:18:17.592813   30506 event_metrics.go:133] [Metrics][ADDED][DIS_RECEIVED] perc50 803.391119ms, perc90 1.709851399s, perc99 1.709851399s. Total count 14
I0721 22:18:17.592825   30506 event_metrics.go:134] [Metrics][ADDED][DIS_SENDING] perc50 839.493536ms, perc90 1.778152243s, perc99 1.778176709s. Total count 14
I0721 22:18:17.592835   30506 event_metrics.go:135] [Metrics][ADDED][DIS_SENT] perc50 839.495307ms, perc90 1.77815393s, perc99 1.778178074s. Total count 14
I0721 22:18:17.592844   30506 event_metrics.go:136] [Metrics][ADDED][SER_ENCODED] perc50 839.503736ms, perc90 1.778163512s, perc99 1.778185418s. Total count 14
I0721 22:18:17.592853   30506 event_metrics.go:137] [Metrics][ADDED][SER_SENT] perc50 839.5086ms, perc90 1.77816789s, perc99 1.778188511s. Total count 14

Initial investigation: Add node event were not in event queue. However, for watch related event (i.e. update) we won's lost them.

@yb01
Copy link
Collaborator

yb01 commented Aug 2, 2022

did some ad hoc and calculation of watch node event numbers from simulator log and service metrics, the event numbers match to prove that there is no data change event lost.

@yb01
Copy link
Collaborator

yb01 commented Aug 8, 2022

close as by-designed behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants