-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLecture 13 Spanner.srt
4935 lines (4109 loc) · 139 KB
/
Lecture 13 Spanner.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,110 --> 00:00:10,050
嗯,也许我们应该开始吧,因为我们已经有很长时间了
um maybe maybe we should get started um
it's been a long time since we've all
2
00:00:10,050 --> 00:00:16,020
晚上在同一地方,希望大家今天过得不错,我想
been in the same place at night I hope
everybody's doing well today I'd like to
3
00:00:16,020 --> 00:00:21,480
谈论扳手谈论这篇文章的原因是它很少见
talk about spanner the reason to talk
about this paper is that it's a rare
4
00:00:21,480 --> 00:00:27,210
系统示例提供了基于广泛数据的分布式事务
example of a system provides distributed
transactions over data that's widely
5
00:00:27,210 --> 00:00:30,449
分离的数据可能分散在整个互联网上,
separated that is data that might be
scattered all over the internet and
6
00:00:30,449 --> 00:00:36,149
我在Saul的生产系统中从来没有做过的不同数据中心
different data centers I'm Saul most
never done in production systems of
7
00:00:36,149 --> 00:00:39,420
当然,非常需要能够进行交易
course it's extremely desirable to be
able to have transactions the
8
00:00:39,420 --> 00:00:44,730
程序员非常喜欢它,并且非常希望能够传播数据
programmers really like it and also
extremely desirable to have data spread
9
00:00:44,730 --> 00:00:50,160
在整个网络上进行容错处理并确保数据
all over the network for both for fault
tolerance and to ensure that data is
10
00:00:50,160 --> 00:01:01,039
几乎每个想要使用它的人附近都有一个数据副本,并且
near that there's a copy of the data
near everybody who wants to use it and
11
00:01:01,039 --> 00:01:07,920
在实现此扳手的过程中,至少使用了两个巧妙的想法,一个是
on the way to achieving this spanner
used at least two neat ideas one is that
12
00:01:07,920 --> 00:01:12,020
他们运行两阶段提交,但实际上是在复制的paxos上运行它
they run two-phase commit but they
actually run it over paxos replicated
13
00:01:12,020 --> 00:01:17,850
参与者我是为了避免问题的两阶段提交
participants I'm in order to avoid the
problem the two-phase commit that a
14
00:01:17,850 --> 00:01:22,650
崩溃的协调员可以阻止所有人,另一个有趣的想法是
crashed coordinator can block everyone
and the other interesting idea is that
15
00:01:22,650 --> 00:01:26,820
他们使用同步时间来获得非常有效的只读
they use synchronize time in order to
have very efficient read-only
16
00:01:26,820 --> 00:01:32,520
交易和系统是实际上非常成功的
transactions and the system is is that
actually been very successful it's used
17
00:01:32,520 --> 00:01:38,040
Google内的许多不同服务带来了很多麻烦
a lot by many many different services
inside of Google it's been turned by
18
00:01:38,040 --> 00:01:43,320
Google成为一种产品,可为其基于云的客户提供服务,
Google into a product to service for
their cloud-based customers and it's
19
00:01:43,320 --> 00:01:48,689
启发了许多其他研究和其他系统
inspired a bunch of other research and
other systems both sort of by the
20
00:01:48,689 --> 00:01:53,090
例如,这种广域交易是可能的,
example that it's kind of wide area
transactions are possible and also
21
00:01:53,090 --> 00:01:58,229
特别是至少有一个打开她的系统蟑螂数据库使用了很多
specifically there's at least one opens
her system cockroach DB that uses a lot
22
00:01:58,229 --> 00:02:05,100
明确使用大量设计激励用例的原因
of explicitly uses a lot of the design
the motivating use case the reason that
23
00:02:05,100 --> 00:02:09,239
报纸说他们首先开始设计扳手的是
the paper says they first kind of
started the design spanner was that they
24
00:02:09,239 --> 00:02:13,800
实际上已经有很多大型数据库系统,
were already had a actually they had
many big database systems and
25
00:02:13,800 --> 00:02:19,290
Google,但他们的广告系统特别是数据短缺
Google but their advertising system in
particular the data was shorted over
26
00:02:19,290 --> 00:02:25,770
许多不同的续集和BigTable数据库,并保持
many many distinct my sequel and
BigTable databases and maintaining that
27
00:02:25,770 --> 00:02:30,630
分片只是一个笨拙,手动和耗时的过程
sharding was a just an awkward and
manual and time-consuming process in
28
00:02:30,630 --> 00:02:36,180
此外,他们以前的广告数据库系统不允许
addition their previous advertising
database system didn't allow
29
00:02:36,180 --> 00:02:40,020
跨多个交易的交易基本上比单个交易多
transactions that spanned more than a
single basically more than a single
30
00:02:40,020 --> 00:02:44,820
服务器,但他们确实希望能够将数据更多地分发出去
server but they really wanted to be able
to have to spread their data out more
31
00:02:44,820 --> 00:02:51,000
广泛地获得更好的性能,并进行多个交易
widely for better performance and to
have transactions over the multiple
32
00:02:51,000 --> 00:02:56,970
广告数据库的数据分片显然是
shards of the data for their advertising
database apparently the workload was
33
00:02:56,970 --> 00:03:00,780
以只读事务为主,我的意思是您可以在表6中看到它,其中
dominated by read-only transactions I
mean you can see this in table 6 where
34
00:03:00,780 --> 00:03:06,860
有数十亿的只读交易,只有数百万
the there's billions of read-only
transactions and only millions of
35
00:03:06,860 --> 00:03:11,880
读写交易,因此他们对交易的表现非常感兴趣
readwrite transactions so they're very
interested in the performance of
36
00:03:11,880 --> 00:03:16,440
只读仅执行杂草的事务,并且显然还需要
read-only of transactions that only do
weeds and apparently they also required
37
00:03:16,440 --> 00:03:21,209
强大的一致性,并且您知道特别是什么交易,因此它们
strong consistency and that you know
what transactions in particular so they
38
00:03:21,209 --> 00:03:27,060
需要可序列化的交易,他们还需要外部一致性
wanted serializable transactions and
they also wanted external consistency
39
00:03:27,060 --> 00:03:33,450
这意味着如果一项事务提交,然后完成
which means that if one transaction
commits and then after it finishes
40
00:03:33,450 --> 00:03:37,700
提交另一笔交易开始第二笔交易需要看到任何
committing another transaction starts
the second transaction needs to see any
41
00:03:37,700 --> 00:03:43,560
修改是由第一个完成的,并且这种外部一致性证明是
modification is done by the first and
this external consistency turns out to
42
00:03:43,560 --> 00:03:52,010
对复制的数据很感兴趣,所以
be interesting with replicated data all
right so
43
00:03:52,010 --> 00:03:57,480
拥有权只是服务器的物理布置的基本安排
ownage are just a basic arrangement sort
of physical arrangement of their servers
44
00:03:57,480 --> 00:04:03,680
扳手使用它的服务器已分布在数据中心
that that spanner uses it has the its
servers are spread over data centers
45
00:04:03,680 --> 00:04:08,880
大概是全世界,当然是整个美国
presumably all over the world certainly
all over the United States and each
46
00:04:08,880 --> 00:04:14,030
数据被复制到多个数据中心,因此图表必须具有
piece of data is replicated at multiple
data centers so the diagrams got to have
47
00:04:14,030 --> 00:04:19,649
多个数据中心让我们说真的有三个数据中心
multiple data centers let's say there's
there's three data centers really
48
00:04:19,649 --> 00:04:23,330
还会有更多糟糕
there'd be many more oops
49
00:04:26,009 --> 00:04:29,990
所以我们整日持不同意见,然后数据将其分
so we have all day
dissenters then the data shard it that
50
00:04:29,990 --> 00:04:35,389
它被分解了,您可以想到它已被密钥和分解了
it's broken up you can think of it has
been being broken up by key into and
51
00:04:35,389 --> 00:04:39,770
分散在许多服务器上,所以也许有一台服务器提供密钥启动
split over many servers so maybe there's
one server that serves keys starting
52
00:04:39,770 --> 00:04:46,490
在此数据中心中以a或以B开头的另一个以此类推
with a in this data center or another
starting with B and so forth lots of
53
00:04:46,490 --> 00:04:52,520
实际上,具有大量服务器的大量图表实际上每个数据中心在任何地方都具有
lots of charting with lots of servers in
fact every data center has on any piece
54
00:04:52,520 --> 00:04:57,319
数据是在多个数据中心上复制的任何分片,因此
of data is any shard is replicated at
more than one data center so there's
55
00:04:57,319 --> 00:05:01,520
将成为另一个副本的a键和B键的另一个副本,依此类推
going to be another copy another replica
of the a keys and the B keys and so on
56
00:05:01,520 --> 00:05:08,090
在中心的第二天,又是另一个希望完全相同的副本
the second day in the center and yet
another hopefully identical copy of all
57
00:05:08,090 --> 00:05:14,300
此数据位于第三个数据中心,此外每个数据中心都有多个
this data at the third data center in
addition each data center has multiple
58
00:05:14,300 --> 00:05:19,940
客户或他们的扳手客户,以及这些客户在网络上的真实身份
clients or their clients of spanner and
what these clients really are as web
59
00:05:19,940 --> 00:05:24,250
服务器,因此,如果我们普通人坐在网络浏览器前
servers so if our ordinary human beings
sitting in front of a web browser
60
00:05:24,250 --> 00:05:28,520
连接到一些使用扳手的Google服务
connects to some Google service that
uses spanner
61
00:05:28,520 --> 00:05:31,550
他们将连接到其中一个数据中心中的某些Web服务器,并且
they'll connect to some web server in
one of the data centers and that's going
62
00:05:31,550 --> 00:05:40,580
成为这些扳手客户之一,可以复制
to be one of these one of these spanner
clients all right so that is replicated
63
00:05:40,580 --> 00:05:45,680
复制实际上是由Paxos管理的,实际上,Paxos的变体
the replication is managed by Paxos in
fact that really a variant of Paxos that
64
00:05:45,680 --> 00:05:50,020
有领导者,非常像我们都熟悉的木筏
has leaders and is really very much like
the raft that we're all familiar with
65
00:05:50,020 --> 00:05:56,840
每个Paxos实例管理给定数据碎片的所有副本,因此
and each Paxos instance manages all the
replicas of a given shard of the data so
66
00:05:56,840 --> 00:06:06,620
该分片的所有分片组成一个Paxos组,所有
this shard all the copies of this shard
form one Paxos group and all the
67
00:06:06,620 --> 00:06:09,740
副本是这种碎片形式,其他包装是分组的,每个包装内都有
replicas are this shard form other packs
was group and within each these are
68
00:06:09,740 --> 00:06:14,900
这些补丁实例是独立的,因为它自己的领导者运行自己的版本
these patches instances independent as
its own leader runs its own version of
69
00:06:14,900 --> 00:06:21,740
礼包的诗歌实例是协议麻木,其原因是
the of the poem instance of the packs
was protocol numb and the reason for the
70
00:06:21,740 --> 00:06:29,539
分片,对于每个分片的独立paxos实例,允许并行
sharding and for the independent paxos
instances per shard is to allow parallel
71
00:06:29,539 --> 00:06:34,190
速度之快和大量并行吞吐量,因为数量众多
speed-up and a lot of parallel
throughput because there's a vast number
72
00:06:34,190 --> 00:06:37,490
您知道的代表网络工作的客户数量
of clients you know which are
representing working on behalf of web
73
00:06:37,490 --> 00:06:41,190
浏览器,所以通常并发
browsers so this huge number typically
of concurrent
74
00:06:41,190 --> 00:06:46,780
请求,因此付出更大的代价将它们分成多个
requests and so it pays and more
immensely to split them up over multiple
75
00:06:46,780 --> 00:06:56,620
碎片和多种并行运行的Paxos组
shards and multiple sort of Paxos groups
that are running in parallel okay and
76
00:06:56,620 --> 00:07:02,680
您可能会想到,或者这些paxos组中的每一个都有一个领导,就像愤怒一样,所以
you can think of or each of these paxos
groups has a leader a lot like wrath so
77
00:07:02,680 --> 00:07:06,669
也许这个分片的领导者不是数据是数据中心中的一个副本,并且
maybe the leader for this shard isn't
data is a replica in datacenter one and
78
00:07:06,669 --> 00:07:13,599
该分片的领导者可能是副本和数据中心两个,依此类推
the leader for this shard might be the
replica and datacenter two and and so
79
00:07:13,599 --> 00:07:21,250
并且您知道,这意味着如果您需要,如果客户需要
forth and you know so that means that if
you need to if a client needs to do a
80
00:07:21,250 --> 00:07:28,360
它必须将其权利发送给其数据的分片的领导者
right it has to send that right to the
leader of the of the shard whose data it
81
00:07:28,360 --> 00:07:34,750
需要仅用Raph编写这些Paxos实例,它们才是真正的
needs to write just with Raph these
Paxos instances are what they're really
82
00:07:34,750 --> 00:07:38,530
做的是发送日志,领导者正在复制操作日志
doing is sending out a log the leader is
sort of replicating a log of operations
83
00:07:38,530 --> 00:07:42,819
给所有跟随者,跟随者执行该日志,该日志用于数据
to all the followers and the followers
execute that log which is for data is
84
00:07:42,819 --> 00:07:53,199
将被读取和写入,因此它将以相同的顺序执行所有日志
gonna be reads and writes so it executes
those logs all in the same order all
85
00:07:53,199 --> 00:07:58,389
正确,所以这些设置的原因就是我提到的分片
right so the reason for these for this
setup the sharding as I mentioned for
86
00:07:58,389 --> 00:08:03,759
在不同数据中心中的多个副本的吞吐量为两个
throughput the multiple copies in
different data centers is for two
87
00:08:03,759 --> 00:08:07,719
原因之一是您想要复制和其他数据中心,以防一种数据
reasons one is you want copies and
different data centers in case one data
88
00:08:07,719 --> 00:08:12,729
如果您知道整个城市的电源故障,则数据中心将失败
center fails if you know maybe you power
fails to the entire city the data
89
00:08:12,729 --> 00:08:16,930
集中在地震,火灾或其他您想要的东西中
centers in or there's an earthquake or a
fire or something you'd like other
90
00:08:16,930 --> 00:08:20,560
复制可能不会同时发生故障的其他数据中心
copies that other data centers that are
maybe not going to fail at the same time
91
00:08:20,560 --> 00:08:24,550
然后您知道要为此付出代价,因为现在paxos协议
and then you know there's a price to pay
for that because now the paxos protocol
92
00:08:24,550 --> 00:08:29,409
现在可能要长途交谈才能与关注者交谈,
now has to talk maybe over long
distances to talk to followers and
93
00:08:29,409 --> 00:08:33,309
不同的数据中心在多个数据中心拥有数据的另一个原因是
different data centers the other reason
to have data in multiple data centers is
94
00:08:33,309 --> 00:08:39,009
它可以让您在所有不同的客户端附近拥有数据副本
95
00:08:39,010 --> 00:08:42,429
使用它,因此,如果您有一条可以在两个加利福尼亚州读取的数据
that use it so if you have a piece of
data that may be read in both California
96
00:08:42,429 --> 00:08:48,250
在纽约,也许可以很高兴地在加利福尼亚州拥有一份该数据的副本
and New York maybe it's nice to have a
copy of that data one copy in California
97
00:08:48,250 --> 00:08:53,140
一本在纽约的副本,因此阅读速度非常快,实际上很多
one copy in New York so that reads can
be very fast and indeed a lot of the
98
00:08:53,140 --> 00:08:57,529
重点是时间是从本地读取
focus that
time is to make reads from the local the
99
00:08:57,529 --> 00:09:04,790
最近的副本既快速又正确,最后是另一个有趣的交互
nearest replica both fast and correct
finally another interesting interaction
100
00:09:04,790 --> 00:09:07,130
Paxos和多个数据中心之间的关系是
between Paxos and multiple data centers
is that
101
00:09:07,130 --> 00:09:13,400
paxos lie craft仅需要多数才能复制日志条目,并且
paxos lie craft only requires a majority
in order to replicate a log entry and
102
00:09:13,400 --> 00:09:18,020
继续,这意味着如果有一个慢速,遥远或不稳定的数据中心,
proceed and that means if there's one
slow or distant or flaky data center the
103
00:09:18,020 --> 00:09:22,760
Paxil系统可以保持协调并接受新请求,即使一个数据
Paxil system can keep chugging along and
accepting new requests even if one data
104
00:09:22,760 --> 00:09:31,460
中心正在缓慢,好吧,这样的安排
center is is being slow all right so
with this arrangement there's a couple
105
00:09:31,460 --> 00:09:35,810
论文必须克服的一大挑战是他们真的想阅读
of big challenges that paper has to bite
off one is they really want to do reads
106
00:09:35,810 --> 00:09:41,720
来自本地数据中心,但是因为他们使用的是Paxos,而且因为Paxos
from local data centers but because
they're using Paxos and because Paxos
107
00:09:41,720 --> 00:09:47,870
只要求每个日志条目都以多数复制,这意味着
only requires each log entry to be
replicated on a majority that means a
108
00:09:47,870 --> 00:09:52,910
少数副本可能滞后并且可能没有看到最新数据
minority of the replicas may be lagging
and may not have seen the latest data
109
00:09:52,910 --> 00:09:58,970
这是由paxos提交的,这意味着如果我们允许客户阅读
that's been committed by paxos and that
means that if we allow clients to read
110
00:09:58,970 --> 00:10:04,610
从本地副本以提高速度,他们可能正在读取过时的数据,如果它们
from the local replicas for speed they
may be reading out-of-date data if their
111
00:10:04,610 --> 00:10:08,120
副本恰好在少数族群中,没有看到最新更新,因此
replica happens to be in the minority
that didn't see the latest updates so
112
00:10:08,120 --> 00:10:11,630
他们必须这样做,因为他们需要正确性,他们需要这个
they have to since they're requiring
correctness they're requiring this
113
00:10:11,630 --> 00:10:18,470
外部一致性的想法,即每次读取都可以看到他们最新的数据
external consistency idea that every
read see the most up-to-date data they
114
00:10:18,470 --> 00:10:22,850
必须采用某种方式来处理本地副本的可能性
have to have some way of dealing with
the possibility that the local replicas
115
00:10:22,850 --> 00:10:28,310
可能滞后于他们必须处理的另一个问题是,交易可能
may be lagging another issue they have
to deal with is that a transaction may
116
00:10:28,310 --> 00:10:32,300
涉及多个分片,因此涉及多个paxos组,因此您可能
involve multiple shards and therefore
multiple paxos groups so you may be
117
00:10:32,300 --> 00:10:35,780
读取或写入单个交易可能是读取或写入多个交易
reading or writing a single transaction
may be reading or writing multiple
118
00:10:35,780 --> 00:10:40,250
数据库中存储在多个分片和多个Paxil中的记录
records in the database that are stored
in multiple shards and multiple Paxil
119
00:10:40,250 --> 00:10:49,700
脚本,所以这些必须是我们,我们需要分布式事务,所以我
scripts so those have to be we need
distributed transactions okay so I'm
120
00:10:49,700 --> 00:10:52,700
将会解释交易的工作方式
going to explain how the transactions
work that's going to be the kind of
121
00:10:52,700 --> 00:10:58,720
演讲者的重点实际上是实现读写交易
focus of the lecture spanner actually
beats implements readwrite transactions
122
00:10:58,720 --> 00:11:02,389
与只读交易大不相同,所以让我从您的
quite differently from read-only
transactions so let me start with your
123
00:11:02,389 --> 00:11:07,190
如此之多的传统读写事务之美
beauty of readwrite transactions which
are so have a lot more conventional in
124
00:11:07,190 --> 00:11:27,490
他们的设计还不错,所以第一次读写交易让我提醒您
their design alright so first readwrite
transactions let me just remind you at a
125
00:11:27,490 --> 00:11:32,020
交易看起来像这样,让我们只选择一个简单的交易
transaction looks like so let's just
choose a simple one that's like
126
00:11:32,020 --> 00:11:39,010
模仿银行转帐,所以我是其中的一台客户计算机
mimicking bank transfer so I'm one of
those client machines a client of
127
00:11:39,010 --> 00:11:42,460
扳手,您将运行一些代码,您将运行此事务代码,代码会说哦
spanner you'd run some code you run this
transaction code the code would say oh
128
00:11:42,460 --> 00:11:46,330
我正在开始交易,然后我会说哦,我想读写
I'm beginning a transaction and then I
would say oh I want to read and write
129
00:11:46,330 --> 00:11:50,350
这些记录,所以也许您在数据库记录X中有一个银行余额,我们想要
these records so maybe you have a bank
balance in database record X and we want
130
00:11:50,350 --> 00:11:56,740
你知道增加和增加这个银行余额减少你的银行
to you know increment and increase this
bank balance and decrease y's bank
131
00:11:56,740 --> 00:12:01,090
余额,哦,交易到此结束,现在客户希望
balance and oh that's the end of the
transaction and now the client hopes the
132
00:12:01,090 --> 00:12:04,350
数据库将关闭并提交
database will go off and commit that
133
00:12:05,160 --> 00:12:11,080
好吧,所以我想追溯所有必须发生的步骤
alright so I want to trace through all
the steps that that have to happen in
134
00:12:11,080 --> 00:12:17,560
为了使扳手执行此读写事务,因此首先
order to in order for spanner to execute
this read write transaction so first of
135
00:12:17,560 --> 00:12:21,540
一个数据中心中有一个客户正在推动这项交易
all there's a client in one of the data
centers that's driving this transaction
136
00:12:21,540 --> 00:12:25,690
所以我在这里画这个客户让我们想象一下x和y在不同的位置
so I'll draw this client here let's
imagine that x and y are on different
137
00:12:25,690 --> 00:12:31,990
分片,因为那是有趣的情况,而那些分片
shards since that's the that's the
interesting case and that those shards
138
00:12:31,990 --> 00:12:38,740
两个分片中的每一个都在三个不同的数据中心中复制,因此我们知道
each of the two shards is replicated in
three different data centers so know we
139
00:12:38,740 --> 00:12:47,970
我们在这里拥有三个数据中心,每个数据中心都有一台服务器,
got our three data centers here and at
each data center there's a server that
140
00:12:47,970 --> 00:12:55,120
我将为保留行为的碎片的副本编写x
I'm just going to write x for the
replicas of the shard that's holding act
141
00:12:55,120 --> 00:13:03,360
使用这三个服务器微调器的x和y的银行余额一次
with the bank balance for x and y for
the these three servers spinner once
142
00:13:03,360 --> 00:13:08,470
两阶段提交只是为了完全承受我们的两阶段提交和两阶段
two-phase commit just to totally stand
our two-phase commit and two phase
143
00:13:08,470 --> 00:13:16,060
几乎完全按照上周603的读数中所述进行锁定
locking almost exactly as described in
the reading from last week from the 603
144
00:13:16,060 --> 00:13:22,540
三本教科书,其巨大的区别在于代替了参与者和
three textbook and the huge difference
is that instead of the participants and
145
00:13:22,540 --> 00:13:26,590
交易管理者是个人计算机,参与者是
the transaction manager being individual
computers the participants in the
146
00:13:26,590 --> 00:13:33,270
Paxos复制了事务方式管理器
transaction manner manager are Paxos
replicated
147
00:13:33,340 --> 00:13:37,570
多组服务器以提高容错能力,这意味着提醒
groups of servers for increased fault
tolerance so that means just to remind
148
00:13:37,570 --> 00:13:44,050
你说该碎片存储X的碎片的三个副本,
you that the shard the three replicas of
the shard that stores X it's a really
149
00:13:44,050 --> 00:13:49,330
应用访问组与这三个副本的Y相同,我们可以想象
app access group same with these three
replicas strong Y and we'll just imagine
150
00:13:49,330 --> 00:13:53,790
对于这三台服务器中的每台服务器来说,它都是领导者,所以我们说
that for each of these one of the three
servers is the leader so let's say the
151
00:13:53,790 --> 00:14:00,760
服务器和数据中心2是Paxos的领导者,X是分片,
server and data center 2 is the Paxos
leader for the X is shard and the
152
00:14:00,760 --> 00:14:08,080
仆人说您是Paxos的领导者,您很清楚,所以第一个
servant is saying one is the Paxos
leader for y sharp okay so the first
153
00:14:08,080 --> 00:14:11,770
发生的事情是硬币选择了唯一的交易ID
thing that happens is that the coin
picks a unique transaction ID which is
154
00:14:11,770 --> 00:14:16,470
将被携带在所有这些消息上,以便系统知道
going to be carried on all these
messages so that the system knows that
155
00:14:16,470 --> 00:14:21,100
所有不同的操作都与单个交易相关联
all the different operations are
associated with a single transaction the
156
00:14:21,100 --> 00:14:25,270
客户端的第一件事必须是这样,尽管代码看起来像
first thing that does the client has to
be so despite the way the code looks
157
00:14:25,270 --> 00:14:30,520
它在其中读取和写入X,然后实际上以某种方式读取一些写入Y
where it reads and writes X then read
some write Y in fact the way the code
158
00:14:30,520 --> 00:14:34,330
必须组织交易代码,必须先进行所有读取,
has transaction code has to be organized
it has to do all its reads first and
159
00:14:34,330 --> 00:14:39,220
然后在最后,基本上同时将所有写入作为
then at the very end do all the writes
at the same time essentially as part of
160
00:14:39,220 --> 00:14:49,050
提交,以便客户做得很好,这表明它是为了
the commit so the clients to do good
reads it turns out that it in order to
161
00:14:49,050 --> 00:14:57,520
保持锁定,因为就像您上周每次阅读6:53一样
maintain locks since just as as in last
week's 6:53 reading every time you read
162
00:14:57,520 --> 00:15:03,790
或写一个数据项,负责它的服务器必须关联一个
or write a data item the server
responsible for it has to associate a
163
00:15:03,790 --> 00:15:09,280
用该数据项锁定锁,将保持读取锁和扳手的锁
lock with that data item the locks are
maintained the read locks and spanner
164
00:15:09,280 --> 00:15:14,530
仅在Paxos领导者中维护,因此当客户交易想要
maintain only in the Paxos leader so
when the client transaction wants to
165
00:15:14,530 --> 00:15:23,560
读取访问权限将读取的X请求发送给X的领导者,而该领导者是分片的,并且该领导者
read access sends a read X request to
the leader of X is shard and that leader
166
00:15:23,560 --> 00:15:28,630
的分片返回x的当前值,当然,如果
of the shard returns the current value
of x plus sets a lock on X of course if
167
00:15:28,630 --> 00:15:32,250
锁已经设置好了,您将直到任何时候都不会响应客户端
the locks already set then you won't
respond to the client until whatever
168
00:15:32,250 --> 00:15:36,370