-
Notifications
You must be signed in to change notification settings - Fork 0
/
draft-ietf-roll-rnfd-04.txt
1400 lines (1010 loc) · 62.2 KB
/
draft-ietf-roll-rnfd-04.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
ROLL K. Iwanicki
Internet-Draft University of Warsaw
Intended status: Standards Track September 08, 2024
Expires: March 12, 2025
RNFD: Fast border router crash detection in RPL
draft-ietf-roll-rnfd-04
Abstract
By and large, a correct operation of a RPL network requires border
routers to be up. In many applications, it is beneficial for the
nodes to detect a crash of a border router as soon as possible to
trigger fallback actions. This document describes RNFD, an extension
to RPL that expedites border router failure detection, even by an
order of magnitude, by having nodes collaboratively monitor the
status of a given border router. The extension introduces an
additional state at each node, a new type of RPL Control Message
Options for synchronizing this state among different nodes, and the
coordination algorithm itself.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 12, 2025.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Iwanicki Expires March 12, 2025 [Page 1]
Internet-Draft RNFD September 2024
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Effects of LBR Crashes . . . . . . . . . . . . . . . . . 3
1.2. Design Principles . . . . . . . . . . . . . . . . . . . . 4
1.3. Other Solutions . . . . . . . . . . . . . . . . . . . . . 5
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Protocol State Machine . . . . . . . . . . . . . . . . . 7
3.2. Counters and Communication . . . . . . . . . . . . . . . 8
4. The RNFD Option . . . . . . . . . . . . . . . . . . . . . . . 9
4.1. General CFRC Requirements . . . . . . . . . . . . . . . . 9
4.2. Format of the Option . . . . . . . . . . . . . . . . . . 10
5. RPL Router Behavior . . . . . . . . . . . . . . . . . . . . . 12
5.1. Joining a DODAG Version and Changing the RNFD Role . . . 12
5.2. Detecting and Verifying Problems with the DODAG Root . . 13
5.3. Disseminating Observations and Reaching Agreement . . . . 15
5.4. DODAG Root's Behavior . . . . . . . . . . . . . . . . . . 16
5.5. Activating and Deactivating the Protocol on Demand . . . 17
5.6. Processing CFRCs of Incompatible Lengths . . . . . . . . 18
5.7. Summary of RNFD's Interactions with RPL . . . . . . . . . 19
5.8. Summary of RNFD's Constants . . . . . . . . . . . . . . . 19
6. Manageability Considerations . . . . . . . . . . . . . . . . 20
6.1. Role Assignment and CFRC Size Adjustment . . . . . . . . 20
6.2. Virtual DODAG Roots . . . . . . . . . . . . . . . . . . . 21
7. Security Considerations . . . . . . . . . . . . . . . . . . . 22
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 23
10.1. Normative References . . . . . . . . . . . . . . . . . . 23
10.2. Informative References . . . . . . . . . . . . . . . . . 24
10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 25
1. Introduction
RPL is an IPv6 routing protocol for low-power and lossy networks
(LLNs) [RFC6550]. Such networks are usually constrained in device
energy and channel capacity. They are formed largely of nodes that
offer little processing power and memory, and links that are of
variable qualities and support low data rates. Therefore, a
significant challenge that a routing protocol for LLNs has to address
Iwanicki Expires March 12, 2025 [Page 2]
Internet-Draft RNFD September 2024
is minimizing resource consumption without sacrificing reaction time
to network changes.
One of the main design principles adopted in RPL to minimize node
resource consumption is delegating much of the responsibility for
routing to LLN border routers (LBRs). A network is organized into
destination-oriented directed acyclic graphs (DODAGs), each
corresponding to an LBR and having all its paths terminate at the
LBR. To this end, every node is dynamically assigned a rank
representing its distance, measured in some metric, to a given LBR,
with the LBR having the minimal rank, which reflects its role as the
DODAG root. The ranks allow each non-LBR node to select from among
its neighbors (i.e., nodes to which the node has links) those ones
that are closer to the LBR than the node itself: the node's parents
in the graph. The resulting DODAG paths, consisting of the node-
parent links, are utilized for routing packets upward: to the LBR and
outside the LLN. They are also used by nodes to periodically report
their connectivity upward to the LBR, which allows in turn for
directing packets downward, from the LBR to these nodes, for
instance, by means of source routing [RFC6554]. All in all, not only
do LBRs participate in routing but also drive the process of DODAG
construction and maintenance underlying the protocol.
To play this central role, LBRs are expected to be more capable than
regular LLN nodes. They are assumed not to be constrained in
computing power, memory, and energy, which often entails a more
involved hardware-software architecture and tethered power supply.
This, however, also makes them prone to failures, especially since in
large deployments it is often difficult to ensure a backup power
supply for every LBR.
1.1. Effects of LBR Crashes
When an LBR crashes, the nodes in its DODAG lose the ability to
communicate with other Internet hosts. In addition, a significant
fraction of DODAG paths interconnecting the nodes become invalid, as
they pass through the dead LBR. The others also degenerate as a
result of DODAG repair attempts, which are bound to fail. In effect,
routing inside the DODAG also becomes largely impossible.
Consequently, it is desirable that an LBR crash be detected by the
nodes fast, so that they can leave the broken DODAG and join another
one or trigger additional application- or deployment-dependent
fallback mechanisms, thereby minimizing the negative impact of the
disconnection.
Since all DODAG paths lead to the corresponding LBR, detecting its
crash by a node entails dropping all parents and adopting an infinite
rank, which reflects the node's inability to reach the dead LBR.
Iwanicki Expires March 12, 2025 [Page 3]
Internet-Draft RNFD September 2024
Depending on the deployment settings, the node can then remain in
such a state, join a different DODAG, or even become itself the root
of a floating DODAG. In any case, however, achieving this state for
all nodes is slow, can generate heavy traffic, and is difficult to
implement correctly [Iwanicki16] [Paszkowska19] [Ciolkosz19].
To start with, tearing down all DODAG paths requires each of the dead
LBR's neighbors to detect that its link with the LBR is no longer up.
Otherwise, any of the neighbors unaware of this fact can keep
advertising a finite rank and can thus be other nodes' parent or
ancestor in the DODAG: such nodes will incorrectly believe they have
a valid path to the dead LBR. Detecting a crash of a link by a node
normally happens when the node has observed sufficiently many
forwarding failures over the link. Therefore, considering the low-
data-rate applications of LLNs, the period from the crash to the
moment of eliminating from the DODAG the last link to the dead LBR
may be long. Subsequently learning by all nodes that none of their
links can form any path leading to the dead LBR also adds latency,
partly due to parent changes that the nodes independently perform in
attempts to repair their broken paths locally. Since a non-LBR node
has only local knowledge of the network, potentially inconsistent
with that of other nodes, such parent changes often produce paths
containing loops, which have to be broken before all nodes can
conclude that no path to the dead LBR exists globally. Even with
RPL's dedicated loop detection mechanisms [RFC6553], this also
requires traffic, and hence time. Finally, switching a parent or
discovering a loop can also generate cascaded bursts of control
traffic, owing to the adaptive Trickle algorithm for exchanging DODAG
information [RFC6202]. Overall, the behavior of the network when
handling an LBR crash is highly suboptimal, thereby not being in line
with RPL's goals of minimizing resource consumption and reaction
latencies.
1.2. Design Principles
To address this issue, this document proposes an extension to RPL,
dubbed Root Node Failure Detector (RNFD). To minimize the time and
traffic required to handle an LBR crash, the RNFD algorithm adopts
the following design principles, derived directly from the previous
observations:
1. Explicitly coordinating LBR monitoring between nodes instead of
relying only on the emergent behavior resulting from their
independent operation.
2. Avoiding probing all links to the dead LBR so as to reduce the
tail latency when eliminating these links from the DODAG.
Iwanicki Expires March 12, 2025 [Page 4]
Internet-Draft RNFD September 2024
3. Exploiting concurrency by prompting proactive checking for a
possible LBR crash when some nodes suspect such a failure may
have taken place, which aims to further reduce the overall
latency.
4. Minimizing changes to RPL's existing algorithms by operating in
parallel and largely independently (in the background), and
introducing few additional assumptions.
While these principles do improve RPL's performance under a wide
range of LBR crashes, their probabilistic nature precludes hard
guarantees for all possible corner cases. In particular, in some
scenarios, RNFD's operation may result in false negatives, but these
situations are peculiar and will eventually be handled by RPL's own
aforementioned mechanisms. Likewise, in some scenarios, notably
involving highly unstable links, false positives may occur, but they
can be alleviated as well. In any case, the principles also
guarantee that RNFD can be deactivated at any time, if needed, in
which case RPL's operation is unaffected.
1.3. Other Solutions
Given the consequences of LBR failures, if possible, it is also worth
considering other solutions to the problem. More specifically, power
outages can be alleviated by provisioning redundant power sources or
emergency batteries. Likewise, RPL's so-called virtual DODAG roots
can help tolerating some failures of individual LBRs.
As mentioned previously, RNFD has been designed to be largely
independent of those solutions, that is, rather than aiming to be
their replacement, it can complement them. In particular, the
operation of RNFD with different variants of virtual DODAG roots is
covered in Section 6.2.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
The Terminology used in this document is consistent with and
incorporates that described in "Terms Used in Routing for Low-Power
and Lossy Networks (LLNs)" [RFC7102], "RPL: IPv6 Routing Protocol for
Low-Power and Lossy Networks" [RFC6550], and "The Routing Protocol
for Low-Power and Lossy Networks (RPL) Option for Carrying RPL
Information in Data-Plane Datagrams" [RFC6553]. Other terms in use
Iwanicki Expires March 12, 2025 [Page 5]
Internet-Draft RNFD September 2024
in LLNs can be found in "Terminology for Constrained-Node Networks"
[RFC7228].
In particular, the following acronyms appear in the document:
DIO DODAG Information Object (a RPL message)
DIS DODAG Information Solicitation (a RPL message)
DODAG Destination-Oriented Directed Acyclic Graph
LLN Low-power and Lossy Network
LBR LLN Border Router
In addition, the document introduces the following concepts:
Sentinel One of the two roles that a node can play in RNFD. For a
given DODAG Version, a Sentinel node is a DODAG root's neighbor
that monitors the DODAG root's status. There are normally
multiple Sentinels for a DODAG root. However, being the DODAG
root's neighbor need not imply being Sentinel.
Acceptor The other of the two roles that a node can play in RNFD.
For a given DODAG Version, an Acceptor node is a node that is not
Sentinel.
Locally Observed DODAG Root's State (LORS) A node's local knowledge
of the DODAG root's status, specifying in particular whether the
DODAG root is up.
Conflict-Free Replicated Counter (CFRC) Conceptually represents a
dynamic set whose cardinality can be estimated. It defines a
partial order on its values and supports element addition and
union. The union operation is order- and duplicate-insensitive,
that is, idempotent, commutative, and associative.
3. Overview
As mentioned previously, LBRs are DODAG roots in RPL, and hence a
crash of an LBR is global in that it affects all nodes in the
corresponding DODAG. Therefore, each node running RNFD for a given
DODAG explicitly tracks the DODAG root's current condition, which is
referred to as Locally Observed DODAG Root's State (LORS), and
synchronizes its local knowledge with other nodes.
Since monitoring the condition of the DODAG root is performed by
tracking the status of its links (i.e., whether they are up or down),
Iwanicki Expires March 12, 2025 [Page 6]
Internet-Draft RNFD September 2024
it must be done by the root's neighbors; other nodes must accept
their observations. Consequently, depending on their roles, non-root
nodes are divided in RNFD into two disjoint groups: Sentinels and
Acceptors. A Sentinel node is the DODAG root's neighbor that
monitors its link with the root. The DODAG root thus normally has
multiple Sentinels but being its neighbor need not imply being
Sentinel. An Acceptor node is in turn a node that is not Sentinel.
Acceptors thus mainly collect and propagate Sentinels' observations.
More information on Sentinel selection can be found in Section 6.1.
3.1. Protocol State Machine
The possible values of LORS and transitions between them are depicted
in Figure 1. States "UP" and "GLOBALLY DOWN" can be attained by both
Sentinels and Acceptors; states "SUSPECTED DOWN" and "LOCALLY DOWN"
-- by Sentinels only.
+---------------------------------------------------------+
| |---------------------------+ 3a |
| +-----------------+---------+ 3b | |
| | 2b | v v v
+-+----+-+ 1 +---------+-+ +-----------+ +-+------+-+
| UP +---->+ SUSPECTED +---->+ LOCALLY +---->+ GLOBALLY |
| +<----+ DOWN | 2a | DOWN | 3c | DOWN |
+-+----+-+ 4a +-----------+ +-+---------+ +-+--------+
^ ^ | |
| | 4b | |
| +---------------------------+ 5 |
+--------------------------------------------------+
Figure 1: RNFD States and Transitions
To begin with, when any node joins a DODAG Version, the DODAG root
must appear alive, so the node initializes RNFD with its LORS equal
to "UP". For a properly working DODAG root, the node remains in
state "UP".
However, when a node -- acting as Sentinel -- starts suspecting that
the root may have crashed, it changes its LORS to "SUSPECTED DOWN"
(transition 1 in Figure 1). The transition from "UP" to "SUSPECTED
DOWN" can happen based on the node's observations at either the data
plane, for instance, link-layer triggers about missing hop-by-hop
acknowledgments for packets forwarded over the node's link to the
root, or the control plane, for example, a significant growth in the
number of Sentinels already suspecting the root to be dead. In state
"SUSPECTED DOWN", the Sentinel node may verify its suspicion and/or
inform other nodes about the suspicion. When this has been done, it
Iwanicki Expires March 12, 2025 [Page 7]
Internet-Draft RNFD September 2024
changes its LORS to "LOCALLY DOWN" (transition 2a). In some cases,
the verification need not be performed and, as an optimization, a
direct transition from "UP" to "LOCALLY DOWN" (transition 2b) can be
done instead.
If sufficiently many Sentinels have their LORS equal to "LOCALLY
DOWN", all nodes -- Sentinels and Acceptors -- consent globally that
the DODAG root must have crashed and set their LORS to "GLOBALLY
DOWN", irrespective of the previous value (transitions 3a, 3b, and
3c). State "GLOBALLY DOWN" is terminal in that the only transition
any node can perform from this to another state (transition 5) takes
place when the node joins a new DODAG version. When a node is in
state "GLOBALLY DOWN", RNFD forces RPL to maintain an infinite rank
and no parent, thereby preventing routing packets upward in the
DODAG. In other words, this state represents a situation in which
all non-root nodes agree that the current DODAG version is unusable,
and hence, to recover, the root has to give a proof of being alive by
initiating a new DODAG Version.
In contrast, if a node -- either Sentinel or Acceptor -- is in state
"UP", RNFD does not influence RPL's packet forwarding: a node can
route packets upward if it has a parent. The same is true for states
"SUSPECTED DOWN" and "LOCALLY DOWN", attainable only by Sentinels.
Finally, while in any of the two states, a Sentinel node may observe
some activity of the DODAG root, and hence decide that its suspicion
is a mistake. In such a case, it returns to state "UP" (transitions
4a and 4b).
3.2. Counters and Communication
To enable arriving at a global conclusion that the DODAG root has
crashed (i.e., transiting to state "GLOBALLY DOWN"), all nodes count
locally and synchronize among each other the number of Sentinels
considering the root to be dead (i.e., those in state "LOCALLY
DOWN"). This process employs structures referred to as conflict-free
replicated counters (CFRCs). They are stored and modified
independently by each node and are disseminated throughout the
network in options added to RPL link-local control messages: DODAG
Information Objects (DIOs) and DODAG Information Solicitations
(DISs). Upon reception of such an option from its neighbor, a node
merges the received counter with its local one, thereby obtaining a
new content for its local counter.
The merging operation is idempotent, commutative, and associative.
Moreover, all possible counter values are partially ordered. This
enables ensuring eventual consistency of the counters across all
nodes, irrespective of the particular sequence of merges, shape of
the DODAG, or general network topology.
Iwanicki Expires March 12, 2025 [Page 8]
Internet-Draft RNFD September 2024
Each node in RNFD maintains two CFRCs for a DODAG:
o PositiveCFRC, counting Sentinels that consider or have previously
considered the root node as alive in the current DODAG Version,
o NegativeCFRC, counting Sentinels that consider or have previously
considered the root node as dead in the current DODAG Version.
PositiveCFRC is always greater than or equal to the NegativeCFRC in
terms of the partial order defined for the counters. The difference
between the value of PositiveCFRC and the value of NegativeCFRC is
thus nonnegative and estimates the number of Sentinels that still
consider the DODAG root node as alive.
4. The RNFD Option
RNFD state synchronization between nodes takes place through the RNFD
Option. It is a new type of RPL Control Message Options that is
carried in link-local RPL control messages, notably DIOs and DISs.
Its main task is allowing the receivers to merge their two CFRCs with
the sender's CFRCs.
4.1. General CFRC Requirements
CFRCs in RNFD MUST support the following operations:
value(c) Returns a nonnegative integer value corresponding to the
number of nodes counted by a given CFRC, c.
zero() Returns a CFRC that counts no nodes, that is, has its value
equal to 0.
self() Returns a CFRC that counts only the node executing the
operation.
infinity() Returns a CFRC that counts all possible nodes and
represents a special value, infinity.
merge(c1, c2) Returns a CFRC that is a union of c1 and c2 (i.e.,
counts all nodes that are counted by either c1, c2, or both c1 and
c2).
compare(c1, c2) Returns the result of comparing c1 to c2.
saturated(c) Returns TRUE if a given CFRC, c, is saturated (i.e., no
more new nodes should be counted by it) or FALSE otherwise.
Iwanicki Expires March 12, 2025 [Page 9]
Internet-Draft RNFD September 2024
The partial ordering of CFRCs implies that the result of compare(c1,
c2) can be either:
o smaller, if c1 is ordered before c2 (i.e., c2 counts all nodes
that c1 counts and at least one node that c1 does not count);
o greater, if c1 is ordered after c2 (i.e., c1 counts all nodes that
c2 counts and at least one node that c2 does not count);
o equal, if c1 and c2 are the same (i.e., they count the same
nodes);
o incomparable, otherwise.
In particular, zero() is smaller than all other values and infinity()
is greater than any other value.
The properties of merging in turn can be formalized as follows for
any c1, c2, and c3:
o idempotence: c1 = merge(c1, c1);
o commutativity: merge(c1, c2) = merge(c2, c1);
o associativity: merge(c1, merge(c2, c3)) = merge(merge(c1, c2),
c3).
In particular, merge(c, zero()) always equals c while merge(c,
infinity()) always equals infinity().
There are many algorithmic structures that can provide the
aforementioned properties of CFRC. Although in principle RNFD does
not rely on any specific one, the option adopts so-called linear
counting [Whang90].
4.2. Format of the Option
The format of the RNFD Option conforms to the generic format of RPL
Control Message Options:
Iwanicki Expires March 12, 2025 [Page 10]
Internet-Draft RNFD September 2024
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = TBD1 | Option Length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+ +
| PosCFRC, NegCFRC (Variable Length*) |
. .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The '*' denotes that, if present, the fields have equal lengths.
Figure 2: Format of the RNFD Option
Option Type TBD1
Option Length 8-bit unsigned integer. Denotes the length of the
option in octets excluding the Option Type and Option Length
fields. Its value MUST be even. A value of 0 denotes that RNFD
is disabled in the current DODAG Version.
PosCFRC, NegCFRC Two variable-length, octet-aligned bit arrays
carrying the sender's PositiveCFRC and NegativeCFRC, respectively.
The length of the arrays constituting the PosCFRC and NegCFRC fields
is the same and is derived from Option Length as follows. The value
of Option Length is divided by 2 to obtain the number of octets each
of the two arrays occupies. The resulting number of octets is
multiplied by 8 which yields an upper bound on the number of bits in
each array. As the actual bit length of each of the arrays, the
largest prime number less than the upper bound is assumed. For
example, if the value of Option Length is 16, then each array
occupies 8 octets, and its actual bit length is 61, as this is the
largest prime number less than 64.
Furthermore, for any bit equal to 1 in the NegCFRC, the bit with the
same index MUST be equal to 1 also in the PosCFRC. Any unused bits
(i.e., the bits beyond the actual bit length of each of the arrays)
MUST be equal to 0. Finally, if PosCFRC has all its bits equal to 1,
then NegCFRC MUST also have all its bits equal to 1.
The CFRC operations are defined for such bit arrays of a given length
as follows:
value(c) Returns the smallest integer value not less than -LT*ln(L0/
LT), where ln() is the natural logarithm function, L0 is the
Iwanicki Expires March 12, 2025 [Page 11]
Internet-Draft RNFD September 2024
number of bits equal to 0 in the array corresponding to "c" and LT
is the bit length of the array.
zero() Returns an array with all bits equal to 0.
self() Returns an array with a single bit, selected uniformly at
random, equal to 1.
infinity() Returns an array with all bits equal to 1.
merge(c1, c2) Returns a bit array that constitutes a bitwise OR of
c1 and c2, that is, a bit in the resulting array is equal to 0
only if the same bit is equal to 0 in both c1 and c2.
compare(c1, c2) Returns:
o equal if each bit of c1 is equal to the corresponding bit of c2;
o less if c1 and c2 are not equal and, for each bit equal to 1 in
c1, the corresponding bit in c2 is also equal to 1;
o greater if c1 and c2 are not equal and, for each bit equal to 1 in
c2, the corresponding bit in c1 is also equal to 1;
o incomparable, otherwise.
saturated(c) Returns TRUE, if more than
RNFD_CFRC_SATURATION_THRESHOLD of the bits in c are equal to 1, or
FALSE, otherwise.
5. RPL Router Behavior
Although RNFD operates largely independently of RPL, it does need
interact with RPL and the overall protocol stack. These interactions
are described next and can be realized, for instance, by means of
event triggers.
5.1. Joining a DODAG Version and Changing the RNFD Role
Whenever RPL running at a node joins a DODAG Version, RNFD -- if
active -- MUST assume for the node the role of Acceptor.
Accordingly, it MUST set its LORS to "UP" and its PositiveCFRC and
NegativeCFRC to zero().
The role MAY then change between Acceptor and Sentinel at any time.
However, while a switch from Sentinel to Acceptor has no
preconditions, for a switch from Acceptor to Sentinel to be possible,
_all_ of the following conditions MUST hold:
Iwanicki Expires March 12, 2025 [Page 12]
Internet-Draft RNFD September 2024
1. LORS is "UP";
2. saturated(PositiveCFRC) is FALSE;
3. a neighbor entry for the DODAG root is present in RPL's DODAG
parent set;
4. the neighbor is considered reachable via its link-local IPv6
address.
A role change also REQUIRES appropriate updates to LORS and CFRCs, so
that the node is properly accounted for. More specifically, when
changing its role from Acceptor to Sentinel, the node MUST add itself
to its PositiveCFRC as follows. It MUST generate a new CFRC value,
selfc = self(), and MUST replace its PositiveCFRC, denoted oldpc,
with newpc = merge(oldpc, selfc). In contrast, the effects of a
switch from Sentinel to Acceptor vary depending on the node's value
of LORS before the switch:
o for "GLOBALLY DOWN", the node MUST NOT modify its LORS,
PositiveCFRC, and NegativeCFRC;
o for "LOCALLY DOWN", the node MUST set its LORS to "UP" but MUST
NOT modify its PositiveCFRC and NegativeCFRC;
o for "UP" and "SUSPECTED DOWN", the node MUST set its LORS to "UP",
MUST NOT modify it PositiveCFRC, but MUST add itself to
NegativeCFRC, that is, replace its NegativeCFRC, denoted oldnc,
with newnc = merge(oldnc, selfc), where selfc is the counter
generated with self() when the node last added itself to its
PositiveCFRC.
5.2. Detecting and Verifying Problems with the DODAG Root
Only nodes that are Sentinels take active part in detecting crashes
of the DODAG Root; Acceptors just disseminate their observations,
reflected in the CFRCs.
The DODAG root monitoring SHOULD be based on both internal inputs,
notably the values of CFRCs and LORS, and external inputs, such as
triggers from RPL and other protocols. External input monitoring
SHOULD be performed preferably in a reactive fashion, also
independently of RPL, and at both data plane and control plane. In
particular, it is RECOMMENDED that RNFD be directly notified of
events relevant to the routing adjacency maintenance mechanisms on
which RPL relies, such as Layer 2 triggers [RFC5184] or the Neighbor
Unreachability Detection [RFC4861] mechanism. In addition, depending
on the underlying protocol stack, there may be other potential
Iwanicki Expires March 12, 2025 [Page 13]
Internet-Draft RNFD September 2024
sources of such events, for instance, neighbor communication
overhearing. In any case, only events concerning the DODAG root need
be monitored. For example, RNFD can conclude that there may be
problems with the DODAG root if it observes a lack of multiple
consecutive L2 acknowledgments for packets transmitted by the node
via the link to the DODAG root. Internally, in turn, it is
RECOMMENDED that RNFD take action whenever there is a change to its
local CFRCs, so that a node can have a chance to participate in
detecting potential problems even when normally it would not exchange
packets over the link with the DODAG root during some period. In
particular, RNFD SHOULD conclude that there may be problems with the
DODAG root, when the fraction value(NegativeCFRC)/value(PositiveCFRC)
has grown by at least RNFD_SUSPICION_GROWTH_THRESHOLD since the node
last set its LORS to "UP".
Whenever having its LORS set to "UP" RNFD concludes -- based on
either external or internal inputs -- that there may be problems with
the link with the DODAG root, it MUST set its LORS to either
"SUSPECTED DOWN" or, as an optimization, to "LOCALLY DOWN".
The "SUSPECTED DOWN" value of LORS is temporary: its aim is to give
RNFD an additional opportunity to verify whether the link with the
DODAG root is indeed down. Depending on the outcome of such
verification, RNFD MUST set its LORS to either "UP", if the link has
been confirmed not to be down, or "LOCALLY DOWN", otherwise. The
verification can be performed, for example, by transmitting RPL DIS
or ICMPv6 Echo Request messages to the DODAG root's link-local IPv6
address and expecting replies confirming that the root is up and
reachable through the link. Care SHOULD be taken not to overload the
DODAG root with traffic due to simultaneous probes, for instance,
random backoffs can be employed to this end. It is RECOMMENDED that
the "SUSPECTED DOWN" value of LORS is attained and verification takes
place if RNFD's conclusion on the state of the DODAG root is based
only on indirect observations, for example, the aforementioned growth
of the CFRC values. In contrast, for direct observations, such as
missing L2 acknowledgments, the verification MAY be skipped, with the
node's LORS effectively changing from "UP" directly to "LOCALLY
DOWN".
For consistency with RPL, when detecting potential problems with the
DODAG root, RNFD also MUST make use of RPL's independent knowledge.
More specifically, a node MUST switch its LORS from "UP" or
"SUSPECTED DOWN" directly to "LOCALLY DOWN" if a neighbor entry for
the DODAG root is removed from RPL's DODAG parent set or the neighbor
ceases to be considered reachable via its link-local IPv6 address.
Finally, while having its LORS already equal to "LOCALLY DOWN", a
node may make an observation confirming that its link with the DODAG
Iwanicki Expires March 12, 2025 [Page 14]
Internet-Draft RNFD September 2024
root is actually up. In such a case, it SHOULD set its LORS back to
"UP" but MUST NOT do this before the previous conditions 2-4
necessary for a node to change its role from Acceptor to Sentinel all
hold (see Section 5.1).
To appropriately account for the node's observations on the state of
the DODAG root, the aforementioned LORS transitions are accompanied
by changes to the node's local CFRCs as follows. Transitions between
"UP" and "SUSPECTED DOWN" do not affect any of the two CFRCs. During
a switch from "UP" or "SUSPECTED DOWN" to "LOCALLY DOWN", in turn,
the node MUST add itself to its NegativeCFRC, as explained
previously. By symmetry, a transition from "LOCALLY DOWN" to "UP"
REQUIRES the node to add itself to its PositiveCFRC, again, as
explained previously.
Such changes to a node's local CFRCs, if performed repeatedly due to
incorrect decisions regarding the status of the node's link with the
DODAG root, may lead to those CFRCs becoming saturated. An
implementation SHOULD thus try to minimize false-positive transitions
from "UP" and "SUSPECTED DOWN" to "LOCALLY DOWN". The exact approach
depends on the specific solutions employed for assessing the state of
a link. For instance, one can utilize additional mechanisms for
increasing the confidence of individual decisions, such as during the
aforementioned verification in the "SUSPECTED DOWN" state, or can
limit the number of transitions per node, possibly in an adaptive
fashion.
5.3. Disseminating Observations and Reaching Agreement
Nodes disseminate their observations by exchanging CFRCs in the RNFD
Options embedded in link-local RPL control messages, notably DIOs and
DISs. When processing such a received option, a node -- acting as
Sentinel or Acceptor -- MUST update its PositiveCFRC and NegativeCFRC
to respectively newpc = merge(oldpc, recvpc) and newnc = merge(oldnc,
recvnc), where oldpc and oldnc are the values of the node's
PositiveCFRC and NegativeCFRC before the update, while recvpc and
recvnc are the received values of option fields PosCFRC and NegCFRC,
respectively.
In effect, the node's value of fraction
value(NegativeCFRC)/value(PositiveCFRC) may change. If the fraction
reaches at least RNFD_CONSENSUS_THRESHOLD (with value(PositiveCFRC)
being greater than zero), then the node consents on the DODAG root
being down. Accordingly, it MUST change its LORS to "GLOBALLY DOWN"
and set its PositiveCFRC and NegativeCFRC both to infinity().
The "GLOBALLY DOWN" value of LORS is terminal: the node MUST NOT
change it and MUST NOT modify its CFRCs until it joins a new DODAG
Iwanicki Expires March 12, 2025 [Page 15]
Internet-Draft RNFD September 2024
Version. With this value of LORS, RNFD at the node MUST also prevent
RPL from having any DODAG parent and advertising any Rank other than
INFINITE_RANK.
Since the RNFD Option is embedded, among others, in RPL DIO control
messages, updates to a node's CFRCs may affect the sending schedule
of these messages, which is driven by the DIO Trickle timer
[RFC6206]. It is RECOMMENDED to use for RNFD a dedicated Trickle
timer, different from RPL's original DIO Trickle timer. In such a
setting, whenever the dedicated timer fires and no DIO message
containing the RNFD Option has been sent to the link-local all-RPL-
nodes multicast IPv6 address since the previous firing, the node
sends a DIO message containing the RNFD Option to the address. In
contrast, in the absence of the dedicated Trickle timer for RNFD, an
implementation SHOULD ensure that the RNFD Option is present in
multicast DIO messages sufficiently often to quickly propagate
changes to the node's CFRCs, and notably as soon as possible after a
reset of the timer triggered by RNFD. In the remainder of this
document, we will refer to the Trickle timer utilized by RNFD --
either the dedicated one or RPL's original one, depending on the
implementation -- simply as "Trickle timer". In particular, a node
MUST reset its Trickle timer when it changes its LORS to "GLOBALLY
DOWN", so that information about the detected crash of the DODAG root
is disseminated in the DODAG fast. Likewise, a node SHOULD reset its
Trickle timer when any of its local CFRCs changes significantly.
5.4. DODAG Root's Behavior
The DODAG root node MUST assume the role of Acceptor in RNFD and MUST
NOT ever switch this role. It MUST also monitor its LORS and local
CFRCs, so that it can react to various events.
To start with, the DODAG root MUST generate a new DODAG Version,
thereby restarting the protocol, if it changes its LORS to "GLOBALLY
DOWN", which may happen when the root has restarted after a crash or
the nodes have falsely detected its crash. It MAY also generate a
new DODAG Version if fraction value(NegativeCFRC)/value(PositiveCFRC)
approaches RNFD_CONSENSUS_THRESHOLD, so as to avoid potential
interruptions to routing.
Furthermore, the DODAG root SHOULD either generate a new DODAG
Version or increase the bit length of its CFRCs if
saturated(PositiveCFRC) becomes TRUE. This is a self-regulation
mechanism that helps adjust the CFRCs to a potentially large number
of Sentinels (see Section 6.1).
Iwanicki Expires March 12, 2025 [Page 16]
Internet-Draft RNFD September 2024
In general, issuing a new DODAG Version effectively restarts RNFD.
The DODAG root MAY thus perform this operation also in other
situations.
5.5. Activating and Deactivating the Protocol on Demand
RNFD can be activated and deactivated on demand, once per DODAG
Version. The particular policies for activating and deactivating the
protocol are outside the scope of this document. However, the
activation and deactivation SHOULD be done at the DODAG root node;
other nodes MUST comply.
More specifically, when a non-root node joins a DODAG Version, RNFD
at the node is initially inactive. The node MUST NOT activate the
protocol unless it receives for this DODAG Version a valid RNFD
Option containing some CFRCs, that is, having its Option Length field
positive. In particular, if the option accompanies the message that
causes the node to join the DODAG Version, the protocol MUST be
active from the moment of the joining. RNFD then remains active at
the node until it is explicitly deactivated or the node joins a new
DODAG Version. An explicit deactivation MUST take place when the
node receives an RNFD Option for the DODAG Version with no CFRCs,
that is, having its Option Length field equal to zero. When
explicitly deactivated, RNFD MUST NOT be reactivated unless the node
joins a new DODAG Version. In particular, when the first RNFD Option
received by the node has its Option Length field equal to zero, the
protocol MUST remain deactivated for the entire time the node belongs
to the current DODAG Version.
When RNFD at a node is initially inactive for a DODAG Version, the
node MUST NOT attach any RNFD Option to the messages it sends (in
particular, because it may not know the desired CFRC length -- see
Section 5.6). When the protocol has been explicitly deactivated, the
node MAY also decide not to attach the option to its outgoing
messages. However, it is RECOMMENDED that it sends sufficiently many
messages with the option to the link-local all-RPL-nodes multicast
IPv6 address to allow its neighbors to learn that RNFD has been
deactivated in the current DODAG version. In particular, it MAY
reset its Trickle timer to this end but also MAY use some reactive
mechanisms, for example, replying with a unicast DIO or DIS
containing the RNFD Option with no CFRCs to a message from a neighbor
that contains the option with some CFRCs, as such a neighbor appears
not to have learned about the deactivation of RNFD.
Iwanicki Expires March 12, 2025 [Page 17]
Internet-Draft RNFD September 2024
5.6. Processing CFRCs of Incompatible Lengths
The merge() and compare() operations on CFRCs require both arguments
to be compatible, that is, to have the same bit length. However, the
processing rules for the RNFD Option (see Section 4.2) do not
necessitate this. This fact is made use of not only in the
mechanisms for activating and deactivating the protocol (see
Section 5.5), but also in mechanisms for dynamic adjustments of
CFRCs, which aim to enable deployment-specific policies (see
Section 6.1). A node thus MUST be prepared to receive the RNFD
Option with fields PosCFRC and NegCFRC of a different bit length than
the node's own PositiveCFRC and NegativeCFRC. Assuming that it has
RNFD active and that fields PosCFRC and NegCFRC in the option have a
positive length, the node MUST react as follows.
If the bit length of fields PosCFRC and NegCFRC is the same as that
of the node's local PositiveCFRC and NegativeCFRC, then the node MUST
perform the merges, as detailed previously (see Section 5.3).
If the bit length of fields PosCFRC and NegCFRC is smaller than that
of the node's local PositiveCFRC and NegativeCFRC, then the node MUST
ignore the option and MAY reset its Trickle timer.
If the bit length of fields PosCFRC and NegCFRC is greater than that
of the node's local PositiveCFRC and NegativeCFRC, then the node MUST
extend the bit length of its local CFRCs to be equal to that in the
option and set the CFRCs as follows:
o If the node's LORS is "GLOBALLY DOWN", then both its local CFRCs
MUST be set to infinity().
o Otherwise, they both MUST be set to zero(), and the node MUST
account for itself in so initialized CFRCs. More specifically, if
the node is Sentinel, then it MUST add itself to its PositiveCFRC,
as detailed previously. In addition, if its LORS is "LOCALLY
DOWN", then it MUST also add itself to its NegativeCFRC, again, as
explained previously. Finally, the node MUST perform merges of
its local CFRCs and the ones received in the option (see
Section 5.3) and MAY reset its Trickle timer.
In contrast, if the node is unable to extend its local CFRCs, for
example, because it lacks resources, then it MUST stop participating
in RNFD, that is, until it joins a new DODAG Version, it MUST NOT
send the RNFD Option and MUST ignore this option in received