You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in the Binary Protocol, the Error flag was intended to be used for payload validation only. This proposal clarifies what this means exactly and aims to strongly discourage use of this bit to signal anything else, including device error states or errors in execution of long-actions.
Motivation
The state of error reporting and handling in the Harp protocol is not great. Currently, the only explicit error reporting information is the Error flag in the message format. Because of this, there has been confusion over when this bit should be used. Specifically, some devices have used this bit to signal internal error state or failure to execute a long-action.
This is a problem because by default errors reported through this bit will cause an exception to be raised by the Device node in clients, and communication will terminate. If we start assuming that the state of device internal logic needs to be propagated and handled through this bit, this would greatly complicate error handling in workflows.
At best, new operators may need to be introduced. At worst, this will drive people to always turn on the IgnoreErrors property and start being blind to all kinds of validation errors which would then silently propagate unchecked.
The Error flag can be limited to a single bit in the message protocol if we assume the context required to interpret the Error can be derived exclusively from the message contents. This already works if we match message contents to the register specification in the device.yml and is what is used in the vast majority of cases.
If instead what we want is to communicate issues with the ongoing device state, then more information than just 1-bit is definitely required. There is currently no way to map a 1-bit flag to a meaningful error message on a register-by-register basis. We would need at least an error code, and ideally a documented register address containing more information about error states. This is why we do not think it is appropriate to try and force logical errors into this single bit. For instance, we may not be able to distinguish between a logical error and a payload type, or out-of-bounds error.
The other problem of using this bit to communicate device state errors, is there is no way to specify and document this behavior in the device.yml metadata. While format errors such as message and payload types, or allowed value ranges, are declared in the register metadata, the information about special uses of the error bit cannot easily be made explicit anywhere, leaving people wondering whether they need to worry about logical errors.
By clarifying that error states in the device must be communicated via their own registers, we will be promoting explicitness in the protocol, and encouraging device developers to create specific group masks with error codes, and error register documentation for any known device-specific failure modes.
Detailed Design
Improve the binary protocol specification to clarify that the Error flag should not be used for anything else other than immediate validation of payload conformity. Adapting from the original specification (annotated):
The field [Command] has an Error flag on the 4th least significant bit. When this bit is set it means that an error has occurred. THIS IS TOO GENERAL, WE NEED TO CONSTRAIN WHICH ERRORS ARE ALLOWED HERE
Examples of possible errors can be: THIS SHOULD BE A COMPLETE LIST OF ERRORS
The controller tries to read from (or write to) a register that doesn’t exist; THIS IS FINE
The controller tries to write on a read-only register. THIS IS MISSING
The controller tries to write invalid data to a certain register; NEED TO CLARIFY THE MEANING OF INVALID DATA
The [PayloadType] doesn’t match the target register specification. THIS IS FINE
The length of the payload doesn't match the target register specification. THIS IS MISSING
As noted in bold the current specification is too ambiguous. The proposal is to constrain the Error flag to only signal payload compliance with the device schema so that errors signaled by this bit are only protocol errors.
As for the examples, even if we make these examples a complete list, we would still have an issue with the meaning of "invalid data". From the context in the above specification, we propose at least the following clarifications:
CONSIDER using the Error flag only for Reply messages.
DO NOT set the Error flag on messages of type Event.
DO NOT set the Error flag on messages sent from the device as a result of long-actions triggered by a command, or any invalid state entered by the device as a result of external or internal actions not caused by an invalid command.
A final clarification to be discussed is how we should recommend handling situations in which data might be "valid" contingent on the state of other registers, or other internal device state. As a general principle, this should be avoided as much as possible, as it requires the controller to keep track of device state.
However, this is sometimes unavoidable, especially for actions which may require configuration of a number of registers prior to the action being triggered (e.g. starting a pulse train, moving a motor), and where the action might be invalid only for specific combinations of register values (i.e. each register value may be "valid" by itself, but not the specific combination of all the register values taken together).
We propose here that this situation should NOT be handled by replying to the write command with a message flagged with the Error bit.
Instead, this should either be handled by sending a message from another register, or replying to the write with an alternative register value which is allowed (e.g. consider the case where setting an invalid frame rate results in setting the register to the closest allowed frame rate).
If sending a message from another register, the documentation of that register should clearly document the values indicating error, e.g. through a group mask.
Drawbacks
Waiting for review.
Alternatives
The impact of not clarifying expected behavior of the Error flag is proliferation of unexpected behavior in Reply messages, undocumented failure modes, and growing complexity in handling device event streams.
A complementary proposal might be to introduce a specific register address for device error handling which could be monitored by the generic Device node and resolved by the controller to throw specific exceptions following lookup using the device.yml.
Unresolved Questions
None at the moment.
Design Meetings
The text was updated successfully, but these errors were encountered:
Summary
As described in the Binary Protocol, the
Error
flag was intended to be used for payload validation only. This proposal clarifies what this means exactly and aims to strongly discourage use of this bit to signal anything else, including device error states or errors in execution of long-actions.Motivation
The state of error reporting and handling in the Harp protocol is not great. Currently, the only explicit error reporting information is the
Error
flag in the message format. Because of this, there has been confusion over when this bit should be used. Specifically, some devices have used this bit to signal internal error state or failure to execute a long-action.This is a problem because by default errors reported through this bit will cause an exception to be raised by the
Device
node in clients, and communication will terminate. If we start assuming that the state of device internal logic needs to be propagated and handled through this bit, this would greatly complicate error handling in workflows.At best, new operators may need to be introduced. At worst, this will drive people to always turn on the
IgnoreErrors
property and start being blind to all kinds of validation errors which would then silently propagate unchecked.The
Error
flag can be limited to a single bit in the message protocol if we assume the context required to interpret theError
can be derived exclusively from the message contents. This already works if we match message contents to the register specification in thedevice.yml
and is what is used in the vast majority of cases.If instead what we want is to communicate issues with the ongoing device state, then more information than just 1-bit is definitely required. There is currently no way to map a 1-bit flag to a meaningful error message on a register-by-register basis. We would need at least an error code, and ideally a documented register address containing more information about error states. This is why we do not think it is appropriate to try and force logical errors into this single bit. For instance, we may not be able to distinguish between a logical error and a payload type, or out-of-bounds error.
The other problem of using this bit to communicate device state errors, is there is no way to specify and document this behavior in the
device.yml
metadata. While format errors such as message and payload types, or allowed value ranges, are declared in the register metadata, the information about special uses of the error bit cannot easily be made explicit anywhere, leaving people wondering whether they need to worry about logical errors.By clarifying that error states in the device must be communicated via their own registers, we will be promoting explicitness in the protocol, and encouraging device developers to create specific group masks with error codes, and error register documentation for any known device-specific failure modes.
Detailed Design
Improve the binary protocol specification to clarify that the
Error
flag should not be used for anything else other than immediate validation of payload conformity. Adapting from the original specification (annotated):The field [
Command
] has an Error flag on the 4th least significant bit. When this bit is set it means that an error has occurred.THIS IS TOO GENERAL, WE NEED TO CONSTRAIN WHICH ERRORS ARE ALLOWED HERE
Examples ofpossible errors can be: THIS SHOULD BE A COMPLETE LIST OF ERRORSPayloadType
] doesn’t match the target register specification. THIS IS FINEAs noted in bold the current specification is too ambiguous. The proposal is to constrain the
Error
flag to only signal payload compliance with the device schema so that errors signaled by this bit are only protocol errors.As for the examples, even if we make these examples a complete list, we would still have an issue with the meaning of "invalid data". From the context in the above specification, we propose at least the following clarifications:
Error
flag only forReply
messages.Error
flag on messages of typeEvent
.Error
flag on messages sent from the device as a result of long-actions triggered by a command, or any invalid state entered by the device as a result of external or internal actions not caused by an invalid command.A final clarification to be discussed is how we should recommend handling situations in which data might be "valid" contingent on the state of other registers, or other internal device state. As a general principle, this should be avoided as much as possible, as it requires the controller to keep track of device state.
However, this is sometimes unavoidable, especially for actions which may require configuration of a number of registers prior to the action being triggered (e.g. starting a pulse train, moving a motor), and where the action might be invalid only for specific combinations of register values (i.e. each register value may be "valid" by itself, but not the specific combination of all the register values taken together).
We propose here that this situation should NOT be handled by replying to the write command with a message flagged with the
Error
bit.Instead, this should either be handled by sending a message from another register, or replying to the write with an alternative register value which is allowed (e.g. consider the case where setting an invalid frame rate results in setting the register to the closest allowed frame rate).
If sending a message from another register, the documentation of that register should clearly document the values indicating error, e.g. through a group mask.
Drawbacks
Waiting for review.
Alternatives
The impact of not clarifying expected behavior of the
Error
flag is proliferation of unexpected behavior inReply
messages, undocumented failure modes, and growing complexity in handling device event streams.A complementary proposal might be to introduce a specific register address for device error handling which could be monitored by the generic
Device
node and resolved by the controller to throw specific exceptions following lookup using thedevice.yml
.Unresolved Questions
None at the moment.
Design Meetings
The text was updated successfully, but these errors were encountered: