-
Notifications
You must be signed in to change notification settings - Fork 490
sntp: fix the handling of the rtc counter wrapping #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
#define tim_ref (RTC.SCRATCH[2]) | ||
// Calibration value | ||
#define cal (RTC.SCRATCH[3]) | ||
|
||
// To protect access to the above. | ||
xSemaphoreHandle sntp_sem = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be declared static, as it's only accessed inside this source file?
Thank you for the feedback. Changed the semaphore to be static. I have since tried other ways to improve this code and think more could be done, but this PR fixes some significant issues. Fwiw I found the RTC and/or the calibrations to be rather unstable and gave up on using sntp for obtaining times for a data logging application and will log the rtc count and the calibration and have the time estimated when analysing the data. |
Just add a bit more. I also tried to use the sntp times to estimate the calibration, and only using the internal calibration to reset this if the time step was large. The corrections to the calibration were filtered and it did converge to give a lot better time than just using the internal calibration. However, it still received impulse errors larger than a second and then took time to settle. When there was a network outage it drifted significantly - the error had typically not settled so it had little chance. I guess it needed to also take into account the jitter etc. Also resetting the time with each sntp response causes jumps in the time, and these can be backwards, so I think that is the wrong approach. When the time difference is within a small bounds it should probably just be making small periodic adjustments to avoid jumps and to avoid going backwards. Surely there is a lot of state-of-the-art NTP code that could be followed. For my application at hand it seemed to be less demanding on the node to just log the times when the node interacts with the server etc, and it might even have GPS events logged with a time etc in some deployments. A smooth time line might then be fitted to the events during analysis. |
Thanks for this fix and the comments. I think the suggestions you make sound sensible, although I agree that for a lot of purposes this is good enough. I'm keen to merge this, but I just want to check first if @doragasu - who contributed the original SNTP implementation - has any comments. |
I had a quick look and it looks good, thanks for the work! I already knew that synchronizing access to time keeping structures was needed, but wasn't brave enought to try because I don't know newlib internals, and I thought that it provided a built-in synchronization mechanism (instead of having to use semaphores). But after reading your patch, I suppose that's not the case. Other than that, I'd like to throw a comment. Inside If I get some time I'll try reviewing the code more in-depth and testing it. |
This reworks the accounting of the time to address the wrapping of the rtc counter. It currently assumes that the time is accessed more frequently than the counter would wrap. It also adds a semaphore to synchronize access to the time storage.
Thank you for the feedback. @doragasu the mutex protects the state that handles the counter wrapping too, so if the RTC counter were read before taking the mutex then another task might win a race to update the tim_ref in |
@Iosen I understand, thanks for explaining. If I read the timer inside OK, having that cleared, I have absolutely no problem with the changes, they look good to me. BTW, when I did some tests of the implementation accuracy, I saw that it is heavily dependent on the network you use. When I did tests at my worplace, I got much lower correction values each time SNTP was updated that when I did the same tests at home (my home network is slower and has greater ping). As SNTP does not take RTT into account, I suppose this is expected. I don't think it makes too much sense trying to improve this implementation accuracy: if you want better accuracy maybe the correct thing would be implementing a full blown NTP client. |
About the accuracy of the algorithm, I was looking at the relevant RFC4330 and in section 5 there is a simple algorithm that compensates for the round trip time of the packets. I've hacked together a proof of concept and it seems to work well here: I converted every timestamp to int64_t and changed sntp_update_rtc() to take an int64_t offset instead. However I deeply changed also the source of sntp.c because it's a mess of #ifdefs and I wanted to understand the code before doing anything, so I'm afraid the changes I made are not directly appliable. |
After a lot of experiments this is how it converges with an update interval of 15 sec (Delay and Offset are in usecs):
While the offsets become very small it seems to me that the embedded RTC is quite unstable. The calibration value keeps increasing and still cannot compensate the offsets; and then suddenly the same calibration value is too big. With longer update intervals the situation gets worse. Besides my code updates the RTC.SCRATCH[0..1] locations directly so the clock is not monotonic. I tried to adjust gradually but with those fluctuations it doesn't really work. For the record this is how the sntp_update_rtc() function looks right now:
|
I see there is a fix for the race conditions in the sntp_fun, but nobody merged them? |
This reworks the accounting of the time to address the wrapping of the
rtc counter. It currently assumes that the time is accessed more
frequently than the counter would wrap. It also adds a semaphore to
synchronize access to the time storage.