Skip to content

Conversation

@dterrahe
Copy link
Member

@dterrahe dterrahe commented Nov 25, 2025

We sometimes find elusive bugs caused by signals being connected to handlers that don't provide the correct return value, i.e. void instead of gboolean. GTK/GObject don't have built-in signature checking when connecting signals, as this is not really natively supported in c. This PR adds such checking by extensively using the _Generic construct. For this it is necessary to separately provide the expected handler signatures for each type of signal (see the second commit which adds this to gtk.h). Since some signals expect different handlers depending on which type of object they are sent for, there is some loss of specificity, but for the common cases, for example button-press and clicked, this has already helped me find a couple of mismatches (that are now well-hidden in this rather large PR).

It is common practice to send objects as user_data that are implicitly converted to gpointer and then upon receipt in the handler explicitly converted back to the real type (obviously without any type checking happening). These now need to be declared as their real type in the handler prototype or they'll throw an error. Similarly, generic objects (like widgets) that get a signal connected are often declared as their real type (for example button) in the handler prototype. These implicit conversions now need to be made explicit (either when connecting or when receiving the signal).

So for the moment many errors are generated. Until these are largely resolved, it is easier to ignore them at compile time (so the compile is allowed to finish) and only print them at runtime. With a decent IDE it is then faster to fix a number of them at the same time. Obviously this has the drawback that one actually has to activate all the signal_connects (by opening all dialogs etc) otherwise the type mismatch will still be ignored.

Eventually, when all existing mismatches are resolved, this can be made into a compile time fail (so that CI will also catch it) by uncommenting the _Static_assert. All the checks are already done at compile time only and don't generate any code, except when an error needs to be printed.

Obviously this PR is already rather large. Every signal_connect needs to be touched, because the unnecessary casts to G_OBJECT used to hide the real object type making it impossible to check for a signature with it as the first parameter in _Generic. It may not be too useful to complete the whole exercise before GTK4, although getting rid of all the remaining false positives before the end of porting may help find real issues.

@dterrahe dterrahe marked this pull request as draft November 25, 2025 10:28
@dterrahe dterrahe force-pushed the callback_check branch 2 times, most recently from 936cac1 to f0c920c Compare November 25, 2025 16:48
@TurboGit TurboGit added this to the 5.6 milestone Nov 25, 2025
@TurboGit
Copy link
Member

Sounds like a nice project :)

@dterrahe
Copy link
Member Author

Sounds like a nice project :)

Of course when I started I thought this might be worth spending a few hours on...

I would appreciate if you had a look at the code "style" changes in the first commit, since many more of these are needed to complete this. There would be some value in merging the second commit in disabled state so it can be enabled at will to check new code. But of course the real value only comes when the static assert is enabled (maybe only in Debug mode, since we have that running on CI). For that I could really use some help. Most of the changes are quite mechanical (just some basic understanding needed and a willingness to read the changes already made as examples).

@jenshannoschwalm
Copy link
Collaborator

@dterrahe for me this seems to be another big round of important code maintenance work.

I dont have a lot on my personal agenda and can't contribute to the gtk4 project (not knowing howto do that well). I could take this pr and continue your work if you would like me to do so you can concentrate on the gtk thing. Just let me know...

Remove the G_OBJECT(wrapper) since it isn't necessary and hides the real type
void(*)() fails callback check; use void(*)(void)
@dterrahe
Copy link
Member Author

Some progress and some blockers.

The static compile time identification of the function signature (using _Generic) needs an exact match. So void(*)(gpointer) can't be used to match void(*)(GtkWidget*) for example. This makes it necessary to list all the possible function signatures that might be encountered in the wild. Using __typeof__(instance) and __typeof(user_data) adds level of flexibility, but still requires the exact type of Widget being connected to match the type of widget received by the handler. Even if it is not used in the handler at all or if and implicit conversion is handy. I've now added a bunch of GtkWidget types as acceptable "instances" in all cases (if they're used incorrectly you'll generally get a warning at runtime anyway, rather than a crash) and I'm allowing any type that's attached as data to be received as a gpointer. This reduces the number of false positives significantly and makes it much easier to find the real problem cases (and the reason I started this) where the return type is wrong. Most of those have been fixed now and the number of (probably mostly false) positives reported at runtime (just start, not opening dialogs etc) is down to 58.

It might still be nice to make all the type conversions explicit over time and then maybe these fallbacks could/should be removed. Also, the macros involved are by now ridiculous in size, when expanded,

It turns out that the static checking only works in llvm. gcc treats the const ints required_signature and found_signature as run-time const. Which means we can't fail the compile (in CI) if a wrong signature is introduced and any run-time reporting would generate code for each and every call of g_signal_connect (and still only signal problems when they are all covered in manual testing). So we have to think how we'd want to productionize this. The Debug variant built in CI is current gcc; maybe there are no obstacles to switching it to llvm @victoryforce?

Anyway, this is a little premature, because the other issue that surfaced is that when a parameter in the signature is declared const there is no way to automatically match it with an argument that is non-const. The most common case is when we pass a module *self and receive it as a const module *self. __typeof__(data) will not match in _Generic and neither will const __typeof__(data). There's no way to convert/cast self in the call without respecifying the whole type (const dt_iop_module_t *) and doing that everywhere means that we effectively disable type checking completely because the compiler assumes we know what we're doing even if we incorrectly cast wildly different types. In C++ this issue is somewhat addressed, but for C the language designers basically gave up because there's no compatible way to fix this problem and make const safe and useful. The only other "solution" is to remove most of the const specifiers (for user_data parameters) that have been painstakingly added, but I expect Pascal to have an issue with that. So it seems unlikely we'll get rid of the remaining false positives (in fact, they may increase) and the usability of this feature becomes doubtful.

Sigh. OK, after all that I now realise that I could have achieved my original goal of identifying incorrect signal handler returns types with just this:

  guint signal_id = g_signal_lookup(signal, G_OBJECT_TYPE(instance)); \
  GSignalQuery type_query; \
  g_signal_query(signal_id, &type_query); \
  if((type_query.return_type  == G_TYPE_BOOLEAN) \
      ^ (_Generic((DISABLINGPREFIX##c_handler), \
                  gboolean(*)() : TRUE, void(*)() : FALSE, gpointer: FALSE))) \
    dt_print_nts_ext("%s:%d: connecting signal %s to %s found wrong return\n", __FILE__, __LINE__, signal, #c_handler); \

This still (after all the fixes in the first commit) shows this

/src/iop/temperature.c:2149: connecting signal button-release-event to G_CALLBACK(temp_label_click) found wrong return
/src/iop/colorequal.c:3037: connecting signal size-allocate to G_CALLBACK(_area_size_callback) found wrong return
/src/iop/overlay.c:1085: connecting signal draw to G_CALLBACK(_draw_thumb) found wrong return
/src/iop/channelmixerrgb.c:4699: connecting signal button-press-event to G_CALLBACK(_commit_profile_callback) found wrong return
/src/iop/channelmixerrgb.c:4707: connecting signal button-press-event to G_CALLBACK(_run_profile_callback) found wrong return
/src/iop/colorbalance.c:1901: connecting signal button-release-event to G_CALLBACK(_cycle_layout_callback) found wrong return
/src/libs/tools/viewswitcher.c:266: connecting signal enter-notify-event to G_CALLBACK(_lib_viewswitcher_enter_leave_notify_callback) found wrong return
/src/libs/tools/viewswitcher.c:267: connecting signal leave-notify-event to G_CALLBACK(_lib_viewswitcher_enter_leave_notify_callback) found wrong return
/src/libs/tools/viewswitcher.c:266: connecting signal enter-notify-event to G_CALLBACK(_lib_viewswitcher_enter_leave_notify_callback) found wrong return
/src/libs/tools/viewswitcher.c:267: connecting signal leave-notify-event to G_CALLBACK(_lib_viewswitcher_enter_leave_notify_callback) found wrong return

@victoryforce
Copy link
Collaborator

The Debug variant built in CI is current gcc; maybe there are no obstacles to switching it to llvm @victoryforce?

Absolutely, this is just one run in the CI matrix to make sure we haven't broken the Debug build somehow. If it makes more sense to use another compiler here - why not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants