Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/string/strdup/: STRNDUPA(): Reimplement in terms of strndupa(3) #1189

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alejandro-colomar
Copy link
Collaborator

@alejandro-colomar alejandro-colomar commented Jan 18, 2025

Cc: @chqrlie


Revisions:

v2
  • Implement STRDUPA() instead of a macro that checks if the argument is a string literal. This is what we want in the first place.
$ git range-diff master gh/sl sl 
1:  76257601 < -:  -------- lib/typetraits.h: is_string_literal(): Add macro
-:  -------- > 1:  dd4fa7bb lib/strdup/: STRDUPA(): Add macro
v2b
  • Fix commit message
$ git range-diff master gh/sl sl 
1:  dd4fa7bb ! 1:  6cc16778 lib/strdup/: STRDUPA(): Add macro
    @@ Metadata
     Author: Alejandro Colomar <[email protected]>
     
      ## Commit message ##
    -    lib/strdup/: STRDUPA(): Add macro
    +    lib/string/strdup/: STRDUPA(): Add macro
     
         This is like strdupa(3), but we make sure that the argument is a string
         literal, which makes sure that we won't have a stack overflow.
v3
  • Simplify STRNDUPA().
  • Make comment consistent with the one in STRNDUPA().
$ git range-diff master gh/sl sl 
1:  6cc16778 ! 1:  1838660c lib/string/strdup/: STRDUPA(): Add macro
    @@ lib/string/strdup/strdupa.h (new)
     +#include <string.h>
     +
     +
    -+// strdupa(3), but make sure that the argument is a string literal.
    ++// strdupa(3), but ensure that 's' is a string literal.
     +#define STRDUPA(s)  strdupa("" s "")
     +
     +
-:  -------- > 2:  9a4b3426 lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
v4
  • Implement strndupa(3), for musl.
$ git range-diff shadow/master gh/sl sl 
1:  1838660c = 1:  1838660c lib/string/strdup/: STRDUPA(): Add macro
-:  -------- > 2:  0e95f992 lib/string/strdup/strndupa.h: Add strndupa(3)
2:  9a4b3426 ! 3:  9227a343 lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
    @@ Metadata
      ## Commit message ##
         lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
     
    -    I don't know why I implemented it so complicated.  :|
    -
    -    Probably, I had implemented XSTRNDUP() recently, and I just had that
    -    too recent in memory.
    -
         Signed-off-by: Alejandro Colomar <[email protected]>
     
      ## lib/string/strdup/strndupa.h ##
    @@ lib/string/strdup/strndupa.h
      
      
     @@
    - 
    - #include <config.h>
    - 
    --#include <alloca.h>
      #include <string.h>
      
      #include "sizeof.h"
     -#include "string/strcpy/strncat.h"
      
      
    - // Similar to strndupa(3), but ensure that 's' is an array.
    + #ifndef  strndupa
    +@@
    + #endif
    + 
    + 
    +-// Similar to strndupa(3), but ensure that 's' is an array.
     -#define STRNDUPA(s)                                                           \
     -(                                                                             \
     -  STRNCAT(strcpy(alloca(strnlen(s, NITEMS(s)) + 1), ""), s)             \
     -)
    -+#define STRNDUPA(s)  strndupa(s, NITEMS(s))
    ++// strndupa(3), but ensure that 's' is an array.
    ++#define STRNDUPA(s)      strndupa(s, NITEMS(s))
      
      
      #endif  // include guard
v4b
  • Fix commit message.
$ git range-diff master gh/sl sl 
1:  1838660c = 1:  1838660c lib/string/strdup/: STRDUPA(): Add macro
2:  0e95f992 ! 2:  8a584bb3 lib/string/strdup/strndupa.h: Add strndupa(3)
    @@ Metadata
     Author: Alejandro Colomar <[email protected]>
     
      ## Commit message ##
    -    lib/string/strdup/strndupa.h: Add strndupa(3)
    +    lib/string/strdup/strndupa.h: strndupa(3): Add macro
     
         musl doesn't provide strndupa(3).
     
3:  9227a343 = 3:  f848a85d lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
v4c
  • Remove superfluous comment. Now after the simplification it's obvious.
$ git range-diff master gh/sl sl 
1:  1838660c = 1:  1838660c lib/string/strdup/: STRDUPA(): Add macro
2:  8a584bb3 = 2:  8a584bb3 lib/string/strdup/strndupa.h: strndupa(3): Add macro
3:  f848a85d ! 3:  3bece290 lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
    @@ lib/string/strdup/strndupa.h
     -(                                                                             \
     -  STRNCAT(strcpy(alloca(strnlen(s, NITEMS(s)) + 1), ""), s)             \
     -)
    -+// strndupa(3), but ensure that 's' is an array.
     +#define STRNDUPA(s)      strndupa(s, NITEMS(s))
      
      
v5
  • Drop STRDUPA(), since the PR that needs it might actually not need it.
  • Use 2 spaces.
$ git range-diff master gh/sl sl 
1:  1838660c < -:  -------- lib/string/strdup/: STRDUPA(): Add macro
2:  8a584bb3 = 1:  4ec34e64 lib/string/strdup/strndupa.h: strndupa(3): Add macro
3:  3bece290 ! 2:  f8f938a8 lib/string/strdup/strndupa.h: STRNDUPA(): Simplify implementation
    @@ lib/string/strdup/strndupa.h
     -(                                                                             \
     -  STRNCAT(strcpy(alloca(strnlen(s, NITEMS(s)) + 1), ""), s)             \
     -)
    -+#define STRNDUPA(s)      strndupa(s, NITEMS(s))
    ++#define STRNDUPA(s)  strndupa(s, NITEMS(s))
      
      
      #endif  // include guard

@alejandro-colomar

This comment was marked as outdated.

lib/typetraits.h Outdated Show resolved Hide resolved
@alejandro-colomar alejandro-colomar changed the title lib/typetraits.h: is_string_literal(): Add macro lib/string/strdup/: STRDUPA(): Add macro Jan 18, 2025
@alejandro-colomar alejandro-colomar added the Simpler A good issue for a new beginner label Jan 18, 2025
@alejandro-colomar alejandro-colomar force-pushed the sl branch 2 times, most recently from 9a4b342 to 9227a34 Compare January 19, 2025 15:10
@alejandro-colomar
Copy link
Collaborator Author

Huh! Alpine has strdupa(3) but not strndupa(3).

@chqrlie
Copy link

chqrlie commented Jan 19, 2025

Hello @alejandro-colomar,

I am not sure why you cc'd me on this PR... what puzzles me is the sheer amount of code for such simple tasks and the need for dubious and risky optimisations for utilities for which the only focus should be correctness and sturdiness, not efficiency. strdupa and friends do not seem a good approach to me. strdup and strndup along with explicit memory management seems easier to get right.

Best regards

Chqrlie.

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 19, 2025

Hi @chqrlie ,

I am not sure why you cc'd me on this PR...

I was curious about your opinion of "" s "" to make sure that a macro argument is a string literal. You, because of https://stackoverflow.com/questions/5691232/can-i-determine-if-an-argument-is-string-literal/5726814#comment108597005_5726814. I didn't like any of the approaches shown there, and think "" s "" should be enough (I couldn't come up with an argument that would break it), but since you found issues with a similar one, maybe you can review mine too. :)

what puzzles me is the sheer amount of code for such simple tasks and the need for dubious and risky optimisations for utilities for which the only focus should be correctness and sturdiness, not efficiency. strdupa and friends do not seem a good approach to me. strdup and strndup along with explicit memory management seems easier to get right.

The reason to avoid strdup(3) is to make sure that a function can never fail.

See #1188 (comment). I want to call STRDUPA() there.

Adding error handling would mean that we need to decide what to do on error, which isn't necessarily an easy decision.

Since strdupa(3) cannot fail, and should be safe for small string literals, it's actually easier to use than strdup(3) in this case. But yeah, we minimize the use of alloca(3). We have only 3 calls to STRNDUPA(), and I expect to only add a similar amount (or less) of STRDUPA() calls.

BTW, we only use STRNDUPA() with input from utmp(5) members, which are 32-byte arrays or so.

Best regards,
Alex

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 19, 2025

Another approach would be to use a compound literal (char []) {"" s ""} where we need a writable string literal, which except for having a shorter lifetime than strdupa(3), is equivalent to it. However, I find it more readable to use strdupa("" s "") (it's easier to grep(1)).

@chqrlie
Copy link

chqrlie commented Jan 19, 2025

I am not sure why you cc'd me on this PR...

I was curious about your opinion of "" s "" to make sure that a macro argument is a string literal. You, because of https://stackoverflow.com/questions/5691232/can-i-determine-if-an-argument-is-string-literal/5726814#comment108597005_5726814. I didn't like any of the approaches shown there, and think "" s "" should be enough (I couldn't come up with an argument that would break it), but since you found issues with a similar one, maybe you can review mine too. :)

I thought it might be related to this post on StackOverflow (among others...) and indeed it should be enough for this context. It would not catch pathological cases such as STRDUPA(""[0] + "Hello") which is still technically a string constant or STRDUPA(""[0] + p + *"") which can be used to wrap any char pointer.

what puzzles me is the sheer amount of code for such simple tasks and the need for dubious and risky optimisations for utilities for which the only focus should be correctness and sturdiness, not efficiency. strdupa and friends do not seem a good approach to me. strdup and strndup along with explicit memory management seems easier to get right.

The reason to avoid strdup(3) is to make sure that a function can never fail.
See #1188 (comment). I want to call STRDUPA() there.
Adding error handling would mean that we need to decide what to do on error, which isn't necessarily an easy decision.

If there is a reason for strdup() to fail, the same amount of memory allocated by strdupa() would probably crash too. It is better to test for failure and exit gracefully than to hope for the best and crash.

Another advantage of strdupa() is its inline implementation, which prevents C library hijacking, which is a potential problem for SUID programs. I'm not savvy enough to know if such attacks can be crafted, I hope SUID programs are protected from them, but static linking with the bare minimum support functions seems a good idea.

Since strdupa(3) cannot fail, and should be safe for small string literals, it's actually easier to use than strdup(3) in this case. But yeah, we minimize the use of alloca(3). We have only 3 calls to STRNDUPA(), and I expect to only add a similar amount (or less) of STRDUPA() calls.

You could also use local arrays where you copy the string literals or compound literals whose lifetime is more precisely defined.

BTW, we only use STRNDUPA() with input from utmp(5) members, which are 32-byte arrays or so.

The whole utmp(5) interface is quite obsolete, I am not sure what supersedes it in linux systems, and error prone too. These strings are not necessarily null terminated IIRC.

Always happy to help.
Cheers.

Chqrlie.

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 19, 2025

I thought it might be related to this post on StackOverflow (among others...) and indeed it should be enough for this context. It would not catch pathological cases such as STRDUPA(""[0] + "Hello") which is still technically a string constant or STRDUPA(""[0] + p + *"") which can be used to wrap any char pointer.

Heh, yeah, I guess it's good enough. We should be able to catch those in code review. :)

I eventually thought of implementing it as

#define STRDUPA(s)  STRNDUPA("" s "")

which would make sure that the input is also an array, which would reject these pathological cases, by requiring an array.
But I guess it's not worth it. (Or maybe it is...)

what puzzles me is the sheer amount of code for such simple tasks and the need for dubious and risky optimisations for utilities for which the only focus should be correctness and sturdiness, not efficiency. strdupa and friends do not seem a good approach to me. strdup and strndup along with explicit memory management seems easier to get right.

The reason to avoid strdup(3) is to make sure that a function can never fail.
See #1188 (comment). I want to call STRDUPA() there.
Adding error handling would mean that we need to decide what to do on error, which isn't necessarily an easy decision.

If there is a reason for strdup() to fail, the same amount of memory allocated by strdupa() would probably crash too. It is better to test for failure and exit gracefully than to hope for the best and crash.

Yeah, strdup(3) shouldn't fail either. But then, I would find it gross calling strdup(3) and not checking for errors, so I would probably add them anyway. And if we add the error handling, we still need to think on what to do in that case, and I don't think there's a good answer to that in this specific use case.

Plus, actually crashing might be a good thing to do here. If I were to call strdup(3) here, I would probably just exit(1) on error.

Plus, strdupa(3) makes it easier to justify why we add guarantees that the input is a string literal. With strdup(3), since we would have error handling anyway, we could just accept anything, which itself could prompt strdup(3) to actually fail.

Another advantage of strdupa() is its inline implementation, which prevents C library hijacking, which is a potential problem for SUID programs. I'm not savvy enough to know if such attacks can be crafted, I hope SUID programs are protected from them, but static linking with the bare minimum support functions seems a good idea.

Hmmm.

Since strdupa(3) cannot fail, and should be safe for small string literals, it's actually easier to use than strdup(3) in this case. But yeah, we minimize the use of alloca(3). We have only 3 calls to STRNDUPA(), and I expect to only add a similar amount (or less) of STRDUPA() calls.

You could also use local arrays where you copy the string literals or compound literals whose lifetime is more precisely defined.

Actually, I recently had a discussion with a WG14 member, and he showed me some danger with compound literals passed as arguments, in combination with GNU's ({}), which we use in this project in a few places.

#define overwrite(s)  \
({ \
        strcpy(s, "bar"); \
})


puts(overwrite((char []){"foo"}));

The lifetime of the CL has finished before puts(3) tries to read it, so UB. strdup(3) has a safer lifetime.

BTW, we only use STRNDUPA() with input from utmp(5) members, which are 32-byte arrays or so.

The whole utmp(5) interface is quite obsolete, I am not sure what supersedes it in linux systems,

Yes, utmp(5) is deprecated. There's utmpx(5) and utmps, and those are not necessarily deprecated. And they're mostly similar, I think. But yeah, we had some changes recently, and expect we'll do more in the future.

For now, I'm just cleaning up string and memory handling, to make code safer, without thinking too much on that.

and error prone too. These strings are not necessarily null terminated IIRC.

Yep, we're quite aware of the lack of null termination with utmp(5). I did some extensive fixes there, and came up with a conclusion:

For using those non-strings, always copy them into a proper string first, with STRNDUPA(), STRNDUP(), or STRNCAT() (depending on the lifetime we want, and if we had a buffer already). These macros make sure that the input is an array (this makes sure we don't pass random pointers to it, but actual utmp(5) fields, which are small arrays).

$ grep -rn 'STRNDUPA('
src/logoutd.c:57:	user = STRNDUPA(ut->ut_user);
src/logoutd.c:58:	line = STRNDUPA(ut->ut_line);
src/logoutd.c:228:			         STRNDUPA(ut->ut_user),
lib/string/strdup/strndupa.h:19:#define STRNDUPA(s)                                                           \
$ grep -rn 'STRNDUP('
lib/string/strdup/xstrndup.h:19:#define XSTRNDUP(s)                                                           \
lib/utmp.c:192:		*out = XSTRNDUP(ut->ut_host);
lib/utmp.c:261:		hostname = XSTRNDUP(ut->ut_host);
$ grep -rn 'STRNCAT('
src/logoutd.c:208:			STRNCAT(tty_name, ut->ut_line);
lib/string/strcpy/strncat.h:16:#define STRNCAT(dst, src)  strncat(dst, src, NITEMS(src))
lib/string/strdup/strndupa.h:21:	STRNCAT(strcpy(alloca(strnlen(s, NITEMS(s)) + 1), ""), s)             \
lib/string/strdup/xstrndup.h:21:	STRNCAT(strcpy(XMALLOC(strnlen(s, NITEMS(s)) + 1, char), ""), s)      \

We work with those strings, and eventually copy back into utmp(5) fields with STRNCPY(), which makes sure that the destination is an array (so we avoid passing bogus pointers to it).

$ grep -rn 'STRNCPY(' lib
lib/string/strcpy/strncpy.h:16:#define STRNCPY(dst, src)  strncpy(dst, src, NITEMS(dst))
lib/utmp.c:274:	STRNCPY(utent->ut_line, line);
lib/utmp.c:276:		STRNCPY(utent->ut_id, ut->ut_id);
lib/utmp.c:279:		STRNCPY(utent->ut_id, line + 3);
lib/utmp.c:282:	STRNCPY(utent->ut_name, name);
lib/utmp.c:284:	STRNCPY(utent->ut_user, name);
lib/utmp.c:288:		STRNCPY(utent->ut_host, hostname);
lib/log.c:86:	STRNCPY(newlog.ll_host, host);

Those 4 macros are the only string.h-based APIs allowed to interact with utmp(5) members in this project, and those APIs are not allowed to be used with anything other than utmp(5) in this project. These guidelines applied in review, plus the array checks to prevent accidents, seem to work quite well.

Always happy to help.

Thanks! :-)

Cheers.

Chqrlie.

Cheers,
Alex

@stevegrubb
Copy link
Contributor

Note MUSL C does not have a strdupa function. I had to work around it with alloca & strcpy on the audit project.

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 21, 2025

Note MUSL C does not have a strdupa function. I had to work around it with alloca & strcpy on the audit project.

Hi!

It seems it does have strdupa(3). What it doesn't have is strndupa(3) (which is a niche function for utmpx(5) members).

alx@devuan:~/src/musl/libc/master$ grepc strdupa .
./include/string.h:#define	strdupa(x)	strcpy(alloca(strlen(x)+1),x)
alx@devuan:~/src/musl/libc/master$ grepc strndupa .
alx@devuan:~/src/musl/libc/master$ git blame -- include/string.h | grep strdupa
419ae6d5c (Rich Felker   2012-05-22 21:52:08 -0400  92) #define	strdupa(x)	strcpy(alloca(strlen(x)+1),x)

@alejandro-colomar alejandro-colomar changed the title lib/string/strdup/: STRDUPA(): Add macro lib/string/strdup/: STRNDUPA(): Reimplement in terms of strndupa(3) Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Simpler A good issue for a new beginner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants