Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl Display for CStr #550

Open
Darksonn opened this issue Feb 27, 2025 · 7 comments
Open

impl Display for CStr #550

Darksonn opened this issue Feb 27, 2025 · 7 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@Darksonn
Copy link

Darksonn commented Feb 27, 2025

Proposal

Problem statement

In multi-language projects that need to perform interop with C or C++, you will often need to manipulate most of your strings using the core::ffi::CStr type rather than the usual str type, because you need the string to be nul-terminated before you can pass it into the C/C++ codebase. This means that working with nul-terminated strings should be somewhat convenient.

Unfortunately, it's currently very difficult to print a nul-terminated string. The type does not implement Display and there's no display() function either. Currently, you have to do something like this to print it:

println!("{}", my_string.to_string_lossy());

Or use an extension trait for CStr.

What about Debug?

It doesn't print the right thing.

println!("{:?}", c"Hello, here is a string: \n\"\'\xff.");
"Hello, here is a string: \n\"\'\xff."

It wraps the string in quotes, and also escapes various characters such as quotes and newlines.

Motivating examples or use cases

This is motivated by the Linux Kernel, where we currently have a custom CStr type that implements Display. We would like to move away from the custom CStr so that we can use c"" literals. However, there are many places where we print a nul-terminated string for various reasons.

dev_debug!("Registered {name}.");

This becomes:

use kernel::prelude::*; // for CStrExt

dev_debug!("Registered {}.", name.display());

Ideally we would like to be able to continue printing strings with the original syntax.

We don't really care about utf-8 here, and the thing we are printing to doesn't either. Printing replacement characters like how String::from_utf8_lossy would be fine.

Solution sketch

The solution sketch is to implement Display for CStr and CString. The implementation would be the same as for the ByteStr type.

Alternatives

The primary question to consider here is how to deal with bytes that are not valid utf-8.

Add a .display() function

The Path type has a similar problem, but the standard library has chosen to handle it by adding a .display() method. The Path type does not implement the Display trait directly.

I think that we should not repeat this solution for CStr. The Path type carries with it the intent that you are being careful with your paths and that you want to avoid accidentally breaking a path by round-tripping it through the String type. The CStr type does not carry the same intent with it, and implementing Display is more convenient because it lets you print with the "{name}" syntax instead of having to do "{}", name.display().

How to escape the string

This proposal says that the default way to print a CStr should be to use replacement characters rather than some other form of escaping. This is because cstr printing is generally used to display the string to a user, and � is much better at conveying that the data is invalid than \xff to a non-technical user.

Links and related work

Zulip thread: zulip
Adding OsStr::display: tracking issue
LKML thread: lkml

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@Darksonn Darksonn added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Feb 27, 2025
@BurntSushi
Copy link
Member

The new ByteStr type currently implements Display in the way you suggest here: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#trait-implementations-1

I'm unsure about the alternate Display impl here.

Is there a semantic difference between ByteStr and CStr? I think the Display impl for ByteStr is fine because the type is explicitly documented to be conventionally UTF-8. So there's an expectation that the data is "string-like," and more specifically, UTF-8. That is in contrast to &[u8] which carries no such connotation. And indeed, the Display impl on ByteStr (along with its Debug impl) is part of the point of opting into using a type like ByteStr.

Does a CStr have a similar broadly applicable connotation? I think so, although I'm not as steeped in C idioms. And I'm unsure about the UTF-8 expectation.

@Darksonn
Copy link
Author

Darksonn commented Feb 27, 2025

Oh I had not heard of ByteStr. Yes, I think ByteStr has exactly the same semantics as CStr. In fact, perhaps CStr should Deref to ByteStr?

@Darksonn
Copy link
Author

I updated the solution sketch to say that this shares the impl with ByteStr.

@joshtriplett
Copy link
Member

I kinda love the idea of a Deref impl here.

And yes, I think the Display impls should match, other than omitting the trailing NUL in the case of CStr/CString.

@hanna-kruppe
Copy link

hanna-kruppe commented Feb 27, 2025

CStr -> ByteStr would be an expensive operation (strlen) if the plan of making &CStr a thin pointer without length metadata is ever carried out. It’s not the first CStr API that would from O(1) to O(n) if length is no longer stored, but since Deref is often invoked implicitly, it would be by far the hardest to consciously avoid.

Personally I think the probability of “thin pointer CStr” change ever happening is already low and getting lower every year but a Deref impl feels like a novel nail in the coffin.

@tgross35
Copy link

Does a CStr have a similar broadly applicable connotation? I think so, although I'm not as steeped in C idioms. And I'm unsure about the UTF-8 expectation.

C's char has an implementation-defined encoding and can be controlled with -fexec-charset and also pragmas, it looks like Clang and GCC default to UTF-8 for literals, but on Windows uses is the default code page. char8_t is the newer type intended to signify UTF-8, corresponding to string literals with the u8"..." prefix.

It's likely best to say that our CStr is just a bag of nonzero bytes terminated by a null, without any conventional encoding. But we already provide API to make life easier when the bytes happen to be UTF-8 or UTF-8-like with conversions to and from &str / String, and a Display implementation seems like it would simply be an additional case of this. At least, I don't think that having a Display implementation that assumes something similar to UTF-8 punishes other uses of CStr that would need an external conversion routine anyway.

https://internals.rust-lang.org/t/pre-rfc-deprecate-and-replace-cstr-cstring/5016/38 has some more details about C encodings (that entire thread has some good comments).

@tgross35
Copy link

Also related, regarding Display round trips rust-lang/rust#136687.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants