the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy
       [not found] <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com>
@ 2023-10-19  5:46 ` Christoph Hellwig
  2023-10-19  6:01   ` the nul-terminated string helper desk chair rearrangement Kees Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2023-10-19  5:46 UTC (permalink / raw)
  To: Justin Stitt
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel, linux-hardening, ksummit

On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote:
> strncpy() is deprecated for use on NUL-terminated destination strings
> [1] and as such we should prefer more robust and less ambiguous string
> interfaces.

If we want that we need to stop pretendening direct manipulation of
nul-terminate strings is a good idea.  I suspect the churn of replacing
one helper with another, maybe slightly better, one probably
introduces more bugs than it fixes.

If we want to attack the issue for real we need to use something
better.

lib/seq_buf.c is a good start for a lot of simple cases that just
append to strings including creating complex ones.  Kent had a bunch
of good ideas on how to improve it, but couldn't be convinced to
contribute to it instead of duplicating the functionality which
is a bit sad, but I think we need to switch to something like
seq_buf that actually has a counted string instead of all this messing
around with the null-terminated strings.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-19  5:46 ` the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy Christoph Hellwig
@ 2023-10-19  6:01   ` Kees Cook
  2023-10-19  7:01     ` Willy Tarreau
  2023-10-20  4:46     ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Kees Cook @ 2023-10-19  6:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote:
> On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote:
> > strncpy() is deprecated for use on NUL-terminated destination strings
> > [1] and as such we should prefer more robust and less ambiguous string
> > interfaces.
> 
> If we want that we need to stop pretendening direct manipulation of
> nul-terminate strings is a good idea.  I suspect the churn of replacing
> one helper with another, maybe slightly better, one probably
> introduces more bugs than it fixes.
> 
> If we want to attack the issue for real we need to use something
> better.
> 
> lib/seq_buf.c is a good start for a lot of simple cases that just
> append to strings including creating complex ones.  Kent had a bunch
> of good ideas on how to improve it, but couldn't be convinced to
> contribute to it instead of duplicating the functionality which
> is a bit sad, but I think we need to switch to something like
> seq_buf that actually has a counted string instead of all this messing
> around with the null-terminated strings.

When doing more complex string creation, I agree. I spent some time
doing this while I was looking at removing strcat() and strlcat(); this
is where seq_buf shines. (And seq_buf is actually both: it maintains its
%NUL termination _and_ does the length counting.) The only thing clunky
about it was initialization, but all the conversions I experimented with
were way cleaner using seq_buf. I even added a comment to strlcat()'s
kern-doc to aim folks at seq_buf. :)

/**
 * strlcat - Append a string to an existing string
...
 * Do not use this function. While FORTIFY_SOURCE tries to avoid
 * read and write overflows, this is only possible when the sizes
 * of @p and @q are known to the compiler. Prefer building the
 * string with formatting, via scnprintf(), seq_buf, or similar.

Almost all of the remaining strncpy() usage is just string to string
copying, but the corner cases that are being spun out that aren't
strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(),
and memcpy(). Each of these are a clear improvement since they remove
the ambiguity of the intended behavior. Using seq_buf ends up being way
more overhead than is needed.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-19  6:01   ` the nul-terminated string helper desk chair rearrangement Kees Cook
@ 2023-10-19  7:01     ` Willy Tarreau
  2023-10-19 11:40       ` Alexey Dobriyan
  2023-10-20  4:46     ` Christoph Hellwig
  1 sibling, 1 reply; 20+ messages in thread
From: Willy Tarreau @ 2023-10-19  7:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening,
	ksummit

On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote:
> > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote:
> > > strncpy() is deprecated for use on NUL-terminated destination strings
> > > [1] and as such we should prefer more robust and less ambiguous string
> > > interfaces.
> > 
> > If we want that we need to stop pretendening direct manipulation of
> > nul-terminate strings is a good idea.  I suspect the churn of replacing
> > one helper with another, maybe slightly better, one probably
> > introduces more bugs than it fixes.
> > 
> > If we want to attack the issue for real we need to use something
> > better.
> > 
> > lib/seq_buf.c is a good start for a lot of simple cases that just
> > append to strings including creating complex ones.  Kent had a bunch
> > of good ideas on how to improve it, but couldn't be convinced to
> > contribute to it instead of duplicating the functionality which
> > is a bit sad, but I think we need to switch to something like
> > seq_buf that actually has a counted string instead of all this messing
> > around with the null-terminated strings.
> 
> When doing more complex string creation, I agree. I spent some time
> doing this while I was looking at removing strcat() and strlcat(); this
> is where seq_buf shines. (And seq_buf is actually both: it maintains its
> %NUL termination _and_ does the length counting.) The only thing clunky
> about it was initialization, but all the conversions I experimented with
> were way cleaner using seq_buf.
(...)

I also agree. I'm using several other schemes based on pointer+length in
other projects and despite not being complete in terms of API (due to the
slow migration of old working code), over time it proves much easier to
use and requires far less controls.

With NUL-teminated strings you need to perform checks for each and every
operation. When the length is known and controlled, most often you can
get rid of many tests on intermediate operations and perform a check at
the end, thus you end up with less "if" and "goto fail" in the code,
because the checks are no longer for "not crashing nor introducing
vulnerabilities", but just "returning a correct result", which can often
be detected more easily.

Another benefit I found by accident is that when you need to compare some
tokens against multiple ones (say some keywords for example), it becomes
much faster than strcmp()-based if/else series because in this case you
start by comparing lengths instead of comparing contents. And when your
macros allow you to constify string constants, the compiler will replace
long "if" series with checks against constant values, and may even arrange
them as a tree since all are constants, sometimes mixing with the first
char as the discriminator. Typically on the test below I observe a 10x
speedup at -O3 and ~5x at -O2 when I convert this:

	if (!strcmp(name, "host") ||
	    !strcmp(name, "content-length") ||
	    !strcmp(name, "connection") ||
	    !strcmp(name, "proxy-connection") ||
	    !strcmp(name, "keep-alive") ||
	    !strcmp(name, "upgrade") ||
	    !strcmp(name, "te") ||
	    !strcmp(name, "transfer-encoding"))
		return 1;

to this:

	if (isteq(name, ist("host")) ||
	    isteq(name, ist("content-length")) ||
	    isteq(name, ist("connection")) ||
	    isteq(name, ist("proxy-connection")) ||
	    isteq(name, ist("keep-alive")) ||
	    isteq(name, ist("upgrade")) ||
	    isteq(name, ist("te")) ||
	    isteq(name, ist("transfer-encoding")))
		return 1;

The code is larger but when compiled at -Os, it instead becomes smaller.

Another interesting property I'm using in the API above, that might or
might not apply there is that for most archs we care about, functions
can take a struct of two words passed as registers, and can return
such a struct as a pair of registers as well. This allows to chain
functions by passing one function's return as the argument to another
one, which is what users often want to do to avoid intermediate
variables.

All this to say that length-based strings do offer quite a lot of
benefits over the long term.

Willy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-19  7:01     ` Willy Tarreau
@ 2023-10-19 11:40       ` Alexey Dobriyan
  2023-10-19 12:00         ` Willy Tarreau
  0 siblings, 1 reply; 20+ messages in thread
From: Alexey Dobriyan @ 2023-10-19 11:40 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Kees Cook, Christoph Hellwig, Justin Stitt, Keith Busch,
	Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote:
> On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote:
> > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote:
> > > > strncpy() is deprecated for use on NUL-terminated destination strings
> > > > [1] and as such we should prefer more robust and less ambiguous string
> > > > interfaces.
> > > 
> > > If we want that we need to stop pretendening direct manipulation of
> > > nul-terminate strings is a good idea.  I suspect the churn of replacing
> > > one helper with another, maybe slightly better, one probably
> > > introduces more bugs than it fixes.
> > > 
> > > If we want to attack the issue for real we need to use something
> > > better.
> > > 
> > > lib/seq_buf.c is a good start for a lot of simple cases that just
> > > append to strings including creating complex ones.  Kent had a bunch
> > > of good ideas on how to improve it, but couldn't be convinced to
> > > contribute to it instead of duplicating the functionality which
> > > is a bit sad, but I think we need to switch to something like
> > > seq_buf that actually has a counted string instead of all this messing
> > > around with the null-terminated strings.
> > 
> > When doing more complex string creation, I agree. I spent some time
> > doing this while I was looking at removing strcat() and strlcat(); this
> > is where seq_buf shines. (And seq_buf is actually both: it maintains its
> > %NUL termination _and_ does the length counting.) The only thing clunky
> > about it was initialization, but all the conversions I experimented with
> > were way cleaner using seq_buf.
> (...)
> 
> I also agree. I'm using several other schemes based on pointer+length in
> other projects and despite not being complete in terms of API (due to the
> slow migration of old working code), over time it proves much easier to
> use and requires far less controls.
> 
> With NUL-teminated strings you need to perform checks for each and every
> operation. When the length is known and controlled, most often you can
> get rid of many tests on intermediate operations and perform a check at
> the end, thus you end up with less "if" and "goto fail" in the code,
> because the checks are no longer for "not crashing nor introducing
> vulnerabilities", but just "returning a correct result", which can often
> be detected more easily.
> 
> Another benefit I found by accident is that when you need to compare some
> tokens against multiple ones (say some keywords for example), it becomes
> much faster than strcmp()-based if/else series because in this case you
> start by comparing lengths instead of comparing contents. And when your
> macros allow you to constify string constants, the compiler will replace
> long "if" series with checks against constant values, and may even arrange
> them as a tree since all are constants, sometimes mixing with the first
> char as the discriminator. Typically on the test below I observe a 10x
> speedup at -O3 and ~5x at -O2 when I convert this:
> 
> 	if (!strcmp(name, "host") ||
> 	    !strcmp(name, "content-length") ||
> 	    !strcmp(name, "connection") ||
> 	    !strcmp(name, "proxy-connection") ||
> 	    !strcmp(name, "keep-alive") ||
> 	    !strcmp(name, "upgrade") ||
> 	    !strcmp(name, "te") ||
> 	    !strcmp(name, "transfer-encoding"))
> 		return 1;
> 
> to this:
> 
> 	if (isteq(name, ist("host")) ||
> 	    isteq(name, ist("content-length")) ||
> 	    isteq(name, ist("connection")) ||
> 	    isteq(name, ist("proxy-connection")) ||
> 	    isteq(name, ist("keep-alive")) ||
> 	    isteq(name, ist("upgrade")) ||
> 	    isteq(name, ist("te")) ||
> 	    isteq(name, ist("transfer-encoding")))
> 		return 1;
> 
> The code is larger but when compiled at -Os, it instead becomes smaller.
> 
> Another interesting property I'm using in the API above, that might or
> might not apply there is that for most archs we care about, functions
> can take a struct of two words passed as registers, and can return
> such a struct as a pair of registers as well. This allows to chain
> functions by passing one function's return as the argument to another
> one, which is what users often want to do to avoid intermediate
> variables.

Chaining should be nice cherry on top for very specific cases but certainly
not promoted or advertised. Deleting intermediate variables promotes
implementation-defined behaviour because of unspecified order of evaluation
of function arguments. Second, debuggers still operate with lines in mind,
so jumping to the next statement written like this

	f(g(), h())

can be problematic. Intermediate variables are much less of a problem now
that -Wdeclaration-after-statement has been finally abolished! They don't
consume LOC anymore.

> All this to say that length-based strings do offer quite a lot of
> benefits over the long term.

As long as they are named kstring :-) Or std_string, he-he.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-19 11:40       ` Alexey Dobriyan
@ 2023-10-19 12:00         ` Willy Tarreau
  0 siblings, 0 replies; 20+ messages in thread
From: Willy Tarreau @ 2023-10-19 12:00 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Christoph Hellwig, Justin Stitt, Keith Busch,
	Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

On Thu, Oct 19, 2023 at 02:40:52PM +0300, Alexey Dobriyan wrote:
> On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote:
> > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote:
> > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote:
> > > > > strncpy() is deprecated for use on NUL-terminated destination strings
> > > > > [1] and as such we should prefer more robust and less ambiguous string
> > > > > interfaces.
> > > > 
> > > > If we want that we need to stop pretendening direct manipulation of
> > > > nul-terminate strings is a good idea.  I suspect the churn of replacing
> > > > one helper with another, maybe slightly better, one probably
> > > > introduces more bugs than it fixes.
> > > > 
> > > > If we want to attack the issue for real we need to use something
> > > > better.
> > > > 
> > > > lib/seq_buf.c is a good start for a lot of simple cases that just
> > > > append to strings including creating complex ones.  Kent had a bunch
> > > > of good ideas on how to improve it, but couldn't be convinced to
> > > > contribute to it instead of duplicating the functionality which
> > > > is a bit sad, but I think we need to switch to something like
> > > > seq_buf that actually has a counted string instead of all this messing
> > > > around with the null-terminated strings.
> > > 
> > > When doing more complex string creation, I agree. I spent some time
> > > doing this while I was looking at removing strcat() and strlcat(); this
> > > is where seq_buf shines. (And seq_buf is actually both: it maintains its
> > > %NUL termination _and_ does the length counting.) The only thing clunky
> > > about it was initialization, but all the conversions I experimented with
> > > were way cleaner using seq_buf.
> > (...)
> > 
> > I also agree. I'm using several other schemes based on pointer+length in
> > other projects and despite not being complete in terms of API (due to the
> > slow migration of old working code), over time it proves much easier to
> > use and requires far less controls.
> > 
> > With NUL-teminated strings you need to perform checks for each and every
> > operation. When the length is known and controlled, most often you can
> > get rid of many tests on intermediate operations and perform a check at
> > the end, thus you end up with less "if" and "goto fail" in the code,
> > because the checks are no longer for "not crashing nor introducing
> > vulnerabilities", but just "returning a correct result", which can often
> > be detected more easily.
> > 
> > Another benefit I found by accident is that when you need to compare some
> > tokens against multiple ones (say some keywords for example), it becomes
> > much faster than strcmp()-based if/else series because in this case you
> > start by comparing lengths instead of comparing contents. And when your
> > macros allow you to constify string constants, the compiler will replace
> > long "if" series with checks against constant values, and may even arrange
> > them as a tree since all are constants, sometimes mixing with the first
> > char as the discriminator. Typically on the test below I observe a 10x
> > speedup at -O3 and ~5x at -O2 when I convert this:
> > 
> > 	if (!strcmp(name, "host") ||
> > 	    !strcmp(name, "content-length") ||
> > 	    !strcmp(name, "connection") ||
> > 	    !strcmp(name, "proxy-connection") ||
> > 	    !strcmp(name, "keep-alive") ||
> > 	    !strcmp(name, "upgrade") ||
> > 	    !strcmp(name, "te") ||
> > 	    !strcmp(name, "transfer-encoding"))
> > 		return 1;
> > 
> > to this:
> > 
> > 	if (isteq(name, ist("host")) ||
> > 	    isteq(name, ist("content-length")) ||
> > 	    isteq(name, ist("connection")) ||
> > 	    isteq(name, ist("proxy-connection")) ||
> > 	    isteq(name, ist("keep-alive")) ||
> > 	    isteq(name, ist("upgrade")) ||
> > 	    isteq(name, ist("te")) ||
> > 	    isteq(name, ist("transfer-encoding")))
> > 		return 1;
> > 
> > The code is larger but when compiled at -Os, it instead becomes smaller.
> > 
> > Another interesting property I'm using in the API above, that might or
> > might not apply there is that for most archs we care about, functions
> > can take a struct of two words passed as registers, and can return
> > such a struct as a pair of registers as well. This allows to chain
> > functions by passing one function's return as the argument to another
> > one, which is what users often want to do to avoid intermediate
> > variables.
> 
> Chaining should be nice cherry on top for very specific cases but certainly
> not promoted or advertised. Deleting intermediate variables promotes
> implementation-defined behaviour because of unspecified order of evaluation
> of function arguments. Second, debuggers still operate with lines in mind,
> so jumping to the next statement written like this
> 
> 	f(g(), h())
> 
> can be problematic.

It obviously depends what these functions do, but that remains true for
lots of other use cases applying to a shared memory location, if that's
the case. Also it happens that a lot of string functions that are used
as arguments to other ones are in fact lookups, skip, trim etc which
only manipulate the pointer and the length and not the contents.

> Intermediate variables are much less of a problem now
> that -Wdeclaration-after-statement has been finally abolished! They don't
> consume LOC anymore.

Intermediate variables declared after statements remain an abomination
which turn a visual lookup from O(indent_levels) to O(lines) because
normally you only have to quickly glance a the previous opening brace
and if you don't find, you repeat, but with them you have to visually
scan every single line. They're now allowed for macros and iterators
which can make a good use of them but it's not a reason for abusing
them in code supposed to be reviewable by humans.

> > All this to say that length-based strings do offer quite a lot of
> > benefits over the long term.
> 
> As long as they are named kstring :-) Or std_string, he-he.

That point is the last of my concerns ;-)

Willy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-19  6:01   ` the nul-terminated string helper desk chair rearrangement Kees Cook
  2023-10-19  7:01     ` Willy Tarreau
@ 2023-10-20  4:46     ` Christoph Hellwig
  2023-10-20 17:40       ` Justin Stitt
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2023-10-20  4:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening,
	ksummit

On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> Almost all of the remaining strncpy() usage is just string to string
> copying, but the corner cases that are being spun out that aren't
> strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(),
> and memcpy(). Each of these are a clear improvement since they remove
> the ambiguity of the intended behavior. Using seq_buf ends up being way
> more overhead than is needed.

I'm really not sure strscpy is much of an improvement.  In this particular
case in most other places we simply use a snprintf for nqns, which seems
useful here to if we don't want the full buf.

But switching to a completely undocumented helper like strscpy seems not
useful at all.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-20  4:46     ` Christoph Hellwig
@ 2023-10-20 17:40       ` Justin Stitt
  2023-10-20 17:56         ` Linus Torvalds
  2023-10-20 18:30         ` Kees Cook
  0 siblings, 2 replies; 20+ messages in thread
From: Justin Stitt @ 2023-10-20 17:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> > Almost all of the remaining strncpy() usage is just string to string
> > copying, but the corner cases that are being spun out that aren't
> > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(),
> > and memcpy(). Each of these are a clear improvement since they remove
> > the ambiguity of the intended behavior. Using seq_buf ends up being way
> > more overhead than is needed.
>
> I'm really not sure strscpy is much of an improvement.  In this particular
> case in most other places we simply use a snprintf for nqns, which seems
> useful here to if we don't want the full buf.
>
> But switching to a completely undocumented helper like strscpy seems not
> useful at all.

There's some docs at [1]. Perhaps there could be more?

[1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-20 17:40       ` Justin Stitt
@ 2023-10-20 17:56         ` Linus Torvalds
  2023-10-20 18:22           ` Kees Cook
  2023-10-20 18:30         ` Kees Cook
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2023-10-20 17:56 UTC (permalink / raw)
  To: Justin Stitt
  Cc: Christoph Hellwig, Kees Cook, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening,
	ksummit

On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote:
>
> There's some docs at [1]. Perhaps there could be more?
>
> [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292

Note that we have so few 'strlcpy()' calls that we really should
remove that horrid horrid interface. It's a buggy piece of sh*t.

'strlcpy()' is fundamentally unsafe BY DESIGN if you don't trust the
source string - which is one of the alleged reasons to use it.

That said, while 'strscpy()' fixes the fundamental conceptual bugs of
strlcpy(), it's worth noting that it has *two* differences wrt
strlcpy:

 - it doesn't have the broken return value

 - it copies things in word-size chunks to be more efficient

And because of that word-size thing it will possibly insert more than
one NUL character at the end of the result.

To give an example, if you have

   dst[64] = "abcdefghijklmnopqrstuvwxyz";
   src[64] = "123\0........";
   strlcpy(dst, src, 64);

then the strlcpy() will return 3 (the size of the copy), but the
destination will end up looking like

   dst[64] = "123\0\0\0\0\0ijklmnopqrstuvwxyz...";

This odd "possibly word padding" is basically never relevant (and it's
obviously always also limited by the size you gave it), but *if* you
are doing something really odd, and you expect the end of the
destination string to not be modified at all past the final NUL of the
copy, you need to be aware of this.

It does mean that if you used to have

    dst[4];
    strlcpy(dst, "abc", 8);

then that *used* to work (because it would copy four bytes: "abc\0"
and that fits in 'dst[]'). But

   dst[4];
   strscpy(dst, "abc", 8);

will overflow dst[], because it will do a word-copy and you told
'strscpy()' that you had a 8-byte buffer, and it will try to write
"abc\0\0\0\0\0" into the destination.

The above is insane code, but it's an example of why a blind
strlcpy->strscpy conversion might change semantics.

            Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-20 17:56         ` Linus Torvalds
@ 2023-10-20 18:22           ` Kees Cook
  0 siblings, 0 replies; 20+ messages in thread
From: Kees Cook @ 2023-10-20 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Justin Stitt, Christoph Hellwig, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening,
	ksummit, Azeem Shaikh

On Fri, Oct 20, 2023 at 10:56:31AM -0700, Linus Torvalds wrote:
> On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote:
> >
> > There's some docs at [1]. Perhaps there could be more?
> >
> > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
> 
> Note that we have so few 'strlcpy()' calls that we really should
> remove that horrid horrid interface. It's a buggy piece of sh*t.

Yup, that's on-going. There's just a few left; Azeem has been chipping
away at strlcpy.

> It does mean that if you used to have
> 
>     dst[4];
>     strlcpy(dst, "abc", 8);
> 
> then that *used* to work (because it would copy four bytes: "abc\0"
> and that fits in 'dst[]'). But
> 
>    dst[4];
>    strscpy(dst, "abc", 8);
> 
> will overflow dst[], because it will do a word-copy and you told
> 'strscpy()' that you had a 8-byte buffer, and it will try to write
> "abc\0\0\0\0\0" into the destination.

Luckily, we already have checks for these mismatched sizes at compile
time (i.e. CONFIG_FORTIFY_SOURCE will already check for pathological
cases like above where 8 > sizeof(dst)).

> The above is insane code, but it's an example of why a blind
> strlcpy->strscpy conversion might change semantics.

Totally agreed. All of the recent string conversions have been paying
close attention to the behavioral differences.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-20 17:40       ` Justin Stitt
  2023-10-20 17:56         ` Linus Torvalds
@ 2023-10-20 18:30         ` Kees Cook
  2023-10-26 10:01           ` Christoph Hellwig
  1 sibling, 1 reply; 20+ messages in thread
From: Kees Cook @ 2023-10-20 18:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

On Fri, Oct 20, 2023 at 10:40:12AM -0700, Justin Stitt wrote:
> On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote:
> > > Almost all of the remaining strncpy() usage is just string to string
> > > copying, but the corner cases that are being spun out that aren't
> > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(),
> > > and memcpy(). Each of these are a clear improvement since they remove
> > > the ambiguity of the intended behavior. Using seq_buf ends up being way
> > > more overhead than is needed.
> >
> > I'm really not sure strscpy is much of an improvement.  In this particular
> > case in most other places we simply use a snprintf for nqns, which seems
> > useful here to if we don't want the full buf.
> >
> > But switching to a completely undocumented helper like strscpy seems not
> > useful at all.

I'm curious where you looked and didn't find documentation -- perhaps
there is an improvement to be made to aim one to where the existing
documentation lives?

> 
> There's some docs at [1]. Perhaps there could be more?
> 
> [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292

Right, And it's even valid kern-doc, which gets rendered in the kernel
API docs, along with all the other string functions:
https://docs.kernel.org/core-api/kernel-api.html#c.strscpy

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-20 18:30         ` Kees Cook
@ 2023-10-26 10:01           ` Christoph Hellwig
  2023-10-26 11:39             ` James Bottomley
  2023-10-26 13:44             ` Andrew Lunn
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2023-10-26 10:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening,
	ksummit

On Fri, Oct 20, 2023 at 11:30:49AM -0700, Kees Cook wrote:
> I'm curious where you looked and didn't find documentation -- perhaps
> there is an improvement to be made to aim one to where the existing
> documentation lives?

My order was the following:

 - look for kernel doc on the main function implementation in
   lib/string.c (as found by a grep for an EXPORT_SYMBOL for it)
 - after not finding it there, but seeing that it has an ifdef for
   an arch override, which turns out to be unused
 - then I grepped the Documentation/ directory for it, and while
   there are quite a few matches for strscpy, they are largely
   in examples, with the only text referring to strscpy being
   Documentation/process/deprecated.rst that tells you to use it
   instead of strcpy, but not how it actually works
 - after that I realized that some people put the kerneldoc on
   the declaration, so I looked at that in string.h, but couldn't
   find it.

> > There's some docs at [1]. Perhaps there could be more?
> > 
> > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
> 
> Right, And it's even valid kern-doc, which gets rendered in the kernel
> API docs, along with all the other string functions:
> https://docs.kernel.org/core-api/kernel-api.html#c.strscpy

Well, I never use the generated kerneldoc because it's much harder than
just grepping the tree, but indeed it exists even if it's hidden in
the most obsfucated way.  But at least I know now!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 10:01           ` Christoph Hellwig
@ 2023-10-26 11:39             ` James Bottomley
  2023-10-26 13:52               ` Steven Rostedt
  2023-10-26 13:44             ` Andrew Lunn
  1 sibling, 1 reply; 20+ messages in thread
From: James Bottomley @ 2023-10-26 11:39 UTC (permalink / raw)
  To: Christoph Hellwig, Kees Cook
  Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

On Thu, 2023-10-26 at 12:01 +0200, Christoph Hellwig wrote:
> > > There's some docs at [1]. Perhaps there could be more?
> > > 
> > > [1]:
> > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
> > 
> > Right, And it's even valid kern-doc, which gets rendered in the
> > kernel API docs, along with all the other string functions:
> > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy
> 
> Well, I never use the generated kerneldoc because it's much harder
> than just grepping the tree, but indeed it exists even if it's hidden
> in the most obsfucated way.  But at least I know now!

This, honestly, is one of the really annoying problems of kerneldoc. 
When looking for structures or functions

git grep "<function> -"

usually finds it.  However, I recently asked on linux-scsi if we could
point to the doc about system_state and what it meant.  However, either
we all suck or there's no such documentation because no-one could find
it.

While it's nice in theory to have everything documented, it's not much
use if no one can actually find the information ...

James


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 11:39             ` James Bottomley
@ 2023-10-26 13:52               ` Steven Rostedt
  2023-10-26 13:59                 ` Geert Uytterhoeven
  2023-10-26 14:05                 ` Jonathan Corbet
  0 siblings, 2 replies; 20+ messages in thread
From: Steven Rostedt @ 2023-10-26 13:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch,
	Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

On Thu, 26 Oct 2023 07:39:44 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> While it's nice in theory to have everything documented, it's not much
> use if no one can actually find the information ...

Does kerneldoc provide an automated index? That is, if we had a single file
that had every function in the kernel that is documented, with the path to
the file that documents it, it would make finding documentation much
simpler.

Maybe it already does? Which would mean we need a way to find the index too!

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 13:52               ` Steven Rostedt
@ 2023-10-26 13:59                 ` Geert Uytterhoeven
  2023-10-27 18:32                   ` Alexey Dobriyan
  2023-10-26 14:05                 ` Jonathan Corbet
  1 sibling, 1 reply; 20+ messages in thread
From: Geert Uytterhoeven @ 2023-10-26 13:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: James Bottomley, Christoph Hellwig, Kees Cook, Justin Stitt,
	Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

Hi Steven,

On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 26 Oct 2023 07:39:44 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > While it's nice in theory to have everything documented, it's not much
> > use if no one can actually find the information ...
>
> Does kerneldoc provide an automated index? That is, if we had a single file
> that had every function in the kernel that is documented, with the path to
> the file that documents it, it would make finding documentation much
> simpler.
>
> Maybe it already does? Which would mean we need a way to find the index too!

ctags?

Although "git grep" is faster (assumed you use the "correct" search
pattern, which can sometimes be challenging, indeed).

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 13:59                 ` Geert Uytterhoeven
@ 2023-10-27 18:32                   ` Alexey Dobriyan
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Dobriyan @ 2023-10-27 18:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Steven Rostedt, James Bottomley, Christoph Hellwig, Kees Cook,
	Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

On Thu, Oct 26, 2023 at 03:59:28PM +0200, Geert Uytterhoeven wrote:
> Hi Steven,
> 
> On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Thu, 26 Oct 2023 07:39:44 -0400
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >
> > > While it's nice in theory to have everything documented, it's not much
> > > use if no one can actually find the information ...
> >
> > Does kerneldoc provide an automated index? That is, if we had a single file
> > that had every function in the kernel that is documented, with the path to
> > the file that documents it, it would make finding documentation much
> > simpler.
> >
> > Maybe it already does? Which would mean we need a way to find the index too!
> 
> ctags?

ctags is a tool from previous century. It doesn't help that "make tags"
is single-threaded. It needs constant babysitting (loop-like macros,
ignore attibute annotations which masquerade as identifiers). I think
"make tags" became much slower because ignore-list is one giant regexp
which only grows bigger.

> Although "git grep" is faster (assumed you use the "correct" search
> pattern, which can sometimes be challenging, indeed).

I tried QT Creator indexing at some point (which is parallel), it needs
to be told that headers are C not C++. I didn't find a way to tell it
that .c files are C too but F2 jumped to definitions quite well.
Also hovering over identifier/name works (being IDE it understands
popular doc styles).

It can be made to work reasonably well provided that you did "make
allmodconfig" and added few header locations. clangd parses like compiler,
not like human and kernel uses a lot of CONFIG defines so some config
must be chosen.

But I need to recheck all this stuff now that new version was propagated
to distros. It should be better (and less segfaulty :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 13:52               ` Steven Rostedt
  2023-10-26 13:59                 ` Geert Uytterhoeven
@ 2023-10-26 14:05                 ` Jonathan Corbet
  2023-10-27  7:08                   ` Kalle Valo
  1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Corbet @ 2023-10-26 14:05 UTC (permalink / raw)
  To: Steven Rostedt, James Bottomley
  Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch,
	Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

Steven Rostedt <rostedt@goodmis.org> writes:

> On Thu, 26 Oct 2023 07:39:44 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> While it's nice in theory to have everything documented, it's not much
>> use if no one can actually find the information ...
>
> Does kerneldoc provide an automated index?

If you go to https://www.kernel.org/doc/html/latest/ and type a symbol
into the search box on the left, you'll get directed to the right place
(if it exists).  For James's system_state example, it makes it clear
that there's only one reference - in the coding-style document, of all
places...

I've never looked into that index to see how hard it would be to access
without a browser.

jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 14:05                 ` Jonathan Corbet
@ 2023-10-27  7:08                   ` Kalle Valo
  0 siblings, 0 replies; 20+ messages in thread
From: Kalle Valo @ 2023-10-27  7:08 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Steven Rostedt, James Bottomley, Christoph Hellwig, Kees Cook,
	Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, linux-hardening, ksummit

Jonathan Corbet <corbet@lwn.net> writes:

> Steven Rostedt <rostedt@goodmis.org> writes:
>
>> On Thu, 26 Oct 2023 07:39:44 -0400
>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>
>>> While it's nice in theory to have everything documented, it's not much
>>> use if no one can actually find the information ...
>>
>> Does kerneldoc provide an automated index?
>
> If you go to https://www.kernel.org/doc/html/latest/

BTW there's now a shorter URL for this whic is really nice:

https://docs.kernel.org/

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 10:01           ` Christoph Hellwig
  2023-10-26 11:39             ` James Bottomley
@ 2023-10-26 13:44             ` Andrew Lunn
  2023-10-26 13:51               ` Laurent Pinchart
  2023-10-26 14:27               ` James Bottomley
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Lunn @ 2023-10-26 13:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg,
	linux-nvme, linux-kernel, linux-hardening, ksummit

> > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292

I found that https://elixir.bootlin.com/linux is the best way to find
Documentation for functions and structures. I would suggest try it
first, and only when what fails to start using grep.

       Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 13:44             ` Andrew Lunn
@ 2023-10-26 13:51               ` Laurent Pinchart
  2023-10-26 14:27               ` James Bottomley
  1 sibling, 0 replies; 20+ messages in thread
From: Laurent Pinchart @ 2023-10-26 13:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch,
	Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-hardening, ksummit

On Thu, Oct 26, 2023 at 03:44:22PM +0200, Andrew Lunn wrote:
> > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
> 
> I found that https://elixir.bootlin.com/linux is the best way to find
> Documentation for functions and structures. I would suggest try it
> first, and only when what fails to start using grep.

It's painful to have to open the HTML documentation generated by the
kernel build system when developing, and even more painful to have to
use a particular website. If the documentation is difficult to locate in
the source tree, we're doing something wrong.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: the nul-terminated string helper desk chair rearrangement
  2023-10-26 13:44             ` Andrew Lunn
  2023-10-26 13:51               ` Laurent Pinchart
@ 2023-10-26 14:27               ` James Bottomley
  1 sibling, 0 replies; 20+ messages in thread
From: James Bottomley @ 2023-10-26 14:27 UTC (permalink / raw)
  To: Andrew Lunn, Christoph Hellwig
  Cc: Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg,
	linux-nvme, linux-kernel, linux-hardening, ksummit

On Thu, 2023-10-26 at 15:44 +0200, Andrew Lunn wrote:
> > > > [1]:
> > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
> 
> I found that https://elixir.bootlin.com/linux

That's a 404, I think you mean

https://elixir.bootlin.com/linux/latest/source

>  is the best way to find Documentation for functions and structures.
> I would suggest try it first, and only when what fails to start using
> grep.

I just tried it with system_state and it doesn't even find the
definition.  I think it might be because it has annotations which
confuse the searcher (it's in init/main.c as

 enum system_states system_state __read_mostly;

).  If there's any meaningful doc about it, elixir also doesn't find
it.

James
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-27 18:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com>
2023-10-19  5:46 ` the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy Christoph Hellwig
2023-10-19  6:01   ` the nul-terminated string helper desk chair rearrangement Kees Cook
2023-10-19  7:01     ` Willy Tarreau
2023-10-19 11:40       ` Alexey Dobriyan
2023-10-19 12:00         ` Willy Tarreau
2023-10-20  4:46     ` Christoph Hellwig
2023-10-20 17:40       ` Justin Stitt
2023-10-20 17:56         ` Linus Torvalds
2023-10-20 18:22           ` Kees Cook
2023-10-20 18:30         ` Kees Cook
2023-10-26 10:01           ` Christoph Hellwig
2023-10-26 11:39             ` James Bottomley
2023-10-26 13:52               ` Steven Rostedt
2023-10-26 13:59                 ` Geert Uytterhoeven
2023-10-27 18:32                   ` Alexey Dobriyan
2023-10-26 14:05                 ` Jonathan Corbet
2023-10-27  7:08                   ` Kalle Valo
2023-10-26 13:44             ` Andrew Lunn
2023-10-26 13:51               ` Laurent Pinchart
2023-10-26 14:27               ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox