* the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy [not found] <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com> @ 2023-10-19 5:46 ` Christoph Hellwig 2023-10-19 6:01 ` the nul-terminated string helper desk chair rearrangement Kees Cook 0 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2023-10-19 5:46 UTC (permalink / raw) To: Justin Stitt Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. If we want that we need to stop pretendening direct manipulation of nul-terminate strings is a good idea. I suspect the churn of replacing one helper with another, maybe slightly better, one probably introduces more bugs than it fixes. If we want to attack the issue for real we need to use something better. lib/seq_buf.c is a good start for a lot of simple cases that just append to strings including creating complex ones. Kent had a bunch of good ideas on how to improve it, but couldn't be convinced to contribute to it instead of duplicating the functionality which is a bit sad, but I think we need to switch to something like seq_buf that actually has a counted string instead of all this messing around with the null-terminated strings. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-19 5:46 ` the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy Christoph Hellwig @ 2023-10-19 6:01 ` Kees Cook 2023-10-19 7:01 ` Willy Tarreau 2023-10-20 4:46 ` Christoph Hellwig 0 siblings, 2 replies; 20+ messages in thread From: Kees Cook @ 2023-10-19 6:01 UTC (permalink / raw) To: Christoph Hellwig Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > strncpy() is deprecated for use on NUL-terminated destination strings > > [1] and as such we should prefer more robust and less ambiguous string > > interfaces. > > If we want that we need to stop pretendening direct manipulation of > nul-terminate strings is a good idea. I suspect the churn of replacing > one helper with another, maybe slightly better, one probably > introduces more bugs than it fixes. > > If we want to attack the issue for real we need to use something > better. > > lib/seq_buf.c is a good start for a lot of simple cases that just > append to strings including creating complex ones. Kent had a bunch > of good ideas on how to improve it, but couldn't be convinced to > contribute to it instead of duplicating the functionality which > is a bit sad, but I think we need to switch to something like > seq_buf that actually has a counted string instead of all this messing > around with the null-terminated strings. When doing more complex string creation, I agree. I spent some time doing this while I was looking at removing strcat() and strlcat(); this is where seq_buf shines. (And seq_buf is actually both: it maintains its %NUL termination _and_ does the length counting.) The only thing clunky about it was initialization, but all the conversions I experimented with were way cleaner using seq_buf. I even added a comment to strlcat()'s kern-doc to aim folks at seq_buf. :) /** * strlcat - Append a string to an existing string ... * Do not use this function. While FORTIFY_SOURCE tries to avoid * read and write overflows, this is only possible when the sizes * of @p and @q are known to the compiler. Prefer building the * string with formatting, via scnprintf(), seq_buf, or similar. Almost all of the remaining strncpy() usage is just string to string copying, but the corner cases that are being spun out that aren't strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), and memcpy(). Each of these are a clear improvement since they remove the ambiguity of the intended behavior. Using seq_buf ends up being way more overhead than is needed. -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-19 6:01 ` the nul-terminated string helper desk chair rearrangement Kees Cook @ 2023-10-19 7:01 ` Willy Tarreau 2023-10-19 11:40 ` Alexey Dobriyan 2023-10-20 4:46 ` Christoph Hellwig 1 sibling, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2023-10-19 7:01 UTC (permalink / raw) To: Kees Cook Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > [1] and as such we should prefer more robust and less ambiguous string > > > interfaces. > > > > If we want that we need to stop pretendening direct manipulation of > > nul-terminate strings is a good idea. I suspect the churn of replacing > > one helper with another, maybe slightly better, one probably > > introduces more bugs than it fixes. > > > > If we want to attack the issue for real we need to use something > > better. > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > append to strings including creating complex ones. Kent had a bunch > > of good ideas on how to improve it, but couldn't be convinced to > > contribute to it instead of duplicating the functionality which > > is a bit sad, but I think we need to switch to something like > > seq_buf that actually has a counted string instead of all this messing > > around with the null-terminated strings. > > When doing more complex string creation, I agree. I spent some time > doing this while I was looking at removing strcat() and strlcat(); this > is where seq_buf shines. (And seq_buf is actually both: it maintains its > %NUL termination _and_ does the length counting.) The only thing clunky > about it was initialization, but all the conversions I experimented with > were way cleaner using seq_buf. (...) I also agree. I'm using several other schemes based on pointer+length in other projects and despite not being complete in terms of API (due to the slow migration of old working code), over time it proves much easier to use and requires far less controls. With NUL-teminated strings you need to perform checks for each and every operation. When the length is known and controlled, most often you can get rid of many tests on intermediate operations and perform a check at the end, thus you end up with less "if" and "goto fail" in the code, because the checks are no longer for "not crashing nor introducing vulnerabilities", but just "returning a correct result", which can often be detected more easily. Another benefit I found by accident is that when you need to compare some tokens against multiple ones (say some keywords for example), it becomes much faster than strcmp()-based if/else series because in this case you start by comparing lengths instead of comparing contents. And when your macros allow you to constify string constants, the compiler will replace long "if" series with checks against constant values, and may even arrange them as a tree since all are constants, sometimes mixing with the first char as the discriminator. Typically on the test below I observe a 10x speedup at -O3 and ~5x at -O2 when I convert this: if (!strcmp(name, "host") || !strcmp(name, "content-length") || !strcmp(name, "connection") || !strcmp(name, "proxy-connection") || !strcmp(name, "keep-alive") || !strcmp(name, "upgrade") || !strcmp(name, "te") || !strcmp(name, "transfer-encoding")) return 1; to this: if (isteq(name, ist("host")) || isteq(name, ist("content-length")) || isteq(name, ist("connection")) || isteq(name, ist("proxy-connection")) || isteq(name, ist("keep-alive")) || isteq(name, ist("upgrade")) || isteq(name, ist("te")) || isteq(name, ist("transfer-encoding"))) return 1; The code is larger but when compiled at -Os, it instead becomes smaller. Another interesting property I'm using in the API above, that might or might not apply there is that for most archs we care about, functions can take a struct of two words passed as registers, and can return such a struct as a pair of registers as well. This allows to chain functions by passing one function's return as the argument to another one, which is what users often want to do to avoid intermediate variables. All this to say that length-based strings do offer quite a lot of benefits over the long term. Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-19 7:01 ` Willy Tarreau @ 2023-10-19 11:40 ` Alexey Dobriyan 2023-10-19 12:00 ` Willy Tarreau 0 siblings, 1 reply; 20+ messages in thread From: Alexey Dobriyan @ 2023-10-19 11:40 UTC (permalink / raw) To: Willy Tarreau Cc: Kees Cook, Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > interfaces. > > > > > > If we want that we need to stop pretendening direct manipulation of > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > one helper with another, maybe slightly better, one probably > > > introduces more bugs than it fixes. > > > > > > If we want to attack the issue for real we need to use something > > > better. > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > append to strings including creating complex ones. Kent had a bunch > > > of good ideas on how to improve it, but couldn't be convinced to > > > contribute to it instead of duplicating the functionality which > > > is a bit sad, but I think we need to switch to something like > > > seq_buf that actually has a counted string instead of all this messing > > > around with the null-terminated strings. > > > > When doing more complex string creation, I agree. I spent some time > > doing this while I was looking at removing strcat() and strlcat(); this > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > %NUL termination _and_ does the length counting.) The only thing clunky > > about it was initialization, but all the conversions I experimented with > > were way cleaner using seq_buf. > (...) > > I also agree. I'm using several other schemes based on pointer+length in > other projects and despite not being complete in terms of API (due to the > slow migration of old working code), over time it proves much easier to > use and requires far less controls. > > With NUL-teminated strings you need to perform checks for each and every > operation. When the length is known and controlled, most often you can > get rid of many tests on intermediate operations and perform a check at > the end, thus you end up with less "if" and "goto fail" in the code, > because the checks are no longer for "not crashing nor introducing > vulnerabilities", but just "returning a correct result", which can often > be detected more easily. > > Another benefit I found by accident is that when you need to compare some > tokens against multiple ones (say some keywords for example), it becomes > much faster than strcmp()-based if/else series because in this case you > start by comparing lengths instead of comparing contents. And when your > macros allow you to constify string constants, the compiler will replace > long "if" series with checks against constant values, and may even arrange > them as a tree since all are constants, sometimes mixing with the first > char as the discriminator. Typically on the test below I observe a 10x > speedup at -O3 and ~5x at -O2 when I convert this: > > if (!strcmp(name, "host") || > !strcmp(name, "content-length") || > !strcmp(name, "connection") || > !strcmp(name, "proxy-connection") || > !strcmp(name, "keep-alive") || > !strcmp(name, "upgrade") || > !strcmp(name, "te") || > !strcmp(name, "transfer-encoding")) > return 1; > > to this: > > if (isteq(name, ist("host")) || > isteq(name, ist("content-length")) || > isteq(name, ist("connection")) || > isteq(name, ist("proxy-connection")) || > isteq(name, ist("keep-alive")) || > isteq(name, ist("upgrade")) || > isteq(name, ist("te")) || > isteq(name, ist("transfer-encoding"))) > return 1; > > The code is larger but when compiled at -Os, it instead becomes smaller. > > Another interesting property I'm using in the API above, that might or > might not apply there is that for most archs we care about, functions > can take a struct of two words passed as registers, and can return > such a struct as a pair of registers as well. This allows to chain > functions by passing one function's return as the argument to another > one, which is what users often want to do to avoid intermediate > variables. Chaining should be nice cherry on top for very specific cases but certainly not promoted or advertised. Deleting intermediate variables promotes implementation-defined behaviour because of unspecified order of evaluation of function arguments. Second, debuggers still operate with lines in mind, so jumping to the next statement written like this f(g(), h()) can be problematic. Intermediate variables are much less of a problem now that -Wdeclaration-after-statement has been finally abolished! They don't consume LOC anymore. > All this to say that length-based strings do offer quite a lot of > benefits over the long term. As long as they are named kstring :-) Or std_string, he-he. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-19 11:40 ` Alexey Dobriyan @ 2023-10-19 12:00 ` Willy Tarreau 0 siblings, 0 replies; 20+ messages in thread From: Willy Tarreau @ 2023-10-19 12:00 UTC (permalink / raw) To: Alexey Dobriyan Cc: Kees Cook, Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 19, 2023 at 02:40:52PM +0300, Alexey Dobriyan wrote: > On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > > interfaces. > > > > > > > > If we want that we need to stop pretendening direct manipulation of > > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > > one helper with another, maybe slightly better, one probably > > > > introduces more bugs than it fixes. > > > > > > > > If we want to attack the issue for real we need to use something > > > > better. > > > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > > append to strings including creating complex ones. Kent had a bunch > > > > of good ideas on how to improve it, but couldn't be convinced to > > > > contribute to it instead of duplicating the functionality which > > > > is a bit sad, but I think we need to switch to something like > > > > seq_buf that actually has a counted string instead of all this messing > > > > around with the null-terminated strings. > > > > > > When doing more complex string creation, I agree. I spent some time > > > doing this while I was looking at removing strcat() and strlcat(); this > > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > > %NUL termination _and_ does the length counting.) The only thing clunky > > > about it was initialization, but all the conversions I experimented with > > > were way cleaner using seq_buf. > > (...) > > > > I also agree. I'm using several other schemes based on pointer+length in > > other projects and despite not being complete in terms of API (due to the > > slow migration of old working code), over time it proves much easier to > > use and requires far less controls. > > > > With NUL-teminated strings you need to perform checks for each and every > > operation. When the length is known and controlled, most often you can > > get rid of many tests on intermediate operations and perform a check at > > the end, thus you end up with less "if" and "goto fail" in the code, > > because the checks are no longer for "not crashing nor introducing > > vulnerabilities", but just "returning a correct result", which can often > > be detected more easily. > > > > Another benefit I found by accident is that when you need to compare some > > tokens against multiple ones (say some keywords for example), it becomes > > much faster than strcmp()-based if/else series because in this case you > > start by comparing lengths instead of comparing contents. And when your > > macros allow you to constify string constants, the compiler will replace > > long "if" series with checks against constant values, and may even arrange > > them as a tree since all are constants, sometimes mixing with the first > > char as the discriminator. Typically on the test below I observe a 10x > > speedup at -O3 and ~5x at -O2 when I convert this: > > > > if (!strcmp(name, "host") || > > !strcmp(name, "content-length") || > > !strcmp(name, "connection") || > > !strcmp(name, "proxy-connection") || > > !strcmp(name, "keep-alive") || > > !strcmp(name, "upgrade") || > > !strcmp(name, "te") || > > !strcmp(name, "transfer-encoding")) > > return 1; > > > > to this: > > > > if (isteq(name, ist("host")) || > > isteq(name, ist("content-length")) || > > isteq(name, ist("connection")) || > > isteq(name, ist("proxy-connection")) || > > isteq(name, ist("keep-alive")) || > > isteq(name, ist("upgrade")) || > > isteq(name, ist("te")) || > > isteq(name, ist("transfer-encoding"))) > > return 1; > > > > The code is larger but when compiled at -Os, it instead becomes smaller. > > > > Another interesting property I'm using in the API above, that might or > > might not apply there is that for most archs we care about, functions > > can take a struct of two words passed as registers, and can return > > such a struct as a pair of registers as well. This allows to chain > > functions by passing one function's return as the argument to another > > one, which is what users often want to do to avoid intermediate > > variables. > > Chaining should be nice cherry on top for very specific cases but certainly > not promoted or advertised. Deleting intermediate variables promotes > implementation-defined behaviour because of unspecified order of evaluation > of function arguments. Second, debuggers still operate with lines in mind, > so jumping to the next statement written like this > > f(g(), h()) > > can be problematic. It obviously depends what these functions do, but that remains true for lots of other use cases applying to a shared memory location, if that's the case. Also it happens that a lot of string functions that are used as arguments to other ones are in fact lookups, skip, trim etc which only manipulate the pointer and the length and not the contents. > Intermediate variables are much less of a problem now > that -Wdeclaration-after-statement has been finally abolished! They don't > consume LOC anymore. Intermediate variables declared after statements remain an abomination which turn a visual lookup from O(indent_levels) to O(lines) because normally you only have to quickly glance a the previous opening brace and if you don't find, you repeat, but with them you have to visually scan every single line. They're now allowed for macros and iterators which can make a good use of them but it's not a reason for abusing them in code supposed to be reviewable by humans. > > All this to say that length-based strings do offer quite a lot of > > benefits over the long term. > > As long as they are named kstring :-) Or std_string, he-he. That point is the last of my concerns ;-) Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-19 6:01 ` the nul-terminated string helper desk chair rearrangement Kees Cook 2023-10-19 7:01 ` Willy Tarreau @ 2023-10-20 4:46 ` Christoph Hellwig 2023-10-20 17:40 ` Justin Stitt 1 sibling, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2023-10-20 4:46 UTC (permalink / raw) To: Kees Cook Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > Almost all of the remaining strncpy() usage is just string to string > copying, but the corner cases that are being spun out that aren't > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > and memcpy(). Each of these are a clear improvement since they remove > the ambiguity of the intended behavior. Using seq_buf ends up being way > more overhead than is needed. I'm really not sure strscpy is much of an improvement. In this particular case in most other places we simply use a snprintf for nqns, which seems useful here to if we don't want the full buf. But switching to a completely undocumented helper like strscpy seems not useful at all. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-20 4:46 ` Christoph Hellwig @ 2023-10-20 17:40 ` Justin Stitt 2023-10-20 17:56 ` Linus Torvalds 2023-10-20 18:30 ` Kees Cook 0 siblings, 2 replies; 20+ messages in thread From: Justin Stitt @ 2023-10-20 17:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Kees Cook, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > Almost all of the remaining strncpy() usage is just string to string > > copying, but the corner cases that are being spun out that aren't > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > and memcpy(). Each of these are a clear improvement since they remove > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > more overhead than is needed. > > I'm really not sure strscpy is much of an improvement. In this particular > case in most other places we simply use a snprintf for nqns, which seems > useful here to if we don't want the full buf. > > But switching to a completely undocumented helper like strscpy seems not > useful at all. There's some docs at [1]. Perhaps there could be more? [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-20 17:40 ` Justin Stitt @ 2023-10-20 17:56 ` Linus Torvalds 2023-10-20 18:22 ` Kees Cook 2023-10-20 18:30 ` Kees Cook 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2023-10-20 17:56 UTC (permalink / raw) To: Justin Stitt Cc: Christoph Hellwig, Kees Cook, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Note that we have so few 'strlcpy()' calls that we really should remove that horrid horrid interface. It's a buggy piece of sh*t. 'strlcpy()' is fundamentally unsafe BY DESIGN if you don't trust the source string - which is one of the alleged reasons to use it. That said, while 'strscpy()' fixes the fundamental conceptual bugs of strlcpy(), it's worth noting that it has *two* differences wrt strlcpy: - it doesn't have the broken return value - it copies things in word-size chunks to be more efficient And because of that word-size thing it will possibly insert more than one NUL character at the end of the result. To give an example, if you have dst[64] = "abcdefghijklmnopqrstuvwxyz"; src[64] = "123\0........"; strlcpy(dst, src, 64); then the strlcpy() will return 3 (the size of the copy), but the destination will end up looking like dst[64] = "123\0\0\0\0\0ijklmnopqrstuvwxyz..."; This odd "possibly word padding" is basically never relevant (and it's obviously always also limited by the size you gave it), but *if* you are doing something really odd, and you expect the end of the destination string to not be modified at all past the final NUL of the copy, you need to be aware of this. It does mean that if you used to have dst[4]; strlcpy(dst, "abc", 8); then that *used* to work (because it would copy four bytes: "abc\0" and that fits in 'dst[]'). But dst[4]; strscpy(dst, "abc", 8); will overflow dst[], because it will do a word-copy and you told 'strscpy()' that you had a 8-byte buffer, and it will try to write "abc\0\0\0\0\0" into the destination. The above is insane code, but it's an example of why a blind strlcpy->strscpy conversion might change semantics. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-20 17:56 ` Linus Torvalds @ 2023-10-20 18:22 ` Kees Cook 0 siblings, 0 replies; 20+ messages in thread From: Kees Cook @ 2023-10-20 18:22 UTC (permalink / raw) To: Linus Torvalds Cc: Justin Stitt, Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit, Azeem Shaikh On Fri, Oct 20, 2023 at 10:56:31AM -0700, Linus Torvalds wrote: > On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Note that we have so few 'strlcpy()' calls that we really should > remove that horrid horrid interface. It's a buggy piece of sh*t. Yup, that's on-going. There's just a few left; Azeem has been chipping away at strlcpy. > It does mean that if you used to have > > dst[4]; > strlcpy(dst, "abc", 8); > > then that *used* to work (because it would copy four bytes: "abc\0" > and that fits in 'dst[]'). But > > dst[4]; > strscpy(dst, "abc", 8); > > will overflow dst[], because it will do a word-copy and you told > 'strscpy()' that you had a 8-byte buffer, and it will try to write > "abc\0\0\0\0\0" into the destination. Luckily, we already have checks for these mismatched sizes at compile time (i.e. CONFIG_FORTIFY_SOURCE will already check for pathological cases like above where 8 > sizeof(dst)). > The above is insane code, but it's an example of why a blind > strlcpy->strscpy conversion might change semantics. Totally agreed. All of the recent string conversions have been paying close attention to the behavioral differences. -- Kees Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-20 17:40 ` Justin Stitt 2023-10-20 17:56 ` Linus Torvalds @ 2023-10-20 18:30 ` Kees Cook 2023-10-26 10:01 ` Christoph Hellwig 1 sibling, 1 reply; 20+ messages in thread From: Kees Cook @ 2023-10-20 18:30 UTC (permalink / raw) To: Christoph Hellwig Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Fri, Oct 20, 2023 at 10:40:12AM -0700, Justin Stitt wrote: > On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > Almost all of the remaining strncpy() usage is just string to string > > > copying, but the corner cases that are being spun out that aren't > > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > > and memcpy(). Each of these are a clear improvement since they remove > > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > > more overhead than is needed. > > > > I'm really not sure strscpy is much of an improvement. In this particular > > case in most other places we simply use a snprintf for nqns, which seems > > useful here to if we don't want the full buf. > > > > But switching to a completely undocumented helper like strscpy seems not > > useful at all. I'm curious where you looked and didn't find documentation -- perhaps there is an improvement to be made to aim one to where the existing documentation lives? > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Right, And it's even valid kern-doc, which gets rendered in the kernel API docs, along with all the other string functions: https://docs.kernel.org/core-api/kernel-api.html#c.strscpy -- Kees Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-20 18:30 ` Kees Cook @ 2023-10-26 10:01 ` Christoph Hellwig 2023-10-26 11:39 ` James Bottomley 2023-10-26 13:44 ` Andrew Lunn 0 siblings, 2 replies; 20+ messages in thread From: Christoph Hellwig @ 2023-10-26 10:01 UTC (permalink / raw) To: Kees Cook Cc: Christoph Hellwig, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Fri, Oct 20, 2023 at 11:30:49AM -0700, Kees Cook wrote: > I'm curious where you looked and didn't find documentation -- perhaps > there is an improvement to be made to aim one to where the existing > documentation lives? My order was the following: - look for kernel doc on the main function implementation in lib/string.c (as found by a grep for an EXPORT_SYMBOL for it) - after not finding it there, but seeing that it has an ifdef for an arch override, which turns out to be unused - then I grepped the Documentation/ directory for it, and while there are quite a few matches for strscpy, they are largely in examples, with the only text referring to strscpy being Documentation/process/deprecated.rst that tells you to use it instead of strcpy, but not how it actually works - after that I realized that some people put the kerneldoc on the declaration, so I looked at that in string.h, but couldn't find it. > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Right, And it's even valid kern-doc, which gets rendered in the kernel > API docs, along with all the other string functions: > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy Well, I never use the generated kerneldoc because it's much harder than just grepping the tree, but indeed it exists even if it's hidden in the most obsfucated way. But at least I know now! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 10:01 ` Christoph Hellwig @ 2023-10-26 11:39 ` James Bottomley 2023-10-26 13:52 ` Steven Rostedt 2023-10-26 13:44 ` Andrew Lunn 1 sibling, 1 reply; 20+ messages in thread From: James Bottomley @ 2023-10-26 11:39 UTC (permalink / raw) To: Christoph Hellwig, Kees Cook Cc: Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, 2023-10-26 at 12:01 +0200, Christoph Hellwig wrote: > > > There's some docs at [1]. Perhaps there could be more? > > > > > > [1]: > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > > > Right, And it's even valid kern-doc, which gets rendered in the > > kernel API docs, along with all the other string functions: > > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy > > Well, I never use the generated kerneldoc because it's much harder > than just grepping the tree, but indeed it exists even if it's hidden > in the most obsfucated way. But at least I know now! This, honestly, is one of the really annoying problems of kerneldoc. When looking for structures or functions git grep "<function> -" usually finds it. However, I recently asked on linux-scsi if we could point to the doc about system_state and what it meant. However, either we all suck or there's no such documentation because no-one could find it. While it's nice in theory to have everything documented, it's not much use if no one can actually find the information ... James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 11:39 ` James Bottomley @ 2023-10-26 13:52 ` Steven Rostedt 2023-10-26 13:59 ` Geert Uytterhoeven 2023-10-26 14:05 ` Jonathan Corbet 0 siblings, 2 replies; 20+ messages in thread From: Steven Rostedt @ 2023-10-26 13:52 UTC (permalink / raw) To: James Bottomley Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, 26 Oct 2023 07:39:44 -0400 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > While it's nice in theory to have everything documented, it's not much > use if no one can actually find the information ... Does kerneldoc provide an automated index? That is, if we had a single file that had every function in the kernel that is documented, with the path to the file that documents it, it would make finding documentation much simpler. Maybe it already does? Which would mean we need a way to find the index too! -- Steve ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 13:52 ` Steven Rostedt @ 2023-10-26 13:59 ` Geert Uytterhoeven 2023-10-27 18:32 ` Alexey Dobriyan 2023-10-26 14:05 ` Jonathan Corbet 1 sibling, 1 reply; 20+ messages in thread From: Geert Uytterhoeven @ 2023-10-26 13:59 UTC (permalink / raw) To: Steven Rostedt Cc: James Bottomley, Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit Hi Steven, On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > While it's nice in theory to have everything documented, it's not much > > use if no one can actually find the information ... > > Does kerneldoc provide an automated index? That is, if we had a single file > that had every function in the kernel that is documented, with the path to > the file that documents it, it would make finding documentation much > simpler. > > Maybe it already does? Which would mean we need a way to find the index too! ctags? Although "git grep" is faster (assumed you use the "correct" search pattern, which can sometimes be challenging, indeed). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 13:59 ` Geert Uytterhoeven @ 2023-10-27 18:32 ` Alexey Dobriyan 0 siblings, 0 replies; 20+ messages in thread From: Alexey Dobriyan @ 2023-10-27 18:32 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Steven Rostedt, James Bottomley, Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 26, 2023 at 03:59:28PM +0200, Geert Uytterhoeven wrote: > Hi Steven, > > On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 26 Oct 2023 07:39:44 -0400 > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > > > While it's nice in theory to have everything documented, it's not much > > > use if no one can actually find the information ... > > > > Does kerneldoc provide an automated index? That is, if we had a single file > > that had every function in the kernel that is documented, with the path to > > the file that documents it, it would make finding documentation much > > simpler. > > > > Maybe it already does? Which would mean we need a way to find the index too! > > ctags? ctags is a tool from previous century. It doesn't help that "make tags" is single-threaded. It needs constant babysitting (loop-like macros, ignore attibute annotations which masquerade as identifiers). I think "make tags" became much slower because ignore-list is one giant regexp which only grows bigger. > Although "git grep" is faster (assumed you use the "correct" search > pattern, which can sometimes be challenging, indeed). I tried QT Creator indexing at some point (which is parallel), it needs to be told that headers are C not C++. I didn't find a way to tell it that .c files are C too but F2 jumped to definitions quite well. Also hovering over identifier/name works (being IDE it understands popular doc styles). It can be made to work reasonably well provided that you did "make allmodconfig" and added few header locations. clangd parses like compiler, not like human and kernel uses a lot of CONFIG defines so some config must be chosen. But I need to recheck all this stuff now that new version was propagated to distros. It should be better (and less segfaulty :-) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 13:52 ` Steven Rostedt 2023-10-26 13:59 ` Geert Uytterhoeven @ 2023-10-26 14:05 ` Jonathan Corbet 2023-10-27 7:08 ` Kalle Valo 1 sibling, 1 reply; 20+ messages in thread From: Jonathan Corbet @ 2023-10-26 14:05 UTC (permalink / raw) To: Steven Rostedt, James Bottomley Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit Steven Rostedt <rostedt@goodmis.org> writes: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >> While it's nice in theory to have everything documented, it's not much >> use if no one can actually find the information ... > > Does kerneldoc provide an automated index? If you go to https://www.kernel.org/doc/html/latest/ and type a symbol into the search box on the left, you'll get directed to the right place (if it exists). For James's system_state example, it makes it clear that there's only one reference - in the coding-style document, of all places... I've never looked into that index to see how hard it would be to access without a browser. jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 14:05 ` Jonathan Corbet @ 2023-10-27 7:08 ` Kalle Valo 0 siblings, 0 replies; 20+ messages in thread From: Kalle Valo @ 2023-10-27 7:08 UTC (permalink / raw) To: Jonathan Corbet Cc: Steven Rostedt, James Bottomley, Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit Jonathan Corbet <corbet@lwn.net> writes: > Steven Rostedt <rostedt@goodmis.org> writes: > >> On Thu, 26 Oct 2023 07:39:44 -0400 >> James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >> >>> While it's nice in theory to have everything documented, it's not much >>> use if no one can actually find the information ... >> >> Does kerneldoc provide an automated index? > > If you go to https://www.kernel.org/doc/html/latest/ BTW there's now a shorter URL for this whic is really nice: https://docs.kernel.org/ -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 10:01 ` Christoph Hellwig 2023-10-26 11:39 ` James Bottomley @ 2023-10-26 13:44 ` Andrew Lunn 2023-10-26 13:51 ` Laurent Pinchart 2023-10-26 14:27 ` James Bottomley 1 sibling, 2 replies; 20+ messages in thread From: Andrew Lunn @ 2023-10-26 13:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 I found that https://elixir.bootlin.com/linux is the best way to find Documentation for functions and structures. I would suggest try it first, and only when what fails to start using grep. Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 13:44 ` Andrew Lunn @ 2023-10-26 13:51 ` Laurent Pinchart 2023-10-26 14:27 ` James Bottomley 1 sibling, 0 replies; 20+ messages in thread From: Laurent Pinchart @ 2023-10-26 13:51 UTC (permalink / raw) To: Andrew Lunn Cc: Christoph Hellwig, Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, Oct 26, 2023 at 03:44:22PM +0200, Andrew Lunn wrote: > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux is the best way to find > Documentation for functions and structures. I would suggest try it > first, and only when what fails to start using grep. It's painful to have to open the HTML documentation generated by the kernel build system when developing, and even more painful to have to use a particular website. If the documentation is difficult to locate in the source tree, we're doing something wrong. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the nul-terminated string helper desk chair rearrangement 2023-10-26 13:44 ` Andrew Lunn 2023-10-26 13:51 ` Laurent Pinchart @ 2023-10-26 14:27 ` James Bottomley 1 sibling, 0 replies; 20+ messages in thread From: James Bottomley @ 2023-10-26 14:27 UTC (permalink / raw) To: Andrew Lunn, Christoph Hellwig Cc: Kees Cook, Justin Stitt, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel, linux-hardening, ksummit On Thu, 2023-10-26 at 15:44 +0200, Andrew Lunn wrote: > > > > [1]: > > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux That's a 404, I think you mean https://elixir.bootlin.com/linux/latest/source > is the best way to find Documentation for functions and structures. > I would suggest try it first, and only when what fails to start using > grep. I just tried it with system_state and it doesn't even find the definition. I think it might be because it has annotations which confuse the searcher (it's in init/main.c as enum system_states system_state __read_mostly; ). If there's any meaningful doc about it, elixir also doesn't find it. James ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-10-27 18:32 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com>
2023-10-19 5:46 ` the nul-terminated string helper desk chair rearrangement, was: Re: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy Christoph Hellwig
2023-10-19 6:01 ` the nul-terminated string helper desk chair rearrangement Kees Cook
2023-10-19 7:01 ` Willy Tarreau
2023-10-19 11:40 ` Alexey Dobriyan
2023-10-19 12:00 ` Willy Tarreau
2023-10-20 4:46 ` Christoph Hellwig
2023-10-20 17:40 ` Justin Stitt
2023-10-20 17:56 ` Linus Torvalds
2023-10-20 18:22 ` Kees Cook
2023-10-20 18:30 ` Kees Cook
2023-10-26 10:01 ` Christoph Hellwig
2023-10-26 11:39 ` James Bottomley
2023-10-26 13:52 ` Steven Rostedt
2023-10-26 13:59 ` Geert Uytterhoeven
2023-10-27 18:32 ` Alexey Dobriyan
2023-10-26 14:05 ` Jonathan Corbet
2023-10-27 7:08 ` Kalle Valo
2023-10-26 13:44 ` Andrew Lunn
2023-10-26 13:51 ` Laurent Pinchart
2023-10-26 14:27 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox