From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27780C83030 for ; Mon, 7 Jul 2025 05:11:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA73C8D0013; Mon, 7 Jul 2025 01:11:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A58488D0002; Mon, 7 Jul 2025 01:11:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 946CF8D0013; Mon, 7 Jul 2025 01:11:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7D4CF8D0002 for ; Mon, 7 Jul 2025 01:11:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 18518C038A for ; Mon, 7 Jul 2025 05:11:42 +0000 (UTC) X-FDA: 83636296044.01.D10ADF1 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf19.hostedemail.com (Postfix) with ESMTP id 7A1DE1A000C for ; Mon, 7 Jul 2025 05:11:40 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mIhU3eks; spf=pass (imf19.hostedemail.com: domain of alx@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=alx@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751865100; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7jIJW515gzjzho8lEaiEGu7jRz5B3BP2+Rdmn/GpoT4=; b=elPlg6GTK44D5TEILuju5vcpNPpOMbiMUQbRZEkzUIsgJudAuI7I8yLr1uSo/c4/X/f4Ph sTze43lW+ubdachLeh83bD4I2ClKo/pUDdJKp8QsN293IBhlzYjjfBV1CzvzqadAiwU/dO AYoND0Nc3DEqu/vy3U9cs0X2AfJA8vI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mIhU3eks; spf=pass (imf19.hostedemail.com: domain of alx@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=alx@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751865100; a=rsa-sha256; cv=none; b=e91tPZTE7iwemBVV/Z5MStYPSmsaUed6Oq53/xPrh2sNbSdiiDuzreh6+SJZfYtIqz0Uax tQFFmtqH0bhS38r7uLZuFo6bZS1Uu5DcAGESYoC/rI0z6otSKz18AyMm7VELeRRO46TiOh LHZDfHNRXKcOIkQXd1ufO/2jWbLqq7I= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id C48D1A529B8; Mon, 7 Jul 2025 05:11:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FFD5C4CEE3; Mon, 7 Jul 2025 05:11:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751865099; bh=WLZoeE5iX24zrpGxc/TP/3rPS0CtBHbbT6mhw46FiQ4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mIhU3eks/24Br5YRegOZKnpBl9cCiG/Hn+78znPhNtaORBKwkC2C5/w9nL0Ey3W6M UgN9jP5H5BYGpRvjZi1YoMTf3fyYl7et1S978RGmrB81UULAHcH/3VH7nXN3hbdaLk hXX4LgpLNnzAK4s1b7jn3P1F0w+bu0TRETWDJcr5ZyN31Td4mp49wezhC6LXFs4Zat V6l2xEhj1u22qcPzj00OSrvX07qT8eZceIb2gxWvvvb1Ay46g0WkGSgM4VXY7W4lN2 twZzbT4pwZxRZG/3foKB76OJTaOeaDeBszScd7N5v/5OkNkjrnd81Zh9gytbAuVIIZ kNJJ5XeqnIWxA== Date: Mon, 7 Jul 2025 07:11:37 +0200 From: Alejandro Colomar To: linux-mm@kvack.org, linux-hardening@vger.kernel.org Cc: Kees Cook , Christopher Bazley , shadow <~hallyn/shadow@lists.sr.ht>, linux-kernel@vger.kernel.org, Andrew Morton , kasan-dev@googlegroups.com, Dmitry Vyukov , Alexander Potapenko , Marco Elver , Christoph Lameter , David Rientjes , Vlastimil Babka , Roman Gushchin , Harry Yoo , Andrew Clayton Subject: Re: [RFC v3 0/7] Add and use seprintf() instead of less ergonomic APIs Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="mmmwfwiqbayszx3n" Content-Disposition: inline In-Reply-To: X-Stat-Signature: 8oscfgwc51gzk18brftcqtd3tby77z3d X-Rspamd-Queue-Id: 7A1DE1A000C X-Rspamd-Server: rspam11 X-Rspam-User: X-HE-Tag: 1751865100-949407 X-HE-Meta: U2FsdGVkX1+77Kh01hTBLbTQYz8tD3IxFqg04DB0rAZfdW4Z02ygYp6xpWb/JogCVh/nhmDQw+e2NJSK1ikgtj1zmztGMF/1JS4xfYZQbl1mrOgSKNpEPXdfcPniZMNFZb+YvjLhbO1h6eky8hrQxpxPvhyxSSZMatpE60kceQrNQMbIUTl1+jICBCkGLnGG/zi79MJ9P9Hj29a0vBc2aQjOiDsedsPd1TY2vxYa6ppOW++UJqITBjv3tHZkp4UxpREzhMIpnwqj0hWS/aTgLJjI1Czr2cBRFxH1WczcSxAIi+o/U6/hyBwguWZbP85r8zvZ9B7pIaPeLtPPl8Yl7zG/pqVyFn87hzv4ZcHA1cyA+pAz7yebvLvcxzfWDuJJirkzrYSLkipLTjYYj5vWIY9J5yH8ZmjvmQ/FrdpaY46qrPWetbjkJW7HMaDCubSsDvZdA1Smyew0bBeuLm6e1YKMzVCIPFS9IPgNnBfHCr9+dqkGkO1dss3AhEALBhx2BfIBNK8XqcBBj38tKuGtOc7cnA/eGE4lWKa2z9GkQ6aw92mpvCAAbcyFYO2VoDEhG1lwom7WCUW6jOmL2KnDG+oNTvVbAcwUVMV5la42Q+QCUEKuaWLip/Fk7e/sNzGSZ/HrzErJjx2dkKry5jPLOQqfqj50q6hpQYVhWIjJP4IY0TMcwRqiKGhF/x6tIK0wftX2ZsPQym3OPQoLpwXhGRvS7kAj2mEgFG1Gl+FYXFrG3iePT+ALrst6odLoFtRzZrOOP2sjeZPwekmLz2lZ3lpDFdknTIAkshHJXTNYi8SWHnHf7cuR1egVV9DesobO1GGFO6lKANbfUAgpl03OGMIfjrwLBP/6pQlpic+EKv08yOkCv2LhcqgCfxZgCNe3p0+DUCvQrrAoVXRkKNCCDQTUQs8uS4P4fGyaesvY6mwjDS6BlWr8p1U6a0Ea7Nw+zxzRuSGUIFkQGbt3q6F Jt2Tmxjk DouXirL4VKNaQaVuMaAV1KsecwAtG8RoJ80uokQSMsG1dcEiQGFlQ9qG5ZL2eYJ4V56wUHJL5QdyYfeetbfm+yxSTFAawZtMKnPWg9f/jkAG+mUHvItY11NIwe6pYwPBkurPw88HrARyRg2UgpqTUP9NUW+8d8OL/YrsvaZCwFAjy70OrT6dP8wTMTUKGxL40KxcxBMztfTU3oV7NCn8aAvPvDx2/7f8wNSNZiD0D9dvuMaBBSLD2iN14MQdS1XZaipYm4F4voTtodLfvPl6YkNDe8KrvmZnmaz6zc/5K17qWkr1ItcI/SrWBJMtmOdUVXvILrCx4M4NejHAgRCtR0iqsHtr5BnRLVj0FPow4fhw5PQ+3/PbCeJ8n1fBLKX8dQjthDesZ/Y1y5Sr7ZYSXcdv/48h1J/n18+CLZs14dV4ZI5HEqOkbcnE6QsTUeVjJAs2fg8Q844E0NGrGURhMydL2vRZ1Jn+qJh4fhbRF+U2if17tWPkXP6VTcx3aLU8MPgl1zOKXDljlMiKsRkubhASRrAWGsgsCg/kdzIUujqb38NwvGApFRII6BwPqg0SjI4tBiFjcpb/CK38VRzli1lh+Mn3LAQA4hCNkpywTR5h0R8X2XQbg/Xljwa1jEL9x9FOP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --mmmwfwiqbayszx3n Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable From: Alejandro Colomar To: linux-mm@kvack.org, linux-hardening@vger.kernel.org Cc: Kees Cook , Christopher Bazley , shadow <~hallyn/shadow@lists.sr.ht>, linux-kernel@vger.kernel.org, Andrew Morton , kasan-dev@googlegroups.com, Dmitry Vyukov , Alexander Potapenko , Marco Elver , Christoph Lameter , David Rientjes , Vlastimil Babka , Roman Gushchin , Harry Yoo , Andrew Clayton Subject: Re: [RFC v3 0/7] Add and use seprintf() instead of less ergonomic APIs References: MIME-Version: 1.0 In-Reply-To: On Mon, Jul 07, 2025 at 07:06:06AM +0200, Alejandro Colomar wrote: > I've written an analysis of snprintf(3), why it's dangerous, and how > these APIs address that, and will present it as a proposal for > standardization of these APIs in ISO C2y. I'll send that as a reply to > this message in a moment, as I believe it will be interesting for > linux-hardening@. Hi, Here is the proposal for ISO C2y (see below). I'll also send it to the C Committee for discussion.=20 Have a lovely night! Alex --- Name alx-0049r1 - add seprintf() Principles - Codify existing practice to address evident deficiencies. - Enable secure programming Category Standardize existing libc APIs Author Alejandro Colomar Cc: Christopher Bazley History r0 (2025-07-06): - Initial draft. r1 (2025-07-06): - wfix. - tfix. - Expand on the off-by-one bugs. - Note that ignoring truncation is not valid most of the time. Rationale snprintf(3) is very difficult to chain for writing parts of a string in separate calls, such as in a loop. Let's start from the obvious sprintf(3) code (sprintf(3) will not prevent overflow, but let's take it as a baseline from which programmers start thinking): p =3D buf; for (...) p +=3D sprintf(p, ...); Then, programmers will start thinking about preventing buffer overflows. Programmers sometimes will naively add some buffer size information and use snprintf(3): p =3D buf; size =3D countof(buf); for (...) p +=3D snprintf(p, size - (p - buf), ...); if (p >=3D buf + size) // or worse, (p > buf + size - 1) goto fail; (Except for minor differences, this kind of code can be found everywhere. Here are a couple of examples: .) This has several issues, starting with the difficulty of getting the second argument right. Sometimes, programmers will be too confused, and slap a -1 there just to be safe. p =3D buf; size =3D countof(buf); for (...) p +=3D snprintf(p, size - (p - buf) - 1, ...); if (p >=3D buf + size -1) goto fail; (Except for minor differences, this kind of code can be found everywhere. Here are a couple of examples: .) Programmers will sometimes hold a pointer to one past the last element in the array. This is a wise choice, as that pointer is constant throughout the lifetime of the object. Then, programmers might end up with something like this: p =3D buf; e =3D buf + countof(buf); for (...) p +=3D snprintf(p, e - p, ...); if (p >=3D end) goto fail; This is certainly much cleaner. Now a programmer might focus on the fact that this can overflow the pointer. An easy approach would be to make sure that the function never returns more than the remaining size. That is, one could implement something like this scnprintf() --name chosen to match the Linux kernel API of the same name--. For the sake of simplicity, let's ignore multiple evaluation of arguments. #define scnprintf(s, size, ...) \ ({ \ int len_; \ len_ =3D snprintf(s, size, __VA_ARGS__); \ if (len_ =3D=3D -1) \ len_ =3D 0; \ if (len_ >=3D size) \ len_ =3D size - 1; \ \ len_; \ }) p =3D buf; e =3D buf + countof(buf); for (...) p +=3D scnprintf(p, e - p, ...); (Except for minor differences, this kind of code can be found everywhere. Here's an example: .) Now the programmer got rid of pointer overflow. However, they now have silent truncation that cannot be detected. In some cases this may seem good enough. However, often it's not. And anyway, some code remains using snprintf(3) to be able to detect truncation. Moreover, this kind of code ignores the fact that vsnprintf(3) can fail internally, in which case there's not even a truncated string. In the kernel, they're fine, because their internal vsnprintf() doesn't seem to ever fail, so they can always rely on the truncated string. This is not reliable in projects that rely on the libc vsnprintf(3). For the code that needs to detect truncation, a programmer might choose a different path. It would keep using snprintf(3), but would use a temporary length variable instead of the pointer. p =3D buf; e =3D buf + countof(buf); for (...) { len =3D snprintf(p, e - p, ...); if (len =3D=3D -1) goto fail; if (len >=3D e - p) goto fail; p +=3D len; } This is naturally error-prone. A colleague of mine --which is an excellent programmer, to be clear--, had a bug even after knowing about it and having tried to fix it. That shows how hard it is to write this correctly: In a similar fashion, the strlcpy(3) manual page from OpenBSD documents a similar issue when chaining calls to strlcpy(3) --which was designed with semantics equivalent to snprintf(3), except for not formatting the string--: | char *dir, *file, pname[MAXPATHLEN]; | size_t n; | | ... | | n =3D strlcpy(pname, dir, sizeof(pname)); | if (n >=3D sizeof(pname)) | goto toolong; | if (strlcpy(pname + n, file, sizeof(pname) - n) >=3D sizeof(pname) = - n) | goto toolong; | | However, one may question the validity of such optimiza=E2=80=90 | tions, as they defeat the whole purpose of strlcpy() and | strlcat(). As a matter of fact, the first version of | this manual page got it wrong. Finally, a programmer might realize that while this is error- prone, this is indeed the right thing to do. There's no way to avoid it. One could then think of encapsulating this into an API that at least would make it easy to write. Then, one might wonder what the right parameters are for such an API. The only immutable thing in the loop is 'e'. And apart from that, one needs to know where to write, which is 'p'. Let's start with those, and try to keep all the other information (size, len) without escaping the API. Again, let's ignore multiple- evaluation issues in this macro for the sake of simplicity. #define foo(p, e, ...) \ ({ \ int len_ =3D snprintf(p, e - p, __VA_ARGS__); \ if (len_ =3D=3D -1) \ p =3D NULL; \ else if (len_ >=3D e - p) \ p =3D NULL; \ else \ p +=3D len_; \ p; }) p =3D buf; e =3D buf + countof(buf); for (...) { p =3D foo(p, e, ...); if (p =3D=3D NULL) goto fail; } We've advanced a lot. We got rid of the buffer overflow; we also got rid of the error-prone code at call site. However, one might think that checking for truncation after every call is cumbersome. Indeed, it is possible to slightly tweak the internals of foo() to propagate errors from previous calls. #define seprintf(p, e, ...) \ ({ \ if (p !=3D NULL) { \ int len_; \ \ len_ =3D snprintf(p, e - p, __VA_ARGS__); \ if (len_ =3D=3D -1) \ p =3D NULL; \ else if (len_ >=3D e - p) \ p =3D NULL; \ else \ p +=3D len_; \ } \ p; \ }) p =3D buf; e =3D buf + countof(buf); for (...) p =3D seprintf(p, e, ...); if (p =3D=3D NULL) goto fail; By propagating an input null pointer directly to the output of the API, which I've called seprintf() --the 'e' refers to the 'end' pointer, which is the key in this API--, we've allowed ignoring null pointers until after the very last call. If we compare our resulting code to the sprintf(3)-based baseline, we got --perhaps unsurprisingly-- something quite close to it: p =3D buf; for (...) p +=3D sprintf(p, ...); vs p =3D buf; e =3D buf + countof(buf); for (...) p =3D seprintf(p, e, ...); if (p =3D=3D NULL) goto fail; And the seprintf() version is safe against both truncation and buffer overflow. Some important details of the API are: - When 'p' is NULL, the API must preserve errno. This is important to be able to determine the cause of the error after all the chained calls, even when the error occurred in some call in the middle of the chain. - When truncation occurs, a distinct errno value must be used, to signal the programmer that at least the string is reliable to be used as a null-terminated string. The error code chosen is E2BIG, for compatibility with strscpy(), a Linux kernel internal API with which this API shares many features in common. - When a hard error (an internal snprintf(3) error) occurs, an error code different than E2BIG must be used. It is important to set errno, because if an implementation would chose to return NULL without setting errno, an old value of E2BIG could lead the programmer to believe the string was successfully written (and truncated), and read it with nefast consequences. Prior art This API is implemented in the shadow-utils project. Plan9 designed something quite close, which they call seprint(2). The parameters are the same --the right choice--, but they got the semantics for corner cases wrong. Ironically, the existing Plan9 code I've seen seems to expect the semantics that I chose, regardless of the actual semantics of the Plan9 API. This is --I suspect--, because my semantics are actually the intuitive semantics that one would naively guess of an API with these parameters and return value. I've implemented this API for the Linux kernel, and found and fixed an amazing amount of bugs and other questionable code in just the first handful of files that I inspected. Future directions The 'e =3D buf + _Countof(buf)' construct is something I've found to be quite common. It would be interesting to have an _Endof operator that would return a pointer to one past the last element of an array. It would require an array operand, just like _Countof. If an _Endof operator is deemed too cumbersome for implementation, an endof() standard macro that expands to the obvious implementation with _Countof could be okay. This operator (or operator-like macro) would prevent off-by-one bugs when calculating the end sentinel value, such as those shown above (with links to Linux kernel real bugs). Proposed wording Based on N3550. 7.24.6 Input/output :: Formatted input/output functions ## New section after 7.24.6.6 ("The snprintf function"): +7.24.6.6+1 The seprintf function + +Synopsis +1 #include + char *seprintf(char *restrict p, const char end[0], const char *restrict= format, ...); + +Description +2 The $0 function + is equivalent to fprintf, + except that the output is written into an array + (specified by argument p) + rather than a stream. + If p is a null pointer, + nothing is written, + and the function returns a null pointer. + Otherwise, + end shall compare greater than p; + the function writes at most + end - p - 1 non-null characters, + the remaining output characters are discarded, + and a null character is written + at the end of the characters + actually written to the array. + If copying takes place between objects that overlap, + the behavior is undefined. + +Returns +3 The $0 function returns + a pointer to the terminating null character + if the output was written + without discarding any characters. + +4 + If p is a null pointer, + a null pointer is returned, + and errno is not modified. + +5 + If any characters are discarded, + a null pointer is returned, + and the value of the macro E2BIG + is stored in errno. + +6 + If an error occurred, + a null pointer is returned, + and an implementation-defined non-zero value + is stored in errno. ## New section after 7.24.6.13 ("The vsnprintf function"): +7.24.6.13+1 The vseprintf function + +Synopsis +1 #include + char *vseprintf(char *restrict p, const char end[0], const char *restric= t format, va_list arg); + +Description +2 The $0 function + is equivalent to + seprintf, + with the varying argument list replaced by arg. + +3 + The va_list argument to this function + shall have been initialized by the va_start macro + (and possibly subsequent va_arg invocations). + This function does not invoke the va_end macro.343) 7.33.2 Formatted wide character input/output functions ## New section after 7.33.2.4 ("The swprintf function"): +7.33.2.4+1 The sewprintf function + +Synopsis +1 #include + wchar_t *sewprintf(wchar_t *restrict p, const wchar_t end[0], const wcha= r_t *restrict format, ...); + +Description +2 The $0 function + is equivalent to + seprintf, + except that it handles wide strings. ## New section after 7.33.2.8 ("The vswprintf function"): +7.33.2.8+1 The vsewprintf function + +Synopsis +1 #include + wchar_t *vsewprintf(wchar_t *restrict p, const wchar_t end[0], const wch= ar_t *restrict format, va_list arg); + +Description +2 The $0 function + is equivalent to + sewprintf, + with the varying argument list replaced by arg. + +3 + The va_list argument to this function + shall have been initialized by the va_start macro + (and possibly subsequent va_arg invocations). + This function does not invoke the va_end macro.407) --=20 --mmmwfwiqbayszx3n Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEES7Jt9u9GbmlWADAi64mZXMKQwqkFAmhrVwgACgkQ64mZXMKQ wqmc8hAAjqafnpn73MMIHqLNTRMr8vTQficAqAVGQiMMJFKI8nUO7mcy7OcpzhO/ EKXSdqJj+W5vpStlJ6iyc2CN6CWXPtMrGWS4YN0nS5YsfWxizwjPHUn4/wwF0H/Q /COwaBNK99JOv8iXdkt8pfBmHCKplMkjIhdQsD2LC/N3di0JjY7Os/5+MXqTbaEJ PXr44ALp5nKMpfZD2rqshLUSGuCd3esIq0a4UJX1/QuLfmxvUvKFSQCd7A+D9qwd E8WPkRcAv2deFdLWZbmCw7mMqMFU7uBTnTbzKlIz787HUpJdo/ajZ5/x1rKSPm5Q BTNbFUel4C+wB1NBMUGBC890FmimtOlcknepJNFZa1I+BWCVnvElI7dVPEH3BhTb YIEJQ6gZkb4SLqiSlC1kcIicu3H1zoE/WUycoB3dTaZ7j5miQ3k38QnF8yqMbrtK J7eVMRXWiAD/cU2d7GlDBU9pvC/2BAHFTj2TaXHEui4dAjc7v9C/VKM8FG6MNw3Z S5ChjOOLB+NCnuo/YjINqQkRYFpTeBf73/fdb0I+hQSpq1I9p2XLTuhCCSqHEA5l x5EDin7AMYMkV3dyhmQejOtLQ/y9fD21W4+6PRY8IqsvyQRmRAZxBDjK/WNG5z0T 3oMQ5d2Y68lwXeQfx/7KsTh+fQ7YnWrm385qvByOlxU/+fYArzk= =Wn/n -----END PGP SIGNATURE----- --mmmwfwiqbayszx3n--