From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9F0BC6FD18 for ; Tue, 28 Mar 2023 20:02:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 875766B0071; Tue, 28 Mar 2023 16:02:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8251F6B0072; Tue, 28 Mar 2023 16:02:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C6506B0074; Tue, 28 Mar 2023 16:02:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 54EFD6B0071 for ; Tue, 28 Mar 2023 16:02:08 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DCC8040A8E for ; Tue, 28 Mar 2023 20:02:07 +0000 (UTC) X-FDA: 80619378294.27.365A6E7 Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf20.hostedemail.com (Postfix) with ESMTP id 87AA41C0018 for ; Tue, 28 Mar 2023 20:02:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aIf24F8p; spf=pass (imf20.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680033725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=quq8/ANKE/FnscNdTiexjHovD7OcjRI7MRJvqHe2IEA=; b=oWDKzg7eksEvmtY/sedLYddf+G2dUXYQyWoOAkFiHjMcHqHdAxIoTUenB4tA/HWfEgBX4J Aos6ZJ7ZzbkcC122pa4kwzFSq0LdoRd2+IkaDfqWkkSLcrIkxxWsZcJToZumcyoU1DTnN4 /7LPrtb7ivIdTecJEzXyHYavIDBlLrA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aIf24F8p; spf=pass (imf20.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680033725; a=rsa-sha256; cv=none; b=6vjEymggnVIMm/niZIRxbdjbYqSaEOrwQLKM1Iw+Clh/HiwOh9QMHUS+lFYds7DJdImLlB 0vtJbxZrOl7R/OSBAl7k7oyglGxJ2R2Z0CbD+bhbnEiA+x7CTQyBB6Xy+NvEJ17qDsiNdH ecAhXIQ1Um0jRQm4xyCitjas/OUpcRY= Received: by mail-lf1-f52.google.com with SMTP id g17so17337489lfv.4 for ; Tue, 28 Mar 2023 13:02:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680033723; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=quq8/ANKE/FnscNdTiexjHovD7OcjRI7MRJvqHe2IEA=; b=aIf24F8p0MOP+R24E7qMJdfqE2uQSUOEwEbYPH1i/5lrqf2+6VtvMvnoIAOdUXJAVw ncVk8eXzVEYRRUSba0P/1u6T2V2Yh0c7lIE99BB3dFN3iLSfySKCqB5XlDH7mFnUJkVG 2hykFQOdJ0Vlxh+rfhUY3tZDrCWhlDQChNosu1VIYzHx2PT/TKEx16euq+7L/G3x6DFt J/OyuLpTSZopK+fnPUjl0pQr7LZeNb/3ShBddIXsjR6L4sSddwNZHdLd5fGqi2ObeECg ko3wc6o62KilNt1bC5P97nyYxUwV61h/s//PnMGuS/llA5IqSv6dAvbLhSh30VOnvqjd VKDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680033723; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=quq8/ANKE/FnscNdTiexjHovD7OcjRI7MRJvqHe2IEA=; b=LtUXJBWDsqJ4MiQrFZ1Cld36u0tKq09gQjYEPv1Wd5tEGxL2rx3HYKvDjNQTHCJvGb qqxS+ZYg0zkf2sG5xxgm/RTBtZklBLWZvG5aJDldYEvXLJGiIrH8w3Q4s24cFo/sgycx 7N98SqSTE0+j6vj3x5Rbaw8AJ+wy0uvSRDtF8nzexOwdEt2wbo2QDdiDy4sNwWRth0qq V/fOh841M4VV5cvpxMrfwWYofYNM2IcYHkpQA4eJ65CJU9AlWtvecjChtGGLZ++pOHbj XxhNehn9OUoBKq5o7CWPyt62lKVN3rwtnVQlUBV1WD82fp1DlMoAPJ2k0akge/iJdE4s OurQ== X-Gm-Message-State: AO0yUKUkYQUd7bNdGsaabKOtBfFEVq1r3iy+w/KF9N+wmWbJoeOKmgfO EsgupQKSBYYgKDfCLMRvCSlvlri6eTYliN6Yurn6ig== X-Google-Smtp-Source: AKy350bEPelAbCPo1Tu89aeE2p/hbPM0kQ9mKfZGCLxuv0BO4OudfQC7ys/h19rzRD0JNvyG4LwmDB5bYjxxkhWNKsY= X-Received: by 2002:a05:6512:1090:b0:4d8:86c2:75ea with SMTP id j16-20020a056512109000b004d886c275eamr12563486lfg.3.1680033723149; Tue, 28 Mar 2023 13:02:03 -0700 (PDT) MIME-Version: 1.0 References: <20220722201513.1624158-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Tue, 28 Mar 2023 13:01:26 -0700 Message-ID: Subject: Re: [PATCH] userfaultfd: don't fail on unrecognized features To: Peter Xu Cc: Alexander Viro , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 87AA41C0018 X-Stat-Signature: eqxcozh7g5kocf7torqhdtwn74enfa73 X-HE-Tag: 1680033725-744511 X-HE-Meta: U2FsdGVkX18BrPCHgoBr4W/ol9dhcTnw18cpjNgfL1dyHiS2dEi5bZOgiN50ScCYBOZqJl7xmHo+aF7X6yWsI2EQFuu/J4q91zXnPvltWVUUQIRuYvy86tBVSLSaY70z+GuGnE+5aaWVQvwrkIi3pmgBc+yYpQP+pueFcFTV8xlZ3cyn5W+wG9anDjs+NQf1UF26NBIxtTJxHDBj12/kMKIQndUA30xYaOMHoYTag0ZVpkPgXNnUHt3ij8GNF01okEz1vejfsJnrmarIJjeoyWIhGSsCurIQrYHh7KiqGGXWaN86LMHgtRduZge+W8FwtbLpKe0CKfe/+imk82RKCIlQAU+735jVD+bcdZVX2iFOHA9CM/ThJaMSQvBSFE9V0o47mUy+0QywyCbzEy/UzDKR00b78joN38u02EAjiC1FDCQgFbB9qHxsagZ4nUU2s2jvABZM1NTiXL+zVoknTjS4Ww9kSEYCkwYwdZSlIKtfEHjJLq1WneNvsrjXQS/JP4N1vSI4Uo/W1vtDUjUPlerJvguLoUtG+DZcFJZQZ/IzZzdzce4012PBz8sWCJcWZ5/0a1jnx95UeZG+aRRXJgP80YGax6zBjhpTXZfRojNiEbcYCv0N1O3wYLXz3wIdBOc3d+NTYk8rzepND+1I6P4CopF1R1tuAQCeyrBPw0XKY1vmg62hYtBye8MZVIHw2yN+Jbw6lgQcB5dQocNQUWzYNa72mOGuJpMzFz45NHNoLprDIiZlWMwvfj/SGAeDgwNkN8UPSfnvsvRyZJczHc6W6zrBLnD7XkUb0/9WZqzx5XUemgebzONT9/Ro6o3sA4MovtZr058FB+c4wGDSPssZ9o10VGV8ilFU0DTtU+w+PIbwmM4Mpi0FzO2AwQ7jDVUCdEyv6vksTTzldy1MI9U+VJM27CFwqn8wSU2eMToJ1WBS4Sj7NeqrEBhvAO2XDjynmwOuNFR/+2VIK7w pn7bmcN0 q4pNtZ35Q7fxaTOaaXmmcVL/5e6y17R32inN0I5KtIwON50mVLmcp/vGk4u3ogmh5hZm7wpb3GD66oFHvQJ8fSZwRZbMR4f+gyAu+9FLBfuNVYDNJkOdMECUgYJAOGgkdzk+iMRi9f/PDHl7Nr1TDhdzfz9pJ2DDaCtnx9k54cajk/6QSki49IFxqY7JvqyM+41mLvUbIKiA432OxGBTHbArjNGrm6TWOgTP7GOeUk1yoFexL2uFKTYcd29B6IJedAgCbtE7b9VStuM7aJuB5BGmizlPVGXzNTM1l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 28, 2023 at 12:45=E2=80=AFPM Peter Xu wrote= : > > On Tue, Mar 28, 2023 at 12:28:59PM -0700, Axel Rasmussen wrote: > > On Mon, Mar 27, 2023 at 2:01=E2=80=AFPM Peter Xu wr= ote: > > > > > > I think I overlooked this patch.. > > > > > > Axel, could you explain why this patch is correct? Comments inline. > > > > > > On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote: > > > > The basic interaction for setting up a userfaultfd is, userspace is= sues > > > > a UFFDIO_API ioctl, and passes in a set of zero or more feature fla= gs, > > > > indicating the features they would prefer to use. > > > > > > > > Of course, different kernels may support different sets of features > > > > (depending on kernel version, kconfig options, architecture, etc). > > > > Userspace's expectations may also not match: perhaps it was built > > > > against newer kernel headers, which defined some features the kerne= l > > > > it's running on doesn't support. > > > > > > > > Currently, if userspace passes in a flag we don't recognize, the > > > > initialization fails and we return -EINVAL. This isn't great, thoug= h. > > > > > > Why? IIUC that's the major way for user app to detect any misconfig = of > > > feature list so it can bail out early. > > > > > > Quoting from man page (ioctl_userfaultfd(2)): > > > > > > UFFDIO_API > > > (Since Linux 4.3.) Enable operation of the userfaultfd and pe= rform API handshake. > > > > > > ... > > > > > > struct uffdio_api { > > > __u64 api; /* Requested API version (input) */ > > > __u64 features; /* Requested features (input/output)= */ > > > __u64 ioctls; /* Available ioctl() operations (out= put) */ > > > }; > > > > > > ... > > > > > > For Linux kernel versions before 4.11, the features field must= be > > > initialized to zero before the call to UFFDIO_API, and zero (i= .e., > > > no feature bits) is placed in the features field by the kernel= upon > > > return from ioctl(2). > > > > > > ... > > > > > > To enable userfaultfd features the application should set a bi= t > > > corresponding to each feature it wants to enable in the featur= es > > > field. If the kernel supports all the requested features it w= ill > > > enable them. Otherwise it will zero out the returned uffdio_a= pi > > > structure and return EINVAL. > > > > > > IIUC the right way to use this API is first probe with features=3D=3D= 0, then > > > the kernel will return all the supported features, then the user app = should > > > enable only a subset (or all, but not a superset) of supported ones i= n the > > > next UFFDIO_API with a new uffd. > > > > Hmm, I think doing a two-step handshake just overcomplicates things. > > > > Isn't it simpler to just have userspace ask for the features it wants > > up front, and then the kernel responds with the subset of features it > > actually supports? In the common case (all features were supported), > > there is nothing more to do. Userspace is free to detect the uncommon > > case where some features it asked for are missing, and handle that > > however it likes. > > > > I think this patch is backwards compatible with the two-step approach, = too. > > > > I do agree the man page could use some work. I don't think it > > describes the two-step handshake process correctly, either. It just > > says, "ask for the features you want, and the kernel will either give > > them to you or fail". If we really did want to keep the two-step > > process, it should describe it (set features =3D=3D 0 first, then ask o= nly > > for the ones you want which are supported), and the example program > > should demonstrate it. > > > > But, I think it's simpler to just have the kernel do what the man page > > describes. Userspace asks for the features up front, kernel responds > > with the subset that are actually supported. No need to return EINVAL > > if unsupported features were requested. > > The uffdio_api.features passed into the ioctl(UFFDIO_API) should be such > request to enable features specified in the kernel. If the kernel doesn'= t > support any of the features in the list, IMHO it's very natural to fail i= t > as described in the man page. That's also most of the kernel apis do > afaik, by failing any enablement of features if not supported. > > > > > > > > > > Userspace doesn't have an obvious way to react to this; sure, one o= f the > > > > features I asked for was unavailable, but which one? The only optio= n it > > > > has is to turn off things "at random" and hope something works. > > > > > > > > Instead, modify UFFDIO_API to just ignore any unrecognized feature > > > > flags. The interaction is now that the initialization will succeed,= and > > > > as always we return the *subset* of feature flags that can actually= be > > > > used back to userspace. > > > > > > > > Now userspace has an obvious way to react: it checks if any flags i= t > > > > asked for are missing. If so, it can conclude this kernel doesn't > > > > support those, and it can either resign itself to not using them, o= r > > > > fail with an error on its own, or whatever else. > > > > > > > > Signed-off-by: Axel Rasmussen > > > > --- > > > > fs/userfaultfd.c | 6 ++---- > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > > index e943370107d0..4974da1f620c 100644 > > > > --- a/fs/userfaultfd.c > > > > +++ b/fs/userfaultfd.c > > > > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultf= d_ctx *ctx, > > > > ret =3D -EFAULT; > > > > if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api))) > > > > goto out; > > > > - features =3D uffdio_api.features; > > > > - ret =3D -EINVAL; > > > > - if (uffdio_api.api !=3D UFFD_API || (features & ~UFFD_API_FEA= TURES)) > > > > - goto err_out; > > > > > > What's worse is that I think you removed the only UFFD_API check. Al= though > > > I'm not sure whether it'll be extended in the future or not at all (v= ery > > > possible we keep using 0xaa forever..), but removing this means we wo= n't be > > > able to extend it to a new api version in the future, and misconfig o= f > > > uffdio_api will wrongly succeed I think: > > > > > > /* Test wrong UFFD_API */ > > > uffdio_api.api =3D 0xab; > > > uffdio_api.features =3D 0; > > > if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D 0) > > > err("UFFDIO_API should fail but didn't"); > > > > Agreed, we should add back the UFFD_API check - I am happy to send a > > patch for this. > > Do you plan to just revert the patch? If so, please go ahead. IMHO we > should just follow the man page. > > What I agree here is the api isn't that perfect, in that we need to creat= e > a separate userfault file descriptor just to probe. Currently the featur= es > will be returned in the initial test with features=3D0 passed in, but it = also > initializes the uffd handle even if it'll never be used but for probe onl= y. Oh, I thought you could UFFDIO_API the same FD twice. Having to create a whole separate FD just to probe features makes me dislike that design even more. > > However since that existed in the 1st day I guess we'd better keep it > as-is. And it's not so bad either: user app does open/close one more tim= e, > but only once for each app's lifecycle. I don't think just reverting would be enough. We'd also need to update the man page to describe the two-step initialization, and we'd need to update the man page's example program to demonstrate it. Our own selftest also doesn't use that approach, so it would need to be updated as well. It also seems not unlikely that there exists some userspace code which simply copied the example program from the man page, and as such doesn't do the two-step handshake today. Hard to know for certain. Once we've dealt with that, what we'll have accomplished is just making the API harder to use. I don't see any downside from the current state of things, it allows a much simpler way of configuring userfaultfds, and it's backwards compatible with the more complicated way. I think we can set things right by just adding in the UFFD_API version check by itself, and then updating the man page to describe the current state of things? > > Thanks, > > -- > Peter Xu >