From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F21E5C4332F for ; Wed, 21 Dec 2022 20:21:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 293F38E0002; Wed, 21 Dec 2022 15:21:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 243D78E0001; Wed, 21 Dec 2022 15:21:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E4BF8E0002; Wed, 21 Dec 2022 15:21:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ED06E8E0001 for ; Wed, 21 Dec 2022 15:21:33 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A2CBBC0CB5 for ; Wed, 21 Dec 2022 20:21:33 +0000 (UTC) X-FDA: 80267433666.02.A55115B Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by imf05.hostedemail.com (Postfix) with ESMTP id BBAC5100012 for ; Wed, 21 Dec 2022 20:21:31 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=otzkbme8; spf=pass (imf05.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671654091; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n9MEqlNzDIbf7HSYYn7yaUunAAdmpHInWdycxJre5qk=; b=kU8Bm4B2vBlgoFzDy/l94PjAGUMG1yzOJPZI0be+2cVjS4EhOXC938xDXqrjcwmguC/XMU 0qxhv21Dk0yTlY8JzHVOKlFX7pg1m4fV5RsG7LqCjpyV3oin35juqDrzy0JkpUHMwaLDuI tjvQGHIiLuyVrw82Q5J9Ph3/i2nY0OY= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=otzkbme8; spf=pass (imf05.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671654091; a=rsa-sha256; cv=none; b=b+kBpfTbnM0qpBGZZoEPg/jmBLC8mlOhLbCw2NnClKnJrvcHeWpIFszcQBG8WCqBNLliIS Si3VBsk+7C57i+FSllk0gj0JooO1NhCKGnIrv9GmELlPuIqkrcVO+ei5SVxoyFX6vaAQdu TuevcQg0A2MgMArBGxU2Y3kHOOP7qJo= Received: by mail-wr1-f47.google.com with SMTP id i7so16054434wrv.8 for ; Wed, 21 Dec 2022 12:21:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=n9MEqlNzDIbf7HSYYn7yaUunAAdmpHInWdycxJre5qk=; b=otzkbme8DQ+YN964XlRmVnqW9zpdM68YOuEVyYAvOV3n6/r7TBBMTAVYcuLoi5KCAL wDojGD4JKGxWvxPTDcn3ZOLDPg41o5qpT4/ebINHDWMxRlDFVwHfvl2j+6L2zIV25qYi lI09xG+7uZjlRFxgvDU6TfrszMcGvHSWcMcX442zhLzmEoTx91ZgasQ/8JjNahoyAulg EPwlcXOFhn1pU7f2zH4NDa2VcjDTv/g/h2J4sufGAYujhfdu0cmiBAE1jHoIiESkYe1r Rt86IBZ84WrOV7emqxm+ffVz5AKNvx3VilzV+ySh49KVOBoSdXBr84DSst7XMLfDS47J /flA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=n9MEqlNzDIbf7HSYYn7yaUunAAdmpHInWdycxJre5qk=; b=sb4qFhov/533uSJq6e5YN6Gv0dhqhIbyn2JxhD6IyhHojzV/Lpy6gmaO6aysMHuxr7 HLqDFFm2v1DMkz5bZ0NzAnOefgeKgasoWcB3n8KJ4DvxZTfHx4GCHPCUaeP1HHJgqFRu 99jxOhe4aLMMKr7rlSYXsRnO0mM2EIiX3Vl7Us26SgZAucC5EAVpImnMyM5G0C2Bvixj Ej6+pT69U0D1NeHifYYVqVHjVrgq03RcTvdcgWYJOWOL70sQt90b2hhjunDggXXH3zaT lUK/Ttok7CZvLWXbzr10R6l7KLhfGmUi1hIOfMWlYZ4xVp26g95rQIEMPoZnO6IUdZR0 phfg== X-Gm-Message-State: AFqh2kpiD2PQ+45xDsNCACyvg7yn77ni4nO/0zDo6DloMoVWUgZrlUSH bfE42P3jcyycP7YYum99sKaObrD5kKZd9vK+k6Xb7g== X-Google-Smtp-Source: AMrXdXvrzynoRDjhQO3b3GLd3Sb5RrV43m+9J5OVY29ww6nZSTFmLj2RMV9xVhjVhKZnGgvq6fP1rJmbYPgoCQeYgUw= X-Received: by 2002:adf:8b93:0:b0:26b:361a:52f1 with SMTP id o19-20020adf8b93000000b0026b361a52f1mr153484wra.655.1671654090127; Wed, 21 Dec 2022 12:21:30 -0800 (PST) MIME-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> <20221021163703.3218176-34-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Wed, 21 Dec 2022 15:21:18 -0500 Message-ID: Subject: Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM To: Peter Xu Cc: Mike Kravetz , Muchun Song , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: b8tu98i8u1i9ek4ptncyjdxgfwqodjam X-Rspam-User: X-Rspamd-Queue-Id: BBAC5100012 X-Rspamd-Server: rspam06 X-HE-Tag: 1671654091-889735 X-HE-Meta: U2FsdGVkX1963ow5R06kzaZ6+8ijF86aEqunGZqE+czLT98WjvzPbF4tGI4ulqneA5Ma0aj5EX7VwAh4eCfmax+Duj0NjEmzAsPGTw3OK9N1nCBmQt/kZcvNWS82QJMhNDtnuPI3UxkLxDXyUXcmu1opGzx9gRoGpfWlX0SLl+E4kZ2NOrZrPcAfuozcP+R9W+7EHuGYH8qVo0Xa4Tcf7J7kl/KjZdoTjd73Ep3CT3fmYeXr6Ir3l1zZou88hUC7SUESLb07PTX4guDWlvxG9NUUrcAOs+ItMcfSex6nzGxvjdeMvP4G/5izbC9CSpvnaLtIP1lW/rrmC/rQQn3vE3HUtYcPVBjmW6H/sHIofboeIdbbSGBbmBoYSsGH45vh4PD5J4kBsakp0zEzjbt5ci5dlmyV+Hz3O8OWe8Br2nYnysG5ukHjDiSqeRVXr1SUPFGrf04qR9sBvCgASVcWBhrxRhmD7HT27z4n56KGTGiU5sg/3x5ekKV1OO3BIRqBFKei3ldq+1wpAySKEKlW2wy4fk0v3ISk8ujYj4OUkAn0R6PUVIx72doZ6RI1aXF5SQnR3BXWZz9CguzP7fWKXSLtsPiT+hp4BOw0/rcXt7+yaLtDC9qY4LJeoDkVUXptskacNq5tcgaskPnLf6A5xaMxFDGHe5VT2RN+5b7F8TZ4RGKth9+EObZs31bmB2TWlSnKYrWLyfad3Oh+Nr3njYZOI4AWv10MpBspLe2S+aHxOrpkTJJ6gJ9CNWelN9XBQ+yFNx8XFKlmBAlcgILGOCP0seMEsqQvsy1zGmdFFrpAZc+WNiFSbLuiyZCeZuYxI5p+hurVzccuKL3ql1I87frpkyRwHP4kwegS8ADAk1tCY5f71pcoYZnZxgG19d5B7bgmDd+mqKtv3DgEEc/IwAViRBmxI5luuZIgAX+gCWOo6nZ+WeuGbZB81+CHEqyTqOgpRtKvKzl1xU606GE EqwZsXw4 CMvuW0swibcUrgH/5QuqpcPO6G5TUWOpQgUEr9KC/16se1m3TdkeOCQM3B+8+HxOw8poh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 21, 2022 at 2:23 PM Peter Xu wrote: > > James, > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu wrote: > > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > > enable HGM. Userspace can check if the feature exists in > > > > uffdio_api.features, and if it does not exist, the kernel does not > > > > support and therefore did not enable HGM. > > > > > > > > Signed-off-by: James Houghton > > > > > > It's still slightly a pity that this can only be enabled by an uffd context > > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > > > The idea here is that, for applications that can conceivably benefit > > from HGM, we have a mechanism for enabling it for that application. So > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > > prefer this approach over something more general like MADV_ENABLE_HGM > > or something. > > Sorry to get back to this very late - I know this has been discussed since > the very early stage of the feature, but is there any reasoning behind? > > When I start to think seriously on applying this to process snapshot with > uffd-wp I found that the minor mode trick won't easily play - normally > that's a case where all the pages were there mapped huge, but when the app > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller > pages, probably some size that the user can specify. It'll be non-trivial > to enable HGM during that phase using MINOR mode because in that case the > pages are all mapped. > > For the long term, I am just still worried the current interface is still > not as flexible. Thanks for bringing this up, Peter. I think the main reason was: having separate UFFD_FEATUREs clearly indicates to userspace what is and is not supported. For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't allowed as of this patch series, but it could be allowed in the future. To add support in the same way as this series, we would add another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that having to add another feature isn't great; is this what you're concerned about? Considering MADV_ENABLE_HUGETLB... 1. If a user provides this, then the contract becomes: "the kernel may allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at high-granularities, provided the support exists", but it becomes unclear to userspace to know what's supported and what isn't. 2. We would then need to keep track if a user explicitly enabled it, or if it got enabled automatically in response to memory poison, for example. Not a big problem, just a complication. (Otherwise, if HGM got enabled for poison, suddenly userspace would be allowed to do things it wasn't allowed to do before.) 3. This API makes sense for enabling HGM for something outside of userfaultfd, like MADV_DONTNEED. Maybe (1) is solvable if we provide a bit field that describes what's supported, or maybe (1) isn't even a problem. Another possibility is to have a feature like UFFD_FEATURE_HUGETLB_HGM, which will enable the possibility of HGM for all relevant userfaultfd ioctls, but we have the same problem where it's unclear what's supported and what isn't. I'm happy to change the API to whatever you think makes the most sense. Thanks! - James > > -- > Peter Xu >