From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 941AEC4332F for ; Thu, 22 Dec 2022 01:25:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10BAE8E0002; Wed, 21 Dec 2022 20:25:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 095048E0001; Wed, 21 Dec 2022 20:25:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E50728E0002; Wed, 21 Dec 2022 20:25:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CE5C88E0001 for ; Wed, 21 Dec 2022 20:25:01 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A0B9BC0AD6 for ; Thu, 22 Dec 2022 01:25:01 +0000 (UTC) X-FDA: 80268198402.12.4A5B456 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf27.hostedemail.com (Postfix) with ESMTP id 06C9C4000B for ; Thu, 22 Dec 2022 01:24:59 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kq961Vpy; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671672300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6EpNT/fZq6auLKHZyFjmmvXixN7/oo6fjlHo3Pj9m8o=; b=hP5HXov6DobGwDUvjletvUpWQP/uASTLMOMBXgFZMWPkw3frqSgVLoTY+ZE+0bHIzRVJMO 4R6GOsyRoWccAYxi3DPf+zo8ZvERldqMn75wTMD0xwWtV3irtEBcvZhwfZCKikcTOi06xE SsxW3YSD1LPLEhpt929YfI+mEqydlbY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kq961Vpy; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671672300; a=rsa-sha256; cv=none; b=6HYFmnPH8mXg6SmgmP4jjZ0p8M7IEX/q2d8sXBQX+T+HGdz0266hy/49kS4SFJHNjIIKPe GSwwBwKVG0DiynoM5pXbvm+GfuVNK9GEDvJwMgBSCGepahR5qv4E1e5JYj6jlTsPvdeIfF a8/cy5Xxv0m6Va6teB5wSrQcA+6lCZk= Received: by mail-wm1-f41.google.com with SMTP id bi26-20020a05600c3d9a00b003d3404a89faso3231106wmb.1 for ; Wed, 21 Dec 2022 17:24:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6EpNT/fZq6auLKHZyFjmmvXixN7/oo6fjlHo3Pj9m8o=; b=kq961VpyGUztjBGi0fCdkrV545N02jULWGj1RdHYXBYGN46N+Qs4IzsGMWyDNLJxDU buNxYVlSmta+HsUs0Ki8CXsQRPvckp6LMLUBweLH0Pt9It1hk2/FgvT/G8Gc9Ho0Jl36 Cy3J6VPOUfTcCzMZGm1nsr1im5CCYDzWZMN3ycsJfgit99EzbDMx6a2p6K3tFsTaToHY 3GtNp3NqHBcDmTKcpuIu1pQi6c3mFVA4aqIO8sugI3VjTcdHzlpdH7mMPMtLg5K4gFD9 nfGWqlBznwiCoBGFkslVosfUfB0LQ9sj0DQ/fCVjYwtZS9OqIws4k57d1mB119CB/KMl s3qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6EpNT/fZq6auLKHZyFjmmvXixN7/oo6fjlHo3Pj9m8o=; b=gYz5P3YIOzxPb6OkACGMa9xHOUjtoFHMT4fZX/IyTOLQ2EO5xrjRsgw4rHg24oGzWM zQVK/asy8TINH8NpeZcMxqRcST324KqoQgzoVxqECURy8TD3vJ0CQHeLDL2/qNeh6JR9 OJEglA94CDBDezX1i9t48OCCEm2/Wm/lzvypYOYMhWg8HFPcM1TiPx3H1xYtKh4YsMG2 D0yAQNsgRW4hWu8PTMBrCtauJkYjn/CYeGLCAbDk+lolRxav1QEK6GlenAU4o1/0Vdtq ItiV127KCfH5KYEdYZaMYsH3OITuEvc6Eoh6AbZbk30H7nwveEiJfm4ukOspB7I53bEs +jWA== X-Gm-Message-State: AFqh2kpjsGkI1OPp5QrGDm8rqDmcrbEqCrFVqA9/8JijoPrQ+sjn1yOP oB3EqmcbR5H3s7ZHe0chtLij02o//3udmVdbMIdE3A== X-Google-Smtp-Source: AMrXdXv5bVPhh5ksL4kdNuJMFZpAK4ASsWlJclZAuMAKm5ROMN0PS8xVFZ9BltcK3Ho2ctuBt4Gk7QNpQYIIeZLNows= X-Received: by 2002:a05:600c:4b95:b0:3d1:da8c:7869 with SMTP id e21-20020a05600c4b9500b003d1da8c7869mr234485wmp.26.1671672298509; Wed, 21 Dec 2022 17:24:58 -0800 (PST) MIME-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> <20221021163703.3218176-34-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Wed, 21 Dec 2022 20:24:45 -0500 Message-ID: Subject: Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM To: Mike Kravetz Cc: Peter Xu , Muchun Song , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 06C9C4000B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: gxe7ahhixep8f56mcqx4eheu6r7hwzh8 X-HE-Tag: 1671672299-443403 X-HE-Meta: U2FsdGVkX1+T4zfhaK21FzE7XFSIQlpkCwzVHK3g7xJnB2BXoy38NIN/I7E+QXt8Y8YVr2nLrGpSM7e+Aui1i+vK/X5JaPTvvzrFzKaZs9M1FI0Z/Jn4gfg9EnXAoc5idf56WeRgMGpgxhAyz9WZnnpO9Qj4KmBmFr3yEroDFpITDKT3rYSnKf+Tup1uOr0EJCitPf6dv9w1toHDYVtZwjfXVq3LAFUWOabQ7bWYbY7ABBeiPqmMIyhHswX0Hov/Mk8JaWsBztPwzFgg2uRe302Ye2iOBqz8B/Jmx4U47lFZwrlwi89KaOhBUeW0ASnT86MnD8S3XrOpyNk3EAV91cYp+rF+l2gIT6oWKtm7iZfhbj9fjbgWQDVo3rXnv+1aq4Oc+ZWwblA2+TfDMliCBp3E5FPxkIOT+lzt8DuiA0MWFiCvXhHlDosSxSaAwxAOKxVIGl2zf6hooDewlj97fsCpxaJVpx6d3LAj8ZuFwylxZz3RCqNV0oUpGQ3V0K/JDzmDB60wl+3pyqigMH6pq3GPEHIygdXRzDOTNNeqcQShHjHQW/rXsal1BQBl+od8DYGp1vljVad7SOkjIVuvfriW8/O5TH49Ryd4o+uqjZFoliJGW4NABmv/Q8sOWRkUBYhRxLr2s7wYgKfQrP85oHbMrJ713ScSu3SFDA2J6drcHmwu0Qpn13a4l+HJq64HrfJxIyjRzJkPAYeoy31tHmVFnUbG55hYISo+LxbCj8fRs1JS1tHWstflmVNFgCO2t+Yii0GBwlni1miVTVqv3dQMoPY1EQFpO5seL20nzeU5ikwgEX2x6FHM7D86YLnQ5O86c9vUPIzyktAVkbpFKdfCQP74FLb9VOdjMRRkVvxQ/t6/pH4CYCmijP1FZ+7b4KpQtRdMYUVW5RWqU8ob3iFSfVWGiLT+4XAfdp5j2ZyRMUQ3PuJc6DslMzvMFg38bFMEp+TDkiid8SAzCkx wxLluhlM M8LjAuuhw8duSM9bc0+fcZi+P6c2BkHH/eRr7Zf3NYhAP215+Hn+QQlCw3w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > So considering two API choices: > > > > 1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for > > UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for > > UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start > > allowing high-granularity choices (not sure if this is bad; we started > > allowing it for HugeTLB recently with no other API change, AFAIA). > > I don't think we can just start allowing HGM for MADV_DONTNEED without > some type of user interaction/request. Otherwise, a user that passes > in non-hugetlb page size requests may get unexpected results. And, one > of the threads about MADV_DONTNEED points out a valid use cases where > the caller may not know the mapping is hugetlb or not and is likely to > pass in non-hugetlb page size requests. > > > 2. MADV_ENABLE_HGM or something similar. The changes to > > UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically, > > provided they are implemented. > > > > I don't mind one way or the other. Peter, I assume you prefer #2. > > Mike, what about you? If we decide on something other than #1, I'll > > make the change before sending v1 out. > > Since I do not believe 1) is an option, MADV_ENABLE_HGM might be the way > to go. Any thoughts about MADV_ENABLE_HGM? I'm thinking: > - Make it have same restrictions as other madvise hugetlb calls, > . addr must be huge page aligned > . length is rounded down to a multiple of huge page size > - We split the vma as required I agree with these. > - Flags carrying HGM state reside in the hugetlb_shared_vma_data struct I actually changed this in v1 to storing HGM state as a VMA flag to avoid problems with splitting VMAs (like, when we split a VMA, it's possible the VMA data/lock struct doesn't get allocated). It seems better to me; I can change it back if you disagree. Not sure what the best name for this flag is either. MADV_ENABLE_HGM sounds ok. MADV_HUGETLB_HGM or MADV_HUGETLB_SMALL_PAGES could work too. No need to figure it out now. Thanks Mike and Peter :) I'll make this change for v1 and send it out sometime soon. - James