From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5E8CC43334 for ; Tue, 21 Jun 2022 17:56:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35D868E000E; Tue, 21 Jun 2022 13:56:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30D538E0003; Tue, 21 Jun 2022 13:56:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B0C58E000E; Tue, 21 Jun 2022 13:56:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 078DB8E0003 for ; Tue, 21 Jun 2022 13:56:52 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id C5B9E121087 for ; Tue, 21 Jun 2022 17:56:51 +0000 (UTC) X-FDA: 79602998622.03.636068E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 3E8601C0004 for ; Tue, 21 Jun 2022 17:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655834210; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CwcRrVUMPhlghA9CUBlT3YrWIuMvNipEwhkQKcExEw=; b=Iqmvh8W6W4Eci2/5t83AwivVIWr/yyBtEz+PNviWDdEwb9nys1NCpKNogUCIiTsc2xrBUT aWPZmfXNnoFtCd7Zq4eiDPhalVUNa0dxhldXBBdysueFGrFausLkiHAVQkz+dGqIYG3smu 9zupcUjfQFSPBhxnpRL8usZKNtZCoCA= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-75-ucog3CTdMV-LtPnaWW78UQ-1; Tue, 21 Jun 2022 13:56:49 -0400 X-MC-Unique: ucog3CTdMV-LtPnaWW78UQ-1 Received: by mail-io1-f70.google.com with SMTP id n20-20020a6b7214000000b00669cae33d00so7958698ioc.17 for ; Tue, 21 Jun 2022 10:56:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=3CwcRrVUMPhlghA9CUBlT3YrWIuMvNipEwhkQKcExEw=; b=kGH2M8ZrlFgVgjm+p1gPZx9yjOVRP7/zRl8b5Dr5iO+dGM+pufaC3uB2FG4YoiKCgJ CXdQRTpQ508hmafwuwJtGse0R8+jDs9n2ZOwEapZsalB4TcukoIHlB+BapOV7xjUnsqN 7IKNbHTlJJl4Q75JUrUQH3x87RCTapHDWRUaXb8jOvNOfMlrZ+XyUCpLy55osS7vS2e2 J0CdNJRa/5m8cCYdjZGn43DXy35wm9OBcf2kCJogjQvAh00iYCivyEjqsU6cqjB0o9vl HoGAsb/eXvrZX84dGVWwGTRhg1uMKb7B3Jm4Mu0et35KO8fy3GWsSD9TTQJVE9ZOg5/s QFPg== X-Gm-Message-State: AJIora/PVKM7vKDuP/jRUxtreRfQsnF1DXhnqKs51Mwl6O97U6aatjNx NjIWWRfTUh0JzGUm68ycaUAMGIEsdqLR3YNp+Y+keiupjoDfeX9lqIQCZXubJDTw2645/UBkTZH hbDBiLldr4eg= X-Received: by 2002:a05:6e02:1563:b0:2d9:110f:8de6 with SMTP id k3-20020a056e02156300b002d9110f8de6mr6305378ilu.153.1655834208684; Tue, 21 Jun 2022 10:56:48 -0700 (PDT) X-Google-Smtp-Source: AGRyM1svuMyTMhRzF/w7ivlL+eYyAXe5Yw2gmnSsGSKwzn+6DHgx3PRTGZjceZMgcxFojqo/KsnjzA== X-Received: by 2002:a05:6e02:1563:b0:2d9:110f:8de6 with SMTP id k3-20020a056e02156300b002d9110f8de6mr6305365ilu.153.1655834208471; Tue, 21 Jun 2022 10:56:48 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id h14-20020a056602130e00b0065a47e16f53sm8179222iov.37.2022.06.21.10.56.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 10:56:47 -0700 (PDT) Date: Tue, 21 Jun 2022 13:56:46 -0400 From: Peter Xu To: Nadav Amit Cc: Linux MM , David Hildenbrand , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Mike Rapoport Subject: Re: [RFC PATCH v2 4/5] userfaultfd: zero access/write hints Message-ID: References: <20220619233449.181323-1-namit@vmware.com> <20220619233449.181323-5-namit@vmware.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655834211; a=rsa-sha256; cv=none; b=RSEksXsqkO6N5Y3iYwIf8Qizqv7kKQWqO3esHBNiFB4Hma8xOQwI0Q2uBtsQvzBySOhT0+ 1vQvvF+sClD9OGhyiwmZpIoyTSsOfG96FTwPMPbc8ZFbeGzoUXdZrXHHdRp8jjSKSpceFj WJr1eXl2NqtUWbVX3oNmIr5Oldh3Y74= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Iqmvh8W6; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf18.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655834211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3CwcRrVUMPhlghA9CUBlT3YrWIuMvNipEwhkQKcExEw=; b=MY7wwi1rbxBOgQG3B8ML9aUVgfmC3NOTftMyHLELvxGwHpeb8ja7rThNeepnqccOzfl3dh 5JGIn8O1vwCpAKgLfHD1A8huchgpvZNivKq+8rJO1Q9krlvsI9w/HjUDFSFA7hkENH5yl2 v6wRQ/oON6yOvzVt0A6D9NUx5W/ST8A= Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Iqmvh8W6; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf18.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Stat-Signature: jcp5rkocdouefauuwz5qyjb33np587g6 X-Rspamd-Queue-Id: 3E8601C0004 X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1655834211-452883 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 21, 2022 at 05:17:05PM +0000, Nadav Amit wrote: > On Jun 21, 2022, at 10:04 AM, Peter Xu wrote: > > > ⚠ External Email > > > > On Sun, Jun 19, 2022 at 04:34:48PM -0700, Nadav Amit wrote: > >> From: Nadav Amit > >> > >> When userfaultfd provides a zeropage in response to ioctl, it provides a > >> readonly alias to the zero page. If the page is later written (which is > >> the likely scenario), page-fault occurs and the page-fault allocator > >> allocates a page and rewires the page-tables. > >> > >> This is an expensive flow for cases in which a page is likely be written > >> to. Users can use the copy ioctl to initialize zero page (by copying > >> zeros), but this is also wasteful. > >> > >> Allow userfaultfd users to efficiently map initialized zero-pages that > >> are writable. Introduce UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY, which, when > >> provided would map a clear page instead of an alias to the zero page. > >> > >> For consistency, introduce also UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY. > >> > >> Suggested-by: David Hildenbrand > >> Cc: Mike Kravetz > >> Cc: Hugh Dickins > >> Cc: Andrew Morton > >> Cc: Axel Rasmussen > >> Cc: Peter Xu > >> Cc: Mike Rapoport > >> Signed-off-by: Nadav Amit > >> --- > >> fs/userfaultfd.c | 14 +++++++++++-- > >> include/uapi/linux/userfaultfd.h | 2 ++ > >> mm/userfaultfd.c | 36 ++++++++++++++++++++++++++++++++ > >> 3 files changed, 50 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > >> index a56983b594d5..ff073de78ea8 100644 > >> --- a/fs/userfaultfd.c > >> +++ b/fs/userfaultfd.c > >> @@ -1770,6 +1770,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > >> struct uffdio_zeropage uffdio_zeropage; > >> struct uffdio_zeropage __user *user_uffdio_zeropage; > >> struct userfaultfd_wake_range range; > >> + bool mode_dontwake, mode_access_likely, mode_write_likely; > >> + uffd_flags_t uffd_flags; > >> > >> user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; > >> > >> @@ -1788,8 +1790,16 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > >> if (ret) > >> goto out; > >> ret = -EINVAL; > >> - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) > >> - goto out; > >> + > >> + mode_dontwake = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_DONTWAKE; > >> + mode_access_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; > >> + mode_write_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; > >> + > >> + if (mode_dontwake) > >> + return -EINVAL; > > > > Hmm.. Why? > > > > Note that the above uffdio_zeropage.mode check was for invalid mode flags > > only, and I think that should be kept, but still I don't see why we want to > > fail UFFDIO_ZEROPAGE_MODE_DONTWAKE users. [1] > > > >> + > >> + uffd_flags = (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | > >> + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); > >> > >> if (mmget_not_zero(ctx->mm)) { > >> ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, > >> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > >> index 6ad93a13282e..b586b7c1e265 100644 > >> --- a/include/uapi/linux/userfaultfd.h > >> +++ b/include/uapi/linux/userfaultfd.h > >> @@ -286,6 +286,8 @@ struct uffdio_copy { > >> struct uffdio_zeropage { > >> struct uffdio_range range; > >> #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) > >> +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<2) > >> +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<3) > >> __u64 mode; > >> > >> /* > >> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > >> index 3172158d8faa..5dfbb1e80369 100644 > >> --- a/mm/userfaultfd.c > >> +++ b/mm/userfaultfd.c > >> @@ -249,6 +249,38 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, > >> return ret; > >> } > >> > >> +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > >> + struct vm_area_struct *dst_vma, > >> + unsigned long dst_addr, > >> + uffd_flags_t uffd_flags) > >> +{ > >> + struct page *page; > >> + int ret; > >> + > >> + ret = -ENOMEM; > >> + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); > >> + if (!page) > >> + goto out; > >> + > >> + /* The PTE is not marked as dirty unconditionally */ > >> + SetPageDirty(page); > >> + __SetPageUptodate(page); > >> + > >> + ret = -ENOMEM; > > > > Nit: can drop this since ret will always be -ENOMEM here.. > > I noticed. Just thought it is clearer this way, and more robust against > future changes. I'd rather leave that for future, but if you really prefer that no problem on my side too. Please just still check [1] above and that's the major real comment, just to make sure it's not overlooked.. -- Peter Xu