From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2E2DC43334 for ; Tue, 21 Jun 2022 17:04:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BF9E6B0078; Tue, 21 Jun 2022 13:04:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36F228E0005; Tue, 21 Jun 2022 13:04:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20F908E0003; Tue, 21 Jun 2022 13:04:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 127346B0078 for ; Tue, 21 Jun 2022 13:04:43 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 745EE35233 for ; Tue, 21 Jun 2022 17:04:41 +0000 (UTC) X-FDA: 79602867162.07.8AE46D3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id C8F9D1C0099 for ; Tue, 21 Jun 2022 17:04:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655831079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rA9DphbTnH0OjZS+5J93ss4dl60iti8UEHPIAmZoT+0=; b=DNLT+iCCxuDUYcsh1Li/DxFW1/NkQj+ijjYBGDwBi60JiDkuCL2TIYaf0XNshwAGt1fnoy 1HG9DUY3hgu1eOsjJVbX/EebC9QeFgsukxtATRVnmeFbTClqo/F9iEyDWaF4YAQhCpg+0+ zm6HQwxjZt3xrymZAnRKDvZdeIdyuHQ= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-513-4-ActpPEN1i1VlPuE5J7LQ-1; Tue, 21 Jun 2022 13:04:38 -0400 X-MC-Unique: 4-ActpPEN1i1VlPuE5J7LQ-1 Received: by mail-il1-f197.google.com with SMTP id u8-20020a056e021a4800b002d3a5419d1bso9523572ilv.12 for ; Tue, 21 Jun 2022 10:04:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=rA9DphbTnH0OjZS+5J93ss4dl60iti8UEHPIAmZoT+0=; b=qMACO+883LKQPTMQawKxU9BoUsdffgPMfc69P3TwrhLHAWIzxOzkE65Uwf/pLNAIZw Znl3e2mULB2pb8SF1b1FGusREf+GXTF/1q2mI2T0HXog2W4ROAB2DHYeeDk9+WvL4MZ6 kplaoC+xuQ9CpuRqxahYSA3HTFsYOZ3zKKJUcL5AbiuJNxf1b2NaaviPYL4RgArS9fYw nF8ywh7qd6Xk1bTLTkQF/lhVuTNcTbZ3s98Rrg5kMdNzxK27MCO4JvPtijqCVRcxN84K 2Eb+CfKG7WdBLVCo1bF8FAk2SPvuJ9f5BKYHBXIwF0YoBRSG8V6EfNV8fSUxAhjP2jo8 SqDg== X-Gm-Message-State: AJIora+1h89kylAqO1Pvzx4MoBAhduB6L4Hj54gJrb+4EMMFP0ODNQJA BVH/LjyjojxqtfH9Egy1St1TJ/KF6SFbHQwP5xDYzJP1dwnR4ZW2VL1mxxxIuM+BTfbXWZLjfYe awVzmtsq4/Jo= X-Received: by 2002:a05:6602:2010:b0:66a:3c7f:dd4a with SMTP id y16-20020a056602201000b0066a3c7fdd4amr14556290iod.149.1655831077942; Tue, 21 Jun 2022 10:04:37 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tae5bCrj/uvM4DJQG0BpcehzWWxNi3mMPRuSBhNfS9kcOhdO6vEpqoFQTyhwoift7TTcPikw== X-Received: by 2002:a05:6602:2010:b0:66a:3c7f:dd4a with SMTP id y16-20020a056602201000b0066a3c7fdd4amr14556276iod.149.1655831077725; Tue, 21 Jun 2022 10:04:37 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id n186-20020a6bbdc3000000b00669c107e289sm7930035iof.29.2022.06.21.10.04.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 10:04:36 -0700 (PDT) Date: Tue, 21 Jun 2022 13:04:35 -0400 From: Peter Xu To: Nadav Amit Cc: linux-mm@kvack.org, Nadav Amit , David Hildenbrand , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Mike Rapoport Subject: Re: [RFC PATCH v2 4/5] userfaultfd: zero access/write hints Message-ID: References: <20220619233449.181323-1-namit@vmware.com> <20220619233449.181323-5-namit@vmware.com> MIME-Version: 1.0 In-Reply-To: <20220619233449.181323-5-namit@vmware.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655831081; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rA9DphbTnH0OjZS+5J93ss4dl60iti8UEHPIAmZoT+0=; b=CyJgs3BSUZOmNgfz2NkJnL4Nn9/fnXE7izMTJZloKiYNRwa2sB/4CAdVNaH9XvPVT8LghL CtyLQz6C/BLfHYAJTYHsoi6c2Kxz2i8ojA8DJFcqB3nPtB9NaF+QcatAPBqONWegDfsi+G 7ZzWjR2NxiRLXF0Q40CkeCVmNoRn2f4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DNLT+iCC; spf=none (imf18.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655831081; a=rsa-sha256; cv=none; b=qZ1OVZNkyRsdCPQVVYhasvnKLJbaw8YFrWNG8s8y95MlBQ4acAD+EYoHKBLpQl5QHR1N47 o624fv0lIXlX7Dty9MgQ7yBDAsvbAt+adyM4Tdklh95+2Z1gwaa1/XVlAG9U2ry5jXQejc bavuhLE6/i5JNq/T2gJ6x2QWXm50cJw= X-Stat-Signature: o6g1pqjdn3ctr6bsbs6iwt11f9bb9yxr X-Rspamd-Queue-Id: C8F9D1C0099 X-Rspamd-Server: rspam11 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DNLT+iCC; spf=none (imf18.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1655831080-534222 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jun 19, 2022 at 04:34:48PM -0700, Nadav Amit wrote: > From: Nadav Amit > > When userfaultfd provides a zeropage in response to ioctl, it provides a > readonly alias to the zero page. If the page is later written (which is > the likely scenario), page-fault occurs and the page-fault allocator > allocates a page and rewires the page-tables. > > This is an expensive flow for cases in which a page is likely be written > to. Users can use the copy ioctl to initialize zero page (by copying > zeros), but this is also wasteful. > > Allow userfaultfd users to efficiently map initialized zero-pages that > are writable. Introduce UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY, which, when > provided would map a clear page instead of an alias to the zero page. > > For consistency, introduce also UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY. > > Suggested-by: David Hildenbrand > Cc: Mike Kravetz > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: Axel Rasmussen > Cc: Peter Xu > Cc: Mike Rapoport > Signed-off-by: Nadav Amit > --- > fs/userfaultfd.c | 14 +++++++++++-- > include/uapi/linux/userfaultfd.h | 2 ++ > mm/userfaultfd.c | 36 ++++++++++++++++++++++++++++++++ > 3 files changed, 50 insertions(+), 2 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index a56983b594d5..ff073de78ea8 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1770,6 +1770,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > struct uffdio_zeropage uffdio_zeropage; > struct uffdio_zeropage __user *user_uffdio_zeropage; > struct userfaultfd_wake_range range; > + bool mode_dontwake, mode_access_likely, mode_write_likely; > + uffd_flags_t uffd_flags; > > user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; > > @@ -1788,8 +1790,16 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > if (ret) > goto out; > ret = -EINVAL; > - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) > - goto out; > + > + mode_dontwake = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_DONTWAKE; > + mode_access_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; > + mode_write_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; > + > + if (mode_dontwake) > + return -EINVAL; Hmm.. Why? Note that the above uffdio_zeropage.mode check was for invalid mode flags only, and I think that should be kept, but still I don't see why we want to fail UFFDIO_ZEROPAGE_MODE_DONTWAKE users. > + > + uffd_flags = (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | > + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); > > if (mmget_not_zero(ctx->mm)) { > ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > index 6ad93a13282e..b586b7c1e265 100644 > --- a/include/uapi/linux/userfaultfd.h > +++ b/include/uapi/linux/userfaultfd.h > @@ -286,6 +286,8 @@ struct uffdio_copy { > struct uffdio_zeropage { > struct uffdio_range range; > #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) > +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<2) > +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<3) > __u64 mode; > > /* > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 3172158d8faa..5dfbb1e80369 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -249,6 +249,38 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, > return ret; > } > > +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + uffd_flags_t uffd_flags) > +{ > + struct page *page; > + int ret; > + > + ret = -ENOMEM; > + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); > + if (!page) > + goto out; > + > + /* The PTE is not marked as dirty unconditionally */ > + SetPageDirty(page); > + __SetPageUptodate(page); > + > + ret = -ENOMEM; Nit: can drop this since ret will always be -ENOMEM here.. Thanks, > + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL)) > + goto out_release; > + > + ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, > + page, true, uffd_flags); > + if (ret) > + goto out_release; > +out: > + return ret; > +out_release: > + put_page(page); > + goto out; > +} > + > /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ > static int mcontinue_atomic_pte(struct mm_struct *dst_mm, > pmd_t *dst_pmd, > @@ -511,6 +543,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, > dst_addr, src_addr, page, > uffd_flags); > + else if (!(uffd_flags & UFFD_FLAGS_WP) && > + (uffd_flags & UFFD_FLAGS_WRITE_LIKELY)) > + err = mfill_clearpage_pte(dst_mm, dst_pmd, dst_vma, > + dst_addr, uffd_flags); > else > err = mfill_zeropage_pte(dst_mm, dst_pmd, > dst_vma, dst_addr, uffd_flags); > -- > 2.25.1 > -- Peter Xu