From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0271EB64D9 for ; Tue, 27 Jun 2023 16:10:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51A1D8D0002; Tue, 27 Jun 2023 12:10:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CA1C8D0001; Tue, 27 Jun 2023 12:10:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3919F8D0002; Tue, 27 Jun 2023 12:10:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 270238D0001 for ; Tue, 27 Jun 2023 12:10:51 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E8B2D1C8BA6 for ; Tue, 27 Jun 2023 16:10:50 +0000 (UTC) X-FDA: 80949016260.06.0541792 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 21325180015 for ; Tue, 27 Jun 2023 16:10:48 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=w8Em+4ab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687882249; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JT+A540FIL0LPOW206f36Kp20pq9QXkqXUfUCRnfMMU=; b=Bkovv4P/VTFq5LqK29oJvOyRTsPAkRvrzdVUCrZW93bN23PMQMSlYqCkE0c1mI9m8KjMc3 xssF6zbadPzheHqUaLkBmeQVPtKyUZTgvfDbcG0uBPmrW8nWWE2ZYQXQwrfhJI2jwGtsEY 5nU0zlkR0yiEpoq/y9bl8JTBjwR+QDg= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=w8Em+4ab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687882249; a=rsa-sha256; cv=none; b=EqucrSn1biLHWoGj5qMgVYy+Tc+o9Y4T0R3v+ifZALbljgOwAqLHzwdbvi3aNE4H8cJdXh wMHOPbmZGExgkTiWx5jvF0QHOs047gcoqhd3pJlihUcQtmGCrcab7p8vAzrGV/VlfN1nDJ xU/vSQrMEj9lTeJpMucIXhaShpXdWO4= Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-bd6d9d7da35so4888536276.0 for ; Tue, 27 Jun 2023 09:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687882248; x=1690474248; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JT+A540FIL0LPOW206f36Kp20pq9QXkqXUfUCRnfMMU=; b=w8Em+4absJ0v8xX1ovg58CpeEuyHZayucmjv80kLRXa/GP0CvE/4vDt9kVaWOTU9+T GniIaTlnaZTljSq5Dy/Jv3dP+rb5MAC3ufHUsEvC7XGSvWpcGpgycQEeAIJsXMYIcXHc KYEGZ433apNJrx2nWNo2Am/Pp4f901nkTukz3IUsBuR1427Kk5KvfRfqd2CBoJSAkcXR CULcixm8Koq0d8HscomR+crEYVIdonISMHAv9+aJxRgEsWU5yXU35KIltt7pbFquo7X7 IQIUEzWsVQmgXV39M0HAVOmOC4u+HurH5mYFTdtiPDKQSz6lHxgwxqlT+uYauLpNncG9 KTqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687882248; x=1690474248; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JT+A540FIL0LPOW206f36Kp20pq9QXkqXUfUCRnfMMU=; b=Ti8UJLvnvMZDNEWoGH9XKqVNwp7UOFkR6nPn7qx86+nRi1qttTYIPe/kEtHHRfepSF t4yOVCr9kV5tTExCWxI9p81uI61+azuNVJFL0b4065xOhDLS30gWuyR1RKewAyIcQaT2 N1f+olj0gGUov3Bu2HeC7L0KNJNLuV3Pibdf47icS1R9goEjDun0gdBM0FXhR4nJ7OFl 3fKH8o1SKzCC1ii7LyGpf7hgsay9QFd+TgEcofDfu7fNQSFRmNI+h0shuadWgo0K/aDL SI3KKZgaRFMgVYJC+nNO0WibiZ3VFrbv7ZYR7ubo/osdFPOf78yTB+MbSzoq6gHqb0r8 4Ayw== X-Gm-Message-State: AC+VfDyofZBOP5iBpHYbNk/FSZjSl16Tg3sz6sNOLsyAirEp73EvGq7Q 4v4BGURG0cJolEyu9Md5fyFhVWrXgRmTBC9dNzMqDQ== X-Google-Smtp-Source: ACHHUZ4l9NVeoi0sIUUUo7FJIrP9ho8tlMfjYkW/l1TA++fyIAI5JveVV9a7v3p5Ogg8yCYUmuiA/GAOu9//0Kuz0ac= X-Received: by 2002:a25:608a:0:b0:c16:5bd6:72ff with SMTP id u132-20020a25608a000000b00c165bd672ffmr7950186ybb.2.1687882248006; Tue, 27 Jun 2023 09:10:48 -0700 (PDT) MIME-Version: 1.0 References: <20230627042321.1763765-1-surenb@google.com> <20230627042321.1763765-9-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 27 Jun 2023 09:10:36 -0700 Message-ID: Subject: Re: [PATCH v3 8/8] mm: handle userfaults under VMA lock To: Peter Xu Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 21325180015 X-Stat-Signature: b6jq5z4ha7czm8mbh57irfbku1s584jz X-Rspam-User: X-HE-Tag: 1687882248-85565 X-HE-Meta: U2FsdGVkX1/HEcRAZEQzJEqpK8y6koHPpQ8wymC/PHBQUAKuqtsCSi0HvReZFGTzwgxTDYIj+DA7M6R8rORsPI+RAssH+Yrou3Xxuh41dF7lpAml4eH88X/g0Ye8MNY01EbLJ42bVMopU57r9y3Ibu2Vyq0rD8epFSZqBNs2ToDMMmbl4fEMrwPVTSUod4INXtAy040glPoCvBi7SprPgiqdKoUZiCtDQhIVrIbenpfdI8IoT2TC298nOlZaN9j/KrVqStnDSX0YqlBQJszzlOwX8OAetz6F8QQbaJ9QbqDKgfsvn8qA7n4gqFFf+ow9I+89ht64R95kJO2jn05Pm6GDL0AGmWUlmM5+VOLoZiAILqgImRW8FGeN1iLSElQr/181frI1HomR9CHKB4RnqVsv/Tej1dCNhRy3POvaVn/COGWkgm8kA0RpxTwX9igHAgeIt+c92Bk+lqyhhoEpjj2TvD5PrVMiUB3rCqq4m2COk3yfzpesw2hunCNaAC7U2PCO9Heh+sCXLzI3zxhW6GTYX6+yPD+d4+ZTmCdtklPmx3tUl6Rdmfi7NQDCiUwLjqBWvSje4VaxGfibyF7AStBvpUo0PtYJOlDk3w0Wf5gyvNnRCGFIJjTJtZKf4B1fJydkrBluozEPiNDrwJfTUjmiCRySHq8nUskRHlrpPN98APZU3agNIXMvWbGLUahC5R6ZoeYHW8S3r/J6BLczgpJH4Rb6M5xHORXgtgYLQIYBKv0BlSFByQi6mRGeA+xGVGDDvewwz9D3hyTQLhpl0j9LTo0cDbkrH3H/eGZkn/715GTcWu/eNpPcS+qV5ZBL81KUIbIx5vdgNsSrWS+p4GBTd2wW7LIhDflw/RelRefqhrI/WltRqDPnAONoqyfWEsKCvpFSVRNV+1MvDNl71o5vRQurSlHUVC9qsM4DF+WDt0O0guqLz/kau0IggrKgEFgsNx8tATDdr8RoyAU 2JajNMEi uotADYhtWlSLBWnDHwwjs2dmyriIWbexin6/5sw8qZj32muao75aCJwycbiJ0ly0O8mSUgqznuZuR8CG6y6AhlmmHqfzEENW3UNNFiflzM1hmi61er5Mc8HZhQqcbPQqKODHYAUu7sChVwAWeTJwnsEIEXLYKb0vyGRqNK4kjeDyNUORZ7zNr5+lkmAz9lu9lqpBTH9YQQnFzCb5eQW/rv8ClleAZcconhdHZyKgEzC7GmLiwuRrxcaWAqH6CuK3MzcMgecMhk2pzAdxqg608kEhH1QOTDkXXYXZpOp3bwAWQY96YDWuJFzsd45AZogOmrJubYNvDDgNeTj8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 8:54=E2=80=AFAM Peter Xu wrote: > > On Mon, Jun 26, 2023 at 09:23:21PM -0700, Suren Baghdasaryan wrote: > > Enable handle_userfault to operate under VMA lock by releasing VMA lock > > instead of mmap_lock and retrying. > > This mostly good to me (besides the new DROP flag.. of course), thanks. > Still some nitpicks below. > > > > > Signed-off-by: Suren Baghdasaryan > > --- > > fs/userfaultfd.c | 42 ++++++++++++++++++++++-------------------- > > mm/memory.c | 9 --------- > > 2 files changed, 22 insertions(+), 29 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 4e800bb7d2ab..b88632c404b6 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -277,17 +277,17 @@ static inline struct uffd_msg userfault_msg(unsig= ned long address, > > * hugepmd ranges. > > */ > > static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *= ctx, > > - struct vm_area_struct *vma, > > - unsigned long address, > > - unsigned long flags, > > - unsigned long reason) > > + struct vm_fault *vmf, > > + unsigned long reason) > > { > > + struct vm_area_struct *vma =3D vmf->vma; > > pte_t *ptep, pte; > > bool ret =3D true; > > > > - mmap_assert_locked(ctx->mm); > > + if (!(vmf->flags & FAULT_FLAG_VMA_LOCK)) > > + mmap_assert_locked(ctx->mm); > > Maybe we can have a helper asserting proper vma protector locks (mmap for > !VMA_LOCK and vma read lock for VMA_LOCK)? It basically tells the contex= t > the vma is still safe to access. > > > > > - ptep =3D hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); > > + ptep =3D hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma)); > > if (!ptep) > > goto out; > > > > @@ -308,10 +308,8 @@ static inline bool userfaultfd_huge_must_wait(stru= ct userfaultfd_ctx *ctx, > > } > > #else > > static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *= ctx, > > - struct vm_area_struct *vma, > > - unsigned long address, > > - unsigned long flags, > > - unsigned long reason) > > + struct vm_fault *vmf, > > + unsigned long reason) > > { > > return false; /* should never get here */ > > } > > @@ -325,11 +323,11 @@ static inline bool userfaultfd_huge_must_wait(str= uct userfaultfd_ctx *ctx, > > * threads. > > */ > > static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, > > - unsigned long address, > > - unsigned long flags, > > + struct vm_fault *vmf, > > unsigned long reason) > > { > > struct mm_struct *mm =3D ctx->mm; > > + unsigned long address =3D vmf->address; > > pgd_t *pgd; > > p4d_t *p4d; > > pud_t *pud; > > @@ -337,7 +335,8 @@ static inline bool userfaultfd_must_wait(struct use= rfaultfd_ctx *ctx, > > pte_t *pte; > > bool ret =3D true; > > > > - mmap_assert_locked(mm); > > + if (!(vmf->flags & FAULT_FLAG_VMA_LOCK)) > > + mmap_assert_locked(mm); > > (the assert helper can also be used here) > > > > > pgd =3D pgd_offset(mm, address); > > if (!pgd_present(*pgd)) > > @@ -445,7 +444,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, u= nsigned long reason) > > * Coredumping runs without mmap_lock so we can only check that > > * the mmap_lock is held, if PF_DUMPCORE was not set. > > */ > > - mmap_assert_locked(mm); > > + if (!(vmf->flags & FAULT_FLAG_VMA_LOCK)) > > + mmap_assert_locked(mm); > > > > ctx =3D vma->vm_userfaultfd_ctx.ctx; > > if (!ctx) > > @@ -561,15 +561,17 @@ vm_fault_t handle_userfault(struct vm_fault *vmf,= unsigned long reason) > > spin_unlock_irq(&ctx->fault_pending_wqh.lock); > > > > if (!is_vm_hugetlb_page(vma)) > > - must_wait =3D userfaultfd_must_wait(ctx, vmf->address, vm= f->flags, > > - reason); > > + must_wait =3D userfaultfd_must_wait(ctx, vmf, reason); > > else > > - must_wait =3D userfaultfd_huge_must_wait(ctx, vma, > > - vmf->address, > > - vmf->flags, reason= ); > > + must_wait =3D userfaultfd_huge_must_wait(ctx, vmf, reason= ); > > if (is_vm_hugetlb_page(vma)) > > hugetlb_vma_unlock_read(vma); > > - mmap_read_unlock(mm); > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + /* WARNING: VMA can't be used after this */ > > + vma_end_read(vma); > > + } else > > + mmap_read_unlock(mm); > > I also think maybe we should have a helper mm_release_fault_lock() just > release different locks for with/without VMA_LOCK. It can also be used i= n > the other patch of folio_lock_or_retry(). All seem to be good suggestions. I'll try implementing them in the next version. Thanks! > > > + vmf->flags |=3D FAULT_FLAG_LOCK_DROPPED; > > > > if (likely(must_wait && !READ_ONCE(ctx->released))) { > > wake_up_poll(&ctx->fd_wqh, EPOLLIN); > > diff --git a/mm/memory.c b/mm/memory.c > > index bdf46fdc58d6..923c1576bd14 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5316,15 +5316,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct= mm_struct *mm, > > if (!vma_start_read(vma)) > > goto inval; > > > > - /* > > - * Due to the possibility of userfault handler dropping mmap_lock= , avoid > > - * it for now and fall back to page fault handling under mmap_loc= k. > > - */ > > - if (userfaultfd_armed(vma)) { > > - vma_end_read(vma); > > - goto inval; > > - } > > - > > /* Check since vm_start/vm_end might change before we lock the VM= A */ > > if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)= ) { > > vma_end_read(vma); > > -- > > 2.41.0.178.g377b9f9a00-goog > > > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an= email to kernel-team+unsubscribe@android.com. >