From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 568A8EB64D9 for ; Fri, 30 Jun 2023 01:30:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AB6D8D0002; Thu, 29 Jun 2023 21:30:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95B5E8D0001; Thu, 29 Jun 2023 21:30:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 822E88D0002; Thu, 29 Jun 2023 21:30:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 735668D0001 for ; Thu, 29 Jun 2023 21:30:56 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 415D3C040D for ; Fri, 30 Jun 2023 01:30:56 +0000 (UTC) X-FDA: 80957685312.08.9E7B92A Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf21.hostedemail.com (Postfix) with ESMTP id 7F6891C0018 for ; Fri, 30 Jun 2023 01:30:54 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=y6jc0NS2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688088654; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3c9AWfXGh8Y1ARvgQszZg4CRxGzxDn2J0F5H9PZqMGE=; b=bvRAPdmUC/62f4/QIz4pxis64yozivw5f40UL+bRcsUQR4+ayimx8zVBOQveXBBeeN1n57 TLoZyAHz0OUhs3p98yOeNcOKIZc/Ee7pPf/gMY0qesBJubdZdEec4uEUgqorENB5Rm7FHQ Gs6QhX9XMAOCdzp+K1LpTqnW6D3ZZEA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=y6jc0NS2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688088654; a=rsa-sha256; cv=none; b=2RtvfIXqU184D769s8eXOGxFGEXSNK2a/aEox4GK2VFaFT7rZOK/Sgf9VpYs1KoYo1M1jD CT0CQdnl9NVu4BcJKUGBTRI/xY3+F6O0gaIPzbcPinBZvHeisJzzjUkRib3HfhHXx6PRZy 2G8qCXdr7ajKXCITQZXwxFDJm6S2/Ek= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-5768a7e3adbso19673457b3.0 for ; Thu, 29 Jun 2023 18:30:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688088653; x=1690680653; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3c9AWfXGh8Y1ARvgQszZg4CRxGzxDn2J0F5H9PZqMGE=; b=y6jc0NS2envWLsOefA6GDuk/ScswMh2Kqs861BFMEF8QNNeK6eMXC3VmUsmG5G0vD4 MpAjkdRN2J7ehhr1S2W4oP/PFLD0ypBZNf8gCi83iiO2WQabf1k3MCbtjuDh8qfdylv4 GkHKry6IZvJ2nRXxs2iZ0gkJWisxgl5809pSm/gCGksJknKT2GhY0VeRy+9IE4fDvG9W 0s0E7LXqeLFgS/biU8zjD6LpEtV2/Izf8v0yQhAqG5PG3LKT9wnFOL8y0K+2hNtLcbJi 7juClPDNk21+oOh4kup+fJi6Oo1XQO9JHNg6qHLbv3EHRnyH4qcca2ktMPmzKZbnB+W4 ukBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688088653; x=1690680653; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3c9AWfXGh8Y1ARvgQszZg4CRxGzxDn2J0F5H9PZqMGE=; b=b268xSezqGgnd3alstLEUml017DrF53/rEbuMsJ71yfo3bVVmq90su+34rL3fuCwjs Zy/YwBhFW0jeIwIUKVybdpYYy+gfh7lRVA3BhjCQmp6CLC807Dy7+WQF5A1d+lSNbunG bRBF1DJMEYl6vbSg+pdT+THcyKlGGS6A8ca2RFqK3Hax+TfTikTiguSY4hU8Xs66ET1x pOzYaTUaAcZnZMc44jrPUv9efSnLDysJe9Buvq7XkzfDhfPwT9fzrnFHy2subAB+Iw9O wkw2sOu4CAIpoWMP/Vqh34focHre2XLgb/rQNwicZ+NaG4IO2hk3OuB5ZIGp3lEoMwPw dhkQ== X-Gm-Message-State: ABy/qLZ07TrVV/AjvANpqzuktONu9EvvnHp8m5+xHhko6GvBp7CD1JUt hNeIyBLjEI7CgPrE1xur1J9GAXiM6OJ57ejdfKN5rw== X-Google-Smtp-Source: APBJJlHFb943nAvEHry+lBgUIXa7kvvU79h4DHeIKVAAzsftnQh1PMd7JG3BqbsI2VRy57H833eEh2LKDC0LQBzNoV4= X-Received: by 2002:a25:f30c:0:b0:c42:2b05:17a5 with SMTP id c12-20020a25f30c000000b00c422b0517a5mr35154ybs.11.1688088653284; Thu, 29 Jun 2023 18:30:53 -0700 (PDT) MIME-Version: 1.0 References: <20230628172529.744839-1-surenb@google.com> <20230628172529.744839-6-surenb@google.com> <877crm246q.fsf@nvdebian.thelocal> In-Reply-To: <877crm246q.fsf@nvdebian.thelocal> From: Suren Baghdasaryan Date: Thu, 29 Jun 2023 18:30:41 -0700 Message-ID: Subject: Re: [PATCH v5 5/6] mm: handle swap page faults under per-VMA lock To: Alistair Popple Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7F6891C0018 X-Stat-Signature: m1inygnxwhh71pi4i6nst3ttwf4cjf7d X-HE-Tag: 1688088654-83001 X-HE-Meta: U2FsdGVkX1/fJjM3DSzTBL2C8pmnBc22PWT6sKJGG8qvDin9eJtH/F9Q1KEk0lAkMnJLnSlgQSBI/sDSqqxt0DgXlEmsm6astGjxEy7brNlpJ8J3qi9hTp9VxwbR2X8sCEtAwXt2RDkVelYeKqTwxPTAO5M34CGWoxnIuhISSqtOnhrwjbc9qhdoAtphXkDym1lV/CesebVByUMj+cvl+qMkBzK6ABlZjpZpPdTgusj6Em8SP13QhPBqf7B1jz2ZyDQ+4FrRSs9rLjrRGN12YFJgJjypaQO4wdpQeiLDdvsXP9c3leZvbXRwogOYB6jB6/05Zkh9rEzIAli06XW8mgt99p1OudDlDPK/oCWjuVZzP/NFITToGQ2zj5BbWPuxlKWhkimf/f1Doy9lCst9r3eVl0G8c77D6eg6zhIDc8i64L5Qp9waGk3q2zcNyFJJCP6R8erYa+4a4DJheqMciOGbF16La6Yucou82b0vYdNghc1ZUhJtFIv2vLwn/oUD74978OUF0KNjkNcFmcaRXmKMupDnbFS20nG7qoHT67FUclwwq1WQzJxfrPZJaFz2ZuT018aWSNVeqnwHGotKEWFzFdrc7+/Dy7S9vnM2GRuTzlNaLnDEIJ1mM6jH8W3P/Vctb0BtMX5DO5KSPbX3nk+sOMXwiZ2zs+2fVXyhJvU8MyPKlLQj9JxFnuLtpbpqRhesWSHWU1t/R8+25sE1qkNmO+QtjXN0eFwdRYu9YOlbkyHvKPCnvR+zj8GTi9jn7i619L7IWXXsaIJvkMCGyKSyOWHwitPnm81G6fmM4lRSCMiSgrVBk6ZrgoXkL6aOpGEecjeBEBl4rCe9ORV8MQY6WUM+A1H4Dvm2gnJWVl1haPjj9Q+ir0J2wk3/DEleNhNgV6yMmI/PtsKmAJugRnJ9+trU/VVRqTMx5TYaRtuV+c1xnoyUrUynj8rvG3SzcSiSwclq1x4M+xemF/7 TBL7M7eV 3O2tYFGE6mSMoZ6dgMIPt2Z+14/tt6sfVZ2NZh4jqz3xO+3/f/XV3/P9Rz4JosegKa/lDUPBRVlnq/HUgg37HGU0lNN5PdkCIaNyuxEaqy3q+N4Jh6V9Q+zW/7AQbPdQs/VWWm2nsI91IbjItsNl8Pjd6a8S4wjhiXdPyfIceZ7v6UM2Dz1KORdfehM3fR+qfvVRhAu9iWKDIbpOsU5bgxZLXlG784PYpuLUurexvnUezxwBW+5TLSzz7ZbYJVB1V5QqHjXpMPiKtfw22/41EuAejRMX3uNJ427ZabmLTs7VpToY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 28, 2023 at 11:06=E2=80=AFPM Alistair Popple wrote: > > > Looks good and passed the HMM selftests. So: > > Tested-by: Alistair Popple > Reviewed-by: Alistair Popple Thanks! > > Suren Baghdasaryan writes: > > > When page fault is handled under per-VMA lock protection, all swap page > > faults are retried with mmap_lock because folio_lock_or_retry has to dr= op > > and reacquire mmap_lock if folio could not be immediately locked. > > Follow the same pattern as mmap_lock to drop per-VMA lock when waiting > > for folio and retrying once folio is available. > > With this obstacle removed, enable do_swap_page to operate under > > per-VMA lock protection. Drivers implementing ops->migrate_to_ram might > > still rely on mmap_lock, therefore we have to fall back to mmap_lock in > > that particular case. > > Note that the only time do_swap_page calls synchronous swap_readpage > > is when SWP_SYNCHRONOUS_IO is set, which is only set for > > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > > pmem). Therefore we don't sleep in this path, and there's no need to > > drop the mmap or per-VMA lock. > > > > Signed-off-by: Suren Baghdasaryan > > Acked-by: Peter Xu > > --- > > include/linux/mm.h | 13 +++++++++++++ > > mm/filemap.c | 17 ++++++++--------- > > mm/memory.c | 16 ++++++++++------ > > 3 files changed, 31 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index fec149585985..bbaec479bf98 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -723,6 +723,14 @@ static inline void vma_mark_detached(struct vm_are= a_struct *vma, bool detached) > > struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > unsigned long address); > > > > +static inline void release_fault_lock(struct vm_fault *vmf) > > +{ > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > + vma_end_read(vmf->vma); > > + else > > + mmap_read_unlock(vmf->vma->vm_mm); > > +} > > + > > #else /* CONFIG_PER_VMA_LOCK */ > > > > static inline void vma_init_lock(struct vm_area_struct *vma) {} > > @@ -736,6 +744,11 @@ static inline void vma_assert_write_locked(struct = vm_area_struct *vma) {} > > static inline void vma_mark_detached(struct vm_area_struct *vma, > > bool detached) {} > > > > +static inline void release_fault_lock(struct vm_fault *vmf) > > +{ > > + mmap_read_unlock(vmf->vma->vm_mm); > > +} > > + > > #endif /* CONFIG_PER_VMA_LOCK */ > > > > /* > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 52bcf12dcdbf..d4d8f474e0c5 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1703,27 +1703,26 @@ static int __folio_lock_async(struct folio *fol= io, struct wait_page_queue *wait) > > * Return values: > > * 0 - folio is locked. > > * VM_FAULT_RETRY - folio is not locked. > > - * mmap_lock has been released (mmap_read_unlock(), unless flags h= ad both > > - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > > - * which case mmap_lock is still held. > > + * mmap_lock or per-VMA lock has been released (mmap_read_unlock()= or > > + * vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY a= nd > > + * FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still he= ld. > > * > > * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 > > - * with the folio locked and the mmap_lock unperturbed. > > + * with the folio locked and the mmap_lock/per-VMA lock is left unpert= urbed. > > */ > > vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault = *vmf) > > { > > - struct mm_struct *mm =3D vmf->vma->vm_mm; > > unsigned int flags =3D vmf->flags; > > > > if (fault_flag_allow_retry_first(flags)) { > > /* > > - * CAUTION! In this case, mmap_lock is not released > > - * even though return VM_FAULT_RETRY. > > + * CAUTION! In this case, mmap_lock/per-VMA lock is not > > + * released even though returning VM_FAULT_RETRY. > > */ > > if (flags & FAULT_FLAG_RETRY_NOWAIT) > > return VM_FAULT_RETRY; > > > > - mmap_read_unlock(mm); > > + release_fault_lock(vmf); > > if (flags & FAULT_FLAG_KILLABLE) > > folio_wait_locked_killable(folio); > > else > > @@ -1735,7 +1734,7 @@ vm_fault_t __folio_lock_or_retry(struct folio *fo= lio, struct vm_fault *vmf) > > > > ret =3D __folio_lock_killable(folio); > > if (ret) { > > - mmap_read_unlock(mm); > > + release_fault_lock(vmf); > > return VM_FAULT_RETRY; > > } > > } else { > > diff --git a/mm/memory.c b/mm/memory.c > > index 345080052003..4fb8ecfc6d13 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3712,12 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > if (!pte_unmap_same(vmf)) > > goto out; > > > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > - ret =3D VM_FAULT_RETRY; > > - vma_end_read(vma); > > - goto out; > > - } > > - > > entry =3D pte_to_swp_entry(vmf->orig_pte); > > if (unlikely(non_swap_entry(entry))) { > > if (is_migration_entry(entry)) { > > @@ -3727,6 +3721,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > vmf->page =3D pfn_swap_entry_to_page(entry); > > ret =3D remove_device_exclusive_entry(vmf); > > } else if (is_device_private_entry(entry)) { > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + /* > > + * migrate_to_ram is not yet ready to ope= rate > > + * under VMA lock. > > + */ > > + vma_end_read(vma); > > + ret =3D VM_FAULT_RETRY; > > + goto out; > > + } > > + > > vmf->page =3D pfn_swap_entry_to_page(entry); > > vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf-= >pmd, > > vmf->address, &vmf->ptl); >