From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F0268C77B72
	for <linux-mm@archiver.kernel.org>; Fri, 14 Apr 2023 21:52:13 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 81950900002; Fri, 14 Apr 2023 17:52:13 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7C8C26B0075; Fri, 14 Apr 2023 17:52:13 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 69088900002; Fri, 14 Apr 2023 17:52:13 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 594766B0072
	for <linux-mm@kvack.org>; Fri, 14 Apr 2023 17:52:13 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 22E07403E6
	for <linux-mm@kvack.org>; Fri, 14 Apr 2023 21:52:13 +0000 (UTC)
X-FDA: 80681345346.02.40A6057
Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180])
	by imf30.hostedemail.com (Postfix) with ESMTP id 5764D8000F
	for <linux-mm@kvack.org>; Fri, 14 Apr 2023 21:52:11 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=20XlelL+;
	spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1681509131;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=46Ec7uFiIfmRmN61V1bJw83NzlxVzPRXSYdLuJ40kFQ=;
	b=AqIZ0zss4eU+vWHlVfE2ipGmb1wI9qiDrxWCUcSgyXqpTWn1KAL2Yy40/HEuFCRaEU2MYq
	UIscl0UfO0xBUd9q4z29XCO7L/8BcKtSlHa2aJYEDoIenTsjmbo/7QNNnb3UiV5qkRmLZw
	Xi7STA7fnz8NfYncXG7ttCC7X4cEhi4=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=20XlelL+;
	spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681509131; a=rsa-sha256;
	cv=none;
	b=mgZ5N7iSxfesYUZ/nxLol6nr+eOlADEfCAv8e9lW0TDAVwjPmyfl35oM4LDo7B7DiTymbY
	D5rNBj1SOqkSm2q623WHBzizmjQ3qy+KmvIuhePF2PdQJxg6/T/rvLA2sIm9qoc//NtfnZ
	9dJ9YrMfiIGpJSiMzmcIdV9/J9kcc3c=
Received: by mail-yb1-f180.google.com with SMTP id r184so4406548ybc.1
        for <linux-mm@kvack.org>; Fri, 14 Apr 2023 14:52:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1681509130; x=1684101130;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=46Ec7uFiIfmRmN61V1bJw83NzlxVzPRXSYdLuJ40kFQ=;
        b=20XlelL+jChK79ShTU3jIFYMVYsTy4bydr0eghQ0sbN+vYBGl2mvjANHKs+gvvmfNd
         behIfmNJuGLYbs0EZCP3B1qIKL30mKS2+TMMsC+6goZ8ht6B4engzIc7LOewjic8jlcm
         bJBeh2n/L8L4jM4iuQZ1hrB+694ETnuacFgJ+PLl48YhPn8X9Iz+XeWu5Ud2EeCEdHNJ
         4seUQwlqrRZ2XLAwOoH0X9uuj8qKAdU5krgXfhVau1I88bqzByQMo9ntrZuPDwF6bkb9
         sOrFsppnzMHHEuCTAsWkpgx6Hq2VPnrojC2GAbQn7fk1KHNO3i2HjuK1vMZcMz3SFBDL
         GuYg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681509130; x=1684101130;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=46Ec7uFiIfmRmN61V1bJw83NzlxVzPRXSYdLuJ40kFQ=;
        b=dLwPQ7rXcsOgH+Vim8tPaP31wxsTVK0n+tfyCA5jn6wF6qbebmGDzC0GDV+7aT3Mle
         GX8GszkcvNuqyt38KaF3+Yw4RHfAlaFaaeQ9WnkUZdSWA8rsE/jykrQeRUgox1zQqLqJ
         YF4BKdq2maiLieQHzifiyvSIxGj+xdS6Sc09MmR38SGqauVq2o9ElFyvr0P9XEPHmB86
         2YU+mCbe3/KJrNEH4cEJM3mpgcT+Tgf45KKWXUjUaP/hvDB+YTW3ZEyMZbCPdYd3um0P
         WnrEnKHgd9WA3JpVAR1r84FwKR4na9cCkyi6/I8FDn2g8swpA7KnKaTWJVVFskp5o/oM
         cf4w==
X-Gm-Message-State: AAQBX9ewDXDdXrw2Xp7ekHSO7boiXCSBtLmj2dXTvpKkB6BbpC1k8ftZ
	kNGNS2AEww148K9YXvgg3NOhsdrsaK2Vp5MxOweOPA==
X-Google-Smtp-Source: AKy350ab6zATiyZGtaIhEWr8kb6QJ/jqLf1DxG2XPENo2+Qz94wtYQds9/wct2ztRxrJLixMF3T2qCyEYoN2GfJz8Ho=
X-Received: by 2002:a25:308a:0:b0:b8e:e0db:5b9d with SMTP id
 w132-20020a25308a000000b00b8ee0db5b9dmr3918724ybw.12.1681509130246; Fri, 14
 Apr 2023 14:52:10 -0700 (PDT)
MIME-Version: 1.0
References: <20230414180043.1839745-1-surenb@google.com> <ZDmetaUdmlEz/W8Q@casper.infradead.org>
 <CAJuCfpFPNiZmqQPP+K7CAuiFP5qLdd6W9T84VQNdRsN-9ggm1w@mail.gmail.com> <ZDm4P37XXyMBOMdZ@casper.infradead.org>
In-Reply-To: <ZDm4P37XXyMBOMdZ@casper.infradead.org>
From: Suren Baghdasaryan <surenb@google.com>
Date: Fri, 14 Apr 2023 14:51:59 -0700
Message-ID: <CAJuCfpF2idZUjON6TZw4NV+himmACMGGE=2jgmt=fgAXv6L5Pg@mail.gmail.com>
Subject: Re: [PATCH 1/1] mm: handle swap page faults if the faulting page can
 be locked
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, 
	josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, 
	laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, 
	jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, 
	punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, 
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, 
	kernel-team@android.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam03
X-Stat-Signature: o3sanrc4gpreequ5p4qehr77xebtr66f
X-Rspamd-Queue-Id: 5764D8000F
X-HE-Tag: 1681509131-133418
X-HE-Meta: U2FsdGVkX18aX4zrLOiVuaX/ho2j7v/hBVvcCKJf5ug1R4baYu2AwjuDKuERuI73SKARixPYjaMQbJi9lLIQlusqONQSuyKD48IYl+w8G6DRRueKR+cgYkH5vWteTnvv8BK6M3ay++kypVHruR7SkGhDa+ggpqJjb804siarozDhgiikpFZBFRxjJ/naIkCgCIyYgh3Jy1lgFslnNHJORymEygp0Xc66bnsp28vVE9VQY/U5LcScx30sXfvMocYahDBByxCeZVRbxiM48VCntIVG3eEVc5pz1Z1HzFQiJP7QHCmju3OY4NJ52ndOKt+IjDGC2YynJN7XVPqVcb5NiRVxJT9Eg13IL8dN7v/g1bCq+ihCiWHKcF3ZDnKQf1yS5r3xT3kXalnb0e4De+hXclXIMxsRaAqADpvCXC0W/f/l54o9ufMMhxU6yaV8Zk/5VPTV4gXIL9BHvFBiXskb/4l1+jkjwEB4APvVvQpYgWmw84TPUxhLM+CdBL72DPF8iE121JrxHzuZX0oI5w1OIIERfqkW/NiMxQlL0wqbEUT+LR2jTKJVDzuPGFIiPFrAWQWrhNH2qph3odiQifd4rPIxWwGVM72kBXmpYMXtSO8WBnuU/C5TkkscSLe1XzrxaUvOuTyyKZvTEwfqMJp5m9kgeWuFjvlCyYNmTdDtn3AxH/FHwavsO8g5v8Wxw/dKQKfDeVk8lKYVP5CcGEoFCnhnsr2a3Q7oBW6cCgSmLJf/SORyvhXEJAbqOKFmA0ZJhBpjcwpGJJxBc76143sIrOeQh8l27xw8fv1TdRNiIF5aQUhWsMHILfOjng97Z6yJmSI0/v8qvS7T+/eaEzU+8A/WHed59rMmINsdY0eHqVUuKtM9Lo6JgNnwDz7g/9z/WwZnj1aKouCDUQ2DopF1eresnePaW5sgbfCa8EXy7x412qB3g4R8WZBo8SIAeY+9U4CtoZNepyvX3AzfV5D
 SkfBF7Yp
 51h3YGaI1UF2GBuhbRp8NQoCDQjXL3hS38ZSYpd+/frOnEzFG2Xj1ISq1YNLMDhWddCmOqMhGYQoddE8msOWt0tdl0ojgPbJruVhFGlxFHxLjfE3ErBT5ZnGInY8srJgKad3+hfNgCBXbFpe6KKfUEOHBjpmdr1oqGzycyeInASTlAVSbE+iOM9KJ03X3tLzXSGAoJRlDnwDDa4QFCfP4u95H4etJwR1fseLanVMclOaAsJ8hIfJJ19oK7Xc6JXmNMSM7rC48lIvJ89LiMp2db3zaSt3WAhwA9ag8Ssa8wWkIKh6VrZhPv939Ng==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Apr 14, 2023 at 1:32=E2=80=AFPM Matthew Wilcox <willy@infradead.org=
> wrote:
>
> On Fri, Apr 14, 2023 at 12:48:54PM -0700, Suren Baghdasaryan wrote:
> > >  - We can call migration_entry_wait().  This will wait for PG_locked =
to
> > >    become clear (in migration_entry_wait_on_locked()).  As previously
> > >    discussed offline, I think this is safe to do while holding the VM=
A
> > >    locked.
>
> Just to be clear, this particular use of PG_locked is not during I/O,
> it's during page migration.  This is a few orders of magnitude
> different.
>
> > >  - We can call swap_readpage() if we allocate a new folio.  I haven't
> > >    traced through all this code to tell if it's OK.
>
> ... whereas this will wait for I/O.  If we decide that's not OK, we'll
> need to test for FAULT_FLAG_VMA_LOCK and bail out of this path.
>
> > > So ... I believe this is all OK, but we're definitely now willing to
> > > wait for I/O from the swap device while holding the VMA lock when we
> > > weren't before.  And maybe we should make a bigger deal of it in the
> > > changelog.
> > >
> > > And maybe we shouldn't just be failing the folio_lock_or_retry(),
> > > maybe we should be waiting for the folio lock with the VMA locked.
> >
> > Wouldn't that cause holding the VMA lock for the duration of swap I/O
> > (something you said we want to avoid in the previous paragraph) and
> > effectively undo d065bd810b6d ("mm: retry page fault when blocking on
> > disk transfer") for VMA locks?
>
> I'm not certain we want to avoid holding the VMA lock for the duration
> of an I/O.  Here's how I understand the rationale for avoiding holding
> the mmap_lock while we perform I/O (before the existence of the VMA lock)=
:
>
>  - If everybody is doing page faults, there is no specific problem;
>    we all hold the lock for read and multiple page faults can be handled
>    in parallel.
>  - As soon as one thread attempts to manipulate the tree (eg calls
>    mmap()), all new readers must wait (as the rwsem is fair), and the
>    writer must wait for all existing readers to finish.  That's
>    potentially milliseconds for an I/O during which time all page faults
>    stop.
>
> Now we have the per-VMA lock, faults which can be handled without taking
> the mmap_lock can still be satisfied, as long as that VMA is not being
> modified.  It is rare for a real application to take a page fault on a
> VMA which is being modified.
>
> So modifications to the tree will generally not take VMA locks on VMAs
> which are currently handling faults, and new faults will generally not
> find a VMA which is write-locked.
>
> When we find a locked folio (presumably for I/O, although folios are
> locked for other reasons), if we fall back to taking the mmap_lock
> for read, we increase contention on the mmap_lock and make the page
> fault wait on any mmap() operation.

Do you mean we increase mmap_lock contention by holding the mmap_lock
between the start of pagefault retry and until we drop it in
__folio_lock_or_retry?

> If we simply sleep waiting for the
> I/O, we make any mmap() operation _which touches this VMA_ wait for
> the I/O to complete.  But I think that's OK, because new page faults
> can continue to be serviced ... as long as they don't need to take
> the mmap_lock.

Ok, so we will potentially block VMA writers for the duration of the I/O...
Stupid question: why was this a bigger problem for mmap_lock?
Potentially our address space can consist of only one anon VMA, so
locking that VMA vs mmap_lock should be the same from swap pagefault
POV. Maybe mmap_lock is taken for write in some other important cases
when VMA lock is not needed?

>
> So ... I think what we _really_ want here is ...
>
> +++ b/mm/filemap.c
> @@ -1690,7 +1690,8 @@ static int __folio_lock_async(struct folio *folio, =
struct wait_page_queue *wait)
>  bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>                          unsigned int flags)
>  {
> -       if (fault_flag_allow_retry_first(flags)) {
> +       if (!(flags & FAULT_FLAG_VMA_LOCK) &&
> +           fault_flag_allow_retry_first(flags)) {
>                 /*
>                  * CAUTION! In this case, mmap_lock is not released
>                  * even though return 0.
> @@ -1710,7 +1711,8 @@ bool __folio_lock_or_retry(struct folio *folio, str=
uct mm_struct *mm,
>
>                 ret =3D __folio_lock_killable(folio);
>                 if (ret) {
> -                       mmap_read_unlock(mm);
> +                       if (!(flags & FAULT_FLAG_VMA_LOCK))
> +                               mmap_read_unlock(mm);
>                         return false;
>                 }
>         } else {
>