From: Barry Song <21cnbao@gmail.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: David Hildenbrand <david@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
chrisl@kernel.org, yuzhao@google.com, hanchuanhua@oppo.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
xiang@kernel.org, mhocko@suse.com, shy828301@gmail.com,
wangkefeng.wang@huawei.com, Barry Song <v-songbaohua@oppo.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [RFC PATCH] mm: hold PTL from the first PTE while reclaiming a large folio
Date: Tue, 5 Mar 2024 23:28:29 +1300 [thread overview]
Message-ID: <CAGsJ_4w49DOycVmCuhJ2jcD+XNP6epT4rZ1YK6DU20DHNEyOdg@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4wYVogJD=ROfX195MPZrqK+=ibuycPBeFjrD1i9SvOqrw@mail.gmail.com>
On Tue, Mar 5, 2024 at 10:21 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 10:12 PM Huang, Ying <ying.huang@intel.com> wrote:
> >
> > Barry Song <21cnbao@gmail.com> writes:
> >
> > > On Tue, Mar 5, 2024 at 8:55 PM Huang, Ying <ying.huang@intel.com> wrote:
> > >>
> > >> Barry Song <21cnbao@gmail.com> writes:
> > >>
> > >> > On Tue, Mar 5, 2024 at 10:15 AM David Hildenbrand <david@redhat.com> wrote:
> > >> >> > But we did "resolve" those bugs by entirely untouching all PTEs if we
> > >> >> > found some PTEs were skipped in try_to_unmap_one [1].
> > >> >> >
> > >> >> > While we find we only get the PTL from 2nd, 3rd but not
> > >> >> > 1st PTE, we entirely give up on try_to_unmap_one, and leave
> > >> >> > all PTEs untouched.
> > >> >> >
> > >> >> > /* we are not starting from head */
> > >> >> > if (!IS_ALIGNED((unsigned long)pvmw.pte, CONT_PTES * sizeof(*pvmw.pte))) {
> > >> >> > ret = false;
> > >> >> > atomic64_inc(&perf_stat.mapped_walk_start_from_non_head);
> > >> >> > set_pte_at(mm, address, pvmw.pte, pteval);
> > >> >> > page_vma_mapped_walk_done(&pvmw);
> > >> >> > break;
> > >> >> > }
> > >> >> > This will ensure all PTEs still have a unified state such as CONT-PTE
> > >> >> > after try_to_unmap fails.
> > >> >> > I feel this could have some false postive because when racing
> > >> >> > with unmap, 1st PTE might really become pte_none. So explicitly
> > >> >> > holding PTL from 1st PTE seems a better way.
> > >> >>
> > >> >> Can we estimate the "cost" of holding the PTL?
> > >> >>
> > >> >
> > >> > This is just moving PTL acquisition one or two PTE earlier in those corner
> > >> > cases. In normal cases, it doesn't affect when PTL is held.
> > >>
> > >> The mTHP may be mapped at the end of page table. In that case, the PTL
> > >> will be held longer. Or am I missing something?
> > >
> > > no. this patch doesn't change when we release PTL but change when we
> > > get PTL.
> > >
> > > when the original code iterates nr_pages PTEs in a large folio, it will skip
> > > invalid PTEs, when it meets a valid one, it will acquire PTL. so if it gets
> > > intermediate PTE values some other threads are modifying, it might
> > > skip PTE0, or sometimes PTE0 and PTE1 according to my test. but
> > > arriving at PTE2, likely other threads have written a new value, so we
> > > will begin to hold PTL and iterate till the end of the large folio.
> >
> > Is there any guarantee that the mTHP will always be mapped at the
> > beginning of the page table (PTE0)? IIUC, mTHP can be mapped at PTE496.
> > If so, with your patch, PTL will be held from PTE0 instead of PTE496 in
> > some cases.
>
> I agree. but in another discussion[1], the plan is if we find a large folio has
> been deferred split, we split it before try_to_unmap and pageout. otherwise,
> we may result in lots of redundant I/O, because PTE0-495 will still be
> pageout()-ed.
>
> [1] https://lore.kernel.org/linux-mm/a4a9054f-2040-4f70-8d10-a5af4972e5aa@arm.com/
I thought about this again, seems we can cope with it even w/o the above plan
by:
+ if (folio_test_large(folio) && list_empty(&folio->_deferred_list))
+ flags |= TTU_SYNC;
if a folio has been deferred split, it seems no sense to have the optimization
for the corner cases this patch wants to provide. Only while we know this
folio is still entirely mapped, we have this optimization. This should have
reduced the chance to be quite small though we still have a bit.
>
> >
> > --
> > Best Regards,
> > Huang, Ying
> >
> > > The proposal is that we directly get PTL from PTE0, thus we don't get
> > > intermediate values for the head of nr_pages PTEs. this will ensure
> > > a large folio is either completely unmapped or completely mapped.
> > > but not partially mapped and partially unmapped.
> > >
> > >>
Thanks
Barry
next prev parent reply other threads:[~2024-03-05 10:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-04 10:37 Barry Song
2024-03-04 12:20 ` Ryan Roberts
2024-03-04 12:41 ` David Hildenbrand
2024-03-04 13:03 ` Ryan Roberts
2024-03-04 14:27 ` David Hildenbrand
2024-03-04 20:42 ` Barry Song
2024-03-04 21:02 ` David Hildenbrand
2024-03-04 21:41 ` Barry Song
2024-03-04 21:04 ` Barry Song
2024-03-04 21:15 ` David Hildenbrand
2024-03-04 22:29 ` Barry Song
2024-03-05 7:53 ` Huang, Ying
2024-03-05 9:02 ` Barry Song
2024-03-05 9:10 ` Huang, Ying
2024-03-05 9:21 ` Barry Song
2024-03-05 10:28 ` Barry Song [this message]
2024-03-04 22:02 ` Ryan Roberts
2024-03-05 7:50 ` Huang, Ying
2024-03-04 21:57 ` Barry Song
2024-03-05 8:54 ` Ryan Roberts
2024-03-05 9:08 ` Barry Song
2024-03-05 9:11 ` Ryan Roberts
2024-03-05 9:15 ` Barry Song
2024-03-05 7:28 ` Huang, Ying
2024-03-05 8:56 ` Barry Song
2024-03-05 9:04 ` Huang, Ying
2024-03-05 9:08 ` Ryan Roberts
2024-03-05 9:11 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4w49DOycVmCuhJ2jcD+XNP6epT4rZ1YK6DU20DHNEyOdg@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox