From: Ryan Roberts <ryan.roberts@arm.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, david@redhat.com,
chrisl@kernel.org, yuzhao@google.com, hanchuanhua@oppo.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
ying.huang@intel.com, xiang@kernel.org, mhocko@suse.com,
shy828301@gmail.com, wangkefeng.wang@huawei.com,
Barry Song <v-songbaohua@oppo.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [RFC PATCH] mm: hold PTL from the first PTE while reclaiming a large folio
Date: Tue, 5 Mar 2024 08:54:41 +0000 [thread overview]
Message-ID: <e5e14ef9-1bd2-45a8-818d-e92910e97f8f@arm.com> (raw)
In-Reply-To: <CAGsJ_4wx7oSzt4vn6B+LRoZetMhH-fDXRFrCFRyoqVOakLidjg@mail.gmail.com>
On 04/03/2024 21:57, Barry Song wrote:
> On Tue, Mar 5, 2024 at 1:21 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi Barry,
>>
>> On 04/03/2024 10:37, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> page_vma_mapped_walk() within try_to_unmap_one() races with other
>>> PTEs modification such as break-before-make, while iterating PTEs
>>> of a large folio, it will only begin to acquire PTL after it gets
>>> a valid(present) PTE. break-before-make intermediately sets PTEs
>>> to pte_none. Thus, a large folio's PTEs might be partially skipped
>>> in try_to_unmap_one().
>>
>> I just want to check my understanding here - I think the problem occurs for
>> PTE-mapped, PMD-sized folios as well as smaller-than-PMD-size large folios? Now
>> that I've had a look at the code and have a better understanding, I think that
>> must be the case? And therefore this problem exists independently of my work to
>> support swap-out of mTHP? (From your previous report I was under the impression
>> that it only affected mTHP).
>
> I think this affects all large folios with PTEs entries more than 1. but hugeTLB
> is handled as a whole in try_to_unmap_one and its rmap is removed all
> together, i feel hugeTLB doesn't have this problem.
>
>>
>> Its just that the problem is becoming more pronounced because with mTHP,
>> PTE-mapped large folios are much more common?
>
> right. as now large folios become a more common case, and it is my case
> running in millions of phones.
>
> BTW, I feel we can somehow learn from hugeTLB, for example, we can reclaim
> all PTEs all together rather than iterating PTEs one by one. This will improve
> performance. for example, a batched
> set_ptes_to_swap_entries()
> {
> }
> then we only need to loop once for a large folio, right now we are looping
> nr_pages times.
You still need a pte-pte loop somewhere. In hugetlb's case it's in the arch
implementation. HugeTLB ptes are all a fixed size for a given VMA, which makes
things a bit easier too, whereas in the regular mm, they are now a variable size.
David and I introduced folio_pte_batch() to help gather batches of ptes, and it
uses the contpte bit to avoid iterating over intermediate ptes. And I'm adding
swap_pte_batch() which does a similar thing for swap entry batching in v4 of my
swap-out series.
For your set_ptes_to_swap_entries() example, I'm not sure what it would do other
than loop over the PTEs setting an incremented swap entry to each one? How is
that more performant?
Thanks,
Ryan
next prev parent reply other threads:[~2024-03-05 8:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-04 10:37 Barry Song
2024-03-04 12:20 ` Ryan Roberts
2024-03-04 12:41 ` David Hildenbrand
2024-03-04 13:03 ` Ryan Roberts
2024-03-04 14:27 ` David Hildenbrand
2024-03-04 20:42 ` Barry Song
2024-03-04 21:02 ` David Hildenbrand
2024-03-04 21:41 ` Barry Song
2024-03-04 21:04 ` Barry Song
2024-03-04 21:15 ` David Hildenbrand
2024-03-04 22:29 ` Barry Song
2024-03-05 7:53 ` Huang, Ying
2024-03-05 9:02 ` Barry Song
2024-03-05 9:10 ` Huang, Ying
2024-03-05 9:21 ` Barry Song
2024-03-05 10:28 ` Barry Song
2024-03-04 22:02 ` Ryan Roberts
2024-03-05 7:50 ` Huang, Ying
2024-03-04 21:57 ` Barry Song
2024-03-05 8:54 ` Ryan Roberts [this message]
2024-03-05 9:08 ` Barry Song
2024-03-05 9:11 ` Ryan Roberts
2024-03-05 9:15 ` Barry Song
2024-03-05 7:28 ` Huang, Ying
2024-03-05 8:56 ` Barry Song
2024-03-05 9:04 ` Huang, Ying
2024-03-05 9:08 ` Ryan Roberts
2024-03-05 9:11 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5e14ef9-1bd2-45a8-818d-e92910e97f8f@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shy828301@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox