From: David Hildenbrand <david@redhat.com>
To: Barry Song <21cnbao@gmail.com>, Ryan Roberts <ryan.roberts@arm.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
baolin.wang@linux.alibaba.com, chrisl@kernel.org,
hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com,
kasong@tencent.com, linux-kernel@vger.kernel.org,
surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org,
xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com,
yuzhao@google.com, ziy@nvidia.com
Subject: Re: [PATCH v3 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally
Date: Mon, 6 May 2024 10:06:02 +0200 [thread overview]
Message-ID: <0d20d8af-e480-4eb8-8606-1e486b13fd7e@redhat.com> (raw)
In-Reply-To: <CAGsJ_4wx60GoB1erTQ7v3GTXLb_140bOJ_+z=kqY39eOd3P23g@mail.gmail.com>
On 04.05.24 01:40, Barry Song wrote:
> On Fri, May 3, 2024 at 5:41 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 03/05/2024 01:50, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> There could arise a necessity to obtain the first pte_t from a swap
>>> pte_t located in the middle. For instance, this may occur within the
>>> context of do_swap_page(), where a page fault can potentially occur in
>>> any PTE of a large folio. To address this, the following patch introduces
>>> pte_move_swp_offset(), a function capable of bidirectional movement by
>>> a specified delta argument. Consequently, pte_increment_swp_offset()
>>
>> You mean pte_next_swp_offset()?
>
> yes.
>
>>
>>> will directly invoke it with delta = 1.
>>>
>>> Suggested-by: "Huang, Ying" <ying.huang@intel.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>> mm/internal.h | 25 +++++++++++++++++++++----
>>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index c5552d35d995..cfe4aed66a5c 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -211,18 +211,21 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>>> }
>>>
>>> /**
>>> - * pte_next_swp_offset - Increment the swap entry offset field of a swap pte.
>>> + * pte_move_swp_offset - Move the swap entry offset field of a swap pte
>>> + * forward or backward by delta
>>> * @pte: The initial pte state; is_swap_pte(pte) must be true and
>>> * non_swap_entry() must be false.
>>> + * @delta: The direction and the offset we are moving; forward if delta
>>> + * is positive; backward if delta is negative
>>> *
>>> - * Increments the swap offset, while maintaining all other fields, including
>>> + * Moves the swap offset, while maintaining all other fields, including
>>> * swap type, and any swp pte bits. The resulting pte is returned.
>>> */
>>> -static inline pte_t pte_next_swp_offset(pte_t pte)
>>> +static inline pte_t pte_move_swp_offset(pte_t pte, long delta)
>>
>> We have equivalent functions for pfn:
>>
>> pte_next_pfn()
>> pte_advance_pfn()
>>
>> Although the latter takes an unsigned long and only moves forward currently. I
>> wonder if it makes sense to have their naming and semantics match? i.e. change
>> pte_advance_pfn() to pte_move_pfn() and let it move backwards too.
>>
>> I guess we don't have a need for that and it adds more churn.
>
> we might have a need in the below case.
> A forks B, then A and B share large folios. B unmap/exit, then large
> folios of process
> A become single-mapped.
> Right now, while writing A's folios, we are CoWing A's large folios
> into many small
> folios. I believe we can reuse the entire large folios instead of doing nr_pages
> CoW and page faults.
> In this case, we might want to get the first PTE from vmf->pte.
Once we have COW reuse for large folios in place (I think you know that
I am working on that), it might make sense to "COW-reuse around",
meaning we look if some neighboring PTEs map the same large folio and
map them writable as well. But if it's really worth it, increasing page
fault latency, is to be decided separately.
>
> Another case, might be
> A forks B, and we write either A or B, we might CoW an entire large
> folios instead
> CoWing nr_pages small folios.
>
> case 1 seems more useful, I might have a go after some days. then we might
> see pte_move_pfn().
pte_move_pfn() does sound odd to me. It might not be required to
implement the optimization described above. (it's easier to simply read
another PTE, check if it maps the same large folio, and to batch from there)
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-05-06 8:06 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-03 0:50 [PATCH v3 0/6] large folios swap-in: handle refault cases first Barry Song
2024-05-03 0:50 ` [PATCH v3 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
2024-05-03 9:26 ` Ryan Roberts
2024-05-03 20:25 ` Chris Li
2024-05-08 7:35 ` Huang, Ying
2024-05-03 0:50 ` [PATCH v3 2/6] mm: remove swap_free() and always use swap_free_nr() Barry Song
2024-05-03 9:31 ` Ryan Roberts
2024-05-03 20:37 ` Chris Li
2024-05-04 4:03 ` Christoph Hellwig
2024-05-04 4:27 ` Barry Song
2024-05-04 4:28 ` Christoph Hellwig
2024-05-04 4:47 ` Barry Song
2024-05-08 7:56 ` Huang, Ying
2024-05-08 8:30 ` Barry Song
2024-05-08 9:10 ` Ryan Roberts
2024-05-03 0:50 ` [PATCH v3 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally Barry Song
2024-05-03 9:41 ` Ryan Roberts
2024-05-03 23:40 ` Barry Song
2024-05-06 8:06 ` David Hildenbrand [this message]
2024-05-06 8:20 ` Barry Song
2024-05-06 8:31 ` David Hildenbrand
2024-05-07 8:14 ` Ryan Roberts
2024-05-07 8:24 ` Barry Song
2024-05-07 9:39 ` Ryan Roberts
2024-05-03 20:51 ` Chris Li
2024-05-03 23:07 ` Barry Song
2024-05-08 8:08 ` Huang, Ying
2024-05-03 0:50 ` [PATCH v3 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages Barry Song
2024-05-03 10:02 ` Ryan Roberts
2024-05-06 16:51 ` Khalid Aziz
2024-05-03 0:50 ` [PATCH v3 5/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
2024-05-03 0:50 ` [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache Barry Song
2024-05-03 10:50 ` Ryan Roberts
2024-05-03 23:23 ` Barry Song
2024-05-06 12:07 ` David Hildenbrand
2024-05-06 12:38 ` Barry Song
2024-05-06 12:58 ` Barry Song
2024-05-06 13:16 ` David Hildenbrand
2024-05-06 22:58 ` Barry Song
2024-05-07 8:24 ` David Hildenbrand
2024-05-07 8:43 ` Barry Song
2024-05-07 8:59 ` David Hildenbrand
2024-05-07 9:24 ` Barry Song
2024-05-07 10:39 ` David Hildenbrand
2024-05-07 10:48 ` Barry Song
2024-05-07 8:17 ` Ryan Roberts
2024-05-06 12:05 ` David Hildenbrand
2024-05-06 12:27 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0d20d8af-e480-4eb8-8606-1e486b13fd7e@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=hanchuanhua@oppo.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox