From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
baolin.wang@linux.alibaba.com, chrisl@kernel.org,
hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com,
kasong@tencent.com, linux-kernel@vger.kernel.org,
surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org,
xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com,
yuzhao@google.com, ziy@nvidia.com
Subject: Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache
Date: Tue, 7 May 2024 18:48:03 +0800 [thread overview]
Message-ID: <CAGsJ_4whPN91m0v_=TtnnQN8y1CQgau+4F40q1vkrRUFK5t0wA@mail.gmail.com> (raw)
In-Reply-To: <41c1bd1f-b1d7-4faf-a422-1eff7425b35c@redhat.com>
On Tue, May 7, 2024 at 6:39 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.24 11:24, Barry Song wrote:
> > On Tue, May 7, 2024 at 8:59 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>> Let's assume a single subpage of a large folio is no longer mapped.
> >>>> Then, we'd have:
> >>>>
> >>>> nr_pages == folio_nr_pages(folio) - 1.
> >>>>
> >>>> You could simply map+reuse most of the folio without COWing.
> >>>
> >>> yes. This is good but the pte which is no longer mapped could be
> >>> anyone within the nr_pages PTEs. so it could be quite tricky for
> >>> set_ptes.
> >>
> >> The swap batching logic should take care of that, otherwise it would be
> >> buggy.
> >
> > When you mention "it would be buggy," are you also referring to the current
> > fallback approach? or only refer to the future patch which might be able
> > to map/reuse "nr_pages - 1" pages?
>
> swap_pte_batch() should not skip any holes. So consequently, set_ptes()
> should do the right thing. (regarding your comment "could be quite ricky
> for set_ptes")
>
> So I think that should be working as expected.
maybe not. take a look at my current code, I am goto check_folio with
nr_pages = 1
if swap_pte_batch(folio_ptep, nr, folio_pte) != folio_nr_pages(folio).
+ nr_pages = 1;
+ ...
+ if (folio_test_large(folio) && folio_test_swapcache(folio)) {
+ int nr = folio_nr_pages(folio);
+ ...
+ if (!pte_same(folio_pte,
pte_move_swp_offset(vmf->orig_pte, -idx)) ||
+ swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
+ goto check_folio; /* read here, i am falling
back nr_pages = 1 */
+
+
+ ...
+ nr_pages = nr;
The fallback(=1) works. but it seems you are proposing set nr_pages =
swap_pte_batch(folio_ptep, nr, folio_pte)
if (swap_pte_batch(folio_ptep, nr, folio_pte) > 1 &&
swap_pte_batch(folio_ptep, nr, folio_pte) <
nr_pages) ?
>
> >
> > The current patch falls back to setting nr_pages = 1 without mapping or
> > reusing nr_pages - 1. I feel your concern doesn't refer to this fallback?
> >
> >>
> >>>
> >>>>
> >>>> Once we support COW reuse of PTE-mapped THP we'd do the same. Here, it's
> >>>> just easy to detect that the folio is exclusive (folio_ref_count(folio)
> >>>> == 1 before mapping anything).
> >>>>
> >>>> If you really want to mimic what do_wp_page() currently does, you should
> >>>> have:
> >>>>
> >>>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio))
> >>>
> >>> I actually dislike the part that do_wp_page() handles the reuse of a large
> >>> folio which is entirely mapped. For example, A forks B, B exit, we write
> >>> A's large folio, we get nr_pages CoW of small folios. Ideally, we can
> >>> reuse the whole folios for writing.
> >>
> >> Yes, see the link I shared to what I am working on. There isn't really a
> >> question if what we do right now needs to be improved and all these
> >> scenarios are pretty obvious clear.
> >
> > Great! I plan to dedicate more time to reviewing your work.
>
> Nice! And there will be a lot of follow-up optimization work I won't
> tackle immediately regarding COW (COW-reuse around, maybe sometimes we
> want to COW bigger chunks).
>
> I still have making PageAnonExclusive a per-folio flag on my TODO list,
> that will help the COW-reuse around case a lot.
>
> >
> >>
> >>>
> >>>>
> >>>> Personally, I think we should keep it simple here and use:
> >>>>
> >>>> exclusive || folio_ref_count(folio) == 1
> >>>
> >>> I feel this is still better than
> >>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio))
> >>> as we reuse the whole large folio. the do_wp_page() behaviour
> >>> doesn't have this.
> >>
> >> Yes, but there is the comment "Same logic as in do_wp_page();". We
> >> already ran into issues having different COW reuse logic all over the
> >> place. For this case here, I don't care if we leave it as
> >>
> >> "exclusive || folio_ref_count(folio) == 1"
> >
> > I'm perfectly fine with using the code for this patchset and maybe
> > looking for other
> > opportunities for potential optimization as an incremental patchset,
> > for example,
> > reusing the remaining PTEs as suggested by you - "simply map+reuse most of
> > the folio without COWing."
> >
> >>
> >> But let's not try inventing new stuff here.
> >
> > It seems you ignored and snipped my "16 + 14" pages and "15" pages
> > example though. but once we support "simply map+reuse most of the
> > folio without COWing", the "16+14" problem can be resolved, instead,
> > we consume 16 pages.
>
>
> Oh, sorry for skipping that, for me it was rather clear: the partially
> mapped folios will be on the deferred split list and the excess memory
> can (and will be) reclaimed when there is need. So this temporary memory
> consumption is usually not a problem in practice. But yes, something to
> optimize (just like COW reuse in general).
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2024-05-07 10:48 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-03 0:50 [PATCH v3 0/6] large folios swap-in: handle refault cases first Barry Song
2024-05-03 0:50 ` [PATCH v3 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
2024-05-03 9:26 ` Ryan Roberts
2024-05-03 20:25 ` Chris Li
2024-05-08 7:35 ` Huang, Ying
2024-05-03 0:50 ` [PATCH v3 2/6] mm: remove swap_free() and always use swap_free_nr() Barry Song
2024-05-03 9:31 ` Ryan Roberts
2024-05-03 20:37 ` Chris Li
2024-05-04 4:03 ` Christoph Hellwig
2024-05-04 4:27 ` Barry Song
2024-05-04 4:28 ` Christoph Hellwig
2024-05-04 4:47 ` Barry Song
2024-05-08 7:56 ` Huang, Ying
2024-05-08 8:30 ` Barry Song
2024-05-08 9:10 ` Ryan Roberts
2024-05-03 0:50 ` [PATCH v3 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally Barry Song
2024-05-03 9:41 ` Ryan Roberts
2024-05-03 23:40 ` Barry Song
2024-05-06 8:06 ` David Hildenbrand
2024-05-06 8:20 ` Barry Song
2024-05-06 8:31 ` David Hildenbrand
2024-05-07 8:14 ` Ryan Roberts
2024-05-07 8:24 ` Barry Song
2024-05-07 9:39 ` Ryan Roberts
2024-05-03 20:51 ` Chris Li
2024-05-03 23:07 ` Barry Song
2024-05-08 8:08 ` Huang, Ying
2024-05-03 0:50 ` [PATCH v3 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages Barry Song
2024-05-03 10:02 ` Ryan Roberts
2024-05-06 16:51 ` Khalid Aziz
2024-05-03 0:50 ` [PATCH v3 5/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
2024-05-03 0:50 ` [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache Barry Song
2024-05-03 10:50 ` Ryan Roberts
2024-05-03 23:23 ` Barry Song
2024-05-06 12:07 ` David Hildenbrand
2024-05-06 12:38 ` Barry Song
2024-05-06 12:58 ` Barry Song
2024-05-06 13:16 ` David Hildenbrand
2024-05-06 22:58 ` Barry Song
2024-05-07 8:24 ` David Hildenbrand
2024-05-07 8:43 ` Barry Song
2024-05-07 8:59 ` David Hildenbrand
2024-05-07 9:24 ` Barry Song
2024-05-07 10:39 ` David Hildenbrand
2024-05-07 10:48 ` Barry Song [this message]
2024-05-07 8:17 ` Ryan Roberts
2024-05-06 12:05 ` David Hildenbrand
2024-05-06 12:27 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGsJ_4whPN91m0v_=TtnnQN8y1CQgau+4F40q1vkrRUFK5t0wA@mail.gmail.com' \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox