Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Nadav Amit <nadav.amit@gmail.com>,
	Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries
Date: Tue, 2 Aug 2022 14:06:37 +0200	[thread overview]
Message-ID: <49434bea-3862-1052-2993-8ccad985708b@redhat.com> (raw)
In-Reply-To: <YuhVJmSsgs4Q1bYJ@xz-m1.local>

On 02.08.22 00:35, Peter Xu wrote:
> On Mon, Aug 01, 2022 at 10:21:32AM +0200, David Hildenbrand wrote:
>> On 29.07.22 03:40, Peter Xu wrote:
>>> [Marking as RFC; only x86 is supported for now, plan to add a few more
>>>  archs when there's a formal version]
>>>
>>> Problem
>>> =======
>>>
>>> When migrate a page, right now we always mark the migrated page as old.
>>> The reason could be that we don't really know whether the page is hot or
>>> cold, so we could have taken it a default negative assuming that's safer.
>>>
>>> However that could lead to at least two problems:
>>>
>>>   (1) We lost the real hot/cold information while we could have persisted.
>>>       That information shouldn't change even if the backing page is changed
>>>       after the migration,
>>>
>>>   (2) There can be always extra overhead on the immediate next access to
>>>       any migrated page, because hardware MMU needs cycles to set the young
>>>       bit again (as long as the MMU supports).
>>>
>>> Many of the recent upstream works showed that (2) is not something trivial
>>> and actually very measurable.  In my test case, reading 1G chunk of memory
>>> - jumping in page size intervals - could take 99ms just because of the
>>> extra setting on the young bit on a generic x86_64 system, comparing to 4ms
>>> if young set.
>>>
>>> This issue is originally reported by Andrea Arcangeli.
>>>
>>> Solution
>>> ========
>>>
>>> To solve this problem, this patchset tries to remember the young bit in the
>>> migration entries and carry it over when recovering the ptes.
>>>
>>> We have the chance to do so because in many systems the swap offset is not
>>> really fully used.  Migration entries use swp offset to store PFN only,
>>> while the PFN is normally not as large as swp offset and normally smaller.
>>> It means we do have some free bits in swp offset that we can use to store
>>> things like young, and that's how this series tried to approach this
>>> problem.
>>>
>>> One tricky thing here is even though we're embedding the information into
>>> swap entry which seems to be a very generic data structure, the number of
>>> bits that are free is still arch dependent.  Not only because the size of
>>> swp_entry_t differs, but also due to the different layouts of swap ptes on
>>> different archs.
>>>
>>> Here, this series requires specific arch to define an extra macro called
>>> __ARCH_SWP_OFFSET_BITS represents the size of swp offset.  With this
>>> information, the swap logic can know whether there's extra bits to use,
>>> then it'll remember the young bits when possible.  By default, it'll keep
>>> the old behavior of keeping all migrated pages cold.
>>>
>>
>>
>> I played with a similar idea when working on pte_swp_exclusive() but
>> gave up, because it ended up looking too hacky. Looking at patch #2, I
>> get the same feeling again. Kind of hacky.
> 
> Could you explain what's the "hacky" part you mentioned?

SWP_PFN_OFFSET_FREE_BITS :)

It's a PFN offset and we're mangling in random other bits. That's hacky
IMHO.

I played with the idea of converting all code to store bits in addition
to the type + offset. But that requires digging through a lot of arch
code to teach that code about additional flags, so I discarded that idea
when working on the COW fixes.

> 
> I used swap entry to avoid per-arch operations. I failed to figure out a
> common way to know swp offset length myself so unluckily in this RFC I
> still needed one macro per-arch.  Ying's suggestion seems to be a good fit
> here to me to remove the last arch-specific dependency.

Instead of mangling this into the PFN offset and let the arch tell you
which bits of the PFN offset are unused ... rather remove the bits from
the offset and define them manually to have a certain meaning. That's
exactly how pte_swp_mkexclusive/pte_swp_exclusive/ is supposed to be
handled on architectures that want to support it.

I hope I could make it clearer what the hacky part is IMHO :)

> 
>>
>>
>> If we mostly only care about x86_64, and it's a performance improvement
>> after all, why not simply do it like
>> pte_swp_mkexclusive/pte_swp_exclusive/ ... and reuse a spare PTE bit?
> 
> Page migration works for most archs, I want to have it work for all archs
> that can easily benefit from it.

Yet we only care about x86-64 IIUC regarding performance, just the way
the dirty bit is handled?

> 
> Besides I actually have a question on the anon exclusive bit in the swap
> pte: since we have that anyway, why we need a specific migration type for
> anon exclusive pages?  Can it be simply read migration entries with anon
> exclusive bit set?

Not before all arch support pte_swp_mkexclusive/pte_swp_exclusive/.

As pte_swp_mkexclusive/pte_swp_exclusive/ only applies to actual swap
PTEs, you could even reuse that bit for migration entries and get at
alteast the most relevant 64bit architectures supported easily.

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2022-08-02 12:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29  1:40 Peter Xu
2022-07-29  1:40 ` [PATCH RFC 1/4] mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry Peter Xu
2022-08-01  3:13   ` Huang, Ying
2022-08-01 22:29     ` Peter Xu
2022-08-02  1:22       ` Huang, Ying
2022-07-29  1:40 ` [PATCH RFC 2/4] mm: Remember young bit for page migrations Peter Xu
2022-07-29  1:40 ` [PATCH RFC 3/4] mm/x86: Use SWP_TYPE_BITS in 3-level swap macros Peter Xu
2022-07-29  1:40 ` [PATCH RFC 4/4] mm/x86: Define __ARCH_SWP_OFFSET_BITS Peter Xu
2022-07-29 17:07 ` [PATCH RFC 0/4] mm: Remember young bit for migration entries Nadav Amit
2022-07-29 22:43   ` Peter Xu
2022-08-01  3:20 ` Huang, Ying
2022-08-01  5:33 ` Huang, Ying
2022-08-01 22:25   ` Peter Xu
2022-08-01  8:21 ` David Hildenbrand
2022-08-01 22:35   ` Peter Xu
2022-08-02 12:06     ` David Hildenbrand [this message]
2022-08-02 20:14       ` Peter Xu
2022-08-02 20:23         ` David Hildenbrand
2022-08-02 20:35           ` Peter Xu
2022-08-02 20:59             ` David Hildenbrand
2022-08-02 22:15               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49434bea-3862-1052-2993-8ccad985708b@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox