From: Lance Yang <ioworker0@gmail.com>
To: "Zach O'Keefe" <zokeefe@google.com>,
Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>,
David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, songmuchun@bytedance.com,
peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] mm/khugepaged: skip copying lazyfree pages on collapse
Date: Tue, 20 Feb 2024 18:15:48 +0800 [thread overview]
Message-ID: <CAK1f24nhchLX98so=jmbdm4jF21FfvNbtxNaXxx059e_Da_uOg@mail.gmail.com> (raw)
In-Reply-To: <CAAa6QmRjqob=HQ1K4c+vP5iydM_VA-wd5NcoDLVuX=13NwedSQ@mail.gmail.com>
Hey Zach, Yang, Michal, and David,
Please accept my sincerest apologies for the delayed
response.
Thanks for the replies; it‘s been very helpful to me! I also
appreciate the valuable information you’ve shared!
I agree that it’s not a good idea to let khugepaged avoid
any pages marked with MADV_FREE.
Thanks again for your time!
Best,
Lance
On Tue, Feb 6, 2024 at 4:27 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Feb 5, 2024 at 11:43 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Mon, Feb 5, 2024 at 1:45 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 02-02-24 09:42:27, Yang Shi wrote:
> > > > But if the partial range is MADV_FREE, khugepaged won't skip them.
> > > > This is what your second test case does.
> > > >
> > > > Secondly, I think it depends on the semantics of MADV_FREE,
> > > > particularly how to treat the redirtied pages. TBH I'm always confused
> > > > by the semantics. For example, the page contained "abcd", then it was
> > > > MADV_FREE'ed, then it was written again with "1234" after "abcd". So
> > > > the user should expect to see "abcd1234" or "00001234".
> > >
> > > Correct. You cannot assume the content of the first page as it could
> > > have been reclaimed at any time.
> > >
> > > > I'm supposed it should be "abcd1234" since MADV_FREE pages are still
> > > > valid and available, if I'm wrong please feel free to correct me. If
> > > > so we should always copy MADV_FREE pages in khugepaged regardless of
> > > > whether it is redirtied or not otherwise it may incur data corruption.
> > > > If we don't copy, then the follow up redirty after collapse to the
> > > > hugepage may return "00001234", right?
> > >
> > > Right. As pointed above this is a valid outcome if the page has been
> > > dropped. User has means to tell that from /proc/vmstat though. Not in a
> > > great precision but I think it would be really surprising to not see any
> > > pglazyfreed yet the content is gone. I think it would be legit to call
> > > it a bug. One could argue the bug would be in the accounting rather than
> > > the khugepaged implementation because madvised pages could be dropped at
> > > any time. But I think it makes more sense to copy the existing content.
>
> +1. I agree that the content should be dropped iff pglazyfreed is
> incremented. Of course, we could do that here, but I don't think there
> is a good reason to, and we should just copy the contents.
>
> > Yeah, as long as khugepaged sees the MADV_FREE pages, it means they
> > have "NOT" been dropped yet. It may be dropped later if memory
> > pressure occurs, but anyway khugepaged wins the race and khugepaged
> > can't assume the pages will be dropped before they get redirtied. So
> > copying the content does make sense.
>
> Per Lance, I kinda get that this "undermines" MADV_FREE, insofar that,
> from the user's perspective, that memory which was intended as a
> buffer against OOM kill scenarios, is no longer there to reclaim trivially. I
> don't have a real world example where this is an issue, however. Also,
> not copying the contents doesn't change that fact.
>
> The proper alternative, if you want to make the "undermining"
> argument, is for khugepaged to stay away from hugepage regions with
> any MADV_FREE pages. I think it's fair to assume MADV_FREE'd memory is
> likely cold memory, and therefore not a good hugepage target anyways.
> However, it'd be unfortunate if there were a couple MADV_FREE pages in
> the middle of an otherwise hot / highly-utilized hugepage region that
> would prevent it from being pmd-mapped via khugepaged. But.. this is
> exactly-ish what you get when hugepage-ware system/runtime allocators
> split THPs to free up internal caches.
>
> Best,
> Zach
>
>
> > > --
> > > Michal Hocko
> > > SUSE Labs
prev parent reply other threads:[~2024-02-20 10:16 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-01 12:52 Lance Yang
2024-02-01 13:49 ` Lance Yang
2024-02-01 20:37 ` Yang Shi
2024-02-02 11:23 ` Lance Yang
2024-02-02 17:43 ` Yang Shi
2024-02-02 10:06 ` Michal Hocko
2024-02-02 11:18 ` Lance Yang
2024-02-02 12:27 ` Michal Hocko
2024-02-02 12:52 ` Lance Yang
2024-02-02 12:57 ` Michal Hocko
2024-02-02 13:46 ` Lance Yang
2024-02-02 14:20 ` Lance Yang
2024-02-02 14:42 ` Michal Hocko
2024-02-02 14:52 ` Lance Yang
2024-02-02 15:26 ` David Hildenbrand
2024-02-02 15:38 ` Michal Hocko
2024-02-02 17:42 ` Yang Shi
2024-02-03 4:17 ` Lance Yang
2024-02-05 19:41 ` Yang Shi
2024-02-05 9:45 ` Michal Hocko
2024-02-05 19:43 ` Yang Shi
2024-02-05 20:26 ` Zach O'Keefe
2024-02-20 10:15 ` Lance Yang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAK1f24nhchLX98so=jmbdm4jF21FfvNbtxNaXxx059e_Da_uOg@mail.gmail.com' \
--to=ioworker0@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox