From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: David Hildenbrand <david@redhat.com>,
"Yin, Fengwei" <fengwei.yin@intel.com>,
Yu Zhao <yuzhao@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>,
<akpm@linux-foundation.org>, <willy@infradead.org>,
<vishal.moola@gmail.com>, <wangkefeng.wang@huawei.com>,
<minchan@kernel.org>, <shy828301@gmail.com>
Subject: Re: [PATCH 0/2] don't use mapcount() to check large folio sharing
Date: Fri, 4 Aug 2023 15:36:05 +0800 [thread overview]
Message-ID: <959095f6-8574-f5fc-812c-b0b9b9a3c101@intel.com> (raw)
In-Reply-To: <75996f6b-63fe-4878-c19d-bf35ee2ad20b@redhat.com>
Hi David,
On 8/4/2023 3:31 PM, David Hildenbrand wrote:
> On 04.08.23 02:17, Yin, Fengwei wrote:
>>
>>
>> On 8/4/2023 7:38 AM, Yu Zhao wrote:
>>> On Thu, Aug 3, 2023 at 5:27 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/4/2023 4:46 AM, Yu Zhao wrote:
>>>>> On Wed, Aug 2, 2023 at 6:56 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>
>>>>>> "
>>>>>>
>>>>>> On 8/2/2023 8:49 PM, Ryan Roberts wrote:
>>>>>>> On 02/08/2023 13:42, Yin, Fengwei wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/2/2023 8:40 PM, Ryan Roberts wrote:
>>>>>>>>> On 02/08/2023 13:35, Yin, Fengwei wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/2/2023 6:27 PM, Ryan Roberts wrote:
>>>>>>>>>>> On 28/07/2023 17:13, Yin Fengwei wrote:
>>>>>>>>>>>> In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
>>>>>>>>>>>> folio_mapcount() is used to check whether the folio is shared. But it's
>>>>>>>>>>>> not correct as folio_mapcount() returns total mapcount of large folio.
>>>>>>>>>>>>
>>>>>>>>>>>> Use folio_estimated_sharers() here as the estimated number is enough.
>>>>>>>>>>>>
>>>>>>>>>>>> Yin Fengwei (2):
>>>>>>>>>>>> madvise: don't use mapcount() against large folio for sharing check
>>>>>>>>>>>> madvise: don't use mapcount() against large folio for sharing check
>>>>>>>>>>>>
>>>>>>>>>>>> mm/huge_memory.c | 2 +-
>>>>>>>>>>>> mm/madvise.c | 6 +++---
>>>>>>>>>>>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As a set of fixes, I agree this is definitely an improvement, so:
>>>>>>>>>>>
>>>>>>>>>>> Reviewed-By: Ryan Roberts
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But I have a couple of comments around further improvements;
>>>>>>>>>>>
>>>>>>>>>>> Once we have the scheme that David is working on to be able to provide precise
>>>>>>>>>>> exclusive vs shared info, we will probably want to move to that. Although that
>>>>>>>>>>> scheme will need access to the mm_struct of a process known to be mapping the
>>>>>>>>>>> folio. We have that info, but its not passed to folio_estimated_sharers() so we
>>>>>>>>>>> can't just reimplement folio_estimated_sharers() - we will need to rework these
>>>>>>>>>>> call sites again.
>>>>>>>>>> Yes. This could be extra work. Maybe should delay till David's work is done.
>>>>>>>>>
>>>>>>>>> What you have is definitely an improvement over what was there before. And is
>>>>>>>>> probably the best we can do without David's scheme. So I wouldn't delay this.
>>>>>>>>> Just pointing out that we will be able to make it even better later on (if
>>>>>>>>> David's stuff goes in).
>>>>>>>> Yes. I agree that we should wait for David's work ready and do fix based on that.
>>>>>>>
>>>>>>> I was suggesting the opposite - not waiting. Then we can do separate improvement
>>>>>>> later.
>>>>>> Let's wait for David's work ready.
>>>>>
>>>>> Waiting is fine as long as we don't miss the next merge window -- we
>>>>> don't want these two bugs to get into another release. Also I think we
>>>>> should cc stable, since as David mentioned, they have been causing
>>>>> selftest failures.
>>>>
>>>> Stable was CCed.
>>>
>>> Need to add the "Cc: stable@vger.kernel.org" tag:
>>> Documentation/process/stable-kernel-rules.rst
>> OK. Thanks for clarification. I totally mis-understanded this. :).
>>
>> I'd like to wait for answer from Andrew whether these patches are suitable
>> for stable (I suppose you think so) branch.
>
> Note that the COW test does not fail -- it skips -- but the behavir changed:
>
> $ ./cow
> # [INFO] detected THP size: 2048 KiB
> # [INFO] detected hugetlb page size: 2048 KiB
> # [INFO] detected hugetlb page size: 1048576 KiB
> # [INFO] huge zeropage is enabled
> TAP version 13
> 1..190
> # [INFO] Anonymous memory tests in private mappings
> # [RUN] Basic COW after fork() ... with base page
> ok 1 No leak from parent into child
> # [RUN] Basic COW after fork() ... with swapped out base page
> ok 2 No leak from parent into child
> # [RUN] Basic COW after fork() ... with THP
> ok 3 No leak from parent into child
> # [RUN] Basic COW after fork() ... with swapped-out THP
> ok 4 No leak from parent into child
> # [RUN] Basic COW after fork() ... with PTE-mapped THP
> ok 5 No leak from parent into child
> # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP
> ok 6 # SKIP MADV_PAGEOUT did not work, is swap enabled?
> # [RUN] Basic COW after fork() ... with single PTE of THP
> ok 7 No leak from parent into child
> # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP
> ok 8 No leak from parent into child
> # [RUN] Basic COW after fork() ... with partially mremap()'ed THP
> ok 9 No leak from parent into child
> # [RUN] Basic COW after fork() ... with partially shared THP
> ok 10 No leak from parent into child
> ...
>
> Observe how patch #6 skips because the MADV_PAGEOUT was not effective (which might have happened due to other reasons as well, thus no failure).
>
> The code that broke it is
>
> commit 07e8c82b5eff8ef34b74210eacb8d9c4a2886b82
> Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Date: Wed Dec 21 10:08:46 2022 -0800
>
> madvise: convert madvise_cold_or_pageout_pte_range() to use folios
> This change removes a number of calls to compound_head(), and saves
> 1729 bytes of kernel text.
> Link: https://lkml.kernel.org/r/20221221180848.20774-3-vishal.moola@gmail.com
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: SeongJae Park <sj@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
>
> Ever since v6.3.
>
> The simplest way to fix it would be to revert the page_mapcount() -> folio_mapcount(),
> conversion.
>
>
> Probably all that is information worth having in the patch description.
Thanks a lot for checking this. I will try this patchset to see whether
it can restore the behavior (I suppose so from your broken commit info
but want to confirm).
Regards
Yin, Fengwei
next prev parent reply other threads:[~2023-08-04 7:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-28 16:13 Yin Fengwei
2023-07-28 16:13 ` [PATCH 1/2] madvise: don't use mapcount() against large folio for sharing check Yin Fengwei
2023-07-28 16:13 ` [PATCH 2/2] " Yin Fengwei
2023-07-28 17:41 ` Andrew Morton
2023-07-29 13:53 ` Yin, Fengwei
2023-07-28 17:24 ` [PATCH 0/2] don't use mapcount() to check large folio sharing Andrew Morton
2023-08-02 12:39 ` Yin, Fengwei
2023-08-04 7:14 ` Yin, Fengwei
2023-08-07 16:43 ` Andrew Morton
2023-08-08 0:02 ` Yin, Fengwei
2023-08-02 10:27 ` Ryan Roberts
2023-08-02 10:48 ` David Hildenbrand
2023-08-02 11:20 ` Ryan Roberts
2023-08-02 11:36 ` David Hildenbrand
2023-08-02 11:51 ` Ryan Roberts
2023-08-02 11:52 ` David Hildenbrand
2023-08-02 12:35 ` Yin, Fengwei
2023-08-02 12:40 ` Ryan Roberts
2023-08-02 12:42 ` Yin, Fengwei
2023-08-02 12:49 ` Ryan Roberts
2023-08-02 12:55 ` Yin, Fengwei
2023-08-03 20:46 ` Yu Zhao
2023-08-03 23:27 ` Yin, Fengwei
2023-08-03 23:38 ` Yu Zhao
2023-08-04 0:17 ` Yin, Fengwei
2023-08-04 7:31 ` David Hildenbrand
2023-08-04 7:36 ` Yin, Fengwei [this message]
2023-08-04 8:11 ` Yin, Fengwei
2023-08-02 12:43 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=959095f6-8574-f5fc-812c-b0b9b9a3c101@intel.com \
--to=fengwei.yin@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=stable@vger.kernel.org \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox