linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Minchan Kim <minchan@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] thp: use is_zero_pfn after pte_present check
Date: Mon, 12 Oct 2015 17:15:06 +0200	[thread overview]
Message-ID: <561BCE7A.1080403@suse.cz> (raw)
In-Reply-To: <20151012145746.GA11396@bbox>

On 10/12/2015 04:57 PM, Minchan Kim wrote:
> Hello,
>
> On Mon, Oct 12, 2015 at 01:13:20PM +0300, Kirill A. Shutemov wrote:
>> On Mon, Oct 12, 2015 at 10:54:16AM +0900, Minchan Kim wrote:
>>> Use is_zero_pfn on pteval only after pte_present check on pteval
>>> (It might be better idea to introduce is_zero_pte where checks
>>> pte_present first). Otherwise, it could work with swap or
>>> migration entry and if pte_pfn's result is equal to zero_pfn
>>> by chance, we lose user's data in __collapse_huge_page_copy.
>>> So if you're luck, the application is segfaulted and finally you
>>> could see below message when the application is exit.
>>>
>>> BUG: Bad rss-counter state mm:ffff88007f099300 idx:2 val:3
>>
>> Did you acctually steped on the bug?
>> If yes it's subject for stable@, I think.
>
> Yes, I did with my testing program which made heavy swap-in/out/
> swapoff with MADV_DONTNEED in a memcg.
> Actually, I marked this patch as -stable but removed it right before
> sending because my test program is artificial and didn't see any
> report about rss bad counting with MM_SWAPENTS in linux-mm(Of course,
> I might miss it).
> In addition, sometime I saw someone insists on "It's not a stable
> material if it's not a bug with real workload". I don't want to
> involve such non-technical stuff so waited someone nudges me to
> mark it as -stable and finally, you did. ;-)

I'd also think this should go -stable, and I haven't heard the "real 
workload" argument before.

> If other reviewers are not against, I will Cc -stable in next spin.
>
>>
>>> Signed-off-by: Minchan Kim <minchan@kernel.org>
>>> ---
>>>
>>> I found this bug with MADV_FREE hard test. Sometime, I saw
>>> "Bad rss-counter" message with MM_SWAPENTS but it's really
>>> rare, once a day if I was luck or once in five days if I was
>>> unlucky so I am doing test still and just pass a few days but
>>> I hope it will fix the issue.
>>>
>>>   mm/huge_memory.c | 12 +++++++++++-
>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 4b06b8db9df2..349590aa4533 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2665,15 +2665,25 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>>>   	for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
>>>   	     _pte++, _address += PAGE_SIZE) {
>>>   		pte_t pteval = *_pte;
>>> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>> +		if (pte_none(pteval)) {
>>
>> In -mm tree we have is_swap_pte() check before this point in
>> khugepaged_scan_pmd()
>
> Actually, I tested this patch with v4.2 kernel so it doesn't have
> the check.
> Now, I look through optimistic check for swapin readahead patch
> in current mmotm.
> It seems the check couldn't prevent this problem because it releases
> pte lock and anon_vma lock before being isolated the page in
> __collapse_huge_page_isolate so the page could be swapped out again.
>
>>
>> Also, what about similar pattern in __collapse_huge_page_isolate() and
>> __collapse_huge_page_copy()? Shouldn't they be fixed as well?
>
> I see what's wrong here.
> /me slaps self.
> The line I was about to change was in __collapse_huge_page_isolate
> but I changed khugepaged_scan_pmd by mistake at last modification
> since that part is almost same. :(
> Fortunately my testing kernel is doing right version.
> Here it goes.
>
>  From 2a2e4b247e132d823af30655dbc0b57738e9d6ee Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 12 Oct 2015 09:52:46 +0900
> Subject: [PATCH] thp: use is_zero_pfn only after pte_present check
>
> Use is_zero_pfn on pteval only after pte_present check on pteval
> (It might be better idea to introduce is_zero_pte where checks
> pte_present first). Otherwise, it could work with swap or
> migration entry and if pte_pfn's result is equal to zero_pfn
> by chance, we lose user's data in __collapse_huge_page_copy.
> So if you're luck, the application is segfaulted and finally you
> could see below message when the application is exit.
>
> BUG: Bad rss-counter state mm:ffff88007f099300 idx:2 val:3
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

So this patch should be stable 4.1+. Does it apply both in -next and 
4.3-rcX?

> ---
>   mm/huge_memory.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4b06b8db9df2..bbac913f96bc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2206,7 +2206,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>   	for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
>   	     _pte++, address += PAGE_SIZE) {
>   		pte_t pteval = *_pte;
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +		if (pte_none(pteval) || (pte_present(pteval) &&
> +				is_zero_pfn(pte_pfn(pteval)))) {
>   			if (!userfaultfd_armed(vma) &&
>   			    ++none_or_zero <= khugepaged_max_ptes_none)
>   				continue;
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-10-12 15:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12  1:54 Minchan Kim
2015-10-12 10:13 ` Kirill A. Shutemov
2015-10-12 14:57   ` Minchan Kim
2015-10-12 15:15     ` Vlastimil Babka [this message]
2015-10-12 20:20       ` Andrea Arcangeli
2015-10-12 15:27     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561BCE7A.1080403@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox