linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Hugh Dickins <hughd@google.com>,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Sanan Hasanov <sanan.hasanov@knights.ucf.edu>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"contact@pgazz.com" <contact@pgazz.com>,
	"syzkaller@googlegroups.com" <syzkaller@googlegroups.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: kernel BUG in page_add_anon_rmap
Date: Mon, 30 Jan 2023 10:26:16 +0100	[thread overview]
Message-ID: <92076c0e-1eee-66a4-6342-202989c32955@redhat.com> (raw)
In-Reply-To: <67dfd817-073e-9abb-316f-689ba8193965@redhat.com>

On 30.01.23 10:03, David Hildenbrand wrote:
>>>>
>>>> I reproduced on next-20230127 (did not try upstream yet).
>>
>> Upstream's fine; on next-20230127 (with David's repro) it bisects to
>> 5ddaec50023e ("mm/mmap: remove __vma_adjust()").  I think I'd better
>> hand on to Liam, rather than delay you by puzzling over it further myself.
>>
> 
> Thanks for identifying the problematic commit! ...
> 
>>>>
>>>> I think two key things are that a) THP are set to "always" and b) we have a
>>>> NUMA setup [I assume].
>>>>
>>>> The relevant bits:
>>>>
>>>> [  439.886738] page:00000000c4de9000 refcount:513 mapcount:2
>>>> mapping:0000000000000000 index:0x20003 pfn:0x14ee03
>>>> [  439.893758] head:000000003d5b75a4 order:9 entire_mapcount:0
>>>> nr_pages_mapped:511 pincount:0
>>>> [  439.899611] memcg:ffff986dc4689000
>>>> [  439.902207] anon flags:
>>>> 0x17ffffc009003f(locked|referenced|uptodate|dirty|lru|active|head|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
>>>> [  439.910737] raw: 0017ffffc0020000 ffffe952c53b8001 ffffe952c53b80c8
>>>> dead000000000400
>>>> [  439.916268] raw: 0000000000000000 0000000000000000 0000000000000001
>>>> 0000000000000000
>>>> [  439.921773] head: 0017ffffc009003f ffffe952c538b108 ffff986de35a0010
>>>> ffff98714338a001
>>>> [  439.927360] head: 0000000000020000 0000000000000000 00000201ffffffff
>>>> ffff986dc4689000
>>>> [  439.932341] page dumped because: VM_BUG_ON_PAGE(!first && (flags & ((
>>>> rmap_t)((((1UL))) << (0)))))
>>>>
>>>>
>>>> Indeed, the mapcount of the subpage is 2 instead of 1. The subpage is only
>>>> mapped into a single
>>>> page table (no fork() or similar).
>>
>> Yes, that mapcount:2 is weird; and what's also weird is the index:0x20003:
>> what is remove_migration_pte(), in an mbind(0x20002000,...), doing with
>> index:0x20003?
> 
> I was assuming the whole folio would get migrated. As you raise below,
> it's all a bit unclear once THP get involved and dealing with mbind()
> and page migration.
> 
>>>>
>>>> I created this reduced reproducer that triggers 100%:
>>
>> Very helpful, thank you.
>>
>>>>
>>>>
>>>> #include <stdint.h>
>>>> #include <unistd.h>
>>>> #include <sys/mman.h>
>>>> #include <numaif.h>
>>>>
>>>> int main(void)
>>>> {
>>>> 	mmap((void*)0x20000000ul, 0x1000000ul, PROT_READ|PROT_WRITE|PROT_EXEC,
>>>> 	     MAP_ANONYMOUS|MAP_FIXED|MAP_PRIVATE, -1, 0ul);
>>>> 	madvise((void*)0x20000000ul, 0x1000000ul, MADV_HUGEPAGE);
>>>>
>>>> 	*(uint32_t*)0x20000080 = 0x80000;
>>>> 	mlock((void*)0x20001000ul, 0x2000ul);
>>>> 	mlock((void*)0x20000000ul, 0x3000ul);
>>
>> It's not an mlock() issue in particular: quickly established by
>> substituting madvise(,, MADV_NOHUGEPAGE) for those mlock() calls.
>> Looks like a vma splitting issue now.
> 
> Gah, should have tried something like that first before suspecting it's
> mlock related. :)
> 
>>
>>>> 	mbind((void*)0x20002000ul, 0x1000ul, MPOL_LOCAL, NULL, 0x7fful,
>>>> 	MPOL_MF_MOVE);
>>
>> I guess it will turn out not to be relevant to this particular syzbug,
>> but what do we expect an mbind() of just 0x1000 of a THP to do?
>>
>> It's a subject I've wrestled with unsuccessfully in the past: I found
>> myself arriving at one conclusion (split THP) in one place, and a contrary
>> conclusion (widen range) in another place, and never had time to work out
>> one unified answer.
> 
> I'm aware of a similar issue with long-term page pinning: we might want
> to pin a 4k portion of a THP, but will end up blocking the whole THP
> from getting migrated/swapped/split/freed/ ... until we unpin (ever?). I
> wrote a reproducer [1] a while ago to show how you can effectively steal
> most THP in the system using comparatively small memlock limit using
> io_uring ...
> 

Correction, my reproducer already triggers a compund page split to 
really only pin a 4k page, to then free the remaining 4k pages of the 
previous THP. As a single 4k page is allocated and pinned, we cannot get 
a THP at these physical memory locations until the page is unpinned.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2023-01-30  9:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 23:59 Sanan Hasanov
2023-01-26  0:13 ` Andrew Morton
2023-01-26 18:57 ` Matthew Wilcox
2023-01-26 19:00   ` Sanan Hasanov
2023-01-27 11:44   ` David Hildenbrand
2023-01-27 17:02     ` Hugh Dickins
2023-01-29  6:49       ` Hugh Dickins
2023-01-30  9:03         ` David Hildenbrand
2023-01-30  9:26           ` David Hildenbrand [this message]
2023-01-30 16:11         ` Matthew Wilcox
2023-01-31  1:16           ` Hillf Danton
2023-01-30 19:20         ` Yang Shi
2023-01-30 19:26         ` Liam R. Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92076c0e-1eee-66a4-6342-202989c32955@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=contact@pgazz.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sanan.hasanov@knights.ucf.edu \
    --cc=syzkaller@googlegroups.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox