From: David Hildenbrand <david@redhat.com>
To: Hugh Dickins <hughd@google.com>,
"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Sanan Hasanov <sanan.hasanov@knights.ucf.edu>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"contact@pgazz.com" <contact@pgazz.com>,
"syzkaller@googlegroups.com" <syzkaller@googlegroups.com>,
Huang Ying <ying.huang@intel.com>
Subject: Re: kernel BUG in page_add_anon_rmap
Date: Mon, 30 Jan 2023 10:26:16 +0100 [thread overview]
Message-ID: <92076c0e-1eee-66a4-6342-202989c32955@redhat.com> (raw)
In-Reply-To: <67dfd817-073e-9abb-316f-689ba8193965@redhat.com>
On 30.01.23 10:03, David Hildenbrand wrote:
>>>>
>>>> I reproduced on next-20230127 (did not try upstream yet).
>>
>> Upstream's fine; on next-20230127 (with David's repro) it bisects to
>> 5ddaec50023e ("mm/mmap: remove __vma_adjust()"). I think I'd better
>> hand on to Liam, rather than delay you by puzzling over it further myself.
>>
>
> Thanks for identifying the problematic commit! ...
>
>>>>
>>>> I think two key things are that a) THP are set to "always" and b) we have a
>>>> NUMA setup [I assume].
>>>>
>>>> The relevant bits:
>>>>
>>>> [ 439.886738] page:00000000c4de9000 refcount:513 mapcount:2
>>>> mapping:0000000000000000 index:0x20003 pfn:0x14ee03
>>>> [ 439.893758] head:000000003d5b75a4 order:9 entire_mapcount:0
>>>> nr_pages_mapped:511 pincount:0
>>>> [ 439.899611] memcg:ffff986dc4689000
>>>> [ 439.902207] anon flags:
>>>> 0x17ffffc009003f(locked|referenced|uptodate|dirty|lru|active|head|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
>>>> [ 439.910737] raw: 0017ffffc0020000 ffffe952c53b8001 ffffe952c53b80c8
>>>> dead000000000400
>>>> [ 439.916268] raw: 0000000000000000 0000000000000000 0000000000000001
>>>> 0000000000000000
>>>> [ 439.921773] head: 0017ffffc009003f ffffe952c538b108 ffff986de35a0010
>>>> ffff98714338a001
>>>> [ 439.927360] head: 0000000000020000 0000000000000000 00000201ffffffff
>>>> ffff986dc4689000
>>>> [ 439.932341] page dumped because: VM_BUG_ON_PAGE(!first && (flags & ((
>>>> rmap_t)((((1UL))) << (0)))))
>>>>
>>>>
>>>> Indeed, the mapcount of the subpage is 2 instead of 1. The subpage is only
>>>> mapped into a single
>>>> page table (no fork() or similar).
>>
>> Yes, that mapcount:2 is weird; and what's also weird is the index:0x20003:
>> what is remove_migration_pte(), in an mbind(0x20002000,...), doing with
>> index:0x20003?
>
> I was assuming the whole folio would get migrated. As you raise below,
> it's all a bit unclear once THP get involved and dealing with mbind()
> and page migration.
>
>>>>
>>>> I created this reduced reproducer that triggers 100%:
>>
>> Very helpful, thank you.
>>
>>>>
>>>>
>>>> #include <stdint.h>
>>>> #include <unistd.h>
>>>> #include <sys/mman.h>
>>>> #include <numaif.h>
>>>>
>>>> int main(void)
>>>> {
>>>> mmap((void*)0x20000000ul, 0x1000000ul, PROT_READ|PROT_WRITE|PROT_EXEC,
>>>> MAP_ANONYMOUS|MAP_FIXED|MAP_PRIVATE, -1, 0ul);
>>>> madvise((void*)0x20000000ul, 0x1000000ul, MADV_HUGEPAGE);
>>>>
>>>> *(uint32_t*)0x20000080 = 0x80000;
>>>> mlock((void*)0x20001000ul, 0x2000ul);
>>>> mlock((void*)0x20000000ul, 0x3000ul);
>>
>> It's not an mlock() issue in particular: quickly established by
>> substituting madvise(,, MADV_NOHUGEPAGE) for those mlock() calls.
>> Looks like a vma splitting issue now.
>
> Gah, should have tried something like that first before suspecting it's
> mlock related. :)
>
>>
>>>> mbind((void*)0x20002000ul, 0x1000ul, MPOL_LOCAL, NULL, 0x7fful,
>>>> MPOL_MF_MOVE);
>>
>> I guess it will turn out not to be relevant to this particular syzbug,
>> but what do we expect an mbind() of just 0x1000 of a THP to do?
>>
>> It's a subject I've wrestled with unsuccessfully in the past: I found
>> myself arriving at one conclusion (split THP) in one place, and a contrary
>> conclusion (widen range) in another place, and never had time to work out
>> one unified answer.
>
> I'm aware of a similar issue with long-term page pinning: we might want
> to pin a 4k portion of a THP, but will end up blocking the whole THP
> from getting migrated/swapped/split/freed/ ... until we unpin (ever?). I
> wrote a reproducer [1] a while ago to show how you can effectively steal
> most THP in the system using comparatively small memlock limit using
> io_uring ...
>
Correction, my reproducer already triggers a compund page split to
really only pin a 4k page, to then free the remaining 4k pages of the
previous THP. As a single 4k page is allocated and pinned, we cannot get
a THP at these physical memory locations until the page is unpinned.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2023-01-30 9:26 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-25 23:59 Sanan Hasanov
2023-01-26 0:13 ` Andrew Morton
2023-01-26 18:57 ` Matthew Wilcox
2023-01-26 19:00 ` Sanan Hasanov
2023-01-27 11:44 ` David Hildenbrand
2023-01-27 17:02 ` Hugh Dickins
2023-01-29 6:49 ` Hugh Dickins
2023-01-30 9:03 ` David Hildenbrand
2023-01-30 9:26 ` David Hildenbrand [this message]
2023-01-30 16:11 ` Matthew Wilcox
2023-01-31 1:16 ` Hillf Danton
2023-01-30 19:20 ` Yang Shi
2023-01-30 19:26 ` Liam R. Howlett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92076c0e-1eee-66a4-6342-202989c32955@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@Oracle.com \
--cc=akpm@linux-foundation.org \
--cc=contact@pgazz.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sanan.hasanov@knights.ucf.edu \
--cc=syzkaller@googlegroups.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox