From: mawupeng <mawupeng1@huawei.com>
To: <hughd@google.com>
Cc: <mawupeng1@huawei.com>, <akpm@linux-foundation.org>,
<vbabka@suse.cz>, <linux-mm@kvack.org>
Subject: Re: [5.10/5.15 LTS] Question on mlock race between ksm and cow
Date: Wed, 18 Oct 2023 11:16:37 +0800 [thread overview]
Message-ID: <200c9c35-bc59-4b0e-98e7-be1d937f3bbf@huawei.com> (raw)
In-Reply-To: <ab26eb5b-8338-439d-8c65-48a59da3ca00@huawei.com>
This problem can be easily reproduced with syz in LTS 5.10/5.15.
Kindly ping.
On 2023/9/18 9:07, mawupeng wrote:
> Kindly ping.
>
> On 2023/8/16 11:52, mawupeng wrote:
>> Since page_remove_rmap in wp_page_copy only clear this mlocked page iff
>> page's mapcount is -1. which can be seen as follow.
>>
>> wp_page_copy
>> page_remove_rmap
>> if (!atomic_add_negative(-1, &page->_mapcount))
>> goto out;
>> clear_page_mlock(page); // clear mlocked flag
>>
>> During out test, we can test this mapcount before mlock the kpage, this
>> can close this race.
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 62feb478a367..347f4c0339c2 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1295,7 +1295,8 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
>> if (!PageMlocked(kpage)) {
>> unlock_page(page);
>> lock_page(kpage);
>> - mlock_vma_page(kpage);
>> + if (page_mapcount(kpage) > 0)
>> + mlock_vma_page(kpage);
>> page = kpage; /* for final unlock */
>> }
>> }
>>
>>
>> On 2023/8/15 15:07, mawupeng wrote:
>>> Our syzbot reports a warning on bad page state. The mlocked flag is not
>>> cleared during page free.
>>>
>>> During try_to_merge_one_page in ksm, kpage will be remlocked if vma
>>> contains flag VM_LOCKED, however this flag is just cleared in wp_page_copy.
>>> Since the mapcount of this kpage is -1, no one can remove its mlocked flag
>>> before free, this lead to the bad page report.
>>>
>>> Since mlock changes a lot in v5.18-rc1[1], the latest linux do not have
>>> this problem. The 5.10/5.15 LTS do have this issue.
>>>
>>> Here is the simplified calltrace:
>>> try_to_merge_one_page wp_page_copy
>>>
>>> try_to_merge_one_page
>>> // clear page mlocked during rmap removal
>>> replace_page
>>> page_remove_rmap
>>> if (unlikely(PageMlocked(page)))
>>> clear_page_mlock(compound_head(page));
>>>
>>> if ((vma->vm_flags & VM_LOCKED)
>>> lock_page(old_page);
>>> if (vma->vm_flags & VM_LOCKED)
>>> if (PageMlocked(old_page))
>>> munlock_vma_page(old_page);
>>> if (!PageMlocked(kpage))
>>> lock_page(kpage);
>>> mlock_vma_page(kpage);
>>> unlock_page(kpage);
>>> -------------------------------------------------
>>>
>>> This problem can be easily reproduced with the following modifies:
>>> 1. enable the following CONFIG
>>> a) CONFIG_DEBUG_VM
>>> b) CONFIG_KSM
>>> c) CONFIG_MEMORY_FAIALURE
>>>
>>> 2. add delay in try_to_merge_one_page
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index a5716fdec1aa..f9ee2ec615ac 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -1248,8 +1248,10 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
>>>
>>> if ((vma->vm_flags & VM_LOCKED) && kpage && !err) {
>>> munlock_vma_page(page);
>>> + mdelay(10);
>>> if (!PageMlocked(kpage)) {
>>> unlock_page(page);
>>> + mdelay(100);
>>> lock_page(kpage);
>>> mlock_vma_page(kpage);
>>> page = kpage; /* for final unlock */
>>>
>>> 3. run syzbot with the following content:
>>>
>>> madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc000, 0xc)
>>> mlockall(0x1)
>>> mlockall(0x5)
>>> madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc04c, 0x65)
>>>
>>> madvise(&(0x7f0000ff5000/0x4000)=nil, 0x4000, 0xc)
>>> mlockall(0x1)
>>> mlockall(0xa5)
>>> mlockall(0x0)
>>> munlock(&(0x7f0000ff7000/0x4000)=nil, 0x4000)
>>>
>>> -------------------------------------------------
>>> The detail bug report can be seen as follow:
>>>
>>> BUG: Bad page state in process rs:main Q:Reg pfn:11406a
>>> page:fffff7b004501a80 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20ff4 pfn:0x11406a
>>> flags: 0x30000000028000e(referenced|uptodate|dirty|swapbacked|mlocked|node=0|zone=3)
>>> raw: 030000000028000e fffff7b00456aec8 fffff7b011439908 0000000000000000
>>> Soft offlining pfn 0x455e8f at process virtual address 0x20ff6000
>>> raw: 0000000000020ff4 0000000000000000 00000000ffffffff 0000000000000000
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> Modules linked in:
>>> CPU: 1 PID: 239 Comm: rs:main Q:Reg Not tainted 5.15.126+ #580
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>>> Call Trace:
>>> <TASK>
>>> dump_stack_lvl+0x33/0x46
>>> bad_page+0x9e/0xe0
>>> free_pcp_prepare+0x14b/0x1f0
>>> free_unref_page_list+0x7c/0x210
>>> release_pages+0x2fe/0x3c0
>>> __pagevec_lru_add+0x21a/0x360
>>> lru_cache_add+0x80/0xe0
>>> add_to_page_cache_lru+0x71/0xd0
>>> pagecache_get_page+0x245/0x460
>>> grab_cache_page_write_begin+0x1a/0x40
>>> ext4_da_write_begin+0xb7/0x280
>>> generic_perform_write+0xb4/0x1e0
>>> ext4_buffered_write_iter+0x9c/0x140
>>> ext4_file_write_iter+0x5b/0x840
>>> ? do_futex+0x1af/0xb60
>>> ? check_preempt_curr+0x21/0x60
>>> ? ttwu_do_wakeup.isra.140+0xd/0xf0
>>> new_sync_write+0x117/0x1b0
>>> vfs_write+0x1ff/0x260
>>> ksys_write+0xa0/0xe0
>>> do_syscall_64+0x37/0x90
>>> entry_SYSCALL_64_after_hwframe+0x67/0xd1
>>> RIP: 0033:0x7fb815cef32f
>>> Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
>>> RSP: 002b:00007fb814b2b860 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
>>> RAX: ffffffffffffffda RBX: 00007fb808004f20 RCX: 00007fb815cef32f
>>> RDX: 000000000000006e RSI: 00007fb808004f20 RDI: 0000000000000007
>>> RBP: 00007fb808004c40 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000293 R12: 00007fb808009550
>>> R13: 000000000000006e R14: 0000000000000000 R15: 0000000000000000
>>> </TASK>
>>>
>>> [1]: https://lore.kernel.org/linux-mm/e7fbbdca-6590-7e45-3efd-279fba7f8376@suse.cz/T/#m0cb6e42b2a5ad634e1ec16e59f0f98f2e9382460
next prev parent reply other threads:[~2023-10-18 3:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-15 7:07 mawupeng
2023-08-16 3:52 ` mawupeng
2023-09-18 1:07 ` mawupeng
2023-10-18 3:16 ` mawupeng [this message]
2023-10-20 3:57 ` Hugh Dickins
2023-10-21 6:59 ` mawupeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200c9c35-bc59-4b0e-98e7-be1d937f3bbf@huawei.com \
--to=mawupeng1@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox