From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C31CCDB474 for ; Wed, 18 Oct 2023 03:16:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 247F88D011E; Tue, 17 Oct 2023 23:16:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F7C58D0016; Tue, 17 Oct 2023 23:16:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E7958D011E; Tue, 17 Oct 2023 23:16:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F2D128D0016 for ; Tue, 17 Oct 2023 23:16:46 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B6A4941149 for ; Wed, 18 Oct 2023 03:16:46 +0000 (UTC) X-FDA: 81357120012.03.A21F2CA Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf18.hostedemail.com (Postfix) with ESMTP id 3287E1C000A for ; Wed, 18 Oct 2023 03:16:41 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697599003; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rls+j8LlFim1Af9hFUdY92xV+WP7Bq3ReAMm900LgbU=; b=L1R07npsYTD3Ywq0TSbrkSjJ8MmwklgJpb51sn1qvcqy3JyQuWDdiUz141lnaTNhdGoWbo bnA8Q14qqMks60p8+pUKVIL1gv0WA6jSyOQUHXXPI/6DAKRm7sfZyIIdDW0TE4wI4QDtqt 69rsHfyLUCribGC39QXkqIbS3cRehtY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697599003; a=rsa-sha256; cv=none; b=whmxQDwsNSNGDxs5uWBXFgYL8rNIFWC1V9NCJhD/Ma4uWzfCjXZfKq13PyQRzMKN02cYYg 3+4po9pckGo34k5uYKIUUX/axMowTut6c/hh6OnC0IVQO2gZw+3+vGojbwUVosVvYchimF ANV8pDxDV7pTi4eu1dPcctw9g9/Vxvg= Received: from dggpemm500014.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4S9GCQ3rdszrTTq; Wed, 18 Oct 2023 11:13:54 +0800 (CST) Received: from [10.174.178.120] (10.174.178.120) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Wed, 18 Oct 2023 11:16:37 +0800 Message-ID: <200c9c35-bc59-4b0e-98e7-be1d937f3bbf@huawei.com> Date: Wed, 18 Oct 2023 11:16:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird CC: , , , Subject: Re: [5.10/5.15 LTS] Question on mlock race between ksm and cow Content-Language: en-US To: References: <4263470f-77f8-47e2-be03-e1f8d790999e@huawei.com> <26614f10-5003-44f8-82c1-7ea204eec612@huawei.com> From: mawupeng In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.120] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 3287E1C000A X-Rspam-User: X-Stat-Signature: m6k8k187kd6gimtqhj9p43hkjnhoukap X-Rspamd-Server: rspam01 X-HE-Tag: 1697599001-205960 X-HE-Meta: U2FsdGVkX1+GFEpaezuHQ/uI7Hl00+YetSHIXO6ppUoa/jaTFFqmfnORJEorSddEWeGM+yMfwsSIx++2VQ8ctd1GfFLTlAW/JfS6mBgn53Jr4XnTBilH25sg0s1P8ORk0oKk1megqBGzWPruKExbxWMo6pHESmeJeCWX78Hd0S7H/BbNFQOvQTTyqnuhVw+KQjuAek/xFLInU5dqQQpMsN9LIC5z3itaB6H6JFFnxm8HlyLoLVA5LxCvq2n3cVLn8l5uD8tOU2rKYeubBov6KNK0Gx7/o8BYa+Qkct3gpg1QyR9i/ubYKSW7cdcfAs1n3r2v8BW3oqWAdJoVoxDWh2x6OQ8PKpRGN9eEeZz5laMPoGdmGgjTqTrMhK1VupCdwCj2ynHSMLGgg+NWOwgTO5pBpymc9Y3Up6CoLDNCAdi4fsWdnxHt9jINSC7NnEBgmAd/UC6c+uXM7vdYE1hv9RdECvUVWbbE9FJ29hKf2B5YEAiO4NCCYV4g8aC63KojyDdJh16QLyR3DW65nzd6MUuBntO+vTLuXhLAMANhvenjePU4jSbAriSoTsJWbdw/i7/xGQF5Vactmfq1pgaK7DI3Qqi52zHxa+HB5aMvBU+hW49+6g1Okn6wiFv6YikLSe3/NOqduvOyFdUowJtTCElD9iibzSOa2gTcMGbOlnlFJgCjY5cB1yqLvQLiAWFXPSUujSagZftnbkl9OWmB6v/EF64mQy2cjnOkNl0RZXqiX2S/qU2/uksWi9DSVXYKhOYuASbapKtMw0cYEW0zYMZJgmS8HBhgtEwY2o1li2ukvBCYkp4+HM02dahpSpdEDnicMomdgcvtRlaRTYm+jBEPfgX0NBknUhbbAa3hJEKf+JMNNuyrbjj7m+U2G2nZw1X7p2ipgvlLYZ2nKY2+pN1yHu3hKvXNy1nSR5/SysOood3jBG887x+8u0ocCj8UhBxoyvUs7yxAm7FtWnC DjjSdEUT +SwQgJEHoRrUKkpguq5jPrS8e3jpU1ifz7aoLi7saR/eOKRLv///MUg/dE7OP6fzgn4g5dh5oZEIroyxmSw6fhyC24bK0pBYqpZJZCtGLVV0H9eSsXwXKdQi3h87xKY4HxUY0YJ6Mm53OKfU5nhgLr0/Yzqn8hpzO/M57mdU+/86ejHaNoDsnUk30iWWRQ9ViVj0dxr9pWkArPgyagYVxhs7S/4yl1/6lOnV77Z7vzyOyAmuXDnxieMKMAPccrOCH8ap5HhE6uqFJx9VesBPcsREXm/ujtPGTO70W4beJKT/Lg+x4zE2+sJ01ssf7XhO40lYS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This problem can be easily reproduced with syz in LTS 5.10/5.15. Kindly ping. On 2023/9/18 9:07, mawupeng wrote: > Kindly ping. > > On 2023/8/16 11:52, mawupeng wrote: >> Since page_remove_rmap in wp_page_copy only clear this mlocked page iff >> page's mapcount is -1. which can be seen as follow. >> >> wp_page_copy >> page_remove_rmap >> if (!atomic_add_negative(-1, &page->_mapcount)) >> goto out; >> clear_page_mlock(page); // clear mlocked flag >> >> During out test, we can test this mapcount before mlock the kpage, this >> can close this race. >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index 62feb478a367..347f4c0339c2 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -1295,7 +1295,8 @@ static int try_to_merge_one_page(struct vm_area_struct *vma, >> if (!PageMlocked(kpage)) { >> unlock_page(page); >> lock_page(kpage); >> - mlock_vma_page(kpage); >> + if (page_mapcount(kpage) > 0) >> + mlock_vma_page(kpage); >> page = kpage; /* for final unlock */ >> } >> } >> >> >> On 2023/8/15 15:07, mawupeng wrote: >>> Our syzbot reports a warning on bad page state. The mlocked flag is not >>> cleared during page free. >>> >>> During try_to_merge_one_page in ksm, kpage will be remlocked if vma >>> contains flag VM_LOCKED, however this flag is just cleared in wp_page_copy. >>> Since the mapcount of this kpage is -1, no one can remove its mlocked flag >>> before free, this lead to the bad page report. >>> >>> Since mlock changes a lot in v5.18-rc1[1], the latest linux do not have >>> this problem. The 5.10/5.15 LTS do have this issue. >>> >>> Here is the simplified calltrace: >>> try_to_merge_one_page wp_page_copy >>> >>> try_to_merge_one_page >>> // clear page mlocked during rmap removal >>> replace_page >>> page_remove_rmap >>> if (unlikely(PageMlocked(page))) >>> clear_page_mlock(compound_head(page)); >>> >>> if ((vma->vm_flags & VM_LOCKED) >>> lock_page(old_page); >>> if (vma->vm_flags & VM_LOCKED) >>> if (PageMlocked(old_page)) >>> munlock_vma_page(old_page); >>> if (!PageMlocked(kpage)) >>> lock_page(kpage); >>> mlock_vma_page(kpage); >>> unlock_page(kpage); >>> ------------------------------------------------- >>> >>> This problem can be easily reproduced with the following modifies: >>> 1. enable the following CONFIG >>> a) CONFIG_DEBUG_VM >>> b) CONFIG_KSM >>> c) CONFIG_MEMORY_FAIALURE >>> >>> 2. add delay in try_to_merge_one_page >>> diff --git a/mm/ksm.c b/mm/ksm.c >>> index a5716fdec1aa..f9ee2ec615ac 100644 >>> --- a/mm/ksm.c >>> +++ b/mm/ksm.c >>> @@ -1248,8 +1248,10 @@ static int try_to_merge_one_page(struct vm_area_struct *vma, >>> >>> if ((vma->vm_flags & VM_LOCKED) && kpage && !err) { >>> munlock_vma_page(page); >>> + mdelay(10); >>> if (!PageMlocked(kpage)) { >>> unlock_page(page); >>> + mdelay(100); >>> lock_page(kpage); >>> mlock_vma_page(kpage); >>> page = kpage; /* for final unlock */ >>> >>> 3. run syzbot with the following content: >>> >>> madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc000, 0xc) >>> mlockall(0x1) >>> mlockall(0x5) >>> madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc04c, 0x65) >>> >>> madvise(&(0x7f0000ff5000/0x4000)=nil, 0x4000, 0xc) >>> mlockall(0x1) >>> mlockall(0xa5) >>> mlockall(0x0) >>> munlock(&(0x7f0000ff7000/0x4000)=nil, 0x4000) >>> >>> ------------------------------------------------- >>> The detail bug report can be seen as follow: >>> >>> BUG: Bad page state in process rs:main Q:Reg pfn:11406a >>> page:fffff7b004501a80 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20ff4 pfn:0x11406a >>> flags: 0x30000000028000e(referenced|uptodate|dirty|swapbacked|mlocked|node=0|zone=3) >>> raw: 030000000028000e fffff7b00456aec8 fffff7b011439908 0000000000000000 >>> Soft offlining pfn 0x455e8f at process virtual address 0x20ff6000 >>> raw: 0000000000020ff4 0000000000000000 00000000ffffffff 0000000000000000 >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> Modules linked in: >>> CPU: 1 PID: 239 Comm: rs:main Q:Reg Not tainted 5.15.126+ #580 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 >>> Call Trace: >>> >>> dump_stack_lvl+0x33/0x46 >>> bad_page+0x9e/0xe0 >>> free_pcp_prepare+0x14b/0x1f0 >>> free_unref_page_list+0x7c/0x210 >>> release_pages+0x2fe/0x3c0 >>> __pagevec_lru_add+0x21a/0x360 >>> lru_cache_add+0x80/0xe0 >>> add_to_page_cache_lru+0x71/0xd0 >>> pagecache_get_page+0x245/0x460 >>> grab_cache_page_write_begin+0x1a/0x40 >>> ext4_da_write_begin+0xb7/0x280 >>> generic_perform_write+0xb4/0x1e0 >>> ext4_buffered_write_iter+0x9c/0x140 >>> ext4_file_write_iter+0x5b/0x840 >>> ? do_futex+0x1af/0xb60 >>> ? check_preempt_curr+0x21/0x60 >>> ? ttwu_do_wakeup.isra.140+0xd/0xf0 >>> new_sync_write+0x117/0x1b0 >>> vfs_write+0x1ff/0x260 >>> ksys_write+0xa0/0xe0 >>> do_syscall_64+0x37/0x90 >>> entry_SYSCALL_64_after_hwframe+0x67/0xd1 >>> RIP: 0033:0x7fb815cef32f >>> Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 >>> RSP: 002b:00007fb814b2b860 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 >>> RAX: ffffffffffffffda RBX: 00007fb808004f20 RCX: 00007fb815cef32f >>> RDX: 000000000000006e RSI: 00007fb808004f20 RDI: 0000000000000007 >>> RBP: 00007fb808004c40 R08: 0000000000000000 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000293 R12: 00007fb808009550 >>> R13: 000000000000006e R14: 0000000000000000 R15: 0000000000000000 >>> >>> >>> [1]: https://lore.kernel.org/linux-mm/e7fbbdca-6590-7e45-3efd-279fba7f8376@suse.cz/T/#m0cb6e42b2a5ad634e1ec16e59f0f98f2e9382460