From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C25FCC36005 for ; Wed, 26 Mar 2025 02:40:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 974D6280004; Tue, 25 Mar 2025 22:40:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92527280048; Tue, 25 Mar 2025 22:40:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C75B280004; Tue, 25 Mar 2025 22:40:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5BC5D280048 for ; Tue, 25 Mar 2025 22:40:25 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6CAECAC672 for ; Wed, 26 Mar 2025 02:40:26 +0000 (UTC) X-FDA: 83262148452.23.AEF9644 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf02.hostedemail.com (Postfix) with ESMTP id DC8538000D for ; Wed, 26 Mar 2025 02:40:23 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf02.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742956824; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oha3IOiCUIoKAOKT55tQGuOXVIP9lb+53c5uIyXkqg0=; b=MGSxsH1FaN+YZmUlsTIUaPGEIsRnCvm0NL3zzDos518WWAoQeakvA29HY8aqcnCr+/Ij6/ VD0tJu6mo7p7dw4zsOtqyJHQqgAvqqGPQ5k8wGTIhY0WJ4Eoi/wU5Xj00Yl7zMZfSwPhgR 2RaSD3xgbhtKdjFMHdWmhcLU9sGyq2k= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf02.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742956824; a=rsa-sha256; cv=none; b=vTiwB9+4cK1rV2hItFFDBfuTgJ41t+yaOZEY0LfcMBYmqeZRyH4xTvXCCcDujdtf3xzNtw lsl7hFi5FOfZFNWisop/BxSo5OlZWtZeKR3lIGB3pXP3Gon5m3qSD7Vtvy6399u51P+3nG 0Qz3Fne4wphAskHV83P4fswh/qkH1u0= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4ZMrZh6sVVztRWK; Wed, 26 Mar 2025 10:38:52 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 864DD18007F; Wed, 26 Mar 2025 10:40:19 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 26 Mar 2025 10:40:18 +0800 Message-ID: Date: Wed, 26 Mar 2025 10:40:17 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range To: David Hildenbrand , , , , CC: , , References: <20250324131750.1551884-1-tujinjiang@huawei.com> <899807c3-931f-43e6-bf3e-188787a4205a@redhat.com> <68ab727b-dc3d-327f-33b6-25bbfce8530e@huawei.com> From: Jinjiang Tu In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam01 X-Stat-Signature: p34w1gyiphjo8b5ouxzoc5mq8tpieynf X-Rspam-User: X-Rspamd-Queue-Id: DC8538000D X-HE-Tag: 1742956823-564836 X-HE-Meta: U2FsdGVkX1+DdDUGa8KVSVK/UM//Po9AKKWiBEAfis05ytTKWwxcK7OALr4iwmbjFF4Ejb1mO0+0FhvCjFssHjRxm3pYpePFGfe9/XB7lC0Pmi7C90YsUAd0mNz3SvLYt//uq9H58rbDfN9pwlY07AX7U8jrJo0hJTzO7HvWh8H6zrGO0YqMtZNF5+8I0LL3B8zUZPuukArWvhrQ3vpI1SDYFRtBkKsFe3RYJO6cDfc3zybJ3I+msmRqys+VE4gFdWDUdMGeKYvtQjOno5tCy1CNjBLdutgMJUv9Iss95AcWCgxLmtAM1w+6PRdPZRPbTBypJoxui8X4Wd1MRXaGB1pypSWi4mzcTmldMUIHfX+NwalLIwnC4L1FBw+NTTQjSIzIFZ9Twoi1Bn3Rv2QdFpyf1vgmrPU7QUOwoKng18ttzMpTzZqJvKkmmdWkCcABGuJbr4ocyqWcxiidKxvmbj3+aeHA/CYWVHXJAB694ZDvKYCIOYYcY5FTrhxwV1WIDYbsrnrat4LVnUVNVR01oi+4c+vk4hRhRepkt2o4bGPAl0Hne8FzXa3cICzWkelvOAEEQ8vXS0sZ2Di7tcTxK3OJGc5KRFOdT3xQV/sRwybM2aK8CnIesIWKpQaZOMa6u8qZae/SHYS0NLKf+oegPHLrLwvErZSR6xP1Th9Quf9VxYMzoWbXVjViXERnxdrcKM4kVtMWzPfWP2JvUItTZxIayc6nwd/OfFODU/ouqPScFe1j6Ue+0n6Yesgq4RyyZDvfEpnOF0y8SLqQ1RG1J4xyih326UJIHWEtbIb0yT6CIxnrOGrkinCbU3ktdgiUTEvH/+nTOefGatw9A0EdYR5gya38ozK/doC/YnavNaWWa3TamHMVOly1XwevaMhlr9AY6H144z/Y2KKfo0b199SiT7Hps6vXfQdmP01FCrFRLcPr3cksWXiwYE66oANL0doFMCGvhzuk3NjteEp i6KEkw7S sPOxr4DVn7B8n7oLBg7o7enpI2YJ2cw/64bVxbo1vm6K1olgz53RDx9ymCS8Kj810Ig4LfckpGrNlFStMifflxqCBQrXxDqjxtNcoDw24vCCSu2/+XIUIrJC2L5sO9+DiIMKVr7TKyknHN+v/gWNNOY0zi9pY+nSikedXrc4VFwt2UbxIt2Vgzd2eCGUytGd3SFhaQT1UBvOJJs4/Gm5Z6E9tE+ukUnZ3jRgUn/WU2+nA4BviBu1IUkcGyOk0Pb6fDEe46jQqIDtMO6rA+T3CX3ag2p6K9PNOmCaxtbSwsKnrnbBmc4dQu6rkY21j7IvDV2xQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/3/26 3:05, David Hildenbrand 写道: > On 25.03.25 04:02, Jinjiang Tu wrote: >> >> 在 2025/3/24 21:44, David Hildenbrand 写道: >>> On 24.03.25 14:17, Jinjiang Tu wrote: >>>> We triggered the below BUG: >>>> >>>>    page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 >>>> pfn:0x240402 >>>>    head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 >>>> pincount:0 >>>>    flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff) >>>>    page_type: f4(hugetlb) >>>>    page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1) >>>>    ------------[ cut here ]------------ >>>>    kernel BUG at ./include/linux/page-flags.h:310! >>>>    Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP >>>>    Modules linked in: >>>>    CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374 >>>>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >>>>    pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>    pc : const_folio_flags+0x3c/0x58 >>>>    lr : const_folio_flags+0x3c/0x58 >>>>    Call trace: >>>>     const_folio_flags+0x3c/0x58 (P) >>>>     do_migrate_range+0x164/0x720 >>>>     offline_pages+0x63c/0x6fc >>>>     memory_subsys_offline+0x190/0x1f4 >>>>     device_offline+0xc0/0x13c >>>>     state_store+0x90/0xd8 >>>>     dev_attr_store+0x18/0x2c >>>>     sysfs_kf_write+0x44/0x54 >>>>     kernfs_fop_write_iter+0x120/0x1cc >>>>     vfs_write+0x240/0x378 >>>>     ksys_write+0x70/0x108 >>>>     __arm64_sys_write+0x1c/0x28 >>>>     invoke_syscall+0x48/0x10c >>>>     el0_svc_common.constprop.0+0x40/0xe0 >>>> >>>> When allocating a hugetlb folio, between the folio is taken from buddy >>>> and prep_compound_page() is called, start_isolate_page_range() and >>>> do_migrate_range() is called. When do_migrate_range() scans the head >>>> page >>>> of the hugetlb folio, the compound_head field isn't set, so scans the >>>> tail page next. And at this time, the compound_head field of tail >>>> page is >>>> set, folio_test_large() is called by tail page, thus triggers >>>> VM_BUG_ON(). >>>> >>>> To fix it, get folio refcount before calling folio_test_large(). >>>> >>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports >>>> thp migration") >>>> Signed-off-by: Jinjiang Tu >>>> --- >>>>    mm/memory_hotplug.c | 12 +++--------- >>>>    1 file changed, 3 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>> index 16cf9e17077e..f600c26ce5de 100644 >>>> --- a/mm/memory_hotplug.c >>>> +++ b/mm/memory_hotplug.c >>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long >>>> start_pfn, unsigned long end_pfn) >>>>            page = pfn_to_page(pfn); >>>>            folio = page_folio(page); >>>>    -        /* >>>> -         * No reference or lock is held on the folio, so it might >>>> -         * be modified concurrently (e.g. split).  As such, >>>> -         * folio_nr_pages() may read garbage.  This is fine as the >>>> outer >>>> -         * loop will revisit the split folio later. >>>> -         */ >>>> -        if (folio_test_large(folio)) >>>> -            pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>>> - >>>>            if (!folio_try_get(folio)) >>>>                continue; >>>>              if (unlikely(page_folio(page) != folio)) >>>>                goto put_folio; >>>>    +        if (folio_test_large(folio)) >>>> +            pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>> >>> Moving that will not make it able to skip the large frozen >>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case >>> above. Hmmmm .. >> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip >> free hugetlb slower. > > Yes. But now I realize that we have the same issue with free buddy > pages already (folio_try_get of each individual page :( ). > >>> >>> We could similarly to dumping folios, snapshot them, so we can read >>> stable data. >> extract the code in __dump_page()? But snapshot may lead to >> do_migrate_range() slower too. > > There is a patch series on the list to do that, but it might take a > while to clean that up. Ideally, we'd also jump over free buddy pages. > In the future we might have better ways to do that. > > I don't consider this change here really important, but if all it > affects is free hugetlb folios, it's not really worth it to have this > code around. > > Acked-by: David Hildenbrand > > But: I suspect 8135d8926c08 is not the introducing commit. Please > re-verify. commit 8135d8926c08 adds PageTransHuge() call, which may lead to VM_BUG_ON if the page is tail page. Before it, for hugetlb, PageHuge()、compound_head() and compound_order() are called before getting page ref, PageHuge()、compound_head() allows to pass a tail page, compound_order() will not trigger trigger VM_BUG_ON for tail page, and only lead to reading garbage data, leading to fail offline, but it will not trigger any VM_BUG_ON. So I think 8135d8926c08 is the introducing commit. Thanks. > > We should not CC stable. >