From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B12C3600E for ; Thu, 27 Mar 2025 11:19:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70FB72800E2; Thu, 27 Mar 2025 07:19:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C17A2800DB; Thu, 27 Mar 2025 07:19:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5877B2800E2; Thu, 27 Mar 2025 07:19:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3B6702800DB for ; Thu, 27 Mar 2025 07:19:46 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D90D71A0A8D for ; Thu, 27 Mar 2025 11:19:46 +0000 (UTC) X-FDA: 83267085972.23.9D3E303 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf12.hostedemail.com (Postfix) with ESMTP id 4BA8440006 for ; Thu, 27 Mar 2025 11:19:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743074385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w75bTXij0gk+H9Y69XgJIlaCr9dtYzt70MOl0jrVbpI=; b=x9ISEoms5dGgoWpREa+MoHy4cJQeaOucaHzpYp97Guw6NolH4+Lk8I7dMU5VtsXmgz8dc3 abvPdojnaxt1Yetolg5L/y4kEz5uBOU93zZhdYNOUwxpNQ4QeML64u5+ssyM0ZMQlxY2o2 /25J8EBqGuaIetjObkmvDewog24Ir98= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743074385; a=rsa-sha256; cv=none; b=IupP9JURQykzab6mULGoILfS5Z/XWP7amYrbZacPLP+nJfQsi3ZxL5F7s84SzfPa5ocbpb im2wnUevOnisAHaBEf3ORkqECHiPuAOcBUQz6hdCPNNI1jTjFOA6KTJKzMGsvkd/FmmnMl JYuWf+uWaMm5/xD6sgBUubFcqtxFk8o= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZNgzk5LPnz2TS1V; Thu, 27 Mar 2025 19:14:58 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 6D675140109; Thu, 27 Mar 2025 19:19:39 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 27 Mar 2025 19:19:38 +0800 Message-ID: <38de9073-47fd-4e8c-9940-cdeb98703d4b@huawei.com> Date: Thu, 27 Mar 2025 19:19:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range To: David Hildenbrand , , , , CC: , , References: <20250324131750.1551884-1-tujinjiang@huawei.com> <899807c3-931f-43e6-bf3e-188787a4205a@redhat.com> <68ab727b-dc3d-327f-33b6-25bbfce8530e@huawei.com> From: Jinjiang Tu In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4BA8440006 X-Stat-Signature: twgkxkszb3bc4wat4f3u4p4587unj6if X-Rspam-User: X-HE-Tag: 1743074383-188718 X-HE-Meta: U2FsdGVkX180v0PkZXIr+1t1AjzHxrrucDxLgot8RPnH8VH0NS9Yo88GRrejqlC8rUJpGVC9YOB+0CEK7V1fV9J4FtHEzn2RfeLO48P4cKixfUkhKyUU6P47ohRROpqQXRNjaR6UAQWr/Ft18gZN92RX0olzIv5RkPFCCLleol8tCSKufuS6zBOuA+13WbpPCa8ICkCTU7z5WK7SMI/UD6HwTOos+a6DicxlhZ5DAHmxneJ+R3ivBA1caoQpkaLfvlwjkwaf9NJO12C5x4p9NBJYB23WzQuquANHsHWRUFPgj0fh9jyIkD5YuowZ1/aWdYGNVk3a3MNwT4rIApcwk4GESGU9LrlaSyN8vvm4ZMnDCe3qGUn9/Ltfj2oAoo80iF+1BtvV9LHrRNznzqC3V0ZI25iehPVJohUNJdyIYBP1w2pXDQw/+K4pcXNTGTCQlccVbCuWYByZ18FwCHgEMTk8KVoP3gGJusOwdUCY0vZKd2r9hcfWE0/qyvCXQWzZOfIkqwVzM3p+kfAZfcLmYH2w3EyWabQO1/0Q3LWehPzwU7HvXlwmjQUfV5HnuG4061LXPX7b+kpypnim3cGY5Zd+v6WWBkYlcmNexU0fd+ECJgcGp2ySU0XvPxdrR0152qpy+SmjBSPX2T3x7l1xc7ebAlh2doIZ0rT+AF1ZDs2ytP3wlmBaFFXb3wVIxG+HtBf/IZSr56TW1siryEmV7H20w5Q06s6o9ymq/mtMBAcvslcS2yShp/7YFHHmSaWqAe2wD+hJdk4yZGC+PF0ZlyasfFdQeB4YtONlIFgRij0wB58rskBOxM6c3BEtN7RsBCDUKpLYGM4vbnuYNacPWC6Nh3AAkKzm5xNA5+9hWrFqnlCxNIdx6lEbavkIcNinNNS6DXgjCSQwvlKjCBJB9LXVQKOGFjORziQ9mj9+jzz4cnnOEHH5u/ewdyd2nRi2PaFGctfwS2RLk+TqAiE YYfs52rT xxNk5RQNe1NUQp/6eVKQ1s1OZITgJEdfrx6apKB7YmUgtmgzuvTh3L4uzH29dyuUznsNQdrzUA6u6Miv11BthFxOqkJrJGPYGeHLKpEfO7aJwjD2juAvmUgD48IzZuaLpHgdqbTarymrkeHF+vsexYZHUpDJWcr+jU36pPTg36X9GcVFX+GenU1278JG32MrZVxd2KqHRUut+95Ml2Ka8SUVf2JzLJyhmqIb8WhBkA3Vv2PkFeGjdh3c943MB0/mdUp9oHQ5nGN/GNe4mZ89puOczHDkAPWDGFLgzRJmHrSqsBUyGt5TtNJ7E+937t6FEf4vr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/3/26 20:53, David Hildenbrand 写道: > On 26.03.25 03:40, Jinjiang Tu wrote: >> >> 在 2025/3/26 3:05, David Hildenbrand 写道: >>> On 25.03.25 04:02, Jinjiang Tu wrote: >>>> >>>> 在 2025/3/24 21:44, David Hildenbrand 写道: >>>>> On 24.03.25 14:17, Jinjiang Tu wrote: >>>>>> We triggered the below BUG: >>>>>> >>>>>>     page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 >>>>>> pfn:0x240402 >>>>>>     head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 >>>>>> pincount:0 >>>>>>     flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff) >>>>>>     page_type: f4(hugetlb) >>>>>>     page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1) >>>>>>     ------------[ cut here ]------------ >>>>>>     kernel BUG at ./include/linux/page-flags.h:310! >>>>>>     Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP >>>>>>     Modules linked in: >>>>>>     CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty >>>>>> #374 >>>>>>     Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >>>>>>     pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>>     pc : const_folio_flags+0x3c/0x58 >>>>>>     lr : const_folio_flags+0x3c/0x58 >>>>>>     Call trace: >>>>>>      const_folio_flags+0x3c/0x58 (P) >>>>>>      do_migrate_range+0x164/0x720 >>>>>>      offline_pages+0x63c/0x6fc >>>>>>      memory_subsys_offline+0x190/0x1f4 >>>>>>      device_offline+0xc0/0x13c >>>>>>      state_store+0x90/0xd8 >>>>>>      dev_attr_store+0x18/0x2c >>>>>>      sysfs_kf_write+0x44/0x54 >>>>>>      kernfs_fop_write_iter+0x120/0x1cc >>>>>>      vfs_write+0x240/0x378 >>>>>>      ksys_write+0x70/0x108 >>>>>>      __arm64_sys_write+0x1c/0x28 >>>>>>      invoke_syscall+0x48/0x10c >>>>>>      el0_svc_common.constprop.0+0x40/0xe0 >>>>>> >>>>>> When allocating a hugetlb folio, between the folio is taken from >>>>>> buddy >>>>>> and prep_compound_page() is called, start_isolate_page_range() and >>>>>> do_migrate_range() is called. When do_migrate_range() scans the head >>>>>> page >>>>>> of the hugetlb folio, the compound_head field isn't set, so scans >>>>>> the >>>>>> tail page next. And at this time, the compound_head field of tail >>>>>> page is >>>>>> set, folio_test_large() is called by tail page, thus triggers >>>>>> VM_BUG_ON(). >>>>>> >>>>>> To fix it, get folio refcount before calling folio_test_large(). >>>>>> >>>>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports >>>>>> thp migration") >>>>>> Signed-off-by: Jinjiang Tu >>>>>> --- >>>>>>     mm/memory_hotplug.c | 12 +++--------- >>>>>>     1 file changed, 3 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>>>> index 16cf9e17077e..f600c26ce5de 100644 >>>>>> --- a/mm/memory_hotplug.c >>>>>> +++ b/mm/memory_hotplug.c >>>>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long >>>>>> start_pfn, unsigned long end_pfn) >>>>>>             page = pfn_to_page(pfn); >>>>>>             folio = page_folio(page); >>>>>>     -        /* >>>>>> -         * No reference or lock is held on the folio, so it might >>>>>> -         * be modified concurrently (e.g. split).  As such, >>>>>> -         * folio_nr_pages() may read garbage.  This is fine as the >>>>>> outer >>>>>> -         * loop will revisit the split folio later. >>>>>> -         */ >>>>>> -        if (folio_test_large(folio)) >>>>>> -            pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>>>>> - >>>>>>             if (!folio_try_get(folio)) >>>>>>                 continue; >>>>>>               if (unlikely(page_folio(page) != folio)) >>>>>>                 goto put_folio; >>>>>>     +        if (folio_test_large(folio)) >>>>>> +            pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>>>> >>>>> Moving that will not make it able to skip the large frozen >>>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio >>>>> case >>>>> above. Hmmmm .. >>>> For free hugetlb, pfn is increased by 1 in each loop. This leads to >>>> skip >>>> free hugetlb slower. >>> >>> Yes. But now I realize that we have the same issue with free buddy >>> pages already (folio_try_get of each individual page :( ). >>> >>>>> >>>>> We could similarly to dumping folios, snapshot them, so we can read >>>>> stable data. >>>> extract the code in __dump_page()? But snapshot may lead to >>>> do_migrate_range() slower too. >>> >>> There is a patch series on the list to do that, but it might take a >>> while to clean that up. Ideally, we'd also jump over free buddy pages. >>> In the future we might have better ways to do that. >>> >>> I don't consider this change here really important, but if all it >>> affects is free hugetlb folios, it's not really worth it to have this >>> code around. >>> >>> Acked-by: David Hildenbrand >>> >>> But: I suspect 8135d8926c08 is not the introducing commit. Please >>> re-verify. >> >> commit 8135d8926c08 adds PageTransHuge() call, which may lead to >> VM_BUG_ON if the page is tail page. > > Right; we check for PageHuge before that. > >> Before it, for hugetlb, PageHuge()、compound_head() and >> compound_order() are called before getting >> page ref,  PageHuge()、compound_head() allows to pass a tail page, >> compound_order() will not trigger >> trigger VM_BUG_ON for tail page, and only lead to reading garbage >> data, leading to fail offline, but > > We will retry as documented, so offlining should not fail. > >> it will not trigger any VM_BUG_ON. So I think 8135d8926c08 is the >> introducing commit. > > Not for the hugetlb issue you describe I assume. That should be caused by > > commit b62b51d2d1593e6590669e781768c3c5a7394eb5 > Author: Kefeng Wang > Date:   Tue Aug 27 19:47:24 2024 +0800 > >     mm: memory_hotplug: remove head variable in do_migrate_range() > >     Patch series "mm: memory_hotplug: improve do_migrate_range()", v3. > > So probably we should have both commits as Fixes (one for the hugetlb > path, one for the THP path) > Yes, b62b51d2d159 ("mm: memory_hotplug: remove head variable in do_migrate_range()") introduced this for hugetlb. Thanks.