From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4524EC7EE23 for ; Tue, 30 May 2023 22:11:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88B566B0072; Tue, 30 May 2023 18:11:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83B666B0074; Tue, 30 May 2023 18:11:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 702BD900002; Tue, 30 May 2023 18:11:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 61E036B0072 for ; Tue, 30 May 2023 18:11:35 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1ACB780326 for ; Tue, 30 May 2023 22:11:35 +0000 (UTC) X-FDA: 80848318950.18.2787F5D Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf26.hostedemail.com (Postfix) with ESMTP id 062E714001E for ; Tue, 30 May 2023 22:11:32 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DHM113YG; spf=pass (imf26.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685484693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wMyu1FVqUNEQDR9F0H42OTgZNOy0SCNDuJykJ5gkrpw=; b=AgjO5KUvPOBEqlqfvPSihPEIBK+oA2AsgFAiDAcl5yNfaH5IKi1Dzb/Pp0Mu/TLT6MCkQL TWdHDFw+AyEdTn96Ap+gYHmktjxKG5M08TfQi9OidaznTgpBBzMBW4fjGeduBOlqDnEKK7 739bC2sRog8NBz7Z3PBAj6/FffFzP6I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685484693; a=rsa-sha256; cv=none; b=ibvMKB7yIYvA7uhrW+0CRUY3UAFTMo0qsv0tvlUfOkNmcRGrNHPt18krfULB9kv5g/IiS0 Vb/i3cbGainQPDHQhqhjzvAF6nwzCnsX7tcb6hF1PKtc1I5FugCHrO1kBD3W/59vn8pEWR 1fvVaKJ53M+0VyeHdnF1DEl3YutJjr4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DHM113YG; spf=pass (imf26.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-ba82059ef0bso4173964276.1 for ; Tue, 30 May 2023 15:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685484692; x=1688076692; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wMyu1FVqUNEQDR9F0H42OTgZNOy0SCNDuJykJ5gkrpw=; b=DHM113YGgnJkBHYPTwpg3M+67zJX5zcePaPNh6JBv9Wis/NHrpDFAx0ks3iuCxJJM9 wINi3i6AqBysE7LFcWZNehtYqKAuroIOwL8a8sd/Pi0a97YbTF7nzm1P4X6APqlcqyIf fgdKOvIthCN2qI3PC8nnJAGsUC9F8o24zrFn4Mx8E9fbkwmK8kyyJYs/s0lPuMtUWguz ZucZOJ+MFmJbUNoiUUNgppovuHCsdFcE/bKRob9Z5sqW+yx1ZriUb8o7XwInbyAtAe6l p7RDHaHEyNDIZpCug8G2OtGdyTPTsgojHKm3bQwt6whrQ2MHk7BXH6C5o4GygNQKDKwr Pl4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685484692; x=1688076692; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wMyu1FVqUNEQDR9F0H42OTgZNOy0SCNDuJykJ5gkrpw=; b=gNggAAGanygLc0edY7uvPyiKfuOIRtiIxnfZIZ5OBSF7nJ3PGexRX+a+cBQzTVWL9P 7tEbZRw7w7duizIVLGLsVkPVvyUPSTQKafX/71soR60QOkud1aWYKYwMz41FMHDuc8Ui trLNC2XkinmRXlMMEwiCyUoLwmuRwyo+PIlVGTsZwP49OKg1mitail9+qLtN9/8Rfm+E vzczS9BQhUafuJQof5eZO+W0fggJlJxKaDQ5L1pYkAt6yL5l3pnpNX49JbU/A6YraPHc s+xpI4dp2pRXfP7QVmc3KvXLTr9d5mvBycc6YWC7ZcTWS47yr6F6MLGdQL6u6FFiA9ll MFcg== X-Gm-Message-State: AC+VfDxXqf5ngO5UEU9Wq3I2JHOMkhUz9SpsZyO6n/xz9JxMF4ztq5A7 v8Qs6XHppCJ99xw2nX8IDDMhgchXMw70Ikcw0cE1yw== X-Google-Smtp-Source: ACHHUZ7eWU+HQ5hk1bgtx2HWJQi7K2VIGjDa8IpMYtcmvIvul4QjT3T+wmdQAAVVcuOsIMbR3YKer0v1LTQi6GSw2pc= X-Received: by 2002:a0d:d403:0:b0:565:ef11:1621 with SMTP id w3-20020a0dd403000000b00565ef111621mr3686413ywd.30.1685484691792; Tue, 30 May 2023 15:11:31 -0700 (PDT) MIME-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> <20230428004139.2899856-5-jiaqiyan@google.com> <20230530022456.GA1434147@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: From: Jiaqi Yan Date: Tue, 30 May 2023 15:11:20 -0700 Message-ID: Subject: Re: [RFC PATCH v1 4/7] mm/memory_failure: unmap raw HWPoison PTEs when possible To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: "mike.kravetz@oracle.com" , "peterx@redhat.com" , "songmuchun@bytedance.com" , "duenwen@google.com" , "axelrasmussen@google.com" , "jthoughton@google.com" , "rientjes@google.com" , "linmiaohe@huawei.com" , "shy828301@gmail.com" , "baolin.wang@linux.alibaba.com" , "wangkefeng.wang@huawei.com" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 062E714001E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ghwqnue6paqg8jebpq77143wje3f7krc X-HE-Tag: 1685484692-405817 X-HE-Meta: U2FsdGVkX19QI04xXGE/GYlVL1oM0H8obna9vKb0ugsUvzKKiMuCPPR4GwJcpfSV6hN05WKyfh+erBisTyfHCnVV0TZiB2mjo8vFlDLZbwxO787qhT7gk5akSsy+Ej1uTa5DThNPCYhAcUBGSyRVrYgJ3zSngDuTW4ebttXNgsAGklyRXNMFEWDoPxWZA2tfI41rGb0zGzx+TgD61QN//q8vgcrjeHGwXtc52hAVEdDQdCVexW+7Gf2s8z4E/gJo1FVYpZxhLXwlP8YAVKvaDKH6QKCMyrprnD1SxEHhIH2AnHrFIL+V912K36QYG4Y5ZrxsTtsUqAKAfPRZHgLn3qWQj8UZKCXE742/vqYFsgsPg73Fpdk2TgT4hoPcleqm4kjUqupIodzU8Kk0eFB6h4OHZPcJ3l0f0PpOUc5DmMaqhWOORI7vQnbN4KC4sUzhcoQqgmYMABK4f4kU7KSgCYHvPndLTRZSit/MRiQlZXpvsmkkReibWIjl/QLRCSEiukHTXeBVXgnrc0z6yj4Jc1xQAApi5kApimvnHRDy/AArLhewh9N3CnCFIjc0FwYFMVc0V7pv9tkiCqLxRdscoOytTyS7OTjEWSJOASEulNNwvDXwb3ybw89OsBhkcE+Il/XuiajDXZvon0S1XKW6NKrRcwhiRTR9uy+wETWw0b8sKeTf/3kmdkbCz3ZP1v+FxCmmji7fiUKt48feeh14cz+5rZsi2+jRrbczjcysIk/0n4sqmBab4pWU96vESH+X5Qtk6/fp3N1soDSK70eR0HZHKLpVKUwPlsGRyPg6KOCZnrR5GsJn3WES/N9YfTn1E/CZpGo39NF+t7hvMPy6JsADdThxr0uYVJYVj2k+Qkuy7BXVRxeMG6wxfTzlM6kHYhAdY238JeP96PB36BU41a4sr79Dl8ymAdHJMqFpB0ffo+hy3m+YgaFilzjdivZeRd+TYCq8pETEVF8f/sM 7PDWa0jW 86DhsJpAl4Q2/lBcFgr+mLfBjmy2EW6xQ1GXZsozBy1JNA8g6jDBVV7y7yKn8yFwtZLVMO3Ue4pMxxRFriWzSqqxcyPcK68ehqBAJu4bwmpPzIcth5SLpa6K41d7qWt1WZkR0BnqC7Wl1wUtT+LVbaK8kIOA0je5bxjAiDgR/EGmojOV6/OYHthBP5Kq+RL+Iu18SVeTE/VoHtnSFJV7QTnp0H5Q1t3vK915wpw0ja9hADJSDzId47SnphCyxIU5W5Xskf4vfAmeAS+YuSMD3TD591iiFuvXowEZk/Ly5Mtt94kM0CZgR3BjwBH6o46LGKWV5n+ZS1ZyWjTzwSZShDbnxtfUTED9OfmHyt2vnPbZK6A/M/DuRpRLByPjc0t8LM4qIBryfvcLbeJN+T22026M1URLgucO2UWATohVi4V9BXRyMeH5DcjMS4ZUbzYDWrdoZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 2:31=E2=80=AFPM Jiaqi Yan wro= te: > > On Mon, May 29, 2023 at 7:25=E2=80=AFPM HORIGUCHI NAOYA(=E5=A0=80=E5=8F= =A3=E3=80=80=E7=9B=B4=E4=B9=9F) > wrote: > > > > On Fri, Apr 28, 2023 at 12:41:36AM +0000, Jiaqi Yan wrote: > > > When a folio's VMA is HGM eligible, try_to_unmap_one now only unmaps > > > the raw HWPOISON page (previously split and mapped at PTE size). > > > If HGM failed to be enabled on eligible VMA or splitting failed, > > > try_to_unmap_one fails. > > > > > > For VMS that is not HGM eligible, try_to_unmap_one still unmaps > > > the whole P*D. > > > > > > When only the raw HWPOISON subpage is unmapped but others keep mapped= , > > > the old way in memory_failure to check if unmapping successful doesn'= t > > > work. So introduce is_unmapping_successful() to cover both existing a= nd > > > new unmapping behavior. > > > > > > For the new unmapping behavior, store how many times a raw HWPOISON p= age > > > is expected to be unmapped, and how many times it is actually unmappe= d > > > in try_to_unmap_one(). A HWPOISON raw page is expected to be unmapped > > > from a VMA if splitting succeeded in try_to_split_huge_mapping(), so > > > unmap_success =3D (nr_expected_unamps =3D=3D nr_actual_unmaps). > > > > > > Old folio_set_hugetlb_hwpoison returns -EHWPOISON if a folio has any > > > raw HWPOISON subpage, and try_memory_failure_hugetlb won't attempt > > > recovery actions again because recovery used to be done on the entire > > > hugepage. With the new unmapping behavior, this doesn't hold. More > > > subpages in the hugepage can become corrupted, and needs to be recove= red > > > (i.e. unmapped) individually. New folio_set_hugetlb_hwpoison returns > > > 0 after adding a new raw subpage to raw_hwp_list. > > > > > > Unmapping raw HWPOISON page requires allocating raw_hwp_page > > > successfully in folio_set_hugetlb_hwpoison, so try_memory_failure_hug= etlb > > > now may fail due to OOM. > > > > > > Signed-off-by: Jiaqi Yan > > > --- > > ... > > > > > @@ -1827,6 +1879,31 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs); > > > > > > #ifdef CONFIG_HUGETLB_PAGE > > > > > > +/* > > > + * Given a HWPOISON @subpage as raw page, find its location in @foli= o's > > > + * _hugetlb_hwpoison. Return NULL if @subpage is not in the list. > > > + */ > > > +struct raw_hwp_page *find_in_raw_hwp_list(struct folio *folio, BTW, per our discussion here[1], this routine will probably reuse what comes out of the refactored routine. It should be safe for try_to_unmap_one to hold a raw_hwp_page returned from find_in_raw_hwp_list as long as raw_hwp_list is protected by mf_mutex. [1] https://lore.kernel.org/linux-mm/CACw3F53+Hg4CgFoPj3LLSiURzWfa2egWLO-= =3D12GzfhsNC3XTvQ@mail.gmail.com/T/#m9966de1007b80eb8bd2c2ce0a9db13624cd265= 2e > > > + struct page *subpage) > > > +{ > > > + struct llist_node *t, *tnode; > > > + struct llist_head *raw_hwp_head =3D raw_hwp_list_head(folio); > > > + struct raw_hwp_page *hwp_page =3D NULL; > > > + struct raw_hwp_page *p; > > > + > > > + VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage); > > > > I'm testing the series (on top of v6.2-rc4 + HGM v2 patchset) and found= the > > following error triggered by this VM_BUG_ON_PAGE(). The testcase is ju= st to > > inject hwpoison on an anonymous page (it's not hugetlb-related one). > > Thanks for reporting this problem, Naoya! > > My mistake, this assertion meant to be "if !PageHWPoison(subpage)", to > make sure the caller of find_in_raw_hwp_list is sure that subpage is > hw corrupted. > > > > > [ 790.610985] =3D=3D=3D> testcase 'mm/hwpoison/base/backend-anonymou= s_error-hard-offline_access-avoid.auto3' start > > [ 793.304927] page:000000006743177b refcount:1 mapcount:0 mapping:00= 00000000000000 index:0x700000000 pfn:0x14d739 > > [ 793.309322] memcg:ffff8a30c50b6000 > > [ 793.310934] anon flags: 0x57ffffe08a001d(locked|uptodate|dirty|lru= |mappedtodisk|swapbacked|hwpoison|node=3D1|zone=3D2|lastcpupid=3D0x1fffff) > > [ 793.316665] raw: 0057ffffe08a001d ffffe93cc5353c88 ffffe93cc5685fc= 8 ffff8a30c91878f1 > > [ 793.320211] raw: 0000000700000000 0000000000000000 00000001fffffff= f ffff8a30c50b6000 > > [ 793.323665] page dumped because: VM_BUG_ON_PAGE(PageHWPoison(subpa= ge)) > > [ 793.326764] ------------[ cut here ]------------ > > [ 793.329080] kernel BUG at mm/memory-failure.c:1894! > > [ 793.331895] invalid opcode: 0000 [#1] PREEMPT SMP PTI > > [ 793.334854] CPU: 4 PID: 2644 Comm: mceinj.sh Tainted: G = E N 6.2.0-rc4-v6.2-rc2-230529-1404+ #63 > > [ 793.340710] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),= BIOS 1.16.1-2.fc37 04/01/2014 > > [ 793.345875] RIP: 0010:hwpoison_user_mappings+0x654/0x780 > > [ 793.349066] Code: ef 89 de e8 6e bc f8 ff 48 8b 7c 24 20 48 83 c7 = 58 e8 10 bb d9 ff e9 5f fb ff ff 48 c7 c6 80 ce 4c b1 4c 89 ef e8 1c 38 f6 = ff <0f> 0b 48 c7 c6 7b c8 4c b1 4c 89 ef e8 0b 38 f6 ff 0f 0b 8b 45 58 > > [ 793.359732] RSP: 0018:ffffa3ff85ed3d28 EFLAGS: 00010296 > > [ 793.362367] RAX: 000000000000003a RBX: 0000000000000018 RCX: 00000= 00000000000 > > [ 793.365763] RDX: 0000000000000001 RSI: ffffffffb14ac451 RDI: 00000= 000ffffffff > > [ 793.368698] RBP: ffffe93cc535ce40 R08: 0000000000000000 R09: ffffa= 3ff85ed3ba0 > > [ 793.370837] R10: 0000000000000003 R11: ffffffffb1d3ed28 R12: 00000= 0000014d739 > > [ 793.372903] R13: ffffe93cc535ce40 R14: ffffe93cc535ce40 R15: ffffe= 93cc535ce40 > > [ 793.374931] FS: 00007f6ccc42a740(0000) GS:ffff8a31bbc00000(0000) = knlGS:0000000000000000 > > [ 793.377136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 793.378656] CR2: 0000561aad6474b2 CR3: 00000001492d4005 CR4: 00000= 00000170ee0 > > [ 793.380514] DR0: ffffffffb28ed7d0 DR1: ffffffffb28ed7d1 DR2: fffff= fffb28ed7d2 > > [ 793.382296] DR3: ffffffffb28ed7d3 DR6: 00000000ffff0ff0 DR7: 00000= 00000000600 > > [ 793.384028] Call Trace: > > [ 793.384655] > > [ 793.385210] ? __lru_add_drain_all+0x164/0x1f0 > > [ 793.386316] memory_failure+0x352/0xaa0 > > [ 793.387249] ? __pfx_bpf_lsm_capable+0x10/0x10 > > [ 793.388323] ? __pfx_security_capable+0x10/0x10 > > [ 793.389350] hard_offline_page_store+0x46/0x80 > > [ 793.390397] kernfs_fop_write_iter+0x11e/0x200 > > [ 793.391441] vfs_write+0x1e4/0x3a0 > > [ 793.392221] ksys_write+0x53/0xd0 > > [ 793.392976] do_syscall_64+0x3a/0x90 > > [ 793.393790] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > > > I'm wondering how this code path is called, one possible path is like t= his: > > > > hwpoison_user_mappings > > if PageHuge(hpage) && !PageAnon(hpage) > > try_to_split_huge_mapping() > > find_in_raw_hwp_list > > VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage) > > > > but this looks unlikely because the precheck "PageHuge(hpage) && !PageA= non(hpage)" is > > false for anonymous pages. > > > > Another possible code path is: > > > > hwpoison_user_mappings > > if PageHuge(hpage) && !PageAnon(hpage) > > ... > > else > > try_to_unmap > > rmap_walk > > rmap_walk_anon > > try_to_unmap_one > > if folio_test_hugetlb > > if hgm_eligible > > find_in_raw_hwp_list > > VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage) > > > > but this looks also unlikely because of checking folio_test_hugetlb and= hgm_eligible > > (I think both are false in this testcase.) > > Maybe I miss something (and I'll dig this more), but let me share the i= ssue. > > I bet it is in "is_unmapping_successful". So another problem with this > patch is, "is_unmapping_successful" should only calls > find_in_raw_hwp_list after it handles non hugetlb and non shared > mapping, i.e.: > > struct raw_hwp_page *hwp_page =3D NULL; > > if (!folio_test_hugetlb(folio) || > folio_test_anon(folio) || > !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING)) { > ... > } > > hwp_page =3D find_in_raw_hwp_list(folio, poisoned_page); > VM_BUG_ON_PAGE(!hwp_page, poisoned_page); > > I will make sure these two issues get fixed up in follow-up revisions. > > > > > Thanks, > > Naoya Horiguchi