From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40482C433F5 for ; Fri, 22 Apr 2022 17:59:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AADDB6B0073; Fri, 22 Apr 2022 13:59:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A5DF36B0074; Fri, 22 Apr 2022 13:59:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 926E66B0075; Fri, 22 Apr 2022 13:59:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 819BB6B0073 for ; Fri, 22 Apr 2022 13:59:00 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 53DE624D18 for ; Fri, 22 Apr 2022 17:59:00 +0000 (UTC) X-FDA: 79385276040.19.504C8C5 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 7EDAF120031 for ; Fri, 22 Apr 2022 17:58:58 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id y14so7907452pfe.10 for ; Fri, 22 Apr 2022 10:58:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FctQVSSiMEVqayPxiRekX/p4ls264v1dh2N8QDgjevk=; b=Q9inUwrjhVkfaw/OmYRTK1Qxopksk1ZuoSSNepDscT9GgYDrTXuV0e94+j/ipWxIDV Owf4qmXpjFvdycclABsMBm8DrK2hYu6KrWcxzauE8z5ynsQRi9eUJj+W1RsUtBEq3MIX SG9ZI6P2ucVR4lezV/c1OVpWOZOtCzCqK+J1lBmkC0FXEGaA+jikEuRIZdqz6qb3E3xu QAD5RiVgyF/DJDd3KHp90cqMfD64d58HeRfd5lCtBfw0bZBpZ6SbjdbvQQTml6FbWIer HYtsTI88OE+seC/m931VB7SrssmwT1yb76VwDF7fqE0qEVclJhwFDmLA+cvaaISnhOFc PqNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FctQVSSiMEVqayPxiRekX/p4ls264v1dh2N8QDgjevk=; b=MhvfLVAetrBM9OqPOaop5vQlo/xDqNORdMOB3idTCOlqVzlYYz4+tq1kInlBxmDY+C MSgh0CV0H47zB/MO6PgvzEK5RlCQlqhbLFAbd+LNEhZqFkwwQTGde9ETKg3SKBXq2/6p maP9hyRi8AOfxB5TkFFg66ICJ2dCzcGedpsAEQpiiGt16CHSIw9YZOK9Uesj73kerNRA Oh6faCjz5AduHC8n8CvFgWK1JUNMVZwwcjVkpAEcgaSYwH7p5xgMl5D9JdvwNYJVaqh0 NorIA1VlVnqY5LAEgF63K+jWgcaGN5NB0KjHyd2z5SAvi4JXHODvuqxdmNKT95U4yFgu BmgA== X-Gm-Message-State: AOAM530VWNLUbHePi/itE7WvlDGHMQXCFZ9fuHDX2Yt6udsxWUhPZmtz NozY5YkzSFKwmTNxBYRCD7j2cJFzllhyrZzi8iE= X-Google-Smtp-Source: ABdhPJxzhzoXZ5ycqKOYb8WtdekD6VaH6AiPT1Rwn9ZxjLBdkPIoKjLpnHh018CtZ2Uz5YQi7xaBW1oYTRUJ16d/Mgs= X-Received: by 2002:a62:e518:0:b0:4fa:9333:ddbd with SMTP id n24-20020a62e518000000b004fa9333ddbdmr6221833pff.11.1650650338920; Fri, 22 Apr 2022 10:58:58 -0700 (PDT) MIME-Version: 1.0 References: <497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com> <47144bd6-4de0-26e3-a993-6075714e7f33@redhat.com> <548416ab-1700-4e0e-5e96-e1f1432cd58b@linux.alibaba.com> <83e32a86-6e36-15ed-ef7d-5b00b8d7db59@linux.alibaba.com> In-Reply-To: <83e32a86-6e36-15ed-ef7d-5b00b8d7db59@linux.alibaba.com> From: Yang Shi Date: Fri, 22 Apr 2022 10:58:46 -0700 Message-ID: Subject: Re: [PATCH v2] mm/memory-failure.c: skip huge_zero_page in memory_failure() To: Yu Xu Cc: David Hildenbrand , Linux MM , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Andrew Morton , Miaohe Lin , Oscar Salvador Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7EDAF120031 X-Stat-Signature: 6i43n18ptwp39go8qw5oayjsiynz9ip6 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Q9inUwrj; spf=pass (imf29.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam09 X-HE-Tag: 1650650338-778368 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 22, 2022 at 10:37 AM Yu Xu wrote: > > On 4/23/22 12:03 AM, Yang Shi wrote: > > On Thu, Apr 21, 2022 at 8:27 PM Yu Xu wrote: > >> > >> On 4/22/22 1:53 AM, Yang Shi wrote: > >>> On Thu, Apr 21, 2022 at 2:03 AM David Hildenbrand wrote: > >>>> > >>>> On 21.04.22 01:38, Yang Shi wrote: > >>>>> On Tue, Apr 12, 2022 at 8:11 AM Xu Yu wrote: > >>>>>> > >>>>>> Kernel panic when injecting memory_failure for the global huge_zero_page, > >>>>>> when CONFIG_DEBUG_VM is enabled, as follows. > >>>>>> > >>>>>> [ 5.582720] Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 > >>>>>> [ 5.583786] page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 > >>>>>> [ 5.584900] head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 > >>>>>> [ 5.585796] flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) > >>>>>> [ 5.586712] raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 > >>>>>> [ 5.587640] raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 > >>>>>> [ 5.588565] page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) > >>>>>> [ 5.589398] ------------[ cut here ]------------ > >>>>>> [ 5.589952] kernel BUG at mm/huge_memory.c:2499! > >>>>>> [ 5.590516] invalid opcode: 0000 [#1] PREEMPT SMP PTI > >>>>>> [ 5.591120] CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 > >>>>>> [ 5.591904] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 > >>>>>> [ 5.592817] RIP: 0010:split_huge_page_to_list+0x66a/0x880 > >>>>>> [ 5.593469] Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b > >>>>>> [ 5.595806] RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 > >>>>>> [ 5.596434] RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 > >>>>>> [ 5.597322] RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff > >>>>>> [ 5.598162] RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff > >>>>>> [ 5.598999] R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 > >>>>>> [ 5.599849] R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 > >>>>>> [ 5.600693] FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 > >>>>>> [ 5.601640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> [ 5.602304] CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 > >>>>>> [ 5.603139] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> [ 5.603977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> [ 5.604806] Call Trace: > >>>>>> [ 5.605101] > >>>>>> [ 5.605357] ? __irq_work_queue_local+0x39/0x70 > >>>>>> [ 5.605904] try_to_split_thp_page+0x3a/0x130 > >>>>>> [ 5.606430] memory_failure+0x128/0x800 > >>>>>> [ 5.606888] madvise_inject_error.cold+0x8b/0xa1 > >>>>>> [ 5.607444] __x64_sys_madvise+0x54/0x60 > >>>>>> [ 5.607915] do_syscall_64+0x35/0x80 > >>>>>> [ 5.608347] entry_SYSCALL_64_after_hwframe+0x44/0xae > >>>>>> [ 5.608949] RIP: 0033:0x7fc3754f8bf9 > >>>>>> [ 5.609374] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 > >>>>>> [ 5.611554] RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c > >>>>>> [ 5.612441] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 > >>>>>> [ 5.613269] RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 > >>>>>> [ 5.614108] RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 > >>>>>> [ 5.614946] R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 > >>>>>> [ 5.615787] R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 > >>>>>> [ 5.616626] > >>>>>> > >>>>>> This makes huge_zero_page bail out explicitly before split in > >>>>>> memory_failure(), thus the panic above won't happen again. > >>>>> > >>>>> Skipping huge_zero_page in error injection is ok to me, but I'm > >>>>> actually wondering whether raising BUG is overkilling for splitting > >>>>> huge_zero_page or not. Returning -EBUSY should be totally fine. > >>>> > >>>> I tend to agree. Just failing with -EBUSY might be cleaner. Most > >>>> probably we want to catch any bogus code here that does something we > >>>> don't really expect -- splitting the huge zeropage makes 0 sense. > >>> > >>> Yeah, the huge zero page can't be met from normal paths other than > >>> memory failure, but memory failure is a valid caller. So I tend to > >>> replace the BUG to WARN + returning -EBUSY. If we don't care about the > >>> reason code in memory failure, we don't have to touch memory failure. > >> > >> To make sure I understand this correctly, do you mean the following? > > > > Yes. And you should be able to drop your old patch as well. > > Thanks! > > And I have one more question, if doing so, the huge_zero_page will be set > PG_has_hwpoisoned in memory_failure(), although PG_has_hwpoisoned is not > checked in set_huge_zero_page(). It works, but isn't it a bit strange? Thanks for noticing this. IMHO it seems like an existing bug. The anonymous page fault doesn't check if the page is poisoned or not since it typically gets a fresh allocated page and assumes the poisoned page (isolated successfully) can't be reallocated again. But huge zero page and base zero page are reused every time. So no matter what fix we pick, the issue is always there. > > To this point, as well as you mentioned that "the huge zero page can't be > met from normal paths other than memory failure", I prefer to skip > huge_zero_page in memory_failure(). > > How do you think? > > > > >> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index c468fee595ff..3bb464509518 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -2496,10 +2496,12 @@ int split_huge_page_to_list(struct page *page, struct > >> list_head *list) > >> int extra_pins, ret; > >> pgoff_t end; > >> > >> - VM_BUG_ON_PAGE(is_huge_zero_page(head), head); > >> VM_BUG_ON_PAGE(!PageLocked(head), head); > >> VM_BUG_ON_PAGE(!PageCompound(head), head); > >> > >> + if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) > >> + return -EBUSY; > >> + > >> if (PageWriteback(head)) > >> return -EBUSY; > >> > >> > >>> > >>>> > >>>> -- > >>>> Thanks, > >>>> > >>>> David / dhildenb > >>>> > >>>> > >> > >> -- > >> Thanks, > >> Yu > > -- > Thanks, > Yu