From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8014CCEDDBD for ; Fri, 11 Oct 2024 07:05:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1B426B0095; Fri, 11 Oct 2024 03:05:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCB3D6B0096; Fri, 11 Oct 2024 03:05:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B92B76B0098; Fri, 11 Oct 2024 03:05:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9B3906B0095 for ; Fri, 11 Oct 2024 03:05:00 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8443C120F06 for ; Fri, 11 Oct 2024 07:04:56 +0000 (UTC) X-FDA: 82660434318.28.09E01FC Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf26.hostedemail.com (Postfix) with ESMTP id 9BDDB14000E for ; Fri, 11 Oct 2024 07:04:55 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728630228; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tA5mOuTFo7aRPVYrefhpbeSaf2ggnfaBsZKj3ZYtoWU=; b=VSDR75t5cb06hn3LN+3BBEeqsMYtuXYJQBZesrC9Wchpq749G1smf4Z+2jG5J5s2F73ZWP U01v9Hk2l+2ptjPemhw90VO/FucTVbXniP4wVijH1so0hV9Tm50OCmZk/BUvuTMRtbZf4K sXHc3RJjZRaXHDYMs6GzQtqHdHVTno8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728630228; a=rsa-sha256; cv=none; b=0RD2HirrL4svFXHygxnq3A4C5zbp41Z7i6oV8DYgifNejZcuKpAj4rTGEUjEp60IiY0RXX v/tuWAxrjZ/KRTHMWCpvEErCNJsn8vcwZRADjanvTQ6axDkvrsJBn7zVHBDyD9Vl8cepdt KWxypVSlEvqXf706KT5kJhHEWJCczuM= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4XPyJw6KPsz2DdPH; Fri, 11 Oct 2024 15:03:44 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id 7C0E9180041; Fri, 11 Oct 2024 15:04:52 +0800 (CST) Received: from [10.173.127.72] (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 11 Oct 2024 15:04:51 +0800 Subject: Re: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy To: Jiaqi Yan , CC: , , , , , , , , , , , References: <20240924043924.3562257-1-jiaqiyan@google.com> <20240924043924.3562257-2-jiaqiyan@google.com> From: Miaohe Lin Message-ID: Date: Fri, 11 Oct 2024 15:04:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemd200019.china.huawei.com (7.221.188.193) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9BDDB14000E X-Stat-Signature: mfosfso8mhtjiat151o95ghz8oqo1cnq X-HE-Tag: 1728630295-227432 X-HE-Meta: U2FsdGVkX1+FPehIBly8KF9eSBGCF87d47RGrmsZV1oLyc6+uRo54aNQV7bekbnk/q2Pnlw/EURZY6WJ8EmVZRvftvX+UAksOFx3rLY4bPcWNCpcjTlLki1g2H4+uJcpg9l+a96BW37cMyT+pEhsoGjGKsOZGt+DVrRDZJkU99tOQ2xswgAWnDZ6cGJrWHUJK0DTcoz/bpow/4BRCnqk9F5UDY3OmCjFMAbvJwYIV+LVRu7R/UPUzLqtAQKBCiZmu+t5XpPkaz7LvcPxkCTCYQtq7HfJI6ZLJMzfiSTHP7OJlC+emI8RUbzriEOKD7StHwg/yDnNRwvteS0PUCTm1oCrzAgrz7p5muoCZm61oYlzPuzHX5EZdR65S1jaQ4rZgxS4HNx2n8IMRWJwcIDHkA58Qs0ejvEsymmjcOUSC8ji0Wpkk9AnSMheQXoR0p3gCIajQ6Q7qqX3m5WvYtr2YnZvydKgT2JkbxlqjnAk2KHQqgeCDUhbSPCoNQofpyuVOzJCmjTSbyNOL5z5gTomzumEEITq10vAMNEov7oyEtYF8naWfK/wx5w+gOkhGRRajU8zgRv54vRPoU7g3E+Sx5EP7iyE5oLAJTtp9pLo7f4j4P9KBJqd6Nkvy9KcnGzGtG7fVQijwpbuGUnL4vz8MYGnkSE+67MrhC51a3OOhFJyugCGP67Zbaj1mPPqo8+cKQW/SrJijx5fsp+8slIzf/kkY38CE1TLO5vIL0eULbnITGZKCPziVrqoOx2yQb++oZGe7fVFUABGfIYt1M5RGMTbMWxII71Rrm/Lfr79PtsutHYFidPaH2EYrnoCxnBuRmHfOBU0XAG8ouu5Z/UnLBmvJmn6j8jVHXtHEo7/jc3HeQJnfMaxHHq8XODJYMp/tMVMoD/wXyHu2BykZkliSZLuMXF1+BVCw/rYwTkGdYJDRkRMyq5ifylwlRgeLqazXLh9hOQt8QUtvg8OZ+O bMLsnOai 5y/VqXaFKYai7K04E9QC84hGD+NG+TZKlZ/8ykmL9Yh16sbvYz3LmSsJrFlkNzOGxnvMrTxtwl5N/nFkJfPQUZUtHKJwxr+p8u6P2Ur/4qJhtxWwdy2epV5XLJ3BDMqtMJvCq5spukjsdxoKW1XJtPP/vo2b2F08S/CdP6hIoQDjYbD7v2T4hl3hyuVUVwIZp+DFJW2nU2d4VOfm72trBdedj7ewMKozS9DFEvqfXJ4YbVrO/yovfVtkft+e8bClXTk7z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/4 7:51, Jiaqi Yan wrote: > Hi Jane, > > On Wed, Oct 2, 2024 at 4:50 PM wrote: >> >> Hi, >> >> On 9/23/2024 9:39 PM, Jiaqi Yan wrote: >>> >>> + /* >>> + * On ARM64, if APEI failed to claims SEA, (e.g. GHES driver doesn't >>> + * register to SEA notifications from firmware), memory_failure will >>> + * never be synchrounous to the error consumption thread. Notifying >>> + * it via SIGBUS synchrnously has to be done by either core kernel in >>> + * do_mem_abort, or KVM in kvm_handle_guest_abort. >>> + */ >>> + if (!sysctl_enable_hard_offline) { >>> + pr_info_once("%#lx: disabled by /proc/sys/vm/enable_hard_offline\n", pfn); >>> + kill_procs_now(p, pfn, flags, page_folio(p)); >>> + res = -EOPNOTSUPP; >>> + goto unlock_mutex; >>> + } >>> + >> >> I am curious why the SIGBUS is sent without setting PG_hwpoison in the >> page. In 0/2 there seems to be indication about threads coordinate >> with each other such that clean subpages in a poisoned hugetlb page >> continue to be accessible, and at some point, (or perhaps I misread), >> the poisoned page (sub- or huge-) will eventually be isolated, because, > > The code here is "global policy". The "per-VMA policy", proposed in > 0/2 but code not sent, should be able to support isolation + offline > at some point (all VMAs are gone and page becomes free). > >> it's unthinkable to let a poisoned page laying around and kernel treats >> it like a clean page ? But I'm not sure how do you plan to handle it >> without PG_hwpoison while hard_offline is disabled globally. > > It will become the responsibility of a control plan running in > userspace. For example, the control plan immediately prevents starting > of any new workload/VM, but chooses to wait until memory errors exceed > a certain threshold, or hold on to the hosts until all workloads/VMs > are migrated and then repair the machine. Not setting PG_hwpoison is > indeed a big difference and risk, so it needs to be carefully handled > by userspace. > Could you explain why PG_hwpoison cannot be set in this case? It seems a control plan running in userspace can work with PG_hwpoison set. PG_hwpoison makes sure hwpoisoned pages won't be re-used by kernel while the control plan prevent them from re-accessed from userspace. Or am I miss something? Thanks. .