From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAC3BD6EBF0 for ; Fri, 29 Nov 2024 09:07:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 295DD6B0083; Fri, 29 Nov 2024 04:07:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 245B76B0085; Fri, 29 Nov 2024 04:07:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10CF66B0088; Fri, 29 Nov 2024 04:07:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E0F656B0083 for ; Fri, 29 Nov 2024 04:07:20 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 984C5AED91 for ; Fri, 29 Nov 2024 09:07:20 +0000 (UTC) X-FDA: 82838553504.13.2BE8579 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf14.hostedemail.com (Postfix) with ESMTP id 1C7A7100006 for ; Fri, 29 Nov 2024 09:07:07 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732871233; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZyLWO5Ska6Mz4ettKSN9E/7atuHIT/ytiSCwRym6oG4=; b=T93O2y2vBruWplZIAtuVvXWbw9w0Adye68FiDD2dWYqRHgDuaAAZplNgzp1LPKqZ3358E7 RNRZzKgnOvdtODSgOQbqeUqh2VQyX2Ox91Ia+kiuatXaRSyvGlABqmGY8/1Qe0Ls46KlqZ EfTQr2FdUS5YmQioVBb9lcWD/fAy7u8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732871233; a=rsa-sha256; cv=none; b=QUH06sddruUjU05nPwC/NXi4YfmzN0VFMyiPvVtPDU7ASKOUCOM12rJBRdzekLIWzWOVT8 2B5yRWWDQGLziHCjJIo0DvErREnY90d6qj8cCcgKB4nTE8CvZCmWIf17oYNuOIEytLV01j 90RMC294HHEG7a5WLcZXt7J1HHmaQ3o= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Y06hD5Mmqz1T5r2; Fri, 29 Nov 2024 17:05:00 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id B399B1A016C; Fri, 29 Nov 2024 17:07:11 +0800 (CST) Received: from [10.173.127.72] (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 29 Nov 2024 17:07:10 +0800 Subject: Re: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf_stats To: "Tomohiro Misono (Fujitsu)" , 'Jiaqi Yan' CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Andrew Morton , Naoya Horiguchi References: <20241121045504.2233544-1-misono.tomohiro@fujitsu.com> <098640ac-f1c1-95b6-e367-a2673c3ceaae@huawei.com> From: Miaohe Lin Message-ID: Date: Fri, 29 Nov 2024 17:07:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemd200019.china.huawei.com (7.221.188.193) X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1C7A7100006 X-Stat-Signature: y1e5imwsi634yr437zdiymu6md444ejo X-Rspam-User: X-HE-Tag: 1732871227-870148 X-HE-Meta: U2FsdGVkX1+R6kZB+ccis1i9XH6Z6hnwhUHXVxSBKIy9K1vbnj4iYyFc/ze3fi8w8EePdC9smBXbec3lW2csA96gfovuqJFCJlAPtBeVZ2nVGHK1DIHhNFx1YQ05J3E/x/mp9YLXXQnW9o2G8h8XbXTe7tQH3Ik22lN2DUSFC8bEHxxECdJMQaJ8AFy/w3gZwmUof6H6DoW3PnjsCk9Ez8HV4/tP7ybdo512ELxTkh8CxsZKueEIc0mIUa6pTRa3APvmkeMHvKYFF2ltKZKkr4uAZj2kaUlgsm/wgdpscUAf4G2HG8E7Gk8Ruq4CIr672VGZqqG3uF/IBlfzdKQRljxfoO2t3IDOQttBOdMIThtmxmEc9/d5mZTzZSrH709bZiuabzBql0OC1NawbkIt7atznAV7bIeGc8vVJnQPEZSZocbeVU0TUisje/EtgMuyOlU9qJUJrzdVSGeZfE1yTnjIe6s3dNUuztuH+lYJ20wI8TCkCKpgd01KI1zd2W8osi6cfbS6YYN4hKLBMn7Erf+Mg/JgawE5hvbPhDj9lTOEm9q7TmfV1z9iB3vNtTs4KgAmh2CnJYIh7B48Eeo9a3c2PZNhRJvcdXT8MUh4+XUyB1DBpvEyy4gW81JeU2cl5XL5cpymOcdkCfLNn8J0CQYpybOTySP+qD6vf/h+Vpv4zlBD8iZyjRHzW0+NLXGl6SZI2QQ9M4b7E5Hii43yZ1KEaz7J9K+sHvTMf2BuokL1horL5KVl8HoG3QbCL0vZrCf6o+yWhKF+GpfMlU/p5gRrDqJITS9gr40Mi6bwPOpfDrNeKora4au1uzKkZuVMd+TcA5OLUB1FDUkOItZUwfCajagXbcfiZ49/MrCgJznZ102JLVxsXLFL3O2ZWm4OknIDrDnMMRCbp6BW6JoRbcMeEHwL1a7fMhuqB/QXyUBV0tG2RtuUZyG6EDqc0ikhA6G7wMcx4VMXBAU5IgU 8iy/2t2i rRFEz1PhzTE6pyv/qVXjBMowsH6IZ2XDOOIeXi8REHA0dPS5YMPu4vry7kv91VqHlclapm1HdAGjoB6ortppJ2CvE8tfbi1EyhcDUnR+PiwPAuIoJXfCxHEgWoa5KSoU7nokc8BDgvrgpYntRhxmyLmHKcTYsNGKMEAZE13TlehwXL9pvqXrfRXNzobSncS0A+/bXkcNWsqNdKAi4vzHjNKu+BXZYtzS6Hl9zUAxsYfJvy9gN0RFdRMrS7IHUShJCmrC+MwDBLoQLqDxp3NA4kE9uL7U4fuEL2DZgF24vImQ+g3mM/Aedzl8aAp01Jsgqm4dcWyzMnSLG986EMmAXu/KDMi1IyJ9Gk4y7kJ5h8Ln3ibU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/11/29 16:26, Tomohiro Misono (Fujitsu) wrote: >> On 2024/11/28 13:46, Tomohiro Misono (Fujitsu) wrote: >>>>>> On 2024/11/21 12:55, Tomohiro Misono wrote: >>>>>>> commit 44b8f8bf2438 ("mm: memory-failure: add memory failure stats >>>>>> >>>>>> Sorry for late, I've been swamped recently. >>>>> >>>>> Hi, >>>>> Thanks for your comments. >>>>> >>>>>> >>>>>>> to sysfs") introduces per NUMA memory error stats which show >>>>>>> breakdown of HardwareCorrupted of /proc/meminfo in >>>>>>> /sys/devices/system/node/nodeX/memory_failure. >>>>>> >>>>>> Thanks for your patch. >>>>>> >>>>>>> >>>>>>> However, HardwareCorrupted also counts soft-offline pages. So, add >>>>>>> soft-offline stats in mf_stats too to represent more accurate status. >>>>>> >>>>>> Adding soft-offline stats makes sense to me. >>>>> >>>>> Thanks for confirming. >>>> >>>> Agreed with Miaohe. >>>> >>>>> >>>>>> >>>>>>> >>>>>>> This updates total count as: >>>>>>> total = recovered + ignored + failed + delayed + soft_offline> >>>>>>> Test example: >>>>>>> 1) # grep HardwareCorrupted /proc/meminfo >>>>>>> HardwareCorrupted: 0 kB >>>>>>> 2) soft-offline 1 page by madvise(MADV_SOFT_OFFLINE) >>>>>>> 3) # grep HardwareCorrupted /proc/meminfo >>>>>>> HardwareCorrupted: 4 kB >>>>>>> # grep -r "" /sys/devices/system/node/node0/memory_failure >>>>>>> /sys/devices/system/node/node0/memory_failure/total:1 >>>>>>> /sys/devices/system/node/node0/memory_failure/soft_offline:1 >>>>>>> /sys/devices/system/node/node0/memory_failure/recovered:0 >>>>>>> /sys/devices/system/node/node0/memory_failure/ignored:0 >>>>>>> /sys/devices/system/node/node0/memory_failure/failed:0 >>>>>>> /sys/devices/system/node/node0/memory_failure/delayed:0 >>>>>>> >>>>>>> Signed-off-by: Tomohiro Misono >>>>>>> --- >>>>>>> Hello >>>>>>> >>>>>>> This is RFC because I'm not sure adding SOFT_OFFLINE in enum >>>>>>> mf_result is a right approach. Also, maybe is it better to move >>>>>>> update_per_node_mf_stats() into num_poisoned_pages_inc()? >>>>>>> >>>>>>> I omitted some cleanups and sysfs doc update in this version to >>>>>>> highlight changes. I'd appreciate any suggestions. >>>>>>> >>>>>>> Regards, >>>>>>> Tomohiro Misono >>>>>>> >>>>>>> include/linux/mm.h | 2 ++ >>>>>>> include/linux/mmzone.h | 4 +++- >>>>>>> mm/memory-failure.c | 9 +++++++++ >>>>>>> 3 files changed, 14 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>>>> index 5d6cd523c7c0..7f93f6883760 100644 >>>>>>> --- a/include/linux/mm.h >>>>>>> +++ b/include/linux/mm.h >>>>>>> @@ -3991,6 +3991,8 @@ enum mf_result { >>>>>>> MF_FAILED, /* Error: handling failed */ >>>>>>> MF_DELAYED, /* Will be handled later */ >>>>>>> MF_RECOVERED, /* Successfully recovered */ >>>>>>> + >>>>>>> + MF_RES_SOFT_OFFLINE, /* Soft-offline */ >>>>>> >>>>>> It might not be a good idea to add MF_RES_SOFT_OFFLINE here. 'mf_result' is used to record >>>>>> the result of memory failure handler. So it might be inappropriate to add MF_RES_SOFT_OFFLINE >> here. >>>>> >>>>> Understood. As I don't see other suitable place to put ENUM value, how about changing like below? >>>>> Or, do you prefer adding another ENUM type instead of this? >>>> >>>> I think SOFT_OFFLINE-ed is one of the results of successfully >>>> recovered, and the other one is HARD_OFFLINE-ed. So how about make a >>>> separate sub-ENUM for MF_RECOVERED? Something like: >>> >>> Thanks for the suggestion. >>> >>>> >>>> enum mf_recovered_result { >>>> MF_RECOVERED_SOFT_OFFLINE, >>>> MF_RECOVERED_HARD_OFFLINE, >>>> }; >>> >>> Ok. >>> >>>> >>>> And >>>> 1. total = recovered + ignored + failed + delayed >>>> 2. recovered = soft_offline + hard_offline >>> >>> Do you mean mf_stats now have 7 entries in sysfs? >>> (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + >> soft_offline) >>> Or 6 entries ? (in that case, hard_offline = recovered - soft_offline) >>> It might be simpler to understand for user if total is just the sum of other entries like this RFC, >>> but I'd like to know other opinions. >> >> Will it be better to have below items? >> " >> total >> ignored >> failed >> dalayed >> hard_offline >> soft_offline >> " >> >> though this will break the previous interface. >> Any thoughts? > > That would be great, but these files are under stable ABI and > I don't think we can change them, right? > > https://docs.kernel.org/admin-guide/abi-stable.html > Userspace programs are free to use these interfaces with no restrictions, and backward > compatibility for them will be guaranteed for at least 2 years. > Most interfaces (like syscalls) are expected to never change and always be available. Thanks for your information. So we need to propose a better solution. Looking forward to hearing more suggestions. Thanks. . > > Regards, > Tomohiro Misono >