From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33B53C27C6E for ; Fri, 14 Jun 2024 08:43:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 84BE66B0109; Fri, 14 Jun 2024 04:38:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 519516B0118; Fri, 14 Jun 2024 04:38:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 874F36B0107; Fri, 14 Jun 2024 04:37:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DBA976B0102 for ; Fri, 14 Jun 2024 04:35:23 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 72A4114057D for ; Fri, 14 Jun 2024 08:35:23 +0000 (UTC) X-FDA: 82228834926.04.D29056F Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) by imf29.hostedemail.com (Postfix) with ESMTP id 9F03A12000C for ; Fri, 14 Jun 2024 08:35:21 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O7Sty+bq; spf=pass (imf29.hostedemail.com: domain of refault0@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=refault0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718354120; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UBD23S6qCIO944R+FXPTl9o8IozbKBuB6Ed3Es8CaDg=; b=pXXci6uMT28reXO/EtJZrOLWA+jbWZhTvqg+mlPbIvZQ3GqTtE77AL80g4Ja81BHk9zVSm sOgpexsJ9TCNqZLh4TkEZ7brxG3lJk9oL0QeV1adnh/RnMIoqtuW4N2H+x55Q7tvROjNW9 9x7m61/Mb813SOg4gn8vChnr2fadBx8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O7Sty+bq; spf=pass (imf29.hostedemail.com: domain of refault0@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=refault0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718354120; a=rsa-sha256; cv=none; b=TeqBQoJrIkyML/b3AiACwyxugHE/Bd3PGkk2FlYyk8FW+s0l9C2wfefu4r+NCHIKEWR5dH aVbFKSeEPPdFGdacKBYObhCtMQoGUMMg9TsK9M3BXcQo56IjysVwk9H1hS8FQWLEnVYCpH UpuynDjfB/2dybv7InAajd/m0xgYX8A= Received: by mail-vk1-f171.google.com with SMTP id 71dfb90a1353d-4e4efbc70a8so73310e0c.0 for ; Fri, 14 Jun 2024 01:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718354120; x=1718958920; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UBD23S6qCIO944R+FXPTl9o8IozbKBuB6Ed3Es8CaDg=; b=O7Sty+bqhu9/CGXjCldGRwm94+I3RzJjvCxqSMABvbWKciKFAIt7u+wPq5/B+/RDT5 kGPnxjLTqjZ2gFEVGwcmwyfcHnGdFPIycBONhc3dvuBr6JNFkQLXFQP3Y0NE0GNkfDpr HZSiFTTojtYdx9SELTMECjC8rO5PPGph/xyud/Dak7RdKzd9cJ1qUcbayKt+esIZ9339 q/QQvU1hfRPFMLasf09PtpfgpRxBXaKxZhPzUUg6/XvX7ZyWUibRtwpYHBYqftLGf7lW uzhLha8UCqgGUnBvC5wPPnBdZeTSChx/AucAHJLGfaFxohHEnK5Rke+QsZE7Ttxupobj sGOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718354120; x=1718958920; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UBD23S6qCIO944R+FXPTl9o8IozbKBuB6Ed3Es8CaDg=; b=dJVOQKz/JxPjrP+aNW0NcCpcQjJcURrBtD8hPBKPl/jLJh+yJcgLfEGk+HMIfT62En OZ0amW7DLMOnvYXp8yNk7Vf59isjVNJCuu9ZNU3EfqcZcP0Ftz0eh6ggm3qLN0pF9C0w aC2y2PEUonot/KhTS7vyX4b1LbNz4s52o8LPDBu8vfEV1fJzFMCODs239ppRUDlX5eug EVtIbrUhsrEZD6cilwTAE4OOBaYCO49XtkrbsmJ6s4v9oW68yKvhnZMDTx2gY/YI2hjZ i4fxt3jFEukZWIYIrs5ljhA/pHh5v0yUdicxHJ/g/FJftLmQFN/HyPy+eDZ66OyKhv5L rjnQ== X-Forwarded-Encrypted: i=1; AJvYcCWgWQSsMIwcIblOCBRTG01+hX8X0bXxk7x3yYYtkuveAg/YhBiNAUPpOYyU4ocCbj93w9S758aU3SjBar5p8jd38+s= X-Gm-Message-State: AOJu0YxZ4KZ2QEr4MMv5GHu37kOswKuOgkSF//7ABuq0+cDbnd5e3aXL DN3JJdpImQUb4hEP2/4hg+1Rk6JJabdg3wOe60Qdk8V3SE09ROeAbda594EtdjuBR8wzhAkgkNo KE+kvwWdvjRjFPG1n0S7xL5Q5aS0= X-Google-Smtp-Source: AGHT+IHAmjFbkdS+fr6ZkZVORCPbzkeRgK12X1TuMm6vxt/E1AeV5iBgZCISuEWwsLGiP5eLuFDQttNR3nt1R6jEfL0= X-Received: by 2002:a05:6102:1610:b0:48d:b0a3:fe34 with SMTP id ada2fe7eead31-48db0a40226mr1249530137.2.1718354120499; Fri, 14 Jun 2024 01:35:20 -0700 (PDT) MIME-Version: 1.0 References: <20240611215544.2105970-1-jiaqiyan@google.com> <20240611215544.2105970-2-jiaqiyan@google.com> In-Reply-To: <20240611215544.2105970-2-jiaqiyan@google.com> From: Lance Yang Date: Fri, 14 Jun 2024 16:35:09 +0800 Message-ID: Subject: Re: [PATCH v2 1/3] mm/memory-failure: userspace controls soft-offlining pages To: Jiaqi Yan Cc: nao.horiguchi@gmail.com, linmiaohe@huawei.com, jane.chu@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, shuah@kernel.org, corbet@lwn.net, osalvador@suse.de, rientjes@google.com, duenwen@google.com, fvdl@google.com, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: aqde5qojnjd9y89dhrbr4num4hs6wo4o X-Rspam-User: X-Rspamd-Queue-Id: 9F03A12000C X-Rspamd-Server: rspam02 X-HE-Tag: 1718354121-897372 X-HE-Meta: U2FsdGVkX18Nbte9HkUFGycyHtI/vxVrvZT8oaKLBcq776QbRn8GhNUMMoyt3azE4XaZIRgVw8Kcit9qN5n3ttpbRmlh8y0TpO/t2JaOSIkU17ERhnAqVkkQXF9OAidVLsza7HJidPIkJoa8dlUvJ5g7H4cKkg1iaRYtq2Y+BkWf77cT1jwvnYFFO2F42h9jdyulRZ4tTT+VD3/rM8JtYrk3NB9t6ykSNn0S5P8X/gNUOaJJnah283b0dGCifc/qmIlofrXg1qfRJBqZGD11GvkIctuxqMbI3KEwhM0sqRDIuVilA+J+kngRVSiPlDHEXbLEcQvCcfOofAWoI5VmkzbYr1l7wFj6JO96SqGf64X4DYwUWU2ePb6fWLJ08A9x0sB64gOJJ8Fcr53jnu3cMR2MczASy76vwNc5hoWrzvygEOFjTqITzyzss6HYRTrb1zG9lESkKNTEGmDxkwkPFA5ripoa1OpnSJopFOjvvGyTK/CQ6VxtkCs0Dy3GYoBLxSYkDJdTRGcLrtMsqOtNP0WPw/0LKaiGFOSOjq5T7N7MOi8EMFcPxnPsj9Y5jdTL128YTuEiL+wIWKWxKZ1n71qmUnrNeHG8VfV+RJHXAbZKTt5DxylGkVoIqWQRUflQQV+feemAkS/GS7w/3CkTTrtRr8sSvCdse27wEz+97so19smtN//Xr7IPpC8ELnQPLX1e5pOqMS64Q9KvEhF+dd66cZjp0sSLyxZUM2suLxNviTmlcWr7KVw0gsvNjbcV8w7r0tzX6EbeawCbOcVzoyLgxNKrxKnnEEnjFbTO5Rc4SbAdKIOgy1HwWddaoMBT2u5FuEgCbdgQDzKuWIODB1jKw6/tEWbwubcP71yVdr8ojZMaqfXobqJzRGNNy+DSi9DNJ7IsAbgOsgxuv5X5yTp6X3bnfqV0dzeJ49t30yA9ENEGHAHpRMgVf2hUFhlm6sZyIjF2eUxCHRWhHT0 6yDpl1dC NxDOB9LMCxi2o6Zw4DPCjwl8sZMnBT/2ivrwV57BJS/K0wHtX9PJU9Buz98FQHnvylLzT5+GONE3gshwQKrhtD8inL9+LnhZ+ZNuNn9/I+5O8IL1XvRDu0kRZY5E7sKqqRGxstKBqpxsYzQC5fKjmEQgWqzaIPe5h0fD2VvjiFFba74LLOYQYcAsN4sIqXkoKyuBKcD2nr/uD50BCb4+KIeSB7tI+KjsvDpGLYrJjlauwdWxZvmhiyPeiPkyhpShVYIPxUAtpV50AmU8SneF1z8lHXGTXnhTTdvE7GHKDyUDWRn64w5SopNK/tEFQ+NAQ7Cj+VUXV+sXWPogoK1DlIqn8E5KIiKO2nuESGKxbPw/r9u9bdk/TS91/xDOtnL2/CvLNcpCjQ/ILaalEnxCia/B4dsOzQFyrdPX0wwpTyUVXkdmJCxvhydVtoMIg3IzDVOn/rlefz/fKC5ktY3vpb3JxMBDXdvoqVFy6dp7GBprRl99Tt2PvGUurRT1KJjQwHTPGUGqwjb67Tlm+0nqEfJcN/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jiaqi, On Wed, Jun 12, 2024 at 5:56=E2=80=AFAM Jiaqi Yan wro= te: > > Correctable memory errors are very common on servers with large > amount of memory, and are corrected by ECC. Soft offline is kernel's > additional recovery handling for memory pages having (excessive) > corrected memory errors. Impacted page is migrated to a healthy page > if inuse; the original page is discarded for any future use. > > The actual policy on whether (and when) to soft offline should be > maintained by userspace, especially in case of an 1G HugeTLB page. > Soft-offline dissolves the HugeTLB page, either in-use or free, into > chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. > If userspace has not acknowledged such behavior, it may be surprised > when later mmap hugepages MAP_FAILED due to lack of hugepages. > In case of a transparent hugepage, it will be split into 4K pages > as well; userspace will stop enjoying the transparent performance. > > In addition, discarding the entire 1G HugeTLB page only because of > corrected memory errors sounds very costly and kernel better not > doing under the hood. But today there are at least 2 such cases: > 1. GHES driver sees both GHES_SEV_CORRECTED and > CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER. > 2. RAS Correctable Errors Collector counts correctable errors per > PFN and when the counter for a PFN reaches threshold > In both cases, userspace has no control of the soft offline performed > by kernel's memory failure recovery. > > This commit gives userspace the control of softofflining any page: > kernel only soft offlines raw page / transparent hugepage / HugeTLB > hugepage if userspace has agreed to. The interface to userspace is a > new sysctl called enable_soft_offline under /proc/sys/vm. By default > enable_soft_line is 1 to preserve existing behavior in kernel. s/enable_soft_line/enable_soft_offline > > Signed-off-by: Jiaqi Yan > --- > mm/memory-failure.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index d3c830e817e3..23415fe03318 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -68,6 +68,8 @@ static int sysctl_memory_failure_early_kill __read_most= ly; > > static int sysctl_memory_failure_recovery __read_mostly =3D 1; > > +static int sysctl_enable_soft_offline __read_mostly =3D 1; > + > atomic_long_t num_poisoned_pages __read_mostly =3D ATOMIC_LONG_INIT(0); > > static bool hw_memory_failure __read_mostly =3D false; > @@ -141,6 +143,15 @@ static struct ctl_table memory_failure_table[] =3D { > .extra1 =3D SYSCTL_ZERO, > .extra2 =3D SYSCTL_ONE, > }, > + { > + .procname =3D "enable_soft_offline", > + .data =3D &sysctl_enable_soft_offline, > + .maxlen =3D sizeof(sysctl_enable_soft_offline), > + .mode =3D 0644, > + .proc_handler =3D proc_dointvec_minmax, > + .extra1 =3D SYSCTL_ZERO, > + .extra2 =3D SYSCTL_ONE, > + } > }; > > /* > @@ -2771,6 +2782,11 @@ int soft_offline_page(unsigned long pfn, int flags= ) > bool try_again =3D true; > struct page *page; > > + if (!sysctl_enable_soft_offline) { > + pr_info("soft offline: %#lx: OS-wide disabled\n", pfn); > + return -EINVAL; IMO, "-EPERM" might sound better ;) Using "-EPERM" indicates that the operation is not permitted due to the OS-wide configuration. Thanks, Lance > + } > + > if (!pfn_valid(pfn)) { > WARN_ON_ONCE(flags & MF_COUNT_INCREASED); > return -ENXIO; > -- > 2.45.2.505.gda0bf45e8d-goog > >