From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 756BEC3DA61 for ; Mon, 29 Jul 2024 06:13:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 091F06B0093; Mon, 29 Jul 2024 02:13:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01A316B0099; Mon, 29 Jul 2024 02:13:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAFCC6B00A1; Mon, 29 Jul 2024 02:13:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B5B1F6B0093 for ; Mon, 29 Jul 2024 02:13:39 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3D0401C04C9 for ; Mon, 29 Jul 2024 06:13:39 +0000 (UTC) X-FDA: 82391773758.18.2E8565A Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf21.hostedemail.com (Postfix) with ESMTP id 46E0A1C0004 for ; Mon, 29 Jul 2024 06:13:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=SZti6rIH; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722233575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gt8S7NT/TrfoaEFacqUybne4Hiq3S5hwIbYpqxtHkDs=; b=RKKteRmnMpPBQKLqu512spMOzg/hN1GxJ0gBNsD3Suzsr4I1v3de2JqDi1Hm+fIOT7rjR+ 9y+Zie1F2frQGUHL4d2NKNfqbCEJgbUOxAIoXGwKLvL0K/1WsA8gZr3YekxCIgxmAJxppw DvuA2h7hV+AedSI4dElZJ9Io5ew/Ck4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=SZti6rIH; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722233575; a=rsa-sha256; cv=none; b=VtU0waewzaE88+wLvWkfHcMMEZhQCjolxOKpjItWrJWbki7BiCFnIv6E1Z3AgjcgXsvpy2 3UfhHMGvIiSEb0H/lC+/W5Lblj2176akVibBQX0evTmsgynM4ToyzrWsakNhP/e3Eg0nBe KWzcLMQk80Ov7fp/YSjddiXpGpxcpO4= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-52f025ab3a7so4739004e87.2 for ; Sun, 28 Jul 2024 23:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1722233615; x=1722838415; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gt8S7NT/TrfoaEFacqUybne4Hiq3S5hwIbYpqxtHkDs=; b=SZti6rIHQK7ilZllnkZrfJYf3K6jv82vwwg/9IsMoQg4i7AJcT6buDfuoUoAqC1Ov5 N5ecsOkLT5yNVAUCq5GRSZy47ULO/jXakstBPsE87ma53ZT08D7h0p1eZyHNPsbUpw9w xIoJmmu4JpAQwZs6+PcPNgmoEdeczZ+26m+kdvqFtW26TvbzQXyKQOUXK7zbhImtAWEV FTNEw0C+ux7/5zeY1DNsM21MrYpDoikveLG8KT25+hkJwISUjfc9SAS4BhJFb0RCWrEB qv9LZtR6MkOWVHsUTNkrVzWq32rwbP6nMCfMJFQ1z8gu0/Iv3JGVjjpfjL215PZhbMko Pajg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722233615; x=1722838415; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gt8S7NT/TrfoaEFacqUybne4Hiq3S5hwIbYpqxtHkDs=; b=uIKkSkmfYUZ12Z0Lszr/7RIJtl8FSBumyst+xsGUXMxb+PBK4YP0rkg8WRfRSSJMPC MmjbBcvTM+YVU+SBLi5XCrXzEKd17dZg1RstadYBingLd/hmhJDQ+kN63KoPYDshXD+Q wz4OgECuuUyKripbluusyIosTZWdPKkx44am7O0Gf4Gsl+oR1oRbJSR9r5j+MiJD4hRr rqlnupuju3GcrImFcfLxQp35p1VBrmEIqpVhHET2ASfpjhQhIheDBwV7VVIz5ENUcu9q mUpjxv90nE5y7jguIh39cVlH0Se46FVaqPDIkZxzIiR9aJXptXdMkXo4mT9qBixW8XxL Vxtg== X-Gm-Message-State: AOJu0YyV1lLaLFL2l26QHUhLyqkjPz4oMkjIqU2pZP3WriIcj1UTgass 6QuWZt+LiepqsoeYGrx1z3sioERsEiXmtUEQfJX4PNuCXJrG623+sM/v1iMmoPI= X-Google-Smtp-Source: AGHT+IFAwo3EiCjnUkYHAH2CNpPE9QYcSmm7YFNs/AIYDKbze6CS5ScJvoRSCHbKNV6gRIxxA/NsBg== X-Received: by 2002:a05:6512:32c7:b0:52e:943c:c61a with SMTP id 2adb3069b0e04-5309b2e0adbmr4950063e87.57.1722233615178; Sun, 28 Jul 2024 23:13:35 -0700 (PDT) Received: from localhost (109-81-83-231.rct.o2.cz. [109.81.83.231]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a7acadb8356sm460401266b.206.2024.07.28.23.13.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jul 2024 23:13:34 -0700 (PDT) Date: Mon, 29 Jul 2024 08:13:33 +0200 From: Michal Hocko To: "Zhijian Li (Fujitsu)" Cc: "linux-mm@kvack.org" , Andrew Morton , David Hildenbrand , Oscar Salvador , "linux-kernel@vger.kernel.org" , "Yasunori Gotou (Fujitsu)" Subject: Re: [PATCH RFC] mm: Avoid triggering oom-killer during memory hot-remove operations Message-ID: References: <20240726084456.1309928-1-lizhijian@fujitsu.com> <2ab277af-06ed-41a9-a2b4-91dd1ffce733@fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2ab277af-06ed-41a9-a2b4-91dd1ffce733@fujitsu.com> X-Stat-Signature: 8bwnxu17engxt5aaffxum8k6wckfszzw X-Rspam-User: X-Rspamd-Queue-Id: 46E0A1C0004 X-Rspamd-Server: rspam02 X-HE-Tag: 1722233617-219609 X-HE-Meta: U2FsdGVkX19TsEFPvRRSH18srrxNwlxDTnn2E2Bzwn/xm5zuM4oE8cU2hw0umTybO0jk5QmAAKrm5zCmIoMLB5yM9cs98MI20Xa3vi5zvjaJCJefstT3qWm5JuP5gyyFkthaodfhGACVwO8D3kmDPTxEMCnQbjcFbB1hHl6A0MhieAjm+xCsvk9P/B+cjiKTGOdIX9Qq+VnqwS53IiwGgmaNKK3MBxlQE7+AL/qeW60h/DgZMqgL/pcdoKoAf+AVESGID7RR/dzioxkj/T9xLtLodqZM0QITfZRhxZFr3YoLXaOZXKmQ3OCwXrZnzSKPKa47ld82JX0hGvrC9NhL83nBe9eMXaiEvjnlSigQb3ZVtGGnH/TiBJJkFqzPRLAWpvkDHdL2BNtpyXwybX2IrLAo5xXyc6+3PXP/4iC5EWlqSSEn7jlQZ59jPd7Mbjd84x+FwuK3weOMNH61Jm9kLJ6HjETaHdVeFdzEowIMGHkjMOeqQ/PkpFixRRPB+6JOhuALcDgYkzfKfEf3oUl49VH/4SHH5G9JVYzy6MeCyArotwQckV5Pr/8OgPboRLBxcQIgfOxbtYkgUxECqyjl8ensTpPDIMEyjIh1enChs2G5IwB2/WrxNYa99cbs1rKiddcc90Lg66wYl9qOT6WLODyX4nJvmzHlNOT2KVz3q0VuaCxWivcMXXMo0uOSXR+JmA9cjBpjy2RGf1+l8PKahqg9fofWTpqmxRPgYTAy9smAZib20aBc0D3fP4BNheRrKGgOYVKO+USsKevpOzU/tlZvIUGbFxPnfitkfw7X2FvIyZ4j4b8oq4RnI6q1mow2Q0SznDQJdMgrp5XMWT5KUCZ64jxuhT2IchP1Tn/NQ72NKsb7osmd+OHCB8M0Dn0XTYbOiFlR9pvun+nNKkyVKj6gmLwF2Cu4d53Q6sLp0oo7wsVi1x2N/RB+DIgH4GxiEEc4WOKbGK+WaQZQmW9 n8gIEWZE rd58ybDZm4ev/bfMtuefGcgbjcrppvE4YlgaG3teNGVFIBZ9xNIHG/IAiWqdp80dJbmYqNiY+6NV2mtGpmhFq/uIF6At/9+tTz+dnKAgxx4ERjk52ptdlUxgdzlaMsfcHUiN6KhZWpEiD1YopwkSdR7Bh0yrWhMLTrMPijOsV99Ol5lCcT95t+Kw9ucH71n9+MedQfezDYxorhAED/sR1GJ2HUwXs/ZzLv7aPaO85KlNegyR1wE99PzEcpihvxEWqDyITKA+D4sULUybQ6tKpHVDWMVZjStYsybhLDeCbhRtC511b6wHJBFZssxJZhB7NSDNvt+Ztve0ZB7c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000024, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 29-07-24 02:14:13, Zhijian Li (Fujitsu) wrote: > > > On 29/07/2024 08:37, Li Zhijian wrote: > > Michal, > > > > Sorry to the late reply. > > > > > > On 26/07/2024 17:17, Michal Hocko wrote: > >> On Fri 26-07-24 16:44:56, Li Zhijian wrote: > >>> When a process is bound to a node that is being hot-removed, any memory > >>> allocation attempts from that node should fail gracefully without > >>> triggering the OOM-killer. However, the current behavior can cause the > >>> oom-killer to be invoked, leading to the termination of processes on other > >>> nodes, even when there is sufficient memory available in the system. > >> > >> But you said they are bound to the node that is offlined. > >>> Prevent the oom-killer from being triggered by processes bound to a > >>> node undergoing hot-remove operations. Instead, the allocation attempts > >>> from the offlining node will simply fail, allowing the process to handle > >>> the failure appropriately without causing disruption to the system. > >> > >> NAK. > >> > >> Also it is not really clear why process of offlining should behave any > >> different from after the node is offlined. Could you describe an actual > >> problem you are facing with much more details please? > > > > We encountered that some processes(including some system critical services, for example sshd, rsyslogd, login) > > were killed during our memory hot-remove testing. Our test program are described previous mail[1] > > > > In short, we have 3 memory nodes, node0 and node1 are DRAM, while node2 is CXL volatile memory that is onlined > > to ZONE_MOVABLE. When we attempted to remove the node2, oom-killed was invoked to kill other processes > > (sshd, rsyslogd, login) even though there is enough memory on node0+node1. What are sizes of those nodes, how much memory does the testing program consumes and do you have oom report without the patch applied? -- Michal Hocko SUSE Labs