From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B830DC3ABDA for ; Wed, 14 May 2025 07:15:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E502D6B00DB; Wed, 14 May 2025 03:15:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFFC56B00DC; Wed, 14 May 2025 03:15:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC7D66B00DD; Wed, 14 May 2025 03:15:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AD5666B00DB for ; Wed, 14 May 2025 03:15:58 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C48DDC025C for ; Wed, 14 May 2025 07:15:58 +0000 (UTC) X-FDA: 83440653996.07.E3408C4 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf23.hostedemail.com (Postfix) with ESMTP id 532E414000D for ; Wed, 14 May 2025 07:15:56 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bL36fq2v; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=My4ww7F0; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bL36fq2v; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=My4ww7F0; spf=pass (imf23.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747206956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bNNsM2otBIapJvDtDIEhDKuncaLsTnOAU18oNqRgHlA=; b=ybrnhFvEFUHkwaH63H0zrZBaCYCvAEa1cPcBiAobMXpL8Phl5+lLcGkQ/HuFwESedQ84fU 9ivbgShilZ4VTdUc6jfhZqikUN91MoKYPUZnotEQx7OokUnMvOJYFq6iSHh1obputuKBE0 NWl+APh9YfxsiMC+eDOP/r6oItItrjA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bL36fq2v; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=My4ww7F0; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bL36fq2v; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=My4ww7F0; spf=pass (imf23.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747206956; a=rsa-sha256; cv=none; b=Z+gi9pXvVLQfccAxXTv4qRJ2SpE0z/OtsterPxtaJjkY5nM9zVIKrLqlGvtPdZ/fBVhvEu gfskvVZoaiJ+MWJtxxz9dAoGn3sIzbZKrl5X9JTtQMsHKerRbopKT1rg4qR3hyhpvUgp28 qzyt7V0MkEIWibtZxt6kitn0VvMW2L4= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 401E32120B; Wed, 14 May 2025 07:15:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747206954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNNsM2otBIapJvDtDIEhDKuncaLsTnOAU18oNqRgHlA=; b=bL36fq2vXuzEmI7oJ/xJVpVbeBPJyRjH7zEuZk3359DIMUczBjOe4/s+tPVAVWfjpfV6Oi Sa1vy/cg+cgcECWg4XEy4/hROsTlGom+Edy9jqF2fNQmiwmkKCISsofY4/R3SNoxeLZGJS Bgsb4hStELSKkjNeYu/HxLIRqx5zuZM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747206954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNNsM2otBIapJvDtDIEhDKuncaLsTnOAU18oNqRgHlA=; b=My4ww7F0511KAVQDoI6uPihxd58h+jTWyB39zUCQQ65FW6OzoeLs9xj3QnTX758jmmLPjq 1cRRJnPhu7mwDzDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747206954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNNsM2otBIapJvDtDIEhDKuncaLsTnOAU18oNqRgHlA=; b=bL36fq2vXuzEmI7oJ/xJVpVbeBPJyRjH7zEuZk3359DIMUczBjOe4/s+tPVAVWfjpfV6Oi Sa1vy/cg+cgcECWg4XEy4/hROsTlGom+Edy9jqF2fNQmiwmkKCISsofY4/R3SNoxeLZGJS Bgsb4hStELSKkjNeYu/HxLIRqx5zuZM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747206954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNNsM2otBIapJvDtDIEhDKuncaLsTnOAU18oNqRgHlA=; b=My4ww7F0511KAVQDoI6uPihxd58h+jTWyB39zUCQQ65FW6OzoeLs9xj3QnTX758jmmLPjq 1cRRJnPhu7mwDzDg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 27AD6137E8; Wed, 14 May 2025 07:15:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id p82pBypDJGjGfAAAD6G6ig (envelope-from ); Wed, 14 May 2025 07:15:54 +0000 Message-ID: <1e3507ab-eee0-4812-9acc-33e3499299a1@suse.cz> Date: Wed, 14 May 2025 09:15:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/page_alloc.c: Avoid infinite retries caused by cpuset race To: Suren Baghdasaryan , Tianyang Zhang Cc: Harry Yoo , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan References: <20250416082405.20988-1-zhangtianyang@loongson.cn> Content-Language: en-US From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Action: no action X-Stat-Signature: 8m1dqarwaof5daiw9tdqrmu6af8fzg3o X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 532E414000D X-HE-Tag: 1747206956-788523 X-HE-Meta: U2FsdGVkX1+9nHAG+FNdIlyoGnRskoVY7icn9AGN9SObk4kKs9R0LILPDqvU8RIJf4+02x5xLFxPc9gT883NVjMfqa993XW6sfrV1nSQ/jlejEmm5bAlYvuxjaQKhe5CHxIySMBJlS7WiJLAkduH/YRbHpHo2Ssc42nRkd3zOOVrXtQBDSe/0hzBPzyG/DIBPJJdQwUGcQLVY5rVhlNQM0ebN4VxSO/IyAgVLkfemTtfa636G+RX2OLe72PYDAIKtg3dumZmah39t6z22pEho+3cwRTuKaie/smcBNHy24Fve3JT5rC3Ay9Etf/g22LoJ6U9vWvlEMmy1gRoxj/fqi9tsIyAp03aJzgqfgXKGcTDUYc6iEjRT06lMfGcrC3JyfmwXqcbGdoUUSq8v7ym5Ss6qqt6vs6YeHaHYXhvLKFR6TWUsQq6J1nVPOyuo9xkw2VgrcrxE+FToQO8sNHakiNOU82mNIVhPzBZrbUOhG+0cff+pTusW8+ZKvSOuuQs+mwGx7Ew1sSeuMgv77bLofIoOHxxYw2+Z2+bukqKLu8uhK3Z2dZaRcZLxj/rAEzjZ9zZkRm4fVrX+c70PVVoOhnoUlcBNNR0bQp5mHkkfKuQZjheENt/AwvUU+6up8E8DD4qO1/KVbbc8yX3HZq3yyRn9IPqmGGMX9zw+BnkxEgAYXJ2PJTJ821pgTbVCCPaN6dD5ZjOH15UnOuPerBIE8k5mNl1vFuFQrzJGjpLzRmde9L1ZPRq0gizUcHaEd16UQ+bAz8Iu8zZHME4w3Z8T/BYZObulicZEf3qocpwGyt36nQrDQg1KvyqP0GA1/UOCheDWuO/t5a4XXHNG7NQYsGbmu7JZLA2VUzTf+szFuE/ikecQWggaVcWCHGOCyC+/ORF1HTrwRgXmsbfbsS1nrGoVFOcmTRnGk/l81auFTHIaDYSr23BnjvtZm0pSp0oRDClyPAg9VlNfKPsTlA W1QtDxOP cxlyy8yOtRoFLssKMaSHTiwNsAHiZe5a7bE3T8i2kHsrNyHA1icMz+dI3CFL95FV/HY7s/Z+71nciK81Z6G8otxS9O18udr0qPrpsORKleZSxW1ns46idJDA850T9T1c/zbrZV/dAk7CmMG6woxetZBs+NGBis1FmM3fVmY7mkMol4LbfAq8Omyox+bAlxsT1PwNe/RDRv7QfnaHtRV8ZGAB0bF9rYg3WNdBiuANXwHbyuRjDEDgMKW5B9sfY1AHotutnMWlEMcL8Plp0LIa4FsRHNVgaHKRtXxBwdfGlJV4QJ+F2/8jbRwjFpatDwmwgUH1Iay4Bhn/2qWNXEfl5hs4qkoEhlgaPAiJImfEF8PdIDi1a3CP0nQsTlX/D6HjLQc7wZ1HFqWL6Nwg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/23/25 17:35, Suren Baghdasaryan wrote: >> >> There's a new 'MEMORY MANAGEMENT - PAGE ALLOCATOR' entry (only in >> >> Andrew's mm.git repository now). >> >> >> >> Let's Cc the page allocator folks here! >> >> >> >> -- >> >> Cheers, >> >> Harry / Hyeonggon >> >> >> >>> mm/page_alloc.c | 8 ++++++++ >> >>> 1 file changed, 8 insertions(+) >> >>> >> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> >>> index fd6b865cb1ab..1e82f5214a42 100644 >> >>> --- a/mm/page_alloc.c >> >>> +++ b/mm/page_alloc.c >> >>> @@ -4530,6 +4530,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, >> >>> } >> >>> >> >>> retry: >> >>> + /* >> >>> + * Deal with possible cpuset update races or zonelist updates to avoid >> >>> + * infinite retries. >> >>> + */ >> >>> + if (check_retry_cpuset(cpuset_mems_cookie, ac) || >> >>> + check_retry_zonelist(zonelist_iter_cookie)) >> >>> + goto restart; >> >>> + >> > We have this check later in this block: >> > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652, >> > so IIUC you effectively are moving it to be called before >> > should_reclaim_retry(). If so, I think you should remove the old one >> > (the one I linked earlier) as it seems to be unnecessary duplication >> > at this point. >> In my understanding, the code in >> >> https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 >> >> was introduced to prevent unnecessary OOM (Out-of-Memory) conditions >> in__alloc_pages_may_oom. >> >> If old code is removed, the newly added code (on retry loop entry) >> cannot guarantee that the cpuset >> >> remains valid when the flow reaches in__alloc_pages_may_oom, especially >> if scheduling occurs during this section. > > Well, rescheduling can happen even between > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 > and https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4657 > but I see your point. Also should_reclaim_retry() does not include I think the rescheduling isn't a problem because what we're testing is "we are about to oom, could it have been because we raced?" and the race would have affected the code before #L4652. If we didn't race and yet determined it's time for oom, a race between #L4652 and #L4657 shouldn't matter. The get_page_from_freelist() in __alloc_pages_may_oom() isn't that important for preventing premature oom AFAICS, given it uses high wmark. That said, I think the newly added check could be more logically placed above the call to should_reclaim_retry() instead of right after the retry: label, but it's not critical. > zonelist change detection, so keeping the checks at > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 > sounds like a good idea. > >> >> Therefore, I think retaining the original code logic is necessary to >> ensure correctness under concurrency. >> >> > >> > >> >>> /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ >> >>> if (alloc_flags & ALLOC_KSWAPD) >> >>> wake_all_kswapds(order, gfp_mask, ac); >> >>> -- >> >>> 2.20.1 >> >>> >> >>> >> Thanks >>