From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 759CAC369CB for ; Wed, 23 Apr 2025 15:35:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A0A86B00A3; Wed, 23 Apr 2025 11:35:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64E386B00A4; Wed, 23 Apr 2025 11:35:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EF3E6B00A5; Wed, 23 Apr 2025 11:35:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 329B76B00A3 for ; Wed, 23 Apr 2025 11:35:26 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 76A1FB8AFA for ; Wed, 23 Apr 2025 15:35:26 +0000 (UTC) X-FDA: 83365707852.10.DF267D3 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf19.hostedemail.com (Postfix) with ESMTP id A57F41A000D for ; Wed, 23 Apr 2025 15:35:24 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g7eJxJHZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745422524; a=rsa-sha256; cv=none; b=QicDzwiVFABdc8kbzPBLEgj71PMmLUplQm+96A59yfZ9Wh3nLDRTij9usnkocxEryBmvjH PcJ3bPINezp6hFXXKhvClSIxCiUHxthYr5O3VFB5nLyqzyl3rAS22HJvFoWmH9HxMHfH61 246nTEzEuuKUx/NeKpPsEY4V3+gexHI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g7eJxJHZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745422524; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h+jyAAC0t/JStHxcoSH+ZIpFBol3lCP0Cb116iznOcU=; b=XQafDWm+AQZntJF7qyXp0lAR66tFvHqOKeZubG2mToBvQ9XhoKcBBX6TVAUCb+NjkyNP1V FJwwJ5jXVwu/tXCKK/Gwzy/ci0J7KnH42gTMHce1NHPR5oW8AJt/tT5ayEuhKU4uV861nJ 72JelIr1CNmY8To0NnjxENxBr/jRPy0= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-47666573242so537261cf.0 for ; Wed, 23 Apr 2025 08:35:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745422524; x=1746027324; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=h+jyAAC0t/JStHxcoSH+ZIpFBol3lCP0Cb116iznOcU=; b=g7eJxJHZCd/0KVUgTZexWw9y6N3hZqmNgjC8gX6YDc0YoSFlTgLmNe2h3bdIb0OirI IWs9OphJXUGZ4ozXYFhNZsbsYiKDduc1KAY8V64n0B8Sg8bP7xRKvJdimLWL9gjmXoBo h5IPRKOyl7SUenyzoptl3PWtSr+SOIe2zWJiW3ksx+Jk26dnPz+j3hqO719G1ex/QslU /hFqEFKLVVboa5W3Rs89OjMRxoMvJdAGUG+G68h5esSRTfyQJcqzoqilCDbI3QqIt69d l3flFGDQW6EifQ2DjUE6QoJQYWNM+Preo7R6JTkmK6F3xeddEPhF12MsW9Smose7E6qO yM1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745422524; x=1746027324; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h+jyAAC0t/JStHxcoSH+ZIpFBol3lCP0Cb116iznOcU=; b=qb5mt3PtApC51XrjHVnYffHipN9XW15Xu7SZSOqxDiZo1wv2XEjD7dQNxlNfaR2zpS s8aiLRevUJaRA3G97b5vlQxEy0gQ36tcYIk5i+YcQwNXaeP8wrJqCdTHKeIcf5Pnr90j Gq1lncFFTsy5uC++IrXwirnMroCp9QDJVYsUjOo6XNMHpoQt9x5KqAWd7xrFwHO7ThhT 9KJm4r1KHgCtb8z2N5GIjdGLvZirVtrhmWmH7IjcGprv9BnP4IZFj8Z1a8cbf73Iyq8y iiBjCjljbOq+/FKR0CTVjkvsX3NSFiPkNR/YwVNF9T0n9fW81BTt7SGyQQtqjJxTH13N XoMQ== X-Forwarded-Encrypted: i=1; AJvYcCWEpI4a73gWK+VPHVPgHq14HKkxOmDtPYrMUGz82W8g28cTxqsWJq5Fyl78c7mvU7XqK4Du6foFBw==@kvack.org X-Gm-Message-State: AOJu0YxLqzIJlFSKN+vVQ/UqiYUnHV5TzPlvB2cwLWrKa18T7xRiSS4E iIDgIhmJJr98ObPnJ5BiS6IiytHK/ThW4VXH4W95pV5kmbNa5s27mlks+2745iMipN0jfyK4bCR 4z5oSiZEdFUuVP9i1B29AKePHy0ssEBebocH/IXmhwdRdH55/zHjCfqE= X-Gm-Gg: ASbGncuhCZ13yizE3pkPxfCoT6CcG+2ihKZIaeRgyETdoucevKlyppqQ0l9i0SivTBJ x6xnBbkw8FtrSNfNi3X8ftjj2fFJHCVdxQMopZ4thUWg2NPTlya8oHj9HQaA7SYc72mNxzqLPGZ ou3jwZ2GzAXd3uurR/umhm X-Google-Smtp-Source: AGHT+IExSWvIcGM87Yoi6Vq5pwwOc0D9jRXmsMz/YeACIxtJ+JJu8ohw9anNP3Y3bwIZtGW3Ieq9Vl5FqpnunleAUDE= X-Received: by 2002:a05:622a:22a6:b0:477:8577:1532 with SMTP id d75a77b69052e-47d13ae4751mr4998901cf.28.1745422523428; Wed, 23 Apr 2025 08:35:23 -0700 (PDT) MIME-Version: 1.0 References: <20250416082405.20988-1-zhangtianyang@loongson.cn> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 23 Apr 2025 08:35:11 -0700 X-Gm-Features: ATxdqUF14ylfywPy9oQmaC1VNJDZ3efVx9inPJcex-IU_JLBZJBblY01bJwKl4E Message-ID: Subject: Re: [PATCH] mm/page_alloc.c: Avoid infinite retries caused by cpuset race To: Tianyang Zhang Cc: Harry Yoo , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A57F41A000D X-Stat-Signature: qy94y76qhjjp9j3xkrnwmsq6g5qm9ya6 X-HE-Tag: 1745422524-695242 X-HE-Meta: U2FsdGVkX18iUh2ErSeK9A5vRESfKqLreIlFvNKwJUsYkVETghy75wnMeMFE6DScvsem49QjRjfv1gWxUJu8RJzvpVYn8Buc7mc+8ALy7Wn7osXV58rdbKuERW+rG8yZgIKHDP3F/ej8zrrq1ZEN8K1J+yokyVf29i1CYJK1BYWndqRORcXAUmndmFeVirBKo5dkb6SReSBr9Syv3Dvduf80NKEJ0FxkRDNqQ6MIFVHCyUqolmpdwIB1Zc4h0MarjR8omMcZG1qLWcM1/3gfIjN9mpkSAJYvJC5H9b6iKt464cG/u3KUC1KVEa1nX13VuCoErmlb6vNUT2F35+fw5xu2FNMQ1e9BSnUT6Pic56vVsl0GKkkQVlQja135jxQWUsIM1WSTsgbX2p/TsxPCeUweYFJfwuXHNn4EA/SnpvtaUhVWvzC4fFZveFrdfxusO25KxbIk5oZ+mhOvCCfTBSZDhxK0ndkSL5ZOVx+yOz5wG4YgqHQ/TQlKyq09fgoNsEOqplgL0+UKcrHYiBpjatKNoyqbsvspWQ4GYyZf/EOhKOSEpJZE+k4wRJjaZJnH7Cm8mj2XEvCqEy4BElE3wR3N7IRJAuVy8C+DTqvbFF01QgQC8KYkvvR7vf69uqr9BpmjSnFnjRgK+x/ms/I12uotNk74FbuGgAFiZV1c2wl1dWH1MhHMYuiT6ugTNT9RA8j2l5ri8nhGGO0bM5KkDX047WPyTUSY9Tb3WdPJKFiqeAyBDvwfh5IN4aiug4wYZNp1FVNKhawRTW2lCvX/iiLThwfnO9mxVn+7poyGdcYRHNpCDdvsYHTY4+aoVZHvhr8b6JKsQ1dARYxa3sjx+5ljmPWFcWnZ+g8+88Y2WeOWE+MP+sgCgzySek6IDWe8aCgGzD2FA3WjTQlhs1a6/PceJIfixujErYh53LzuokLUNfy+AGhtpPXDymfELtLh6v5TH0nATtRcG6U2AWL 1FMQ3z03 dBI+wyNSsri4c1dVOi9s8jVb+fzHwW0zUv7lIo8x2qpYXGjuVmGYG0WHcC82TUtzA9o4THTkB7LH0gJyBK0bkpvh1jrZRperUkcSood2PxzYLHuhyRN8lvpMpJu2+AJr9wb/sehEPrUUAzlLpVW8OmuxYenRQgifhSzOt+srRxof3ivLw2OxXoV/x0XGomd6MRwHBXqvZs/+ai6tjBEPccMtB7lhwEf5FHrZkxxQu5S6b0OXSFnyFzC65t68/W1vJDvWX8D6Np6/p0+MvgFJhkz7LSq6BH8EFV9qK4BPnMgNssSQo87Ss3+3s9IFsYNS/erhm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 22, 2025 at 7:39=E2=80=AFPM Tianyang Zhang wrote: > > Hi, Suren > > =E5=9C=A8 2025/4/22 =E4=B8=8A=E5=8D=884:28, Suren Baghdasaryan =E5=86=99= =E9=81=93: > > On Mon, Apr 21, 2025 at 3:00=E2=80=AFAM Harry Yoo wrote: > >> On Wed, Apr 16, 2025 at 04:24:05PM +0800, Tianyang Zhang wrote: > >>> __alloc_pages_slowpath has no change detection for ac->nodemask > >>> in the part of retry path, while cpuset can modify it in parallel. > >>> For some processes that set mempolicy as MPOL_BIND, this results > >>> ac->nodemask changes, and then the should_reclaim_retry will > >>> judge based on the latest nodemask and jump to retry, while the > >>> get_page_from_freelist only traverses the zonelist from > >>> ac->preferred_zoneref, which selected by a expired nodemask > >>> and may cause infinite retries in some cases > >>> > >>> cpu 64: > >>> __alloc_pages_slowpath { > >>> /* ..... */ > >>> retry: > >>> /* ac->nodemask =3D 0x1, ac->preferred->zone->nid =3D 1 */ > >>> if (alloc_flags & ALLOC_KSWAPD) > >>> wake_all_kswapds(order, gfp_mask, ac); > >>> /* cpu 1: > >>> cpuset_write_resmask > >>> update_nodemask > >>> update_nodemasks_hier > >>> update_tasks_nodemask > >>> mpol_rebind_task > >>> mpol_rebind_policy > >>> mpol_rebind_nodemask > >>> // mempolicy->nodes has been modified, > >>> // which ac->nodemask point to > >>> > >>> */ > >>> /* ac->nodemask =3D 0x3, ac->preferred->zone->nid =3D 1 */ > >>> if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > >>> did_some_progress > 0, &no_progress= _loops)) > >>> goto retry; > >>> } > >>> > >>> Simultaneously starting multiple cpuset01 from LTP can quickly > >>> reproduce this issue on a multi node server when the maximum > >>> memory pressure is reached and the swap is enabled > >>> > >>> Signed-off-by: Tianyang Zhang > >>> --- > >> What commit does it fix and should it be backported to -stable? > > I think it fixes 902b62810a57 ("mm, page_alloc: fix more premature OOM > > due to race with cpuset update"). > > I think this issue is unlikely to have been introduced by Patch > 902b62810a57 , > > as the infinite-reties section from > > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4568 > to > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4628 > > where the cpuset race condition occurs remains unmodified in the logic > of Patch 902b62810a57. Yeah, you are right. After looking into it some more, 902b62810a57 is a wrong patch to blame for this infinite loop. > > >> There's a new 'MEMORY MANAGEMENT - PAGE ALLOCATOR' entry (only in > >> Andrew's mm.git repository now). > >> > >> Let's Cc the page allocator folks here! > >> > >> -- > >> Cheers, > >> Harry / Hyeonggon > >> > >>> mm/page_alloc.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>> index fd6b865cb1ab..1e82f5214a42 100644 > >>> --- a/mm/page_alloc.c > >>> +++ b/mm/page_alloc.c > >>> @@ -4530,6 +4530,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigne= d int order, > >>> } > >>> > >>> retry: > >>> + /* > >>> + * Deal with possible cpuset update races or zonelist updates t= o avoid > >>> + * infinite retries. > >>> + */ > >>> + if (check_retry_cpuset(cpuset_mems_cookie, ac) || > >>> + check_retry_zonelist(zonelist_iter_cookie)) > >>> + goto restart; > >>> + > > We have this check later in this block: > > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652= , > > so IIUC you effectively are moving it to be called before > > should_reclaim_retry(). If so, I think you should remove the old one > > (the one I linked earlier) as it seems to be unnecessary duplication > > at this point. > In my understanding, the code in > > https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 > > was introduced to prevent unnecessary OOM (Out-of-Memory) conditions > in__alloc_pages_may_oom. > > If old code is removed, the newly added code (on retry loop entry) > cannot guarantee that the cpuset > > remains valid when the flow reaches in__alloc_pages_may_oom, especially > if scheduling occurs during this section. Well, rescheduling can happen even between https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 and https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4657 but I see your point. Also should_reclaim_retry() does not include zonelist change detection, so keeping the checks at https://elixir.bootlin.com/linux/v6.15-rc3/source/mm/page_alloc.c#L4652 sounds like a good idea. > > Therefore, I think retaining the original code logic is necessary to > ensure correctness under concurrency. > > > > > > >>> /* Ensure kswapd doesn't accidentally go to sleep as long as w= e loop */ > >>> if (alloc_flags & ALLOC_KSWAPD) > >>> wake_all_kswapds(order, gfp_mask, ac); > >>> -- > >>> 2.20.1 > >>> > >>> > Thanks >