From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 760A4D6D24F for ; Thu, 28 Nov 2024 00:49:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9908A6B0083; Wed, 27 Nov 2024 19:49:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93F916B0085; Wed, 27 Nov 2024 19:49:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82EB56B0088; Wed, 27 Nov 2024 19:49:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 648C36B0083 for ; Wed, 27 Nov 2024 19:49:52 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1EC901416B6 for ; Thu, 28 Nov 2024 00:49:52 +0000 (UTC) X-FDA: 82833671046.27.6052352 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf26.hostedemail.com (Postfix) with ESMTP id 8F0AB14000B for ; Thu, 28 Nov 2024 00:49:45 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=vSnw2AD9; dmarc=none; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732754987; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RAqr6K2l3gc92RRdp5Dbf7M32GY9QL0M2AHwIL0etsU=; b=i4mAVg513hfb4GcK9XK77otl12uYNgBmY5vAF8U3TrzKwqr6IP89nTWPwNHj+K3KK0KJB7 8cVjHTh7pnjmZGsiylXJyUCWsMep4xJn872tLFDN4t8aLR1/jO/rx9JRDVHgA/XKNv0yqE UgZS5YyhZzuSzd7eNbgcjuhUqNm1b2k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732754987; a=rsa-sha256; cv=none; b=xe0oev06C40zi4tcX+6xo+HZ+F8Gvmsk5H2MziMNL5Vjgvem0OnnBaiqmU2G5Tbftnw3Eg 6hiM3BRP/d4Ev2FWPnKvp6Hf7rpjib72ikt5/DGyWxE1Y4eoR7qNlfffePJXw86iHUwpCy ATYMqTTTPgxczCdAe35fwglY5BCgjx8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=vSnw2AD9; dmarc=none; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id ECE875C5622; Thu, 28 Nov 2024 00:49:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3761C4CECC; Thu, 28 Nov 2024 00:49:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1732754989; bh=pR9wREWB2fodb0/QPUlRVXlY0hvBsyu1VJhYzZ7hj/M=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=vSnw2AD9RnEFdjGtCyY9VWQVrSRb4jcfu48WtHrEy/XjlkdNqTTy/7Jt5ZLimmsEp 8le2+rtce2WajrByNqD8e4vO+lqh9SDKqOqdiz1EYX2v82EXm4Rjn7aVtiVf1IjW+5 VWNzXrsdiE3Td+3xvKl8yAPQfF6425yC6ZsRnw6o= Date: Wed, 27 Nov 2024 16:49:48 -0800 From: Andrew Morton To: Seiji Nishikawa Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mel Gorman Subject: Re: [PATCH] mm: vmscan: ensure kswapd is woken up if the wait queue is active Message-Id: <20241127164948.74659f9400fd076760c2a670@linux-foundation.org> In-Reply-To: <20241126150612.114561-1-snishika@redhat.com> References: <20241126150612.114561-1-snishika@redhat.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Stat-Signature: jgkn1cr4cuxtssch37uw9raa4jr8remq X-Rspamd-Queue-Id: 8F0AB14000B X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1732754985-333815 X-HE-Meta: U2FsdGVkX1+Caj4wn15a4LKIVPtzOyYjkomJyHb0PzkaC3x5o++OQacbt30+P8JL0MS3HEnk+ROqifKK4Yk6n4Zo/gv6Dmpck8Mc1xpQ4MsDIFlU2zAEpZHu5+6W2rX+2HOP8L/4jelrMK5Olpq30BISvBpKnJdnnN7HWrSC8MbrJdzsSdD6XtlCj4m2Wexo+YM7bHT5WR4f/QtECSxbXnoSSyyhMDmXj3sZla4Kck4hSjqP4XMCMJvuIiXnrXqn59vA2SZOzfTHevqyDsw5wkD54uQN7merqMeFlaMqLWxBaYw0CKggGMZtPuZpphQwI6GNfIDOWyujjqTFldiZzvlaVFmI6eHdNysDEk/EO7oJ/r+38iND4rJp7rXYPRrZNTrINUVD/l/kNNerU2Dps7G4O2K/GH/8ELohDA/deqaNTX4cdGmAp46IabPvcUpJF84CYGM5VLYiTH4jXF5rzriwc4TSX9tORPOhFFSzo9CYkyYrv9FzXsB9cmXMGJG7pHA0U4YIuvdAVcIEZuuhzn5aocASVfOvg+AUFIMGz+M6I6+3M2pTup+0xrnxxJuPOqnUNfC7iJT7a4fctQ0GPLrIT/dkSVa9NIHi5wRmuTvF6VicdOUlEzQrjaOPELo2tL+OueHuSnvhMIeSGcoH0QTCIbOMc9FWOX9BoCgE3Vrr7eLCpG4LctQnQIPEzcfdF2b35SBF5+4bcMfr+bHzjO/saVFz20a6Sg2ZZlw13NOzNRW0LaSv0kLvkMgo1KC/g1nT1PFQSL8tP75jOYH09wJWYTvWJ+QtQD6IbPiDy++0N+8f8BJlRkUncdbBgMWHw0EVT4yyG/3i7mDKdea6ukAdd/gNWPTWU0sZ4lbMbfOOfJ6aPvoruqrNYJkow65oIk5zN7u62q1XtSh50/JzTAPD0UCPwU5yF9eakdfCmx/n8mLk3c144GkpdRBrpANnLvmU+8t/DW2015HCuGD MvABhORS 11BMwVVyfxkH3BbMhHgQw8ESd4/jUYNNLAsNIsm8kNqZ5fEzeKGqyg/VtwLPOwap5OBaCT+gV77e/kTkW8Mu0Mcb69ad+6hBZu65UQ0+L56Gou8Ly3GPbm01qhjhlm03a89EWqi2lZhyoPs7xLE8QrRPlXxhKLd+Ero9PE7XWxiJd6biwo1BlxQOcoUXUN2uwSR3w4gM8op1AcYENkTRY0CA0EhU9QvtYOErppceGPtggpUiatdW2dIffIZ9v/TvK3PFczXx7sSk+P36D3JAqMHRuTSyTnr3T2p2P5eOysITUkLlp/0Ry5xNHrg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 27 Nov 2024 00:06:12 +0900 Seiji Nishikawa wrote: > Even after commit 501b26510ae3 ("vmstat: allow_direct_reclaim should use > zone_page_state_snapshot"), a task may remain indefinitely stuck in > throttle_direct_reclaim() while holding mm->rwsem. > > __alloc_pages_nodemask > try_to_free_pages > throttle_direct_reclaim > > This can cause numerous other tasks to wait on the same rwsem, leading > to severe system hangups: > > [1088963.358712] INFO: task python3:1670971 blocked for more than 120 seconds. > [1088963.365653] Tainted: G OE -------- - - 4.18.0-553.el8_10.aarch64 #1 > [1088963.373887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [1088963.381862] task:python3 state:D stack:0 pid:1670971 ppid:1667117 flags:0x00800080 > [1088963.381869] Call trace: > [1088963.381872] __switch_to+0xd0/0x120 > [1088963.381877] __schedule+0x340/0xac8 > [1088963.381881] schedule+0x68/0x118 > [1088963.381886] rwsem_down_read_slowpath+0x2d4/0x4b8 > > The issue arises when allow_direct_reclaim(pgdat) returns false, > preventing progress even when the pgdat->pfmemalloc_wait wait queue is > empty. Despite the wait queue being empty, the condition, > allow_direct_reclaim(pgdat), may still be returning false, causing it to > continue looping. > > In some cases, reclaimable pages exist (zone_reclaimable_pages() returns > > 0), but calculations of pfmemalloc_reserve and free_pages result in > wmark_ok being false. > > And then, despite the pgdat->kswapd_wait queue being non-empty, kswapd > is not woken up, further exacerbating the problem: > > crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_highest_zoneidx > $775 = __MAX_NR_ZONES > > This patch modifies allow_direct_reclaim() to wake kswapd if the > pgdat->kswapd_wait queue is active, regardless of whether wmark_ok is > true or false. This change ensures kswapd does not miss wake-ups under > high memory pressure, reducing the risk of task stalls in the throttled > reclaim path. The code which is being altered is over 10 years old. Is this misbehavior more recent? If so, are we able to identify which commit caused this? Otherwise, can you suggest why it took so long for this to be discovered? Your test case must be doing something unusual? Thanks. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -6389,8 +6389,8 @@ static bool allow_direct_reclaim(pg_data_t *pgdat) > > wmark_ok = free_pages > pfmemalloc_reserve / 2; > > - /* kswapd must be awake if processes are being throttled */ > - if (!wmark_ok && waitqueue_active(&pgdat->kswapd_wait)) { > + /* Always wake up kswapd if the wait queue is not empty */ > + if (waitqueue_active(&pgdat->kswapd_wait)) { > if (READ_ONCE(pgdat->kswapd_highest_zoneidx) > ZONE_NORMAL) > WRITE_ONCE(pgdat->kswapd_highest_zoneidx, ZONE_NORMAL); >