From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1D78D64076 for ; Sat, 9 Nov 2024 00:16:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A2AB6B00A3; Fri, 8 Nov 2024 19:16:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 35C476B00A5; Fri, 8 Nov 2024 19:16:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CBB26B00A4; Fri, 8 Nov 2024 19:16:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F40D46B00A0 for ; Fri, 8 Nov 2024 19:16:34 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 89941805E5 for ; Sat, 9 Nov 2024 00:16:34 +0000 (UTC) X-FDA: 82764638250.09.69E01F8 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 16A8B1C000A for ; Sat, 9 Nov 2024 00:16:15 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uYov01vI; spf=pass (imf18.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731111307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A839SDJM87IOrSNeKyC99w1i2Ui1CJ5PhN/T3xM/Kqk=; b=YeBffy/Cii8NjEuIj/Qub2Xhv8uliYqJzBxpsugeCy6p5Zcqz1ePNhxbgCrTEb9f4CYEH8 8h8nBgN13jjI/Fy+ip6rGb0CutGYOUBKQsd3GBD2GrUSAW+7Igg/SG0A7RWCraRhJ1zHvO Q/9krbz1ycZgnS84xCTw4hZIQVIb1XU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731111307; a=rsa-sha256; cv=none; b=UPjDGAxGHhXQERYxFm724X/GUBWAdBQjSPIjO1GGtLcQmztQg2imsDVBrmrjDWoJHSOGUj yRrcD8ft0Ksf0q0lAqlkKMrMeepXc3sEyoN2P4mNVPT8lv6EZBU30jCNTGx2ERmWjCYz70 EfkerGfEf2ytKtr+vvIE7k/P85cO9SA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uYov01vI; spf=pass (imf18.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Fri, 8 Nov 2024 16:16:22 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1731111390; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=A839SDJM87IOrSNeKyC99w1i2Ui1CJ5PhN/T3xM/Kqk=; b=uYov01vIAWR+sG5rjBY+oTmqvVpROBs1ulB7AQn82jepVM13jQYl8q+XrK3rMb56cMdSpL fRkGlK7c6f3SPzxxmkl6gidSNov6IfVJERjD/wz4zj6cD6qsJggWbLEoVxP9MrWu+95V7i d9b8xPxYIupYdcNyqxfC6uICyRwb0s0= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Joanne Koong Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: Re: [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Message-ID: References: <20241107235614.3637221-1-joannelkoong@gmail.com> <20241107235614.3637221-3-joannelkoong@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241107235614.3637221-3-joannelkoong@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam10 X-Stat-Signature: t6fq1rhcqbg5ftbemguerpnjc5ho614k X-Rspamd-Queue-Id: 16A8B1C000A X-Rspam-User: X-HE-Tag: 1731111375-455713 X-HE-Meta: U2FsdGVkX18DSbK0uesnfeegd1yqa+NPEIMELygp1KN6xm6E+dZ5Y2rgQ91Tso0OZ50ANSpLkmFjvV03w2sIvY3FnZCTDT/H33YnAZxrjn0EqQ1ieTmoc4vLRHFWdPANN+q61VG3I3q06UaWSiY+dP47YRCr/qbVDM7BBjAnjs9YT8JEHegu8KViWiyYGTYGjRYXBEKzHTp3Fm25JR2n6PuMWedjqi0P721jfrSbJki2TlX+NADgBTIeGH/vx26mt3DEs5M9huRUrts2SAx8/f2YPYGUPT7gYZM7gqnU3ZP0Dl6NAzRn/njpza4cqtbAKKym+rpTQnc+BuQ7xIHVyPZ1IvKK1nBhFkkTyp+A/BA/+7/tlhin13LC8rWZ3jngsdGaiz1YwCPUX3Sfe+UjvbLLz2Lxsfp1txvdukHSyNlNOK0mxg7dppbVMRkNE5zYrz236aS0gSn2w2Usv4GigTiY0RicZsLe+GhP7ec0rEkGvrabk/dL0BmOuqEmACsUQ1Z7t9Ovvntyh0ux3snLU1T+g3W2ZO+AocgR6+wmkilxHvuwUz7631m7XC2nXp4BU0t1CYr/wrmV9jyJV+GvB9KY0c/QM2eDmWTPH3eW8dZSpDOZExDE2iy53HR3Hpi4FUIM36KhMQI5LeLmoShzxuXpdnNqZPlIGRmF2BRX9OFvOXJVz/vB8keZjkYFyBZdaJgwKD3DUQxSoZUGXp4ndsudN/mmRrgdYCYW7zdgLQ/6nUd5W9G9RZpEv0TqG47bOU2y23J4Wa64pKh7aw2LsYV17sD0tC0ceh2kWDz/DhCOLuOX8FC1Drv7yqsg3izB5muJYes9Mitx0BMH58dqKkTk914XZbtT8JVRIRJbFGK+gYcHzDNvzQcFbQJxbh50R89ezOykEhyJejvmT9Bt3sw9szxH7l9QAuVxkRDHB2ty2cYYQLkaB7FNUYy4F5KRZZgC6nWzCdL1tLdgUR9 EGKSCwru a0aEIET9bG1KFCuqDopl55UuU3zPqatTggLKqEVmj2iQFqa88jbCqDqv/jRYqErUxFFcETek2wcYN1PIIfhxzbBSM+hrQxapk65dGiqUciHOA88J27BOGJfoJseKgr0EH2mWkAsM2XOYLU1nHx0JFYVE1PxYCIgl5jt7msKgqa8d93jU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 07, 2024 at 03:56:10PM -0800, Joanne Koong wrote: > Currently in shrink_folio_list(), reclaim for folios under writeback > falls into 3 different cases: > 1) Reclaim is encountering an excessive number of folios under > writeback and this folio has both the writeback and reclaim flags > set > 2) Dirty throttling is enabled (this happens if reclaim through cgroup > is not enabled, if reclaim through cgroupv2 memcg is enabled, or > if reclaim is on the root cgroup), or if the folio is not marked for > immediate reclaim, or if the caller does not have __GFP_FS (or > __GFP_IO if it's going to swap) set > 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag > set and the caller did not have __GFP_FS (or __GFP_IO if swap) set > > In cases 1) and 2), we activate the folio and skip reclaiming it while > in case 3), we wait for writeback to finish on the folio and then try > to reclaim the folio again. In case 3, we wait on writeback because > cgroupv1 does not have dirty folio throttling, as such this is a > mitigation against the case where there are too many folios in writeback > with nothing else to reclaim. > > The issue is that for filesystems where writeback may block, sub-optimal > workarounds may need to be put in place to avoid a potential deadlock > that may arise from reclaim waiting on writeback. (Even though case 3 > above is rare given that legacy cgroupv1 is on its way to being > deprecated, this case still needs to be accounted for). For example, for > FUSE filesystems, a temp page gets allocated per dirty page and the > contents of the dirty page are copied over to the temp page so that > writeback can be immediately cleared on the dirty page in order to avoid > the following deadlock: > * single-threaded FUSE server is in the middle of handling a request that > needs a memory allocation > * memory allocation triggers direct reclaim > * direct reclaim waits on a folio under writeback (eg falls into case 3 > above) that needs to be written back to the FUSE server > * the FUSE server can't write back the folio since it's stuck in direct > reclaim > > In this commit, if legacy memcg encounters a folio with the reclaim flag > set (eg case 3) and the folio belongs to a mapping that has the > AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip > reclaim (eg default to behavior in case 2) instead. > > This allows for the suboptimal workarounds added to address the > "reclaim wait on writeback" deadlock scenario to be removed. > > Signed-off-by: Joanne Koong This looks good just one nit below. Reviewed-by: Shakeel Butt > --- > mm/vmscan.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 749cdc110c74..e9755cb7211b 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > if (writeback && folio_test_reclaim(folio)) > stat->nr_congested += nr_pages; > > + mapping = folio_mapping(folio); Move the above line within folio_test_writeback() check block. > + > /* > * If a folio at the tail of the LRU is under writeback, there > * are three cases to consider. > @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > * 2) Global or new memcg reclaim encounters a folio that is > * not marked for immediate reclaim, or the caller does not > * have __GFP_FS (or __GFP_IO if it's simply going to swap, > - * not to fs). In this case mark the folio for immediate > - * reclaim and continue scanning. > + * not to fs), or writebacks in the mapping may block. > + * In this case mark the folio for immediate reclaim and > + * continue scanning. > * > * Require may_enter_fs() because we would wait on fs, which > * may not have submitted I/O yet. And the loop driver might > @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > /* Case 2 above */ > } else if (writeback_throttling_sane(sc) || > !folio_test_reclaim(folio) || > - !may_enter_fs(folio, sc->gfp_mask)) { > + !may_enter_fs(folio, sc->gfp_mask) || > + (mapping && mapping_writeback_may_block(mapping))) { > /* > * This is slightly racy - > * folio_end_writeback() might have > -- > 2.43.5 >