From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39451C3ABB2 for ; Mon, 16 Sep 2024 16:11:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9066A6B008C; Mon, 16 Sep 2024 12:11:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C0586B0092; Mon, 16 Sep 2024 12:11:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72F5B6B0093; Mon, 16 Sep 2024 12:11:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4D0E26B008C for ; Mon, 16 Sep 2024 12:11:25 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A9C48805AF for ; Mon, 16 Sep 2024 16:11:24 +0000 (UTC) X-FDA: 82571091288.14.BFDEB1D Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by imf08.hostedemail.com (Postfix) with ESMTP id C45F2160018 for ; Mon, 16 Sep 2024 16:11:21 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="dArYA0W/"; spf=pass (imf08.hostedemail.com: domain of weixugc@google.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726502972; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=werckLu2qABNnMsCoJq0wayuYhQhq8fQBYHljQK5Yds=; b=eicaFsthHt3DLHKoHt5u6QrDwcPxeqPuW8/+oPZPreKpbjYc88+FxVdWvnyNX4RoO0RHxm MLcFnWKSs1XOjZx7CxdfIfmQ8Ubj/dA46rGJQFlYBY4RtEY6SCot4aNnjUnLnJb4Wwdovn Rxef7dtXZCZIwpWrUBUmuKMGS+LoeNk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726502972; a=rsa-sha256; cv=none; b=j4mk1oqhA/rV21VbfFUPVglmKQToOA1MPOlZXIr5bvzLG6jnN3xAxGALOPiJ2odOPnMbFP /DOr/7VkLqmCUtK5mum9sz/+svSkLHjho1C1LD3BXrm63d7scKwYz1RtVPr9FKOtLlWYyE H8bnW5qFYYkOybfAtRh03yOAA/5pO0w= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="dArYA0W/"; spf=pass (imf08.hostedemail.com: domain of weixugc@google.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-3e04b7b3c6dso2262441b6e.2 for ; Mon, 16 Sep 2024 09:11:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726503081; x=1727107881; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=werckLu2qABNnMsCoJq0wayuYhQhq8fQBYHljQK5Yds=; b=dArYA0W/Rb1sf5KbH9Jkd1sYRpk3qQBArxGpSQQgg3ZT4ZED6+G7truqi2MGduTZm1 4IRqau2grFrVHEbYt2yDxlMbMXX23/+6V72i+qX6DY/Sku7sDuIJE6DkJ2XcV/gNplW9 FfNZcOFC2rpWYqiTlFSzhjER0ctQjNzEdibelDCtLBST/445aNfT367NNvvbTXolwNZ+ 2Rbzxrm3l+w2M8ykO4xGf2iBUarBoDnZGZG/uhWlsT8apUpJAiCp6PI7J/IGfN2BtKI+ vIBQ+HcYYyKGFn8oIjuIRympUXLdBDxv/dkqbtsrqqWPL8uoCcAevWY/uGZ24eH24SnO qL+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726503081; x=1727107881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=werckLu2qABNnMsCoJq0wayuYhQhq8fQBYHljQK5Yds=; b=Zjgs0w2atqRIa5qwkUiPIu2PCLeM/nK+94qK69PP1XGy0adbQHVbHExTNlKwYe5giC ae/hRLI4W/Xg8y7kGC8srdJODJ8jAtD4NnEFW8u+AXB1BNKsa07PQFambWdrhDdKnYoL ZHqcjRwM+4Bn3RtZzvGq2/F7Vc26TDpBOoNpQwJkKo+BPpAFhHJOVlRShKafcdSZ5kS4 A7KLL+K5kzfygUJjev6NJBaJp067rCNotQyD2BMMt7ZA/ohGZwpCLAqwh/hyBi8opj1S OkuGAL7Ev3oPGURmM0IDrcBm9xV//X66sqM3uw5A0pBfJyAoL6rcKBZ0Zrje1Gu/gMlH 6xlg== X-Gm-Message-State: AOJu0YxaaAaKe0HSC4gFGSNiNAb+fep5Sx1eYhlUt4GYKWSAFHKXzw53 yZUZH0WnJFCa/2L+zquEIDV9mK7O/Yy4A3CEb0Sj3oeggIhYDLM2ospcCH6xyv6ED+CteU81O6k fOtHORuTxxalKp8od4L6WKPqdHRjc7iKHY9V9 X-Google-Smtp-Source: AGHT+IG2paECrHCwX7g5IwKKYKo3FCiS9Keilpk2/8MMysEQ09W4ONACdDMzunXhFuD2rjnRQo8ev2HDFlwqsLIDb8g= X-Received: by 2002:a05:6808:3994:b0:3e0:3946:b92 with SMTP id 5614622812f47-3e071b28ccbmr8946972b6e.45.1726503080489; Mon, 16 Sep 2024 09:11:20 -0700 (PDT) MIME-Version: 1.0 References: <20240913084506.3606292-1-jingxiangzeng.cas@gmail.com> In-Reply-To: <20240913084506.3606292-1-jingxiangzeng.cas@gmail.com> From: Wei Xu Date: Mon, 16 Sep 2024 09:11:08 -0700 Message-ID: Subject: Re: [PATCH V2] mm/vmscan: wake up flushers conditionally to avoid cgroup OOM To: Jingxiang Zeng Cc: linux-mm@kvack.org, akpm@linux-foundation.org, kasong@tencent.com, linuszeng@tencent.com, linux-kernel@vger.kernel.org, tjmercier@google.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C45F2160018 X-Stat-Signature: 4fooxq14hktqb1ckpb1moz911bc3g3ze X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1726503081-462132 X-HE-Meta: U2FsdGVkX19VAvEkZfwLDaRbCXwoiBkNCBczZwqKxQK3Ttefv85ofYBTCvbVCeucjd9oCBYH32gNGeOVel+OW7FsMiGCJmV7rBbkyS4zpylCs67yg+QOsf8iBi4V+LGWJiXySYJVJfAjetBtO+dFeoGmD791eiat5RGUSX+rAWv2YwKMJ74wT69DRmqI/P48CeCS9p3VJ0wD04EIh20ENytELNZCCCmeqXuhGwtp267Dj0+j2cngmo2HewWDmws/35KDedjtJ4e6bBFiRH44m6GZlZurHBgew3TrViwLN97W3VxTBXg+yP2YqGvTo+twHC3kG+85V2ZZGCUyqVBzsXbvsYhL/SjkBslOyOqOrb1s9CvN+KLm07k1M+LtQISLqV+kzqpvl6wAAHZMQ9FKrSgsJV7mufVBP7nIq9ktEB+8MxM7XDgbxPr0JUtcS7nK5uvLnQcNL2I7T01HzbnGXXmfzsjxvjna8SF6gb1WgLOmqt/j6QQcqD9PU2K2I/z3Vc0GGN7AXC7ilhdUUdaoDbNAnVWzoc2c3A6GOM0CA3gU6WRYnf4G46DZjOJngqqwNgUxBHErp/XocaBcwvmrASzNpx5AZuhlARc8m5NUlPKVOWhxegVRnQZbdmBUzSjMIhQ55741xVnlhqrm1mc3QFzZxOEtaaKV+v7WmXvrtRUn2bcVxBWOOspF+u3eVWMCb6DyBE26zyR6M+v4XR/Dbxey08IWNyWmjdVMtRR+rcrphrFcgl7L5x5zGu/TTcwb0XQbZS2YJBr356ve0ox/7XIzhQpw7d8i6nEqGX821vTXPieZRORo1se514Az/+4DW0Xr0LZoUqVIYU8XCSqYojJcN5kgsvljKImwvBbroC2D1IjadLf+JESJKkmBE5OrWbBhSYj0q3FICQ4yS7bBjHjEZT2GbiWg8ph9rSrCY5Z1jcd0QyHcEt4ah5GvwCf68x/VHDAY2xdKc3MXtVy MUd/4W07 5dXAu/xzRVWUrAvJSOENr3wayXtNpix/dlgfouJ53l2m3Co2yMtrQCNLvDTpBMo6qx14eoicSCwj3D63gdEQrWBCdRHcfO1uwLEi/0u+lImLTAsG8IDxVn7vEoULSS0F5Xs4eGugtab8/QNvLAjGkI3PH6bOTlbmsD1xEdTksg71zIblXQ0nXRIAoT1frPWCpCCYmvEJUROPbQH2WB0x7jFzfatxF/Cdf7hmDHgrHpgHZRAhZDrJplwN1iTuYONKmVeV0o6m0ggjBu53to2atNf0x8pu4Nbm57IVeLp/SWPLlqfJvXOuRc4xF7D6n0vutBS61o3vChz0pFJo9B19r5xSomXyscTdDcJJ5JTAXKckeoO1Zw17Y99INrxqV5RVNW95Rn4jeieScZADj1g/PnrG7foQmYuqx12ms X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 13, 2024 at 1:45=E2=80=AFAM Jingxiang Zeng wrote: > > From: Zeng Jingxiang > > Commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > removed the opportunity to wake up flushers during the MGLRU page > reclamation process can lead to an increased likelihood of triggering OOM > when encountering many dirty pages during reclamation on MGLRU. > > This leads to premature OOM if there are too many dirty pages in cgroup: > Killed > > dd invoked oom-killer: gfp_mask=3D0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRI= TE), > order=3D0, oom_score_adj=3D0 > > Call Trace: > > dump_stack_lvl+0x5f/0x80 > dump_stack+0x14/0x20 > dump_header+0x46/0x1b0 > oom_kill_process+0x104/0x220 > out_of_memory+0x112/0x5a0 > mem_cgroup_out_of_memory+0x13b/0x150 > try_charge_memcg+0x44f/0x5c0 > charge_memcg+0x34/0x50 > __mem_cgroup_charge+0x31/0x90 > filemap_add_folio+0x4b/0xf0 > __filemap_get_folio+0x1a4/0x5b0 > ? srso_return_thunk+0x5/0x5f > ? __block_commit_write+0x82/0xb0 > ext4_da_write_begin+0xe5/0x270 > generic_perform_write+0x134/0x2b0 > ext4_buffered_write_iter+0x57/0xd0 > ext4_file_write_iter+0x76/0x7d0 > ? selinux_file_permission+0x119/0x150 > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > vfs_write+0x30c/0x440 > ksys_write+0x65/0xe0 > __x64_sys_write+0x1e/0x30 > x64_sys_call+0x11c2/0x1d50 > do_syscall_64+0x47/0x110 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > memory: usage 308224kB, limit 308224kB, failcnt 2589 > swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > ... > file_dirty 303247360 > file_writeback 0 > ... > > oom-kill:constraint=3DCONSTRAINT_MEMCG,nodemask=3D(null),cpuset=3Dtest, > mems_allowed=3D0,oom_memcg=3D/test,task_memcg=3D/test,task=3Ddd,pid=3D440= 4,uid=3D0 > Memory cgroup out of memory: Killed process 4404 (dd) total-vm:10512kB, > anon-rss:1152kB, file-rss:1824kB, shmem-rss:0kB, UID:0 pgtables:76kB > oom_score_adj:0 > > The flusher wake up was removed to decrease SSD wearing, but if we are > seeing all dirty folios at the tail of an LRU, not waking up the flusher > could lead to thrashing easily. So wake it up when a mem cgroups is > about to OOM due to dirty caches. > > Link: https://lkml.kernel.org/r/20240829102543.189453-1-jingxiangzeng.cas= @gmail.com > Fixes: 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > Signed-off-by: Zeng Jingxiang > Signed-off-by: Kairui Song > Cc: T.J. Mercier > Cc: Wei Xu > Cc: Yu Zhao > Signed-off-by: Andrew Morton > --- > mm/vmscan.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 749cdc110c74..ce471d686a88 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4284,6 +4284,7 @@ static bool sort_folio(struct lruvec *lruvec, struc= t folio *folio, struct scan_c > int tier_idx) > { > bool success; > + bool dirty, writeback; > int gen =3D folio_lru_gen(folio); > int type =3D folio_is_file_lru(folio); > int zone =3D folio_zonenum(folio); > @@ -4332,6 +4333,9 @@ static bool sort_folio(struct lruvec *lruvec, struc= t folio *folio, struct scan_c > /* waiting for writeback */ > if (folio_test_locked(folio) || folio_test_writeback(folio) || > (type =3D=3D LRU_GEN_FILE && folio_test_dirty(folio))) { > + folio_check_dirty_writeback(folio, &dirty, &writeback); We cannot simply call folio_check_dirty_writeback() here because folio_check_dirty_writeback() assumes that the folio is locked (e.g. see buffer_check_dirty_writeback()). > + if (dirty && !writeback) > + sc->nr.unqueued_dirty +=3D delta; > gen =3D folio_inc_gen(lruvec, folio, true); > list_move(&folio->lru, &lrugen->folios[gen][type][zone]); > return true; > @@ -4448,6 +4452,7 @@ static int scan_folios(struct lruvec *lruvec, struc= t scan_control *sc, > scanned, skipped, isolated, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_A= NON); > > + sc->nr.taken +=3D isolated; > /* > * There might not be eligible folios due to reclaim_idx. Check t= he > * remaining to prevent livelock if it's not making progress. > @@ -4920,6 +4925,14 @@ static void lru_gen_shrink_lruvec(struct lruvec *l= ruvec, struct scan_control *sc > if (try_to_shrink_lruvec(lruvec, sc)) > lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG); > > + /* > + * If too many pages failed to evict due to page being dirty, > + * memory pressure have pushed dirty pages to oldest gen, > + * wake up flusher. > + */ > + if (sc->nr.unqueued_dirty > sc->nr.taken) > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + The wakeups will be much more often than intended because sc->nr.unqueued_dirty includes the drity file pages now, but sc->nr.taken doesn't. > clear_mm_walk(); > > blk_finish_plug(&plug); > -- > 2.43.5 >