From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 338A6CF9C71 for ; Wed, 25 Sep 2024 02:07:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD4D96B0092; Tue, 24 Sep 2024 22:07:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B85516B0095; Tue, 24 Sep 2024 22:07:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4D0B6B0096; Tue, 24 Sep 2024 22:07:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 85E096B0092 for ; Tue, 24 Sep 2024 22:07:48 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 38D358072A for ; Wed, 25 Sep 2024 02:07:48 +0000 (UTC) X-FDA: 82601624616.15.7188B96 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf10.hostedemail.com (Postfix) with ESMTP id 50CCAC0009 for ; Wed, 25 Sep 2024 02:07:46 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IbcMxeSf; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727229969; a=rsa-sha256; cv=none; b=magDq4eMXXOkr5zMMNJ2lIUo6q3R3NbhwduafjiCVqB9XdGTm3OUZBHNi0u67DRlLd91Do N/Joa8n06b/I/vxzntbSZX5seYb+51jOPNapzCYgYl5NrJjTBZu1ONdmuFXNvmoaUui97x xRMEbHUMEl7+tuRRq2GpW7XmuJFqp4c= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IbcMxeSf; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727229969; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YX0ikfuqlMiQvWQnT+gAmHAIn9nGmxu/RFYdapeIZAU=; b=ji+87luLFnuKftJxvkav6rgJssUgkhZycLSNl6VLvzRMArW0W7NoJ2MXgEF5m1+n31EhpH Rd8lWrgcAkpNtfHBheHyrAtMcZxLM9SXS2WB9VUUzSi/BBidWOlvD00XPHC69MCMdmwOQe TxEP9YowmqLfOmnb6flZybMhRebPxok= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 506D55C5B93 for ; Wed, 25 Sep 2024 02:07:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82F17C4CED2 for ; Wed, 25 Sep 2024 02:07:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727230064; bh=lj18Z52aMzSH/9GoeTIElboBBspmzY9xOQ06C8At3nM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=IbcMxeSfKmNJ/zrFLzoQm29LqZjw6LIsg5lf9PFL9OEylXECGwcOYelJLdYGwph9l gfmlC3JC09rPY7bs0w1n/h1EOsBTUd2Ri4Xvt0Edj/YejFAoqX/uo0hU834sZAN4T4 h/QJ/X8SmN4D7bk3QjlNf/y/mLwDkQV2pGWbnDX9zZMHJDcmchUURUFpf5scQLoACR dc04APNACn2+paDRV95kYQ7TkUlYV1DQaln06Icv1PqynWykeSRwW4RC3VuX9yBDW+ Q39VrL9d51m6J3fMm7aJaxa2x+/+jTEYjOMidq476KMfeebo6fzVVzG/cai977CTYG bF58R9r3YDQRw== Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-3a1a4f2cc29so59365ab.1 for ; Tue, 24 Sep 2024 19:07:44 -0700 (PDT) X-Gm-Message-State: AOJu0Yy+R54NUBZEDvp2nOlsvIVITKncwA31OkXZMXSOapnVlO1m65kL 0YHVkS6e82tyDz/ex6p41ZgW/hi5cIsWw/bH1BSMYSKIPfUtwRzTrHzTbZ3x/J6pJ8rv538aipB JQQ2I69DAYzLI2mgwXTv4z6EXqNGPPrO+FKMM X-Google-Smtp-Source: AGHT+IGsAX6Dq4L+moisbEyMVPV/W0EQ4y6e1Oow+CD0nOMsIqdX6U/6Zg9c+iFwJC1p60P3SXBZalQa9YHsgAi9Ggo= X-Received: by 2002:a05:6e02:1ca6:b0:39d:1b64:3551 with SMTP id e9e14a558f8ab-3a2703b9a04mr929545ab.19.1727230063538; Tue, 24 Sep 2024 19:07:43 -0700 (PDT) MIME-Version: 1.0 References: <20240924121358.30685-1-jingxiangzeng.cas@gmail.com> In-Reply-To: <20240924121358.30685-1-jingxiangzeng.cas@gmail.com> From: Chris Li Date: Tue, 24 Sep 2024 19:07:30 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH V3] mm/vmscan: wake up flushers conditionally to avoid cgroup OOM To: Jingxiang Zeng Cc: linux-mm@kvack.org, akpm@linux-foundation.org, kasong@tencent.com, linuszeng@tencent.com, linux-kernel@vger.kernel.org, tjmercier@google.com, weixugc@google.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 50CCAC0009 X-Stat-Signature: 5yn6d7cf8zib1drks853pyieg9ukys1p X-Rspam-User: X-HE-Tag: 1727230066-319447 X-HE-Meta: U2FsdGVkX19frlKNTmAl7miZKztULMqOYX1CAse9IbCNyc0+KMQ+eaW8vLFp6IsmzPplvsii8QL2qyk2c2vnvMHX7C9LO1pQUC3L4keZTeoZJvo3WbGux4qog2yytqKFkHqMF4vbkSQVfqVPCXmHP6ntA6Giyy5wCMsvbFeL7E7BP4dpO/JrWL/UwS2NtYe6auu1D2D4vktxFh/v/D3lf4LHAUEJUl8lDXlrN28AugXqusVV+OZpOLeJ6fElcs6mavkc7BpO3dDO9LwWunvgCSqPVyETO4qWYvutM2xLg/Y/1dJ96BvOF7mRAv6FSM2tyC8FT752HPwBHV/6ZngRJR/SgZWh/e+1vXuQZHPVOvC+YA8XTUQxl9wOmxCDvVzTBNWDZWE8l2ho4eB/sGGm9Pmg5qGMgRq45UrDErrVvtxAesScecZl3Jj64deOikOE3SclX0IFmO5cgLKDC8Mq9GjKGh+V5ml59Yee8KSL/vql/4oVBkGmLQ3LlYSNdje78UMuTBTbeG3TzJpyNlAYpqev1JjX//p4zsyZHb9M0O6G6fb24jFmrNBq+n4FoG0jQPiFXDgq7wqnSu394lSQycsYAATScm1khluEHBlJK6AHtnSiqF6+a5/EidWncfOshzZGtTEXnBLeKfH4o5paGYbxeW6PB50tKD9wPaxb0UDADO/0S/PuLgHkIaeUPokrfGvPILB6JLTpYIUf9o63WcysiL/h7Ef7IdA4sbyiSGKR/6y64Vb0ju/KOI3du61zgSyry4PmN7sVfEHFnXv8cK1eiQ0ptqoRhHjXaXyrc3E/GFa7kJbpNuFm9Q5apJHwCfPJIBwR3iWJ/SAqKN3d/vn6jhSTdqSMbOS1UyYJ1TrKV7+8NPg35Isz7TpB3y66SIVDWAhGm69+RVDiVZVAnJ98TENTXj4pbyy+NsO0AkxyjYeeqiiczvHdEPOoVpICnZawctJGB23WC2rbZLE m7DMzwCr 59MksVGlJs/1q3glAGQ8228lTpg7yYNie/el5J6eraloNsZvTQ2lk8fxmUkbQDnWQ/IG7r05VoGCVfdocmDwsoDhvwB3GSoDSE+hluVxwju/xOGE+KTsTz2Ybfmyt0uW4Z3wWjl7SShYjYukVy+EuNn5/uPVC1RXc/jJ1OJmjoCaQwivPKd74Z3mxPvN6Mc9NDxEiv57GWaJE9ku+dwbFzcOGKf5qbImcEy2y0DiuoLXYknB9ZXiRGKc1fSa3QnOvAkVHxbg2m5xUhLR7RTYvgHlot633f57fzegfrybitb9BgWSm3cxbBH2VVsNHeiWRPleQmUbbwch4MKpoTWq7ujgDfhJdCNXx4FQZCfZWdTinYhaReB8X4EKxO9BuaFKSJp3ERKvxTJqsHPit6GdzzVob/EPCciAEYWzJdqZ/vOm0Ryp0445v5Xq8zUavB50fHRVx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jingxiang, I just tested your v3 patch, apply on top of d675c821b65f0c496df1d33150619b7635827e89("mm-memcontrol-add-per-memcg-pgpgi= n-pswpin-counter-v2") with 684826f8271ad97580b138b9ffd462005e470b99(""zram: free secondary algorithms names") reverted. Without your v3 patch it can pass the swap stress test in less than 5 mins. With your V3 patch it is running over 30 minutes and still can't complete. It does not produce kernel panic though, just extremely slow at the linking phase. Here is the top shows: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 33895 ... 20 0 8872 1780 1780 R 99.3 0.0 33:18.70 as 34115 ... 20 0 10568 4692 2964 R 1.0 0.0 0:00.97 top V3 also has regression on my swap stress test. Chris On Tue, Sep 24, 2024 at 5:14=E2=80=AFAM Jingxiang Zeng wrote: > > From: Zeng Jingxiang > > Commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > removed the opportunity to wake up flushers during the MGLRU page > reclamation process can lead to an increased likelihood of triggering OOM > when encountering many dirty pages during reclamation on MGLRU. > > This leads to premature OOM if there are too many dirty pages in cgroup: > Killed > > dd invoked oom-killer: gfp_mask=3D0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRI= TE), > order=3D0, oom_score_adj=3D0 > > Call Trace: > > dump_stack_lvl+0x5f/0x80 > dump_stack+0x14/0x20 > dump_header+0x46/0x1b0 > oom_kill_process+0x104/0x220 > out_of_memory+0x112/0x5a0 > mem_cgroup_out_of_memory+0x13b/0x150 > try_charge_memcg+0x44f/0x5c0 > charge_memcg+0x34/0x50 > __mem_cgroup_charge+0x31/0x90 > filemap_add_folio+0x4b/0xf0 > __filemap_get_folio+0x1a4/0x5b0 > ? srso_return_thunk+0x5/0x5f > ? __block_commit_write+0x82/0xb0 > ext4_da_write_begin+0xe5/0x270 > generic_perform_write+0x134/0x2b0 > ext4_buffered_write_iter+0x57/0xd0 > ext4_file_write_iter+0x76/0x7d0 > ? selinux_file_permission+0x119/0x150 > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > vfs_write+0x30c/0x440 > ksys_write+0x65/0xe0 > __x64_sys_write+0x1e/0x30 > x64_sys_call+0x11c2/0x1d50 > do_syscall_64+0x47/0x110 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > memory: usage 308224kB, limit 308224kB, failcnt 2589 > swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > ... > file_dirty 303247360 > file_writeback 0 > ... > > oom-kill:constraint=3DCONSTRAINT_MEMCG,nodemask=3D(null),cpuset=3Dtest, > mems_allowed=3D0,oom_memcg=3D/test,task_memcg=3D/test,task=3Ddd,pid=3D440= 4,uid=3D0 > Memory cgroup out of memory: Killed process 4404 (dd) total-vm:10512kB, > anon-rss:1152kB, file-rss:1824kB, shmem-rss:0kB, UID:0 pgtables:76kB > oom_score_adj:0 > > The flusher wake up was removed to decrease SSD wearing, but if we are > seeing all dirty folios at the tail of an LRU, not waking up the flusher > could lead to thrashing easily. So wake it up when a mem cgroups is abou= t > to OOM due to dirty caches. > > Link: https://lkml.kernel.org/r/20240829102543.189453-1-jingxiangzeng.cas= @gmail.com > Link: https://lkml.kernel.org/r/20240913084506.3606292-1-jingxiangzeng.ca= s@gmail.com > Fixes: 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > Signed-off-by: Zeng Jingxiang > Signed-off-by: Kairui Song > Cc: T.J. Mercier > Cc: Wei Xu > Cc: Yu Zhao > --- > Changes from v2: > - Acquire the lock before calling the folio_check_dirty_writeback > function. > - Link to v2: https://lore.kernel.org/all/20240913084506.3606292-1-jingxi= angzeng.cas@gmail.com/ > Changes from v1: > - Add code to count the number of unqueued_dirty in the sort_folio > function. > - Link to v1: https://lore.kernel.org/all/20240829102543.189453-1-jingxia= ngzeng.cas@gmail.com/ > --- > mm/vmscan.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 749cdc110c74..12c285a96353 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4290,6 +4290,8 @@ static bool sort_folio(struct lruvec *lruvec, struc= t folio *folio, struct scan_c > int delta =3D folio_nr_pages(folio); > int refs =3D folio_lru_refs(folio); > int tier =3D lru_tier_from_refs(refs); > + bool dirty =3D folio_test_dirty(folio); > + bool writeback =3D folio_test_writeback(folio); > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > > VM_WARN_ON_ONCE_FOLIO(gen >=3D MAX_NR_GENS, folio); > @@ -4330,8 +4332,10 @@ static bool sort_folio(struct lruvec *lruvec, stru= ct folio *folio, struct scan_c > } > > /* waiting for writeback */ > - if (folio_test_locked(folio) || folio_test_writeback(folio) || > - (type =3D=3D LRU_GEN_FILE && folio_test_dirty(folio))) { > + if (folio_test_locked(folio) || dirty || > + (type =3D=3D LRU_GEN_FILE && writeback)) { > + if (type =3D=3D LRU_GEN_FILE && dirty && !writeback) > + sc->nr.unqueued_dirty +=3D delta; > gen =3D folio_inc_gen(lruvec, folio, true); > list_move(&folio->lru, &lrugen->folios[gen][type][zone]); > return true; > @@ -4448,6 +4452,7 @@ static int scan_folios(struct lruvec *lruvec, struc= t scan_control *sc, > scanned, skipped, isolated, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_A= NON); > > + sc->nr.taken +=3D isolated; > /* > * There might not be eligible folios due to reclaim_idx. Check t= he > * remaining to prevent livelock if it's not making progress. > @@ -4920,6 +4925,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *l= ruvec, struct scan_control *sc > if (try_to_shrink_lruvec(lruvec, sc)) > lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG); > > + /* > + * If too many pages in the coldest generation that cannot > + * be isolated, wake up flusher. > + */ > + if (sc->nr.unqueued_dirty > sc->nr.taken) > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + > clear_mm_walk(); > > blk_finish_plug(&plug); > -- > 2.43.5 > >