From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B68CE7AA6 for ; Fri, 6 Sep 2024 00:01:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EFAA6B0082; Thu, 5 Sep 2024 20:01:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29FA16B0085; Thu, 5 Sep 2024 20:01:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167AB6B0088; Thu, 5 Sep 2024 20:01:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ED6276B0082 for ; Thu, 5 Sep 2024 20:01:08 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6CD7A120E01 for ; Fri, 6 Sep 2024 00:01:08 +0000 (UTC) X-FDA: 82532358216.02.38ECB77 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by imf23.hostedemail.com (Postfix) with ESMTP id 9B6AC14001F for ; Fri, 6 Sep 2024 00:01:05 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HW4g22TE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of weixugc@google.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=weixugc@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725580865; a=rsa-sha256; cv=none; b=BBkjxKHhXE3Wtz08DwBeUZ/yMyMCsS1QKAmVihdFWcRJZCZYfcdE2gSzfxMIdH86yyW4PG qk3Gs5wUEKyJZKcREgi061ySxoTHF86Ih3A/+knuje6w4BL/CopVvsXGM4CnccuWiJ+624 vtxEvv9IPOJDX7RQ9/8V2qkJFxiEjSU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HW4g22TE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of weixugc@google.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=weixugc@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725580865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5zbiIE5f0ijyM2F94C8pzvs+v46It1WxJmu/JDnwdjw=; b=niNiqvY/s1wmjtuyj8bm6kiDoYmzUdtvZAWD6jcskwlcXP0mTH7J3+4qywIj8giSVz6SAB wy4HCTau8sRGc53fOdk2Uux5n2FLLPmiLM0hh/4H2K8hgU6an6zc2hV/HEoMcqZQA5inh8 Mm4bSxa5db1ZO0CXhNHlMcRnZGxidGg= Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-42bb8c6e250so10529565e9.1 for ; Thu, 05 Sep 2024 17:01:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725580864; x=1726185664; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5zbiIE5f0ijyM2F94C8pzvs+v46It1WxJmu/JDnwdjw=; b=HW4g22TE5LrTrE6aHiDjsP+tnyX/q2TbtQ8PRmN+vhAe4XCID82/TpH/NpD1DxyiK3 UGmqWYloQQtlY6SKDFoAOjw4E6X66ftUOkH4/SZzfGmm61//O7Nas7pbOPohN2kcwGfA /fcB0EAssDdOH0eNyzfPeNfFDKf9lCyHXZP8fgnFJLCv+bkc9rUb9EEltiIGtF4SIP8q etCvBT/1py13iXOt7lihuFDlbH/VGK3WxxHZvQL4gsrjtKfdKxR+pPtYx/1wUSjaw++u MQ0OzMMXu/kwDePXjC4PPkgti+N3HUmGV+fJ2swP87tm925/x+cRU8VqNzmybTP8Xdey v5zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725580864; x=1726185664; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5zbiIE5f0ijyM2F94C8pzvs+v46It1WxJmu/JDnwdjw=; b=jb12p27b/rpB2GUzR5tK9mEumgaeBsGs66D2wBoNPCfogThDCh/rp67Cr4sb5Pkakv 32LHlq3YvGmu1VzK8NjzF5XdtBmk/fOGEqHrCgjWhvAzOrUSfoxiOJS5OhOaFo21yidW Uvsn+SLUUM4fUzFye1J3/N7e+VGjibovUM3Hbi6ly4zcJhvs1MxyjX2qRj91/SNDsJOA fEkRZgtuTpMKLBlhuypNEGcowKe4Inf4ZSbJvhk8sxouiJ3ZXVRB9ApxiQIaUutRh7WA hx4C0mINH/uvvNLXYuBtoUdguc7J675jil7gbH0Icvqalupmk2iu6faCIMe92cfYTfvX 4TWw== X-Gm-Message-State: AOJu0YyMl3i/g95JZvmp7fgd0D4GS0SmV05f6pkwfQfIcDwafS+iQpFx xbmSDrFz4Z0CVFTSeO/m3riZkxAjTEXVZYsDjWABEEstwbCSvD9tAPKOzwhzqEJxo5Pg7Lwxp2h d10GZ/LoqBDLfVyA61fRuXEQfJR0n2kUBUsNC X-Google-Smtp-Source: AGHT+IGVXtxRG+yYgtoUzZFo9L/X25W9Q/t5h7V6MToEk/e7asHXKKxligC6oCoRrHdVLVYKwitQgwT1nX7qG5GYB7c= X-Received: by 2002:a05:600c:4e46:b0:426:60bc:8f4e with SMTP id 5b1f17b1804b1-42c9f97b524mr4515845e9.5.1725580863110; Thu, 05 Sep 2024 17:01:03 -0700 (PDT) MIME-Version: 1.0 References: <20240829102543.189453-1-jingxiangzeng.cas@gmail.com> In-Reply-To: <20240829102543.189453-1-jingxiangzeng.cas@gmail.com> From: Wei Xu Date: Thu, 5 Sep 2024 17:00:40 -0700 Message-ID: Subject: Re: [PATCH] mm/vmscan: wake up flushers conditionally to avoid cgroup OOM To: Jingxiang Zeng Cc: linux-mm@kvack.org, Andrew Morton , Yu Zhao , "T . J . Mercier" , Kairui Song , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: uc1a8do5z89t1siwf7woz6cg95ocs3d7 X-Rspamd-Queue-Id: 9B6AC14001F X-Rspamd-Server: rspam02 X-HE-Tag: 1725580865-418777 X-HE-Meta: U2FsdGVkX18RzgM1ktl5OOjneFIKXmR2bkDkcU9ejwrHN/j2rqoa16hTUrMPlmG/2EncgSGJidcOQ4F+ocQrdRQ98GZ9b26cfkO3RVa9wox0ImVQp2oF0Wx+Dit7tIiq35dDUTD9XQ70pMssrNS/U9ldgNBZmF8Zwjx6XEz9Zq+85yCeKNfXQgu4CKKcxwetALmywbQWyJqbjmnrVFAffaqNAqpBpfeGvDGLEmFxBx+ppqdEW9LOMlv3ElG7EbKEzXi9Lt+rEJPnuns9qe8Uo4vbF7enxwwwKC3ii8OXq3/C3VFByvSVff4+Z4Jcwe1GMaH2kIbB/g91WQ+DO0UUMcKTzOCarx/DbzhfVEW72Sb1XnJZuB0y1p56Lv+a0mXblh6Ffq2//i/oAGs34sWI37u/8Vn4NQS4byqYnhjTAuuXCjZVFu9tyYgrFH9oxSerFaareYrCx81xbETLkU7gm1DgI2+XY1m9is6gAJYeOPgpQuFBwgH8gKk2aiLZPjM7+2iqcanTNmhu3PnzwHccXuEdMX4L6M3Wx2N4N1KT1VYZi0iJzAgmZxGP0lWvJHcjDzocAq9KHaKlXzTaqktbUaZcTKvtI+VbB/OUg2B6wVf4OLVdg110zf7AewFXAv5MoFs8GAAdkuNywt42X6349VTWarLP5bGVf8/PGtN+G7x6Aem1cooZ4aetWjSdK0gddk1+ADYNH5qwZ35nuLOTtaiJgs3Idc/av3GUmhG5kUgKabHY2FTY2cwyLxdI1HK4cjFl6TM0rnosTgUZAcaJHJ4/7HIPFlvltT1d9kYIpS4HnNg/7JTh7t//PppBQ9hysJaXTfHRdvZ8VjGDul4y1I07nK6t1Uhjsx+/aCN/E8isdZiaNJcKE7/L2ALIdPQmHYZUKvP9oQgCymVgP7q3nO/cZRBCnAFpCQkaZKpwQ2GxKz6pZWh5VV8GHu+fuZeRJ8pqH79Y3xxu38LZwN3 uSD9+m6Z L1NPlo2JxOQ2o/HDW+xgg5UX4tLAXpiFuceXq5DB6w0qz/dVEFaTocF0I9anhhoRqoWZnSr4fAL24mD7haF9qVL5/wPG9xGcJYktqMK8WNdEH0whtihETk5qVaBX9yPBwORIm/vjhe4a0AkOHfNSbjQIEquYvRMej4iFomv7AeSBf0TDIHLX2TO2sHAm6F0Frkbti6wHTL23RRA8+6gkJz8LI+q6PV+ndGzKjKcopQDBBuoWRt2Os5lhhUpuKnAKpxp1UuIhMhKb6cPI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 3:25=E2=80=AFAM Jingxiang Zeng wrote: > > From: Zeng Jingxiang > > Commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > removed the opportunity to wake up flushers during the MGLRU page > reclamation process can lead to an increased likelihood of triggering > OOM when encountering many dirty pages during reclamation on MGLRU. > > This leads to premature OOM if there are too many dirty pages in cgroup: > Killed Thanks for the patch. We have encountered a similar problem. > > dd invoked oom-killer: gfp_mask=3D0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRI= TE), > order=3D0, oom_score_adj=3D0 > > Call Trace: > > dump_stack_lvl+0x5f/0x80 > dump_stack+0x14/0x20 > dump_header+0x46/0x1b0 > oom_kill_process+0x104/0x220 > out_of_memory+0x112/0x5a0 > mem_cgroup_out_of_memory+0x13b/0x150 > try_charge_memcg+0x44f/0x5c0 > charge_memcg+0x34/0x50 > __mem_cgroup_charge+0x31/0x90 > filemap_add_folio+0x4b/0xf0 > __filemap_get_folio+0x1a4/0x5b0 > ? srso_return_thunk+0x5/0x5f > ? __block_commit_write+0x82/0xb0 > ext4_da_write_begin+0xe5/0x270 > generic_perform_write+0x134/0x2b0 > ext4_buffered_write_iter+0x57/0xd0 > ext4_file_write_iter+0x76/0x7d0 > ? selinux_file_permission+0x119/0x150 > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > vfs_write+0x30c/0x440 > ksys_write+0x65/0xe0 > __x64_sys_write+0x1e/0x30 > x64_sys_call+0x11c2/0x1d50 > do_syscall_64+0x47/0x110 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > memory: usage 308224kB, limit 308224kB, failcnt 2589 > swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > ... > file_dirty 303247360 > file_writeback 0 > ... > > oom-kill:constraint=3DCONSTRAINT_MEMCG,nodemask=3D(null),cpuset=3Dtest, > mems_allowed=3D0,oom_memcg=3D/test,task_memcg=3D/test,task=3Ddd,pid=3D440= 4,uid=3D0 > Memory cgroup out of memory: Killed process 4404 (dd) total-vm:10512kB, > anon-rss:1152kB, file-rss:1824kB, shmem-rss:0kB, UID:0 pgtables:76kB > oom_score_adj:0 > > The flusher wake up was removed to decrease SSD wearing, but if we are > seeing all dirty folios at the tail of an LRU, not waking up the flusher > could lead to thrashing easily. So wake it up when a mem cgroups is > about to OOM due to dirty caches. > > MGLRU still suffers OOM issue on latest mm tree, so the test is done > with another fix merged [1]. > > Link: https://lore.kernel.org/linux-mm/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8= S6N-MZUmErNAXoXmw@mail.gmail.com/ [1] > > Fixes: 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > Signed-off-by: Zeng Jingxiang > Signed-off-by: Kairui Song > --- > mm/vmscan.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f27792e77a0f..9cd8c42f67cb 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4447,6 +4447,7 @@ static int scan_folios(struct lruvec *lruvec, struc= t scan_control *sc, > scanned, skipped, isolated, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_A= NON); > > + sc->nr.taken +=3D isolated; > /* > * There might not be eligible folios due to reclaim_idx. Check t= he > * remaining to prevent livelock if it's not making progress. > @@ -4919,6 +4920,14 @@ static void lru_gen_shrink_lruvec(struct lruvec *l= ruvec, struct scan_control *sc > if (try_to_shrink_lruvec(lruvec, sc)) > lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG); > > + /* > + * If too many pages failed to evict due to page being dirty, > + * memory pressure have pushed dirty pages to oldest gen, > + * wake up flusher. > + */ > + if (sc->nr.unqueued_dirty >=3D sc->nr.taken) Any reason not to use a strict =3D=3D check as in shrink_inactive_list()? Also, this check allows the wakeup of the flusher threads when both sc->nr.unqueued_dirty and sc->nr.taken are 0, which is undesirable. If we skip the wakeup for the cases where both counters are 0, then I think we need to handle the situation where only dirty file pages are left for reclaim in the oldest gen. This means that sc->nr.unqueued_dirty needs to be updated in sort_folios() (in addition to shrink_folio_list()) as well because sort_folios() doesn't send dirty file pages to shrink_folio_list() for eviction. > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + > clear_mm_walk(); > > blk_finish_plug(&plug); > -- > 2.43.5 >