From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 646E5C3DA64 for ; Fri, 26 Jul 2024 03:26:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD9496B008C; Thu, 25 Jul 2024 23:26:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D62A66B0092; Thu, 25 Jul 2024 23:26:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDB516B0098; Thu, 25 Jul 2024 23:26:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9C59A6B008C for ; Thu, 25 Jul 2024 23:26:40 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 16FDDC15A6 for ; Fri, 26 Jul 2024 03:26:40 +0000 (UTC) X-FDA: 82380466560.16.56DB493 Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by imf08.hostedemail.com (Postfix) with ESMTP id 34AF8160019 for ; Fri, 26 Jul 2024 03:26:37 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AMosu5GQ; spf=pass (imf08.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.171 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721964332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YHUvOD+14kqRnwgKMazSRJYb1Aq7AL/1ITfsjHBPJ88=; b=Jdsu8RWQLsFQTVFpOnLjJZQPJFiq4nhSRBp1YFSMIwvXA9gGU2h23Ztd8b/94DZT8aszjC Cz11HBxDF4Uxwqq1t7GOH/X5Cvy3CGyaUVcC/E4QjUOTRvP7u0FulYQxfHaQvLrQR7vaMS 9HlDtlrgP9jN96nD9he1UitPh4YFxF0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AMosu5GQ; spf=pass (imf08.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.171 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721964332; a=rsa-sha256; cv=none; b=klRvNxccZ8LktESjTS+0RWAZBdj6JypfVFly3LQ6BpgyUpHxA8KlWIquXPa1dnjXHyQMR1 raBHiDtQr5p5wMvus0/uA4aEoe6vRPyMbINKLjA6d3mJEA/SHWNX4z486vE2PyTRLSGEux UHbSrBk9rxUlcHWIAyV2Pk3cQ5uL7UQ= Received: by mail-lj1-f171.google.com with SMTP id 38308e7fff4ca-2f035ae1083so9888141fa.3 for ; Thu, 25 Jul 2024 20:26:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721964396; x=1722569196; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YHUvOD+14kqRnwgKMazSRJYb1Aq7AL/1ITfsjHBPJ88=; b=AMosu5GQBbQGn0hYFCp2irBGwKdgdN2V1I6l2JE4GQWLbtlOj/xzcNbKUh+RrdMuWK FdGtom91QHX+PZS+6xGA0CZqeDAV8brSsXpuXjVEFugAgm/BojnnMaIllOEu7AEzKwoB NmvJFKf8G7aV/EEHQZVQWdsvi9dYse3zJH26f1G9v2+33cAAMb/Q8A/9r61n15uU1U+C aEioRCo9qyQhkMDyTvBmc7xPxGlXGLtxidArlPmVttCi0tJ99D23iHhI5rbLuEtH9Skc rh2TXgXevdYKMxz/toF2s5iNVd9hqfyuyrYdpKh4DJ/Ud4yE1APmEZexg8MGqbjoOc4N Q2PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721964396; x=1722569196; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YHUvOD+14kqRnwgKMazSRJYb1Aq7AL/1ITfsjHBPJ88=; b=tdlZWOZIguEr7s7/pC/HMGCTqRZ3e5zm0be2L2uxYGgn4g+di6jigQrzly5toADB0/ IcNQsA2lqmRN+DYJsr7qcpd8Q375vlassXg6FWpU6inVuGUZRPrCkNE4OCCIiAPhhTIc kIp7upHKt/ewfKvIJ4cZJsqYbzz8RxfozGp7QblQxPsEK8KbT9fPCu+vb8J30OS4D1RM tK28BNAuJUkrrLeDW7B8OMl8DhjJ/Wai1IXirWaI0pDXOA/v8AsM2JguRsgeMv60H8mY bAeyT02rwGZYKzCdkTyFNSERvAbcLA357LtOaeb5VEUEM/to3qgxOk/rEF7htP7wjKVR bbKQ== X-Forwarded-Encrypted: i=1; AJvYcCXTLcWxydUfxxIaq6YFXGCavLc2e6tWc19hln2yR6Q++xDXTtWqGI/zYunAByS/NPyB5UTMhwZIjGT81N3gv9evkPg= X-Gm-Message-State: AOJu0YxNrOWy/bDO12XF1Nmxt4KuIL0dt9YUuuB1Dno7YlNMeLqXZFN4 YxX4K4KfJc3e5PGCi5yOv0RxUJQmExjd5JZSto614ppuK0qJo3pkTTY8+rrNGg3jBAE1XwCuTU0 GJbWqQl9+Jofs3g5LZCZCSrrermE= X-Google-Smtp-Source: AGHT+IEVdb2kWbTyl5lOxCyh+I+0YNrxwQWzr0uz/UXHePR6/adnQOLkKegn0cBFOXl4QHCJu0ymBmRCOAKhCKqeFf4= X-Received: by 2002:a2e:9808:0:b0:2ef:2543:45a2 with SMTP id 38308e7fff4ca-2f03dba5687mr24586431fa.25.1721964395656; Thu, 25 Jul 2024 20:26:35 -0700 (PDT) MIME-Version: 1.0 References: <893a263a-0038-4b4b-9031-72567b966f73@amd.com> <20240725095933.2640875-1-zhaoyang.huang@unisoc.com> In-Reply-To: <20240725095933.2640875-1-zhaoyang.huang@unisoc.com> From: Zhaoyang Huang Date: Fri, 26 Jul 2024 11:26:23 +0800 Message-ID: Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: "zhaoyang.huang" Cc: bharata@amd.com, Neeraj.Upadhyay@amd.com, akpm@linux-foundation.org, david@redhat.com, kinseyho@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mgorman@suse.de, mjguzik@gmail.com, nikunj@amd.com, vbabka@suse.cz, willy@infradead.org, yuzhao@google.com, steve.kang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 34AF8160019 X-Stat-Signature: z5jehjpiymweaa68sdx6sn3we57u4t57 X-Rspam-User: X-HE-Tag: 1721964397-189600 X-HE-Meta: U2FsdGVkX18ZnlBNwDP9iI+Ybyb2SKahwQ3995cQ4miM6Ox8U6xVY7j6ZRwfSnI3+vRqA7spuO/+zBkfQpDvQjUdvjLvuai9yqlruu2c9XZ5XSdcL7uTp/Z/Cyk7NVyJsU2M3Ava/JOC0kKeZ0dZVP+jrmYS1yP45PSG/q0fBSuU/zR92/P/Cyd17Vqp+dxCsKMhHhhd7c9CDf4V5Kfjg/uAgcjI6rB0r2+JN71YYF7JXDMuTLa/v/9zldxwoiAkE1tpoQpN6/53tOf3sRu3H/D8eCmUL7AL11Sr3kWQ9WNcq1GL90fV3MwbUlPT6MQHZ5NTRePiOQeTuvIxyMYZzdcote1/GzOxmgHNkd4JaNR31yb/Gk6w5tF/4DpWxKivi8ePoZPkCMUH1wz8zERqRQesFIS3/5k7ntg0bikqzHfTi47JJ/b5BS36j+BtEjNc5AUksOeN4b/sD0iT2QafziEApuoWNUj1AP92URwSur0QWflEIkXWhP4l32RW2qbSrcrTEN5ffjlbnBTKw6pCX+mPWnv9o7NxUfyRFnscVZM6nkEGGeei5envggjW08fJCQt+npfi8PQMbUGQUXtw0Xqfo5bysQlJ53by25MVWEOStdnfXG48ERFD+Asg2acW58DJcIN16OTqvgJQ50Ev0p53bflFbPw8keOChXOplaHbdp0QAMNFUJYbwEoryc4FvNhxcAbGTtQRx7jqVH7q0KQLFXLA57MWuu/y6hYWkwrKlZGf1uXG4wchS1rOc5ynEoYmuRHTqCjOG9UTcIOZps7opwTVkyt7T90yQj99KRHlDkvGZb6T4V5tSZrNR6MC+L8QWjuf1bjZjW69zN26ibLUH9JEHnkrV2vvTrNVXc1MSV42rquoLAISkED/7c3UvYDXSVnffNHateCFG8L4IiP3H7IZI8mNMhvkhx8nv9sxyVpA6TPFRFnFKxMrfNN4+uC2glTpbK1lB0WCf54 4doDdpIy WRUg2hEY2UcE0Zc9+vsAz1AUv1UFFXJI7A9wd7xAaLDm7xUBrX3EK3KDtSY4ANKJg8cqNjdMJn6bz0uQWQXeMUugAGw/Vts4POUq+BoOx510/W1qtupU+p9hU1ILXyvjP8NaXQwzFecL0w30MH9j9Hp+6ctLZh0f20oiBb0m1nadFArjtvb3vnLfgbZqXuwMczUgQ54o4TG3AXXbLmYFr8GmBgYydIUk5Pc5muUNqpzBOP9VgHsYNkvH1QJ/LqgqTjURpqRjV6bOuQCrl8R7o5gNCdKb0b50YonSpFc33T16ll3nLfUqMk6hRm1CXbrysxyYMMoOiCHrPPOZcZNwUSn7nDIaEAInxA17p2PD9UZ54qjjix/rzINPDKmNpHoSf1/QiumjhYasQHDE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 25, 2024 at 6:00=E2=80=AFPM zhaoyang.huang wrote: > > >However during the weekend mglru-enabled run (with above fix to > >isolate_lru_folios() and also the previous two patches: truncate.patch > >and mglru.patch and the inode fix provided by Mateusz), another hard > >lockup related to lruvec spinlock was observed. > > > >Here is the hardlock up: > > > >watchdog: Watchdog detected hard LOCKUP on cpu 466 > >CPU: 466 PID: 3103929 Comm: fio Not tainted > >6.10.0-rc3-trnct_nvme_lruvecresched_sirq_inode_mglru #32 > >RIP: 0010:native_queued_spin_lock_slowpath+0x2b4/0x300 > >Call Trace: > > > > ? show_regs+0x69/0x80 > > ? watchdog_hardlockup_check+0x1b4/0x3a0 > > > > ? native_queued_spin_lock_slowpath+0x2b4/0x300 > > > > > > _raw_spin_lock_irqsave+0x5b/0x70 > > folio_lruvec_lock_irqsave+0x62/0x90 > > folio_batch_move_lru+0x9d/0x160 > > folio_rotate_reclaimable+0xab/0xf0 > > folio_end_writeback+0x60/0x90 > > end_buffer_async_write+0xaa/0xe0 > > end_bio_bh_io_sync+0x2c/0x50 > > bio_endio+0x108/0x180 > > blk_mq_end_request_batch+0x11f/0x5e0 > > nvme_pci_complete_batch+0xb5/0xd0 [nvme] > > nvme_irq+0x92/0xe0 [nvme] > > __handle_irq_event_percpu+0x6e/0x1e0 > > handle_irq_event+0x39/0x80 > > handle_edge_irq+0x8c/0x240 > > __common_interrupt+0x4e/0xf0 > > common_interrupt+0x49/0xc0 > > asm_common_interrupt+0x27/0x40 > > > >Here is the lock holder details captured by all-cpu-backtrace: > > > >NMI backtrace for cpu 75 > >CPU: 75 PID: 3095650 Comm: fio Not tainted > >6.10.0-rc3-trnct_nvme_lruvecresched_sirq_inode_mglru #32 > >RIP: 0010:folio_inc_gen+0x142/0x430 > >Call Trace: > > > > ? show_regs+0x69/0x80 > > ? nmi_cpu_backtrace+0xc5/0x130 > > ? nmi_cpu_backtrace_handler+0x11/0x20 > > ? nmi_handle+0x64/0x180 > > ? default_do_nmi+0x45/0x130 > > ? exc_nmi+0x128/0x1a0 > > ? end_repeat_nmi+0xf/0x53 > > ? folio_inc_gen+0x142/0x430 > > ? folio_inc_gen+0x142/0x430 > > ? folio_inc_gen+0x142/0x430 > > > > > > isolate_folios+0x954/0x1630 > > evict_folios+0xa5/0x8c0 > > try_to_shrink_lruvec+0x1be/0x320 > > shrink_one+0x10f/0x1d0 > > shrink_node+0xa4c/0xc90 > > do_try_to_free_pages+0xc0/0x590 > > try_to_free_pages+0xde/0x210 > > __alloc_pages_noprof+0x6ae/0x12c0 > > alloc_pages_mpol_noprof+0xd9/0x220 > > folio_alloc_noprof+0x63/0xe0 > > filemap_alloc_folio_noprof+0xf4/0x100 > > page_cache_ra_unbounded+0xb9/0x1a0 > > page_cache_ra_order+0x26e/0x310 > > ondemand_readahead+0x1a3/0x360 > > page_cache_sync_ra+0x83/0x90 > > filemap_get_pages+0xf0/0x6a0 > > filemap_read+0xe7/0x3d0 > > blkdev_read_iter+0x6f/0x140 > > vfs_read+0x25b/0x340 > > ksys_read+0x67/0xf0 > > __x64_sys_read+0x19/0x20 > > x64_sys_call+0x1771/0x20d0 > > do_syscall_64+0x7e/0x130 > > From the callstack of lock holder, it is looks like a scability issue rat= her than a deadlock. Unlike legacy LRU management, there is no throttling m= echanism for global reclaim under mglru so far.Could we apply the similar m= ethod to throttle the reclaim when it is too aggresive. I am wondering if t= his patch which is a rough version could help on this? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2e34de9cd0d4..827036e21f24 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4520,6 +4520,50 @@ static int isolate_folios(struct lruvec *lruvec, s= truct scan_control *sc, int sw > return scanned; > } > > +static void lru_gen_throttle(pg_data_t *pgdat, struct scan_control *sc) > +{ > + struct lruvec *target_lruvec =3D mem_cgroup_lruvec(sc->target_mem= _cgroup, pgdat); > + > + if (current_is_kswapd()) { > + if (sc->nr.writeback && sc->nr.writeback =3D=3D sc->nr.ta= ken) > + set_bit(PGDAT_WRITEBACK, &pgdat->flags); > + > + /* Allow kswapd to start writing pages during reclaim.*/ > + if (sc->nr.unqueued_dirty =3D=3D sc->nr.file_taken) > + set_bit(PGDAT_DIRTY, &pgdat->flags); > + > + if (sc->nr.immediate) > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK= ); > + } > + > + /* > + * Tag a node/memcg as congested if all the dirty pages were mark= ed > + * for writeback and immediate reclaim (counted in nr.congested). > + * > + * Legacy memcg will stall in page writeback so avoid forcibly > + * stalling in reclaim_throttle(). > + */ > + if (sc->nr.dirty && (sc->nr.dirty / 2 < sc->nr.congested)) { > + if (cgroup_reclaim(sc) && writeback_throttling_sane(sc)) > + set_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->= flags); > + > + if (current_is_kswapd()) > + set_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->fl= ags); > + } > + > + /* > + * Stall direct reclaim for IO completions if the lruvec is > + * node is congested. Allow kswapd to continue until it > + * starts encountering unqueued dirty pages or cycling through > + * the LRU too quickly. > + */ > + if (!current_is_kswapd() && current_may_throttle() && > + !sc->hibernation_mode && > + (test_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->flags) || > + test_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->flags))) > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED); > +} > + > static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, = int swappiness) > { > int type; > @@ -4552,6 +4596,16 @@ static int evict_folios(struct lruvec *lruvec, str= uct scan_control *sc, int swap > retry: > reclaimed =3D shrink_folio_list(&list, pgdat, sc, &stat, false); > sc->nr_reclaimed +=3D reclaimed; > + sc->nr.dirty +=3D stat.nr_dirty; > + sc->nr.congested +=3D stat.nr_congested; > + sc->nr.unqueued_dirty +=3D stat.nr_unqueued_dirty; > + sc->nr.writeback +=3D stat.nr_writeback; > + sc->nr.immediate +=3D stat.nr_immediate; > + sc->nr.taken +=3D scanned; > + > + if (type) > + sc->nr.file_taken +=3D scanned; > + > trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > scanned, reclaimed, &stat, sc->priority, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); > @@ -5908,6 +5962,7 @@ static void shrink_node(pg_data_t *pgdat, struct sc= an_control *sc) > > if (lru_gen_enabled() && root_reclaim(sc)) { > lru_gen_shrink_node(pgdat, sc); > + lru_gen_throttle(pgdat, sc); > return; > } Hi Bharata, This patch arised from a regression Android test case failure which allocated 1GB virtual memory by each over 8 threads on an 5.5GB RAM system. This test could pass on legacy LRU management while failing under MGLRU as a watchdog monitor detected abnormal system-wide schedule status(watchdog can't be scheduled within 60 seconds). This patch with a slight change as below got passed in the test whereas has not been investigated deeply for how it was done. Theoretically, this patch enrolled the similar reclaim throttle mechanism as legacy do which could reduce the contention of lruvec->lru_lock. I think this patch is quite naive for now=EF=BC=8C but I am hoping it could help you as your case seems like a scability issue of memory pressure rather than a deadlock issue. Thank you! the change of the applied version(try to throttle the reclaim before instead of after) if (lru_gen_enabled() && root_reclaim(sc)) { + lru_gen_throttle(pgdat, sc); lru_gen_shrink_node(pgdat, sc); - lru_gen_throttle(pgdat, sc); return; } > > -- > 2.25.1 >