From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E93E9C2D0DB for ; Tue, 21 Jan 2020 08:26:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5C3CD2253D for ; Tue, 21 Jan 2020 08:26:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C3CD2253D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AEBBF6B0003; Tue, 21 Jan 2020 03:26:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A746D6B0005; Tue, 21 Jan 2020 03:26:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93D676B0006; Tue, 21 Jan 2020 03:26:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id 786816B0003 for ; Tue, 21 Jan 2020 03:26:40 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 24371181AEF10 for ; Tue, 21 Jan 2020 08:26:40 +0000 (UTC) X-FDA: 76400960160.20.loaf78_1cdc23f1f270a X-HE-Tag: loaf78_1cdc23f1f270a X-Filterd-Recvd-Size: 12165 Received: from mail3-167.sinamail.sina.com.cn (mail3-167.sinamail.sina.com.cn [202.108.3.167]) by imf41.hostedemail.com (Postfix) with SMTP for ; Tue, 21 Jan 2020 08:26:37 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.254.172.143]) by sina.com with ESMTP id 5E26B5B9000213D0; Tue, 21 Jan 2020 16:26:35 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 481044628929 From: Hillf Danton To: Cong Wang Cc: Mel Gorman , LKML , Andrew Morton , Hillf Danton , Michal Hocko , linux-mm Subject: Re: [PATCH] mm: avoid blocking lock_page() in kcompactd Date: Tue, 21 Jan 2020 16:26:24 +0800 Message-Id: <20200121082624.12608-1-hdanton@sina.com> In-Reply-To: <20200110092256.GN3466@techsingularity.net> References: <20200109225646.22983-1-xiyou.wangcong@gmail.com> <20200110092256.GN3466@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Jan 2020 14:41:50 -0800 Cong Wang wrote: > On Fri, Jan 10, 2020 at 1:22 AM Mel Gorman wrote: > > > > On Thu, Jan 09, 2020 at 02:56:46PM -0800, Cong Wang wrote: > > > We observed kcompactd hung at __lock_page(): > > > > > > INFO: task kcompactd0:57 blocked for more than 120 seconds. > > > Not tainted 4.19.56.x86_64 #1 > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this m= essage. > > > kcompactd0 D 0 57 2 0x80000000 > > > Call Trace: > > > ? __schedule+0x236/0x860 > > > schedule+0x28/0x80 > > > io_schedule+0x12/0x40 > > > __lock_page+0xf9/0x120 > > > ? page_cache_tree_insert+0xb0/0xb0 > > > ? update_pageblock_skip+0xb0/0xb0 > > > migrate_pages+0x88c/0xb90 > > > ? isolate_freepages_block+0x3b0/0x3b0 > > > compact_zone+0x5f1/0x870 > > > kcompactd_do_work+0x130/0x2c0 > > > ? __switch_to_asm+0x35/0x70 > > > ? __switch_to_asm+0x41/0x70 > > > ? kcompactd_do_work+0x2c0/0x2c0 > > > ? kcompactd+0x73/0x180 > > > kcompactd+0x73/0x180 > > > ? finish_wait+0x80/0x80 > > > kthread+0x113/0x130 > > > ? kthread_create_worker_on_cpu+0x50/0x50 > > > ret_from_fork+0x35/0x40 > > > > > > which faddr2line maps to: > > > > > > migrate_pages+0x88c/0xb90: > > > lock_page at include/linux/pagemap.h:483 > > > (inlined by) __unmap_and_move at mm/migrate.c:1024 > > > (inlined by) unmap_and_move at mm/migrate.c:1189 > > > (inlined by) migrate_pages at mm/migrate.c:1419 > > > > > > Sometimes kcompactd eventually got out of this situation, sometimes= not. > > > > > > I think for memory compaction, it is a best effort to migrate the p= ages, > > > so it doesn't have to wait for I/O to complete. It is fine to call > > > trylock_page() here, which is pretty much similar to > > > buffer_migrate_lock_buffers(). > > > > > > Given MIGRATE_SYNC_LIGHT is used on compaction path, just relax the > > > check for it. > > > > > > > Is this a single page being locked for a long time or multiple pages > > being locked without reaching a reschedule point? >=20 > Not sure whether it is single page or multiple pages, but I successfull= y > located the process locking the page (or pages), and I used perf to > capture its stack trace: >=20 >=20 > ffffffffa722aa06 shrink_inactive_list > ffffffffa722b3d7 shrink_node_memcg > ffffffffa722b85f shrink_node > ffffffffa722bc89 do_try_to_free_pages > ffffffffa722c179 try_to_free_mem_cgroup_pages > ffffffffa7298703 try_charge > ffffffffa729a886 mem_cgroup_try_charge > ffffffffa720ec03 __add_to_page_cache_locked > ffffffffa720ee3a add_to_page_cache_lru > ffffffffa7312ddb iomap_readpages_actor > ffffffffa73133f7 iomap_apply > ffffffffa73135da iomap_readpages > ffffffffa722062e read_pages > ffffffffa7220b3f __do_page_cache_readahead > ffffffffa7210554 filemap_fault > ffffffffc039e41f __xfs_filemap_fault > ffffffffa724f5e7 __do_fault > ffffffffa724c5f2 __handle_mm_fault > ffffffffa724cbc6 handle_mm_fault > ffffffffa70a313e __do_page_fault > ffffffffa7a00dfe page_fault >=20 > This process got stuck in this situation for a long time (since I sent = out > this patch) without making any progress. It behaves like stuck in an in= finite > loop, although the EIP still moves around within mem_cgroup_try_charge(= ). Make page reclaim in try_charge() async assuming sync reclaim is unnecess= ary without memory pressure and it does not help much under heavy pressure. S= kipping reclaim is only confined in page fault context to avoid adding too much a= time. --- linux-5.5-rc3/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2525,6 +2525,12 @@ force: if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); css_get_many(&memcg->css, nr_pages); + /* + * reclaim high limit pages soon without holding resources like + * page lock e.g in page fault context + */ + if (unlikely(current->flags & PF_MEMALLOC)) + schedule_work(&memcg->high_work); =20 return 0; =20 --- linux-5.5-rc3/mm/filemap.c=09 +++ b/mm/filemap.c @@ -863,8 +863,14 @@ static int __add_to_page_cache_locked(st mapping_set_update(&xas, mapping); =20 if (!huge) { + bool was_set =3D current->flags & PF_MEMALLOC; + if (!was_set) + current->flags |=3D PF_MEMALLOC; + error =3D mem_cgroup_try_charge(page, current->mm, gfp_mask, &memcg, false); + if (!was_set) + current->flags &=3D ~PF_MEMALLOC; if (error) return error; } -- > I also enabled trace event mm_vmscan_lru_shrink_inactive(), here is wha= t > I collected: >=20 > <...>-455459 [003] .... 2691911.664706: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664711: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D4 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664714: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D2 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D3 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664717: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D5 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D2 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664720: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D5 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D1 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664725: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D7 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664730: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D2 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664732: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664736: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D4 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664739: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D2 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D3 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664744: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D5 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D2 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664747: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D4 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D1 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664752: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D12 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= va > te=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664755: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D4 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664761: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D2 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664762: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D1 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664764: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664770: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D4 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D1 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664777: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D21 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= va > te=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D0 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664780: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D1 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D4 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC > <...>-455459 [003] .... 2691911.664783: > mm_vmscan_lru_shrink_inactive: nid=3D0 nr_scanned=3D2 nr_reclaimed=3D0 > nr_dirty=3D0 nr_writeback=3D0 nr_congested=3D0 nr_immediate=3D0 nr_acti= vat > e=3D0 nr_ref_keep=3D0 nr_unmap_fail=3D0 priority=3D3 > flags=3DRECLAIM_WB_FILE|RECLAIM_WB_ASYNC >=20 >=20 > > > > If it's a single page being locked, it's important to identify what h= eld > > page lock for 2 minutes because that is potentially a missing > > unlock_page. The kernel in question is old -- 4.19.56. Are there any > > other modifications to that kernel? >=20 > We only backported some networking and hardware driver patches, > not any MM change. >=20 > Please let me know if I can collect any other information you need, > before it gets rebooted. >=20 > Thanks.