From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F38D7C3DA49 for ; Mon, 29 Jul 2024 01:41:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 725686B0085; Sun, 28 Jul 2024 21:41:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D60D6B0088; Sun, 28 Jul 2024 21:41:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59D486B0089; Sun, 28 Jul 2024 21:41:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3AD9F6B0085 for ; Sun, 28 Jul 2024 21:41:40 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E8BEB1C0E2B for ; Mon, 29 Jul 2024 01:41:39 +0000 (UTC) X-FDA: 82391088318.07.8D7AF57 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by imf26.hostedemail.com (Postfix) with ESMTP id A7CA5140014 for ; Mon, 29 Jul 2024 01:41:37 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SIwVbaam; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.16 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722217258; a=rsa-sha256; cv=none; b=M/WpOzPT0PChU/RVFaeQWU41jxC4lyc/VzUtIHC4Ig3lnzQfrACeyolJSlFQabTUT3Tq1V 13b3S4I+5WqG3rLrASMX6bwYO6XtptgXUdMXIMqYUwtdjKKc+OG4mNBNlL2cx6o8l55Zea c2DQGgdcp5bBkBmApPPEcvygAjiQGM8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SIwVbaam; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.16 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722217258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y0JpjPqwpqa02GST9sIIK/cYkT/G3VyefkULlzGY9Aw=; b=jxSP1Ne31SNWkQOvT84YAutGMMgqQH7JXYOEKyoCUNYPf0bEEHS+XepH3NRH7mHBf9s6G/ mjy3iy3TD/+iofE07NE5nXE7MsaC2sKGrZ+bWoNxuoXMCilHNjvprcgiVJ25h9lYoZzbyA tbTKxoXaBbjTGGUv/ltlPXBMSwzlO4U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722217298; x=1753753298; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=uh0eBmCXbZHpjajD6vccLotxjYSai5IciHgQNK5l0+U=; b=SIwVbaamqh9WDV5vnXhAfxVi9X9BixrtG1EePEpBzI1bI2arzXMUTkPm ntpuv5C+Sdo/9qVVM38J4dwpJ4VwPBOdCW1cR5pJGjPEbbJJfX+o1lQif 9Ho8yv/amFRiEHyCT0kRCi+imIBsUyLsEbfbC4gKbPrhP41hCnvDjd0Ap JFk9t7JE7Zx6XwaJm6KTP/Zthe/VdcgozznKS9YtbzkWUyo5W9kfJsO0r cgEaIXLnDnX7ZxiEkJwxgJZC2jNBIfZIup/lztrekRtKFPhfYUJauvqg5 o+r5AH8IiCerszornCvl9KJBZZVF+nsETrDBzqzcVpwRYs8gsHgR95VPR Q==; X-CSE-ConnectionGUID: dRQJjx2dTpmZO9xCqLNRJA== X-CSE-MsgGUID: qW1JP1MLR++Z+KqLW1akxQ== X-IronPort-AV: E=McAfee;i="6700,10204,11147"; a="20079577" X-IronPort-AV: E=Sophos;i="6.09,245,1716274800"; d="scan'208";a="20079577" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jul 2024 18:41:36 -0700 X-CSE-ConnectionGUID: MVt0tDl5T3yrBqBs+YW+Qw== X-CSE-MsgGUID: grbf3j9ORR+Xc7ow14Qolw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,245,1716274800"; d="scan'208";a="53473307" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jul 2024 18:41:34 -0700 From: "Huang, Ying" To: Gao Xiang Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/migrate: fix deadlock in migrate_pages_batch() on large folios In-Reply-To: <20240728154913.4023977-1-hsiangkao@linux.alibaba.com> (Gao Xiang's message of "Sun, 28 Jul 2024 23:49:13 +0800") References: <20240728154913.4023977-1-hsiangkao@linux.alibaba.com> Date: Mon, 29 Jul 2024 09:38:01 +0800 Message-ID: <87plqx0yh2.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: A7CA5140014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ie74rx84ufomyqdymi7tzwepskzu6pq5 X-HE-Tag: 1722217297-151300 X-HE-Meta: U2FsdGVkX18m249kPTvW51/xF4AI3+CJrIKswA3uUFrHMm3GSfkX49Yof9uSJL6ziUiIL23po7qSqhCc7OchWuE9rBKrGcSLDUSbspP+vyAe7v4P/slH25EqsScgs2R5q69vKqflCZGfR5o4zxydiZYFS4ZLSnW6YyzpSYIkquVApGwew3SIZO6EoPhuE4jQPGPR1Q9U0Kk4/emF9qiurHxYfXfXcNX1N7sYfy5ReQ47KnNSg/VuswcPKg1aYWtojT0qHL8cdzR510AhAjXYbCMR+x25hkPH9DbUY88yVjEoaE7bcnMQuyyAsMC/5SgVi6/AiZGJgRanm0su4FPm6cQG2h7AzSyxLL1zBKF52TUvlGzFEo+PDfDJbCUid+8BLu+2sbLJEG4K792FYMRYxyM8mHGWp/qOKG/S+YitDTz2fWVxZHvf2h1PebVtymz+FByZO3IcZdOPCLX/JloEfJyH2q9s37ia9olsukshfEJ3EjkmSQk06pWTx3vlF7ILmX25p1jA6Z7r1qnphyT5f0VHVL9dc187eJ7aKsfk7DLI0o8wcvERsrzIYv7xpm4DnplbO3iR0cIm9eOh0vtJbIZlNFQ8Da9pXkKses9kNQadECWqKDhnC86c+zp+gY6GjJ8WT0K+VknuEsQ4o1X/Dapm1KRY3/XixuiB7fbUSLJ/YpE01ch6WIGPHqFzwWQTTlDDUGX604oBTPDXtTxBejzAFPUyegZXVxzO1K5f+eqgGNbdBu4v/i8owJIKso/+IJDajds5IU8l+G5rCE1U1IrPRAkSRKvs9kOYnzH5b5L4pqYo1n7hYwyT9us2ELLO7fbiJQDjqKjng3IjfmMTUuzWWvhwIyhMALV0XS+CwxLG+lFGjzBWrLTQMXwPZoa/4u2XTohOaSQIepUWPnpQLlru3hg2XS0h4DJeQ2zuYWsJVCsYPF8cP0sQ0UjI5QSFYjlrp6cXwqZge8WI7uV u2r/lRDj TxdrRdCbtk8yon8kepQItwb4gazFpTrxCH58VBVlkKQ4V5AY8V0B6lnrI7vRQEBWuxXCu2MrKUgofpfVQZpolYihehNXo29YfPDkb2wQmJaeATVgEAd/IXsq38Qi29pisOkB9nOqx1FKPQdFQDY9zRYr1++iUXAbQ+lAOeQ4oBj1vXWG9z0//sKPV6xyW/Dr5qczf3RHgMpRl+4aWawod+aHXJBwk1Ac5XmzrbEnhuNnxeD8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Xiang, Gao Xiang writes: > Currently, migrate_pages_batch() can lock multiple locked folios > with an arbitrary order. Although folio_trylock() is used to avoid > deadlock as commit 2ef7dbb26990 ("migrate_pages: try migrate in batch > asynchronously firstly") mentioned, it seems try_split_folio() is > still missing. > > It was found by compaction stress test when I explicitly enable EROFS > compressed files to use large folios, which case I cannot reproduce with > the same workload if large folio support is off (current mainline). > Typically, filesystem reads (with locked file-backed folios) could use > another bdev/meta inode to load some other I/Os (e.g. inode extent > metadata or caching compressed data), so the locking order will be: > > file-backed folios (A) > bdev/meta folios (B) > > The following calltrace shows the deadlock: > Thread 1 takes (B) lock and tries to take folio (A) lock > Thread 2 takes (A) lock and tries to take folio (B) lock > > [Thread 1] > INFO: task stress:1824 blocked for more than 30 seconds. > Tainted: G OE 6.10.0-rc7+ #6 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:stress state:D stack:0 pid:1824 tgid:1824 ppid:1822 flags:0x0000000c > Call trace: > __switch_to+0xec/0x138 > __schedule+0x43c/0xcb0 > schedule+0x54/0x198 > io_schedule+0x44/0x70 > folio_wait_bit_common+0x184/0x3f8 > <-- folio mapping ffff00036d69cb18 index 996 (**) > __folio_lock+0x24/0x38 > migrate_pages_batch+0x77c/0xea0 // try_split_folio (mm/migrate.c:1486:2) > // migrate_pages_batch (mm/migrate.c:1734:16) > <--- LIST_HEAD(unmap_folios) has > .. > folio mapping 0xffff0000d184f1d8 index 1711; (*) > folio mapping 0xffff0000d184f1d8 index 1712; > .. > migrate_pages+0xb28/0xe90 > compact_zone+0xa08/0x10f0 > compact_node+0x9c/0x180 > sysctl_compaction_handler+0x8c/0x118 > proc_sys_call_handler+0x1a8/0x280 > proc_sys_write+0x1c/0x30 > vfs_write+0x240/0x380 > ksys_write+0x78/0x118 > __arm64_sys_write+0x24/0x38 > invoke_syscall+0x78/0x108 > el0_svc_common.constprop.0+0x48/0xf0 > do_el0_svc+0x24/0x38 > el0_svc+0x3c/0x148 > el0t_64_sync_handler+0x100/0x130 > el0t_64_sync+0x190/0x198 > > [Thread 2] > INFO: task stress:1825 blocked for more than 30 seconds. > Tainted: G OE 6.10.0-rc7+ #6 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:stress state:D stack:0 pid:1825 tgid:1825 ppid:1822 flags:0x0000000c > Call trace: > __switch_to+0xec/0x138 > __schedule+0x43c/0xcb0 > schedule+0x54/0x198 > io_schedule+0x44/0x70 > folio_wait_bit_common+0x184/0x3f8 > <-- folio = 0xfffffdffc6b503c0 (mapping == 0xffff0000d184f1d8 index == 1711) (*) > __folio_lock+0x24/0x38 > z_erofs_runqueue+0x384/0x9c0 [erofs] > z_erofs_readahead+0x21c/0x350 [erofs] <-- folio mapping 0xffff00036d69cb18 range from [992, 1024] (**) > read_pages+0x74/0x328 > page_cache_ra_order+0x26c/0x348 > ondemand_readahead+0x1c0/0x3a0 > page_cache_sync_ra+0x9c/0xc0 > filemap_get_pages+0xc4/0x708 > filemap_read+0x104/0x3a8 > generic_file_read_iter+0x4c/0x150 > vfs_read+0x27c/0x330 > ksys_pread64+0x84/0xd0 > __arm64_sys_pread64+0x28/0x40 > invoke_syscall+0x78/0x108 > el0_svc_common.constprop.0+0x48/0xf0 > do_el0_svc+0x24/0x38 > el0_svc+0x3c/0x148 > el0t_64_sync_handler+0x100/0x130 > el0t_64_sync+0x190/0x198 > > Fixes: 5dfab109d519 ("migrate_pages: batch _unmap and _move") > Cc: "Huang, Ying" > Signed-off-by: Gao Xiang > --- > mm/migrate.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 20cb9f5f7446..a912e4b83228 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1483,7 +1483,8 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f > { > int rc; > > - folio_lock(folio); > + if (!folio_trylock(folio)) > + return -EAGAIN; > rc = split_folio_to_list(folio, split_folios); > folio_unlock(folio); > if (!rc) Good catch! Thanks for the fixing! The deadlock is similar as the one we fixed in commit fb3592c41a44 ("migrate_pages: fix deadlock in batched migration"). But apparently, we missed this case. For the fix, I think that we should still respect migrate_mode because users may prefer migration success over blocking. @@ -1492,11 +1492,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, return rc; } -static inline int try_split_folio(struct folio *folio, struct list_head *split_folios) +static inline int try_split_folio(struct folio *folio, struct list_head *split_folios, + enum migrate_mode mode) { int rc; - folio_lock(folio); + if (mode == MIGRATE_ASYNC) { + if (!folio_trylock(folio)) + return -EAGAIN; + } else { + folio_lock(folio); + } rc = split_folio_to_list(folio, split_folios); folio_unlock(folio); if (!rc) -- Best Regards, Huang, Ying