From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FF9BC3DA49 for ; Mon, 29 Jul 2024 01:58:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B01856B008A; Sun, 28 Jul 2024 21:58:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB0E76B008C; Sun, 28 Jul 2024 21:58:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 978D96B0096; Sun, 28 Jul 2024 21:58:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 79B2A6B008A for ; Sun, 28 Jul 2024 21:58:10 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2696CA02CC for ; Mon, 29 Jul 2024 01:58:10 +0000 (UTC) X-FDA: 82391129940.06.BEF226F Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by imf02.hostedemail.com (Postfix) with ESMTP id 092B08000F for ; Mon, 29 Jul 2024 01:58:07 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ln3teacH; spf=pass (imf02.hostedemail.com: domain of hsiangkao@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=hsiangkao@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722218261; a=rsa-sha256; cv=none; b=y4lYGFHhROzRhS/iK3dvtJ58ZaHL0XlxKof4z1lvC6NHAC7zV1kUkwp0k8ZDcIfJGOnGe2 a59Zrfd7rEbMgayfaMPhBYBvjM+J/hPkaw+AFREmz3ObnU7safcxFwd4VYVnkEILSxnGxr XqpVOf78iXz2GXx4CaP7UqHm8lrHkyU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ln3teacH; spf=pass (imf02.hostedemail.com: domain of hsiangkao@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=hsiangkao@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722218261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NPs32fx1nWtLVVkhAFV5Sj0qrMlLQyRD5CLhhDczwDI=; b=A/Txq7tVM328xosvPjMBEZM+QwWpqRZMGsrTC/eulKs+vV5bQvRhOyfGOYi7LrCDFB0+dn fQ/O3XJeNfakOkk7dYRJDgkZ4v/6/pipSF6drHNyI8VBe49/GLQleEv+A751epoxc7R0A9 2lqB0MJEjjJ/6s3rNpS2fthkUsJ5FOQ= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1722218285; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=NPs32fx1nWtLVVkhAFV5Sj0qrMlLQyRD5CLhhDczwDI=; b=ln3teacHeVKMGlsx3B4hDVFColBvJSDrxVSPVihBu+oHbLPyb9EVmIHh189WBnrvdwz/U3fMCTlcgxg/DjSGksloEFHoUZKsOWGbdIhvl29M7CByFP87aqDFTS+yz4FBWEYYyfdijtibi8LYMeY9f91mBL9YNHe043NuE0ppgAs= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R401e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033032014031;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0WBSYqdD_1722218282; Received: from 30.97.48.242(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0WBSYqdD_1722218282) by smtp.aliyun-inc.com; Mon, 29 Jul 2024 09:58:04 +0800 Message-ID: Date: Mon, 29 Jul 2024 09:58:02 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/migrate: fix deadlock in migrate_pages_batch() on large folios To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240728154913.4023977-1-hsiangkao@linux.alibaba.com> <87plqx0yh2.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Gao Xiang In-Reply-To: <87plqx0yh2.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: 8mo8o1xg5f963m9idjeeaky8k365gr1k X-Rspamd-Queue-Id: 092B08000F X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1722218287-170492 X-HE-Meta: U2FsdGVkX18fYoGUTFU6SGphmteOhLwqQa9/bWv2fjVoQ+ET3NK6+1rw6iOyRpB4GKcOhK+ni8TTOznv0qdIgf6rSj0gZFjAcjSEK/cqQUpK8Fea64vLn/5ahujQZGditf9M4aytJ2+8BEpH5Q2rKNoMthS+vTLZWQqcl9uQM9IvSHVnDPLxbdYbalf4E7ejM3VFotQXEXXAKwrY5Qu4ZKveJDauHjiKsv3BSw97wCQM1TOPKcoenD7AFRh3+9lLq7ZZ1UYYe3yS6nR4WgTgxsroS7Z2oTJwbibrfXp1jFijdEu/9I3cX5cQq0qMNFkLNsxyb/PIViYnyTAabo0nNmUWramf4k5i0Qy/ofMKNkfStjaG7rkW5N8ATeUKhhn9FT4Ua/U2bI9fjsIz9rH2oSblGn5A+E/ZhaIGh6ZS8WDSODk/NPS+fLgkvvgQbTiZRq5QYwPBeI1RDmTL7W+JMTJKHt6fojUuFVqOzogpT1bcrPxzhgr5ovrZpJg8tqLz3zGGZNdmQ3PiIdQOy2VW0yJg/cv5CsXI2reyLsSVTqpsOgthTWT7M1Me/y/rQDmAbR3IRKWIiINTIwF4BEG7oV9YLNRRNpj0g+KgsBNCr0BrtNXCWQWoXJraR8nBpmwZQJNA+eCo+B51AqALq9wd7IzwQTHm8mgWaHHbAAY5j1BgLLmWv/TmuooQVlNhF63ZnqGj36CkeT4wY/yVP70JQ7Ow3j5sBdHksfDYhg9f14SJiOXUHe9e5ZwT620XCPZ6CKluxx2vpa/aePnL+buZdOR8Ab2/Rw0jU4BIMLsQ0TWuebb0OfvlVaIwl44QcrV6vMgAWaH+7Or7pg9YMcQXUXyn7nWcNPEhUM2P+991qibHuvZpo/bSdFDi4LOVcRMcCkT58t5rcObrIMV3hwy/q2dXbLe2sGW4mdRLIv0Ph94HI9fKR5c7aVnKq7TbyYzqTwtme4ngrJt7DJDeZsM A6b6tIS4 jv6g9DaVkQxAH1R1fGJApiAoqZ0YBNMMOcil9I640MBAYNjf1zDr+kVqJ792/fTMQefXOZVwE5x7fisFrSmm2ptbA6uRvNm5+tE97J7jbkD5R+iIslykqtwAo4zugyjVz2iJxi/g+T1YJ6Ac5Wofx9JLgUHyROgAF8y2FBxSw7jtEgxcNxGjlnpw4HYcdfAQ5T4ywo6xixs4sMfS6JW4Q+bwN+NJa+1/QeddVIcsPbucfOKtJAnssECdoKRELTqBrdXprXbauq+eLJ4BHW78mnymB9fRvH9EeCYZwPyJrynbmBhZwB5JAR33/b3LtJS8AZzES X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ying, On 2024/7/29 09:38, Huang, Ying wrote: > Hi, Xiang, > > Gao Xiang writes: > >> Currently, migrate_pages_batch() can lock multiple locked folios >> with an arbitrary order. Although folio_trylock() is used to avoid >> deadlock as commit 2ef7dbb26990 ("migrate_pages: try migrate in batch >> asynchronously firstly") mentioned, it seems try_split_folio() is >> still missing. >> >> It was found by compaction stress test when I explicitly enable EROFS >> compressed files to use large folios, which case I cannot reproduce with >> the same workload if large folio support is off (current mainline). >> Typically, filesystem reads (with locked file-backed folios) could use >> another bdev/meta inode to load some other I/Os (e.g. inode extent >> metadata or caching compressed data), so the locking order will be: >> >> file-backed folios (A) >> bdev/meta folios (B) >> >> The following calltrace shows the deadlock: >> Thread 1 takes (B) lock and tries to take folio (A) lock >> Thread 2 takes (A) lock and tries to take folio (B) lock >> >> [Thread 1] >> INFO: task stress:1824 blocked for more than 30 seconds. >> Tainted: G OE 6.10.0-rc7+ #6 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:stress state:D stack:0 pid:1824 tgid:1824 ppid:1822 flags:0x0000000c >> Call trace: >> __switch_to+0xec/0x138 >> __schedule+0x43c/0xcb0 >> schedule+0x54/0x198 >> io_schedule+0x44/0x70 >> folio_wait_bit_common+0x184/0x3f8 >> <-- folio mapping ffff00036d69cb18 index 996 (**) >> __folio_lock+0x24/0x38 >> migrate_pages_batch+0x77c/0xea0 // try_split_folio (mm/migrate.c:1486:2) >> // migrate_pages_batch (mm/migrate.c:1734:16) >> <--- LIST_HEAD(unmap_folios) has >> .. >> folio mapping 0xffff0000d184f1d8 index 1711; (*) >> folio mapping 0xffff0000d184f1d8 index 1712; >> .. >> migrate_pages+0xb28/0xe90 >> compact_zone+0xa08/0x10f0 >> compact_node+0x9c/0x180 >> sysctl_compaction_handler+0x8c/0x118 >> proc_sys_call_handler+0x1a8/0x280 >> proc_sys_write+0x1c/0x30 >> vfs_write+0x240/0x380 >> ksys_write+0x78/0x118 >> __arm64_sys_write+0x24/0x38 >> invoke_syscall+0x78/0x108 >> el0_svc_common.constprop.0+0x48/0xf0 >> do_el0_svc+0x24/0x38 >> el0_svc+0x3c/0x148 >> el0t_64_sync_handler+0x100/0x130 >> el0t_64_sync+0x190/0x198 >> >> [Thread 2] >> INFO: task stress:1825 blocked for more than 30 seconds. >> Tainted: G OE 6.10.0-rc7+ #6 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:stress state:D stack:0 pid:1825 tgid:1825 ppid:1822 flags:0x0000000c >> Call trace: >> __switch_to+0xec/0x138 >> __schedule+0x43c/0xcb0 >> schedule+0x54/0x198 >> io_schedule+0x44/0x70 >> folio_wait_bit_common+0x184/0x3f8 >> <-- folio = 0xfffffdffc6b503c0 (mapping == 0xffff0000d184f1d8 index == 1711) (*) >> __folio_lock+0x24/0x38 >> z_erofs_runqueue+0x384/0x9c0 [erofs] >> z_erofs_readahead+0x21c/0x350 [erofs] <-- folio mapping 0xffff00036d69cb18 range from [992, 1024] (**) >> read_pages+0x74/0x328 >> page_cache_ra_order+0x26c/0x348 >> ondemand_readahead+0x1c0/0x3a0 >> page_cache_sync_ra+0x9c/0xc0 >> filemap_get_pages+0xc4/0x708 >> filemap_read+0x104/0x3a8 >> generic_file_read_iter+0x4c/0x150 >> vfs_read+0x27c/0x330 >> ksys_pread64+0x84/0xd0 >> __arm64_sys_pread64+0x28/0x40 >> invoke_syscall+0x78/0x108 >> el0_svc_common.constprop.0+0x48/0xf0 >> do_el0_svc+0x24/0x38 >> el0_svc+0x3c/0x148 >> el0t_64_sync_handler+0x100/0x130 >> el0t_64_sync+0x190/0x198 >> >> Fixes: 5dfab109d519 ("migrate_pages: batch _unmap and _move") >> Cc: "Huang, Ying" >> Signed-off-by: Gao Xiang >> --- >> mm/migrate.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 20cb9f5f7446..a912e4b83228 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1483,7 +1483,8 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f >> { >> int rc; >> >> - folio_lock(folio); >> + if (!folio_trylock(folio)) >> + return -EAGAIN; >> rc = split_folio_to_list(folio, split_folios); >> folio_unlock(folio); >> if (!rc) > > Good catch! Thanks for the fixing! > > The deadlock is similar as the one we fixed in commit fb3592c41a44 > ("migrate_pages: fix deadlock in batched migration"). But apparently, > we missed this case. > > For the fix, I think that we should still respect migrate_mode because > users may prefer migration success over blocking. > > @@ -1492,11 +1492,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, > return rc; > } > > -static inline int try_split_folio(struct folio *folio, struct list_head *split_folios) > +static inline int try_split_folio(struct folio *folio, struct list_head *split_folios, > + enum migrate_mode mode) > { > int rc; > > - folio_lock(folio); > + if (mode == MIGRATE_ASYNC) { > + if (!folio_trylock(folio)) > + return -EAGAIN; > + } else { > + folio_lock(folio); > + } > rc = split_folio_to_list(folio, split_folios); > folio_unlock(folio); > if (!rc) Okay, yeah it looks better since it seems I missed the fallback part in migrate_pages_sync(). Let me send the next version to follow your advice, thanks. Thanks, Gao Xiang > > > -- > Best Regards, > Huang, Ying