From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D31DC3DA4A for ; Fri, 2 Aug 2024 09:01:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 094186B0085; Fri, 2 Aug 2024 05:01:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 043AC6B0088; Fri, 2 Aug 2024 05:01:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9BF36B0089; Fri, 2 Aug 2024 05:01:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C9CC56B0085 for ; Fri, 2 Aug 2024 05:01:50 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4EAA3C06E5 for ; Fri, 2 Aug 2024 09:01:50 +0000 (UTC) X-FDA: 82406712780.26.1D24D15 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf18.hostedemail.com (Postfix) with ESMTP id ECD781C0005 for ; Fri, 2 Aug 2024 09:01:46 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="ZcBCv/Tx"; spf=pass (imf18.hostedemail.com: domain of hsiangkao@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=hsiangkao@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722589263; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MmOD1Kd+8JiK9iMzOwNmvwMz5rOtu4BwhOC7vysWx5s=; b=KGNwtJY33PgoxyOV+/FDLF5g0X8ONGTIOvJRvo5/3XdoRG4hVuUSzW+x615iNudGVXy9i+ hCklObNbGQmaD4T3RQu91TGfv0OrzJgg2lqwcNi4EMnNGv7mpmIl4WERltmeo4QW4fwzHr CwOrNnUKW7MGWcfh3mQdOBNPYI2H0+E= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="ZcBCv/Tx"; spf=pass (imf18.hostedemail.com: domain of hsiangkao@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=hsiangkao@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722589263; a=rsa-sha256; cv=none; b=PKMX1sJ1+kcMe4FtLbty3kQbPli6FFMXrNkPh+DZb3dMK3k9f+1lyLP6v0RnMhn6IUxZtS dhlNte7CQ+lQV/dj61aYcPCToDpgEJ3XRj47uFiBkzZZK/keJyV3bdGf5JYS2jW03LG69w upcfAW9TCjcc280tHK8CUSvebhw6K3M= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1722589303; h=Message-ID:Date:MIME-Version:Subject:From:To:Content-Type; bh=MmOD1Kd+8JiK9iMzOwNmvwMz5rOtu4BwhOC7vysWx5s=; b=ZcBCv/Txy15GGcASCdDuF095Gt7it5nOyJKCn+QKPrzKWDnJXChm21/iWe0/hO62L12nyTRCsx8XtW0kIwipwuCg7jdoiRK4CVpdwvCP3k9fqLWGcgMpxwCCEHSFYE80+lUHz4PccbRJeHTsM0DklwcSUDz4YvsDrduUgrUIvS8= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R401e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037067111;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0WBwwROu_1722589300; Received: from 30.97.48.169(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0WBwwROu_1722589300) by smtp.aliyun-inc.com; Fri, 02 Aug 2024 17:01:42 +0800 Message-ID: Date: Fri, 2 Aug 2024 17:01:40 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/migrate: fix deadlock in migrate_pages_batch() on large folios From: Gao Xiang To: Matthew Wilcox Cc: Andrew Morton , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jan Kara References: <20240728154913.4023977-1-hsiangkao@linux.alibaba.com> <04bbfcd0-6eb1-4a5b-ac21-b3cdf1acdc77@linux.alibaba.com> In-Reply-To: <04bbfcd0-6eb1-4a5b-ac21-b3cdf1acdc77@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: r4h65gw58ark7qr8xszcuu56hi91cfcs X-Rspam-User: X-Rspamd-Queue-Id: ECD781C0005 X-Rspamd-Server: rspam02 X-HE-Tag: 1722589306-953432 X-HE-Meta: U2FsdGVkX19rn/hsYuhzs//i2bSe16jqbDzm0EUgXSLcTMkT9itJ1NIDH83oSTtrU5nEQfbeURK/+jigQiLJVk5QHb0mhkKQPTSDurnOUelrBUg9K4o2Np5vvhSQGWal3loUckpQZjJXIiiILOYcCp//L+u2x3uhQe6nU5rgfXqDEKQ8zpuHZbCvdZSxyIeES4EUGaahiTxZqx2//Y0L+91hvjc7jdB31VpdHk9BG2gfN7oc2PeQsLLtp12OcrqYa2V2p3U9WRtst2PlKSFuEuM0ANIkFBY/EST/hQrDziRz7T7eT0iKPl7LsTdv58zQG1JUMjhWaYgBHNZRcujR4v25hwl2L2LVT3O1AmqzBPxmM7SNIag4fi4eP/KEJPCDNl/C7qfbYbDCaTnOn0lgI5Sr6vq+sIutwRm35fDHpIwHmb15IvxMxtPaOZmvX0NALYImO/+olGjZQlFLIdW5OcCSeqRSi1RSEA496ZPht8ryBtfLGeF2t1BDlHNh+m/8hfIzD2k1Rsxu2u2dHZWu/qH17uHKGXdavUuNSHVSEZjWQmGgBdk7n6X1IYjz6HjMK2+SRbpAJjlkxH1QhK2kdt8ODQO6sVvxQZNK8PcHv6XAnMhtA/NKf611ninLikSaKZXZfkYPVCrJ3emxQuanK5QBZ928RTTiF/X/so8nhYVCmDWEJeZoP0fRJnSoVOVGFSWunlOqvb0HkFbMHXd/y6QSy/zgp+1KY6Hx19TDbxfKwjsfD9qXOoBEczq3gP4i+SyihrsKaHCRdXi2jCt7U4HMyY1vYKBhwZAcTCOdAKuy94x43NbOzVu7A62niiMxiPf4GLD3mNAj2XRFWzLvezNaXKMU3ipVWllmRMr8/68DlgaF2lnaN71qJVJ7m03Th+Z+05qatHqHq4Yj1C35uhbHYXFCJEvjeD5CPGIqQjEr21QouRUEwb/Ryp2fon5eQ3hguxJumvl/5N6xhfr xNjnxhWv +hsxkeuezy6aLVZ4CJQw8JFRwnc6i8G8q3t4tA2FenAMBTtpe5cgmIEoBtrGVkgHTkC4Jn740sNufxrgSRBI93As2rgOl8xzK2LUna/xF6GtXqeJsl6bptR0n9J8/8lkm2mWnj0w29M95T704DnCUlby/zKFhgzsBNi5GKNEF4Xm7nxqFFZ20lFgsFWOzF1GMBHVdCAiGZeZhNa/JvBxin1HmnqGejzqVvq7vZFAe5gzM/gQhkhRw9Ar0Y7OVTrohE4aB9tNW60ra4l6aDKwFqhY0aZ7FSnGtbMGm49WtLeHbKwA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Matthew, On 2024/7/29 06:11, Gao Xiang wrote: > Hi, > > On 2024/7/29 05:46, Matthew Wilcox wrote: >> On Sun, Jul 28, 2024 at 11:49:13PM +0800, Gao Xiang wrote: >>> It was found by compaction stress test when I explicitly enable EROFS >>> compressed files to use large folios, which case I cannot reproduce with >>> the same workload if large folio support is off (current mainline). >>> Typically, filesystem reads (with locked file-backed folios) could use >>> another bdev/meta inode to load some other I/Os (e.g. inode extent >>> metadata or caching compressed data), so the locking order will be: >> >> Umm.  That is a new constraint to me.  We have two other places which >> take the folio lock in a particular order.  Writeback takes locks on >> folios belonging to the same inode in ascending ->index order.  It >> submits all the folios for write before moving on to lock other inodes, >> so it does not conflict with this new constraint you're proposing. > > BTW, I don't believe it's a new order out of EROFS, if you consider > ext4 or ext2 for example, it will also use sb_bread() (buffer heads > on bdev inode to trigger some meta I/Os), > > e.g. take ext2 for simplicity: >   ext2_readahead >     mpage_readahead >      ext2_get_block >        ext2_get_blocks >          ext2_get_branch >             sb_bread     <-- get some metadata using for this data I/O I guess I need to write more words about this: Although currently sb_bread() mainly take buffer locks to do meta I/Os, but the following path takes the similar dependency: ... sb_bread __bread_gfp bdev_getblk __getblk_slow grow_dev_folio // bdev->bd_mapping __filemap_get_folio(FGP_LOCK | .. | FGP_CREAT) So the order is already there for decades.. Although EROFS doesn't use buffer heads since its initial version, it needs a different address_space to cache metadata in page cache for best performance. In .read_folio() and .readahead() context, the orders have to be file-backed folios bdev/meta folios since it's hard to use any other orders and the file-backed folios won't be filled without uptodated bdev/meta folios. > >> >> The other place is remap_file_range().  Both inodes in that case must be >> regular files, >>          if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) >>                  return -EINVAL; >> so this new rule is fine. Refer to vfs_dedupe_file_range_compare() and vfs_lock_two_folios(), it seems it only considers folio->index regardless of address_spaces too. >> >> Does anybody know of any _other_ ordering constraints on folio locks?  I'm >> willing to write them down ... > > Personally I don't think out any particular order between two folio > locks acrossing different inodes, so I think folio batching locking > always needs to be taken care. I think folio_lock() comment of different address_spaces added in commit cd125eeab2de ("filemap: Update the folio_lock documentation") would be better to be refined: ... * in the same address_space. If they are in different address_spaces, * acquire the lock of the folio which belongs to the address_space which * has the lowest address in memory first. */ static inline void folio_lock(struct folio *folio) { ... Since there are several cases we cannot follow the comment above due to .read_folio(), .readahead() and more contexts. I'm not sure how to document the order of different address_spaces, so I think it's just "no particular order between different address_space". Thanks, Gao Xiang