From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35EF0C636D6 for ; Tue, 21 Feb 2023 02:48:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E1D26B0073; Mon, 20 Feb 2023 21:48:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0925D6B0074; Mon, 20 Feb 2023 21:48:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E743C6B0075; Mon, 20 Feb 2023 21:48:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D72156B0073 for ; Mon, 20 Feb 2023 21:48:56 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A528E140570 for ; Tue, 21 Feb 2023 02:48:56 +0000 (UTC) X-FDA: 80489766672.06.8354C1B Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 558D418000B for ; Tue, 21 Feb 2023 02:48:53 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jTmwJfhp; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676947733; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9l+lBndibZ5cVH7YlMoDkViYmvs3ZHoigJ50xYz96BY=; b=skRazoV5gwNZOaOg9wJ8X74SzhAd+snUTG58REFQj9lfFmM3A1YEG+nrnebIqZyKM/hy93 /ALjDawA5/odnxng+BeByHiAlAp8sTufI1y/W6+K3g5035/VWQe6CZW/itwFTLb4f+22WY Il1QHmY0FqC7ENPyKUoU7tc1/xfc+nU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jTmwJfhp; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676947733; a=rsa-sha256; cv=none; b=rRA7TmOIBS/EmQvQX1OU2nje+wfyLLd1anW4gkqljXLSp1ZuUxLighIhQCzL3PJUUlFyoQ pD5FIyZ6/voeANnOCZdlUCOKCWyGgOxY2rF2z+rxkqoJzQMGRLigJdrWR/yk+7PaeDK4uq Inbvxanf3MlX0/m4fnuoLTG70/R0Mek= Received: by mail-qt1-f181.google.com with SMTP id w23so3454902qtn.6 for ; Mon, 20 Feb 2023 18:48:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=9l+lBndibZ5cVH7YlMoDkViYmvs3ZHoigJ50xYz96BY=; b=jTmwJfhpwHU4TnYxcg7185XwY5Mivlpxb7YdDcOOEmng+HtH+QFHANoTMd43QBIO4N Y+dseG37pPl2b9lOdaqAetwISRxZWYIUnhTiW+RsBWNYHK2REGolbctaPTk7QKkaZJfY LB4lmx31MbHc4CCgNxOIkLF0WXCuBdTQQCHVbxzVzXoe9fzhd6+ebRjJ+WWzuOPaxrIQ hIL54J5qr4pn6BH3TSdvF/gsMqXhrdYYm7FQUUrA+ZLHhVC6Qb5B9ZoCTCqI0diKjfwa ztjPCA+D1nlRrL0cufNUuRmvVV/oheW/rxuuwaJTs8uqh1sLp2YWZrqGGcxNlhFJ1ggp cOhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9l+lBndibZ5cVH7YlMoDkViYmvs3ZHoigJ50xYz96BY=; b=wjdTj1kWRQ65xwKHcjbbk55qP4ePWDGKURoVa6810VlnN885La3s0yJUxTn24QNhlc JPw+zbtQ6CMg6E8w8exbreKg+R8SvpRXfF36/holmSzyXoUBtGc11Leem1KIuNji/Sra 5igBidp1nP2Oqgd4cwSDAWk009OMnQrnuHFe3y1BZFnXqUoTkv0rwAJyHP/neIA9IbHT YnMh9YZO+X3s7dDTMMLTKZJyMwcqdHj/ieD+ecrr0WvZwzbVNEKhQHIUnekVPjfL8CYR HjmMAaRmDT/HYloxBU1hsgHLGUpyJDVzFv2DXlJmwp7PNBDXVHMUPuZtpfnvJEqop/1R bfTw== X-Gm-Message-State: AO0yUKVqktqvOSpz9lEmO6Lv8W79ywbIl+o/caD7J+FUiqeKoB77Bgve XX9CZOe034pDPGdwUPJNV9QsXQ== X-Google-Smtp-Source: AK7set94YrUOzWImWRAkkKs4hEZ4G2coul56j0a0MZkyXG3+5fi7r6JL//9OCN+AxUWlYTk3SoTnWQ== X-Received: by 2002:a05:622a:144d:b0:3b9:b43e:5733 with SMTP id v13-20020a05622a144d00b003b9b43e5733mr6412765qtx.61.1676947732274; Mon, 20 Feb 2023 18:48:52 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id h5-20020ac85045000000b003bd1a464346sm8153382qtm.9.2023.02.20.18.48.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Feb 2023 18:48:49 -0800 (PST) Date: Mon, 20 Feb 2023 18:48:38 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: "Huang, Ying" cc: Hugh Dickins , Andrew Morton , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Zi Yan , Yang Shi , Baolin Wang , Oscar Salvador , Matthew Wilcox , Bharata B Rao , Alistair Popple , Xin Hao , Minchan Kim , Mike Kravetz , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Xu, Pengfei" , Christoph Hellwig , Stefan Roesch , Tejun Heo Subject: Re: [PATCH -v5 0/9] migrate_pages(): batch TLB flushing In-Reply-To: <874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com> Message-ID: <2ab4b33e-f570-a6ff-6315-7d5a4614a7bd@google.com> References: <20230213123444.155149-1-ying.huang@intel.com> <87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com> <874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 558D418000B X-Stat-Signature: 9spnbqa1w3rwgznzfhugw7jammu8sjrd X-HE-Tag: 1676947733-766805 X-HE-Meta: U2FsdGVkX1/Ae1AKUaEuVH84B237bzo8vySwrgCD2FxbQxjN4gQ9ffFfO04/bVUKpA4+MoA8xdA504O0tcLNgYPAEuTc0WxwVqeRDQK7eNzK02Q4oyHTqeumTeJTBXm9CIsXOlzyS4bjkLxIgQdpHafIqK3tSEduLy+2CHnsZBvgepIM+J1b+XThDFvYo3PK1bAhjZRH9cccpqdIFaCpuJ97LNF77D5eP2a4sOHmwRe8A2FeNGwNu4O+gwN0tkFVgdpo3KOq/QCRLCCBe0y22GgYnWzNK85hAW4kReR1zkc8Z6rTCDbq9ndH9v5lXp+JiYp9F0jpimES6TzX++XZM05IJ+9/xdUcAvuDfH0ojf2+Epq7V6q4bHX8r8sOf8HhZ5uwbElKDsTJxC3kLHYtXVeAzu1qvOUzrMwQWVd7BUW07kFNkLYGtfJtWD+D/SvVmt4O0Xb7lVcn9xmuWDGJUA/znDKBiEnDdCGOGlZew9sU+H9CezsuJMtuXogS4Hj61LVr03yOoKFwfhShy3V+tVjwulPomRnnauI80HknzZ1tdPQLoMFP8+KxsGx+9Q6GrRNKNI1i4jt/46o/EwgPWUrs+nLATSkXPY28phuoCtpm8P2kP9wlXta3ZdKYjUCexTIDBokx/YzoEbZemYylYixGxVlUylzhGSn2xNqjxS3mMzHK4bb+xqawi3HiFnyJpKH449hYC9Le/zuTxOc8bQC02kzhKdUTq26E4aoPyzn/kuLxsVgYefyud0uZS0wuZyhzjnMObXE7D/xnSpOnIBs5v7sgiuXsZinOsBMVt9tch8qIF2nM06vw5sc8obiUHJ3OGhn091XS+xrXnmEzdF/vpItECSOWXtZ4xtHD/6Tmva3Ly+g0B/1mfOWaufag+JPDuOopPdHfreORD2Qm/6f7O3aAYdBkfqoPvWcZbGQUQwMIyRPbOrGSkWFLD+ih/W+jagwvLwR4c0gZXpR ANdfHDYZ O/jGYwh6HoWzsvl8rUyurEH/XfgPpuktg0IS8hTec44I+T0MJbIKppLYXxKfmOTRtGXAnEIbMp6cIbMXeCVhperKh4brD3fE0LrgHii04f7jt62Zi36hoIVlyKJmF4iMvYFAXf+1h8T6FDW2ytAaEtVq4pITuQiW8tsdeBrcuBt9ZggeV7fjkHVG2G5JlJBv5upM3HKF4sNe/MQtyRTcbMWfF/bt/QbBH2C/qOlypDONkzoGuZBaX9V69990D+Zug24k/gzPtZZRaQSQYtIASk2TPnGbraxZHNx9mQPSgDfMfSDEahNnRrSLow2SSPZ70HiKxCEjViAkJJaAy7PlOdUp2BWfZ7VD3ag4qe0dS5CYhoyLbqRcbVfKwFpdWfFcXriIwe2Nw7L3GLvtWucDelPtlxY0RuT2Nj2+K29ppp6QuwqQg0sTPyISYi2vk5JWccJcKZC3NRw5UWCz1nH56E2zp3lDFmd7Y4KiZ6A3fdqWuHgxeM2BEq2aW87iamCLTRR359Y7vhJyYixoZcaID92RnsWA9S6d9x/iKwcsid5Ec4izcV98iPhddSpnLa9jTtV74IJtFOak84tcDT6cHbSTULCYScuaw2MoZ2wv0B6PLansukBMFjRwPNl80yIwuUhcA+KwoUjIkZ8i6jBN5uAAKRRKyEgpIPE86n8OvHgB4UvaIS4dJU1XLVOQ3SwZpF5+FBJpW7IlMYLpUAhwXn3BWs27BFqHoRlbDlqc1iGsWaPAD1cpZo125WQvbkTBeHHpL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Feb 2023, Huang, Ying wrote: > Hi, Hugh, > > Hugh Dickins writes: > > > On Mon, 13 Feb 2023, Huang Ying wrote: > > > >> From: "Huang, Ying" > >> > >> Now, migrate_pages() migrate folios one by one, like the fake code as > >> follows, > >> > >> for each folio > >> unmap > >> flush TLB > >> copy > >> restore map > >> > >> If multiple folios are passed to migrate_pages(), there are > >> opportunities to batch the TLB flushing and copying. That is, we can > >> change the code to something as follows, > >> > >> for each folio > >> unmap > >> for each folio > >> flush TLB > >> for each folio > >> copy > >> for each folio > >> restore map > >> > >> The total number of TLB flushing IPI can be reduced considerably. And > >> we may use some hardware accelerator such as DSA to accelerate the > >> folio copying. > >> > >> So in this patch, we refactor the migrate_pages() implementation and > >> implement the TLB flushing batching. Base on this, hardware > >> accelerated folio copying can be implemented. > >> > >> If too many folios are passed to migrate_pages(), in the naive batched > >> implementation, we may unmap too many folios at the same time. The > >> possibility for a task to wait for the migrated folios to be mapped > >> again increases. So the latency may be hurt. To deal with this > >> issue, the max number of folios be unmapped in batch is restricted to > >> no more than HPAGE_PMD_NR in the unit of page. That is, the influence > >> is at the same level of THP migration. > >> > >> We use the following test to measure the performance impact of the > >> patchset, > >> > >> On a 2-socket Intel server, > >> > >> - Run pmbench memory accessing benchmark > >> > >> - Run `migratepages` to migrate pages of pmbench between node 0 and > >> node 1 back and forth. > >> > >> With the patch, the TLB flushing IPI reduces 99.1% during the test and > >> the number of pages migrated successfully per second increases 291.7%. > >> > >> Xin Hao helped to test the patchset on an ARM64 server with 128 cores, > >> 2 NUMA nodes. Test results show that the page migration performance > >> increases up to 78%. > >> > >> This patchset is based on mm-unstable 2023-02-10. > > > > And back in linux-next this week: I tried next-20230217 overnight. > > > > There is a deadlock in this patchset (and in previous versions: sorry > > it's taken me so long to report), but I think one that's easily solved. > > > > I've not bisected to precisely which patch (load can take several hours > > to hit the deadlock), but it doesn't really matter, and I expect that > > you can guess. > > > > My root and home filesystems are ext4 (4kB blocks with 4kB PAGE_SIZE), > > and so is the filesystem I'm testing, ext4 on /dev/loop0 on tmpfs. > > So, plenty of ext4 page cache and buffer_heads. > > > > Again and again, the deadlock is seen with buffer_migrate_folio_norefs(), > > either in kcompactd0 or in khugepaged trying to compact, or in both: > > it ends up calling __lock_buffer(), and that schedules away, waiting > > forever to get BH_lock. I have not identified who is holding BH_lock, > > but I imagine a jbd2 journalling thread, and presume that it wants one > > of the folio locks which migrate_pages_batch() is already holding; or > > maybe it's all more convoluted than that. Other tasks then back up > > waiting on those folio locks held in the batch. > > > > Never a problem with buffer_migrate_folio(), always with the "more > > careful" buffer_migrate_folio_norefs(). And the patch below fixes > > it for me: I've had enough hours with it now, on enough occasions, > > to be confident of that. > > > > Cc'ing Jan Kara, who knows buffer_migrate_folio_norefs() and jbd2 > > very well, and I hope can assure us that there is an understandable > > deadlock here, from holding several random folio locks, then trying > > to lock buffers. Cc'ing fsdevel, because there's a risk that mm > > folk think something is safe, when it's not sufficient to cope with > > the diversity of filesystems. I hope nothing more than the below is > > needed (and I've had no other problems with the patchset: good job), > > but cannot be sure. > > > > [PATCH next] migrate_pages: fix deadlock on buffer heads > > > > When __buffer_migrate_folio() is called from buffer_migrate_folio_norefs(), > > force MIGRATE_ASYNC mode so that buffer_migrate_lock_buffers() will only > > trylock_buffer(), failing with -EAGAIN as usual if that does not succeed. > > > > Signed-off-by: Hugh Dickins > > > > --- next-20230217/mm/migrate.c > > +++ fixed/mm/migrate.c > > @@ -748,7 +748,8 @@ static int __buffer_migrate_folio(struct > > if (folio_ref_count(src) != expected_count) > > return -EAGAIN; > > > > - if (!buffer_migrate_lock_buffers(head, mode)) > > + if (!buffer_migrate_lock_buffers(head, > > + check_refs ? MIGRATE_ASYNC : mode)) > > return -EAGAIN; > > > > if (check_refs) { > > Thank you very much for pointing this out and the fix patch. Today, my > colleague Pengfei reported a deadlock bug to me. It seems that we > cannot wait the writeback to complete when we have locked some folios. > Below patch can fix that deadlock. I don't know whether this is related > to the deadlock you run into. It appears that we should avoid to > lock/wait synchronously if we have locked more than one folios. Thanks, I've checked now, on next-20230217 without my patch but with your patch below: it took a few hours, but still deadlocks as I described above, so it's not the same issue. Yes, that's a good principle, that we should avoid to lock/wait synchronously once we have locked one folio (hmm, above you say "more than one": I think we mean the same thing, we're just stating it differently, given how the code runs at present). I'm not a great fan of migrate_folio_unmap()'s arguments, "force" followed by "oh, but don't force" (but applaud the recent "avoid_force_lock" as much better than the original "force_lock"). I haven't tried, but I wonder if you can avoid both those arguments, and both of these patches, by passing down an adjusted mode (perhaps MIGRATE_ASYNC, or perhaps a new mode) to all callees, once the first folio of a batch has been acquired (then restore to the original mode when starting a new batch). (My patch is weak in that it trylocks for buffer_head even on the first folio of a MIGRATE_SYNC norefs batch, although that has never given a problem historically: adjusting the mode after acquiring the first folio would correct that weakness.) Hugh > > Best Regards, > Huang, Ying > > ------------------------------------8<------------------------------------ > From 0699fa2f80a67e863107d49a25909c92b900a9be Mon Sep 17 00:00:00 2001 > From: Huang Ying > Date: Mon, 20 Feb 2023 14:56:34 +0800 > Subject: [PATCH] migrate_pages: fix deadlock on waiting writeback > > Pengfei reported a system soft lockup issue with Syzkaller. The stack > traces are as follows, > > ... > [ 300.124933] INFO: task kworker/u4:3:73 blocked for more than 147 seconds. > [ 300.125214] Not tainted 6.2.0-rc4-kvm+ #1314 > [ 300.125408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 300.125736] task:kworker/u4:3 state:D stack:0 pid:73 ppid:2 flags:0x00004000 > [ 300.126059] Workqueue: writeback wb_workfn (flush-7:3) > [ 300.126282] Call Trace: > [ 300.126378] > [ 300.126464] __schedule+0x43b/0xd00 > [ 300.126601] ? __blk_flush_plug+0x142/0x180 > [ 300.126765] schedule+0x6a/0xf0 > [ 300.126912] io_schedule+0x4a/0x80 > [ 300.127051] folio_wait_bit_common+0x1b5/0x4e0 > [ 300.127227] ? __pfx_wake_page_function+0x10/0x10 > [ 300.127403] __folio_lock+0x27/0x40 > [ 300.127541] write_cache_pages+0x350/0x870 > [ 300.127699] ? __pfx_iomap_do_writepage+0x10/0x10 > [ 300.127889] iomap_writepages+0x3f/0x80 > [ 300.128037] xfs_vm_writepages+0x94/0xd0 > [ 300.128192] ? __pfx_xfs_vm_writepages+0x10/0x10 > [ 300.128370] do_writepages+0x10a/0x240 > [ 300.128514] ? lock_is_held_type+0xe6/0x140 > [ 300.128675] __writeback_single_inode+0x9f/0xa90 > [ 300.128854] writeback_sb_inodes+0x2fb/0x8d0 > [ 300.129030] __writeback_inodes_wb+0x68/0x150 > [ 300.129212] wb_writeback+0x49c/0x770 > [ 300.129357] wb_workfn+0x6fb/0x9d0 > [ 300.129500] process_one_work+0x3cc/0x8d0 > [ 300.129669] worker_thread+0x66/0x630 > [ 300.129824] ? __pfx_worker_thread+0x10/0x10 > [ 300.129989] kthread+0x153/0x190 > [ 300.130116] ? __pfx_kthread+0x10/0x10 > [ 300.130264] ret_from_fork+0x29/0x50 > [ 300.130409] > [ 300.179347] INFO: task repro:1023 blocked for more than 147 seconds. > [ 300.179905] Not tainted 6.2.0-rc4-kvm+ #1314 > [ 300.180317] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 300.180955] task:repro state:D stack:0 pid:1023 ppid:360 flags:0x00004004 > [ 300.181660] Call Trace: > [ 300.181879] > [ 300.182085] __schedule+0x43b/0xd00 > [ 300.182407] schedule+0x6a/0xf0 > [ 300.182694] io_schedule+0x4a/0x80 > [ 300.183020] folio_wait_bit_common+0x1b5/0x4e0 > [ 300.183506] ? compaction_alloc+0x77/0x1150 > [ 300.183892] ? __pfx_wake_page_function+0x10/0x10 > [ 300.184304] folio_wait_bit+0x30/0x40 > [ 300.184640] folio_wait_writeback+0x2e/0x1e0 > [ 300.185034] migrate_pages_batch+0x555/0x1ac0 > [ 300.185462] ? __pfx_compaction_alloc+0x10/0x10 > [ 300.185808] ? __pfx_compaction_free+0x10/0x10 > [ 300.186022] ? __this_cpu_preempt_check+0x17/0x20 > [ 300.186234] ? lock_is_held_type+0xe6/0x140 > [ 300.186423] migrate_pages+0x100e/0x1180 > [ 300.186603] ? __pfx_compaction_free+0x10/0x10 > [ 300.186800] ? __pfx_compaction_alloc+0x10/0x10 > [ 300.187011] compact_zone+0xe10/0x1b50 > [ 300.187182] ? lock_is_held_type+0xe6/0x140 > [ 300.187374] ? check_preemption_disabled+0x80/0xf0 > [ 300.187588] compact_node+0xa3/0x100 > [ 300.187755] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 > [ 300.187993] ? _find_first_bit+0x7b/0x90 > [ 300.188171] sysctl_compaction_handler+0x5d/0xb0 > [ 300.188376] proc_sys_call_handler+0x29d/0x420 > [ 300.188583] proc_sys_write+0x2b/0x40 > [ 300.188749] vfs_write+0x3a3/0x780 > [ 300.188912] ksys_write+0xb7/0x180 > [ 300.189070] __x64_sys_write+0x26/0x30 > [ 300.189260] do_syscall_64+0x3b/0x90 > [ 300.189424] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 300.189654] RIP: 0033:0x7f3a2471f59d > [ 300.189815] RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 > [ 300.190137] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d > [ 300.190397] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 > [ 300.190653] RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010 > [ 300.190910] R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0 > [ 300.191172] R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000 > [ 300.191440] > ... > > To migrate a folio, we may wait the writeback of a folio to complete > when we already have held the lock of some folios. But the writeback > code may wait to lock some folio we held lock. This causes the > deadlock. To fix the issue, we will avoid to wait the writeback to > complete if we have locked some folios. After moving the locked > folios and unlocked, we will retry. > > Signed-off-by: "Huang, Ying" > Reported-by: "Xu, Pengfei" > Cc: Hugh Dickins > Cc: Christoph Hellwig > Cc: Stefan Roesch > Cc: Tejun Heo > Cc: Xin Hao > Cc: Zi Yan > Cc: Yang Shi > Cc: Baolin Wang > Cc: Matthew Wilcox > Cc: Mike Kravetz > --- > mm/migrate.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 28b435cdeac8..bc9a8050f1b0 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1205,6 +1205,18 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page > } > if (!force) > goto out; > + /* > + * We have locked some folios and are going to wait the > + * writeback of this folio to complete. But it's possible for > + * the writeback to wait to lock the folios we have locked. To > + * avoid a potential deadlock, let's bail out and not do that. > + * The locked folios will be moved and unlocked, then we > + * can wait the writeback of this folio. > + */ > + if (avoid_force_lock) { > + rc = -EDEADLOCK; > + goto out; > + } > folio_wait_writeback(src); > } > > -- > 2.39.1 > >