From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92E26C0015E for ; Sat, 29 Jul 2023 10:01:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F0DD8D0002; Sat, 29 Jul 2023 06:01:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A08A8D0001; Sat, 29 Jul 2023 06:01:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAA708D0002; Sat, 29 Jul 2023 06:01:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D8F7B8D0001 for ; Sat, 29 Jul 2023 06:01:11 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 93CC21A051C for ; Sat, 29 Jul 2023 10:01:11 +0000 (UTC) X-FDA: 81064206342.29.98681D8 Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by imf15.hostedemail.com (Postfix) with ESMTP id 44B98A0019 for ; Sat, 29 Jul 2023 10:01:08 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf15.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690624869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QkzkmslcjFxHPykzLyn7yfRxzt2P5A8z62bX1Eh03W4=; b=YaB3+aRTQQsUPPkLL/tO+pM1xYULOy+iR5+eWL3av/uCdj96axbj53ho7vO8mRl1ymsI4R 4uUDfweBBwBDowUR/OegaUx9KrpbkkEnCAU09XxQnyyKq+2tTDw/rv03W+pM69BXKFFQq/ V0hkxTqiOttoPRcbvyYqcBlMZsyHbWs= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf15.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690624869; a=rsa-sha256; cv=none; b=OlU1OfMaw4Hn878hDr9Nn9TQL5fCMtJKCvPE5AbNsKZOCSoam1nDoAFPn00o7zXrm0jW1R 37HdhY9kXlrFnK+pEkSCj679UYPhlB9RUkJtOxBK99TWO7rDQGaqfXSSB+zedbWJFsiURj WRiPNLDvonMiKaJ0tNGD800K65ZrGYM= X-IronPort-AV: E=McAfee;i="6600,9927,10785"; a="126280546" X-IronPort-AV: E=Sophos;i="6.01,240,1684767600"; d="scan'208";a="126280546" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2023 19:01:06 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id A7354E4289 for ; Sat, 29 Jul 2023 19:01:03 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id CD430D219D for ; Sat, 29 Jul 2023 19:01:02 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 306941EB1C2 for ; Sat, 29 Jul 2023 19:01:02 +0900 (JST) Received: from [10.193.128.127] (unknown [10.193.128.127]) by edo.cn.fujitsu.com (Postfix) with ESMTP id E6FB01A0070; Sat, 29 Jul 2023 18:01:00 +0800 (CST) Message-ID: <70c9baf5-767e-b9ac-c27e-c51b44dc2472@fujitsu.com> Date: Sat, 29 Jul 2023 18:01:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v12 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Content-Language: en-US From: Shiyang Ruan To: "Darrick J. Wong" Cc: linux-mm@kvack.org, linux-xfs@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, dan.j.williams@intel.com, willy@infradead.org, jack@suse.cz, akpm@linux-foundation.org, mcgrof@kernel.org References: <20230629081651.253626-1-ruansy.fnst@fujitsu.com> <20230629081651.253626-3-ruansy.fnst@fujitsu.com> <2840406d-0b7d-9897-87f6-ef3627e9ed5d@fujitsu.com> <20230714141834.GV108251@frogsfrogsfrogs> <191fbccb-173b-64d3-df6b-ec98973bddc3@fujitsu.com> In-Reply-To: <191fbccb-173b-64d3-df6b-ec98973bddc3@fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27780.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27780.006 X-TMASE-Result: 10--24.764300-10.000000 X-TMASE-MatchedRID: +J68l7PWK+aPvrMjLFD6eHchRkqzj/bEC/ExpXrHizw0tugJQ9Wdw3HJ dVMZw6tLLJoLOSH2KMayBgfewqvpfXaJlMFevvypOE8QJa8KOA9lH44U2Ru12jm1yj+M+IObY2i R7K8Wcszod+F7EpZSHFewJwmHzLN4IQPGoj5DOk7Sg3E9X/QoxE0s9CXRACW0ymP/1piI/6HXQj bjf/eQSaRXnaYAhcWl5GxirD+G/Cfg2s1T022TRqroPbyANljgQmwrAurhEVUBLwIiWDU8awfNh shqWR17ZqqU0+WeIQVMNbrUuyXyo2k5Fql3Faa7v0DcGXX8NxUxXH/dlhvLv5fHgnfL4gdUhj53 gjhYKkTYuoa3D2RgX5dhdFyTxMUk0MIzlvZD6LRO5y1KmK5bJTZlY6a4lRLZnhD4vcFcha6lNpx 3XgfaSLlMXBKuvIsDFUpxEqvctVIFmkZlQ8RPcUTfhTClWQYRNGC/UiT7n1+xPXYIh1l6dtgjE0 ANqbLcw1SS/C5hZQyw1m5gC0nDzT3TQfUpAv1sWFHKJ2wSViTxKR2kbb+f15whPv65A8aGlE61F sXNTXvZ+FN7+NxPfifGqPCe8bxyxrdsXS7GN2RZNYSHk3Zr0SD8R4lgmqKYBlnw3dG9MzGjxYyR Ba/qJUpZ1N/CwmPLYseN4aSOH1fYoM82yqmFMvoLR4+zsDTtAqYBE3k9Mpw= X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Rspam-User: X-Stat-Signature: q6dgpik3bc8x5mgkjdwkjczo15jcg8kk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 44B98A0019 X-HE-Tag: 1690624868-918428 X-HE-Meta: U2FsdGVkX1+e4sLeKCgpmyyyxhdTRm4sEjegjTCoz5TiisXfOtwK50hOX3ErlYxeG8b3hymD0v9/f32OizGgSJAHVysAynk4KVaroGoxUAY6pJEa6ydRko+hBoALGzr1UhKBAQNlceVjLGd3lALTJs3nmLccR8joXNyeAsVH6a7bos36UNgAuFHsiPr1jTvzBajFj5et5dPVpmjoc7DkTGZEb2LBc3YJSHxwPgfnm3gprrqIs5I5kxXh8d1bOOYOUikDS+sdF3VdirzYBLDOhAXH32kUZBChGMVAC2nqBTPLDDHtegfgJ02blrz4bI4n9yXwzH3A9PCmTpMtKMNmmjTj6dHTiKi7NJWhlC/F7YMYaWKTeNziV2hth6D+UsmvKBSAk+eqhDKbtuUWBdGl0t0+mDOwv5yeDjIV6evFgbUHVrfElxw/z+QghIZSjC2L7dA9IVQuiq1YnY0R9/6XZFuVPGZyjRJI/CpbithVYw3rxr7mG5uBjRbv7k79zyekn0a3qRymcxMyIc02QGDGAddllyjLxb/lLOIw3NdQjuSM8TLJPgb2cEs4R0maVpT7EkiLR022S9oQsjWRhkLOyRydaDv+9XHFYB+nJEEQHCEJZAbRZKFLQz84U72222OJ75384/WAop+HXeO/ymnZf6XmQhAz5acyp77jF7vRBRSQPdRs1wWX4la6OIcvUxFXiF8eHrbNoKoy+QXKJ7TcrMIqdl/8rDudoHcVznQ1JJhGO8nLRGXf+ODND7zwaSV+/e5OwxxXUni71EKKdElWP/5vCW6wrvXzrwXUNb+D2HeQd90Pv5414wcN8OF9PlyBlI10/fpH480YAF/8C1Z+OJvTwBbSiFylkBrpwCkwY6zTojpMEXGqfr5SD/p4wbbmeG0NBpHp7q0SFi+2tNQRNYptOLjfzs4HNzmuoyvJZDWiXzVGrAlttbP53PwKrXSWQX/eRX4Lh0IaNhRUDIf 23Z9dc/k 4tChBMwkoCvPMXaTN498mwB3GlniZf2oLDjaAql02p62ARt1WfKoLfh3slumImpfoh3O7bQWXkgohmUXns+pHUH+3g/ExDX7EQ+83CgXjz0aiea5y8fcqNVbZ32CaCmYIsIAn324TE58E+emNoXmzIfBolhWYi4gsPrGSsj5mG+WaCFfzjt+gt4S0pkoL7eOsljj4CSyoXXO9WyGXk5pMgy37UWz1dUkJDp8lNqS19rMol7t5maUBANqlm3eefS5hRDoBkgWGtuDMLCyrPxa8la5XFr0xOCwFKPafLQD9xHsTlZGCzICG+CCK5u9F3cRsToKkA+rbE9OPWDLTOAC140zfhA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2023/7/20 9:50, Shiyang Ruan 写道: > > > 在 2023/7/14 22:18, Darrick J. Wong 写道: >> On Fri, Jul 14, 2023 at 05:07:58PM +0800, Shiyang Ruan wrote: >>> Hi Darrick, >>> >>> Thanks for applying the 1st patch. >>> >>> Now, since this patch is based on the new freeze_super()/thaw_super() >>> api[1], I'd like to ask what's the plan for this api?  It seems to have >>> missed the v6.5-rc1. >>> >>> [1] >>> https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/ >> >> 6.6.  I intend to push the XFS UBSAN fixes to the list today for review. >> Early next week I'll resend the 6.5 rebase of the kernelfreeze series >> and push it to vfs-for-next.  Some time after that will come large folio >> writes. > > Got it.  Thanks for your information! A small request: If you have time to give some comments, I would appreciate it because I hope we can make the most out of this period(before freeze api be merged in 6.6). -- Thanks, Ruan. > > > -- > Ruan. > >> >> --D >> >>> >>> -- >>> Thanks, >>> Ruan. >>> >>> >>> 在 2023/6/29 16:16, Shiyang Ruan 写道: >>>> This patch is inspired by Dan's "mm, dax, pmem: Introduce >>>> dev_pagemap_failure()"[1].  With the help of dax_holder and >>>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem >>>> on it to unmap all files in use, and notify processes who are using >>>> those files. >>>> >>>> Call trace: >>>> trigger unbind >>>>    -> unbind_store() >>>>     -> ... (skip) >>>>      -> devres_release_all() >>>>       -> kill_dax() >>>>        -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, >>>> MF_MEM_PRE_REMOVE) >>>>         -> xfs_dax_notify_failure() >>>>         `-> freeze_super()             // freeze (kernel call) >>>>         `-> do xfs rmap >>>>         ` -> mf_dax_kill_procs() >>>>         `  -> collect_procs_fsdax()    // all associated processes >>>>         `  -> unmap_and_kill() >>>>         ` -> invalidate_inode_pages2_range() // drop file's cache >>>>         `-> thaw_super()               // thaw (both kernel & user >>>> call) >>>> >>>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove >>>> event.  Use the exclusive freeze/thaw[2] to lock the filesystem to >>>> prevent >>>> new dax mapping from being created.  Do not shutdown filesystem >>>> directly >>>> if configuration is not supported, or if failure range includes >>>> metadata >>>> area.  Make sure all files and processes(not only the current progress) >>>> are handled correctly.  Also drop the cache of associated files before >>>> pmem is removed. >>>> >>>> [1]: >>>> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ >>>> [2]: >>>> https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/ >>>> >>>> Signed-off-by: Shiyang Ruan >>>> --- >>>>    drivers/dax/super.c         |  3 +- >>>>    fs/xfs/xfs_notify_failure.c | 86 >>>> ++++++++++++++++++++++++++++++++++--- >>>>    include/linux/mm.h          |  1 + >>>>    mm/memory-failure.c         | 17 ++++++-- >>>>    4 files changed, 96 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c >>>> index c4c4728a36e4..2e1a35e82fce 100644 >>>> --- a/drivers/dax/super.c >>>> +++ b/drivers/dax/super.c >>>> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev) >>>>            return; >>>>        if (dax_dev->holder_data != NULL) >>>> -        dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0); >>>> +        dax_holder_notify_failure(dax_dev, 0, U64_MAX, >>>> +                MF_MEM_PRE_REMOVE); >>>>        clear_bit(DAXDEV_ALIVE, &dax_dev->flags); >>>>        synchronize_srcu(&dax_srcu); >>>> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c >>>> index 4a9bbd3fe120..f6ec56b76db6 100644 >>>> --- a/fs/xfs/xfs_notify_failure.c >>>> +++ b/fs/xfs/xfs_notify_failure.c >>>> @@ -22,6 +22,7 @@ >>>>    #include >>>>    #include >>>> +#include >>>>    struct xfs_failure_info { >>>>        xfs_agblock_t        startblock; >>>> @@ -73,10 +74,16 @@ xfs_dax_failure_fn( >>>>        struct xfs_mount        *mp = cur->bc_mp; >>>>        struct xfs_inode        *ip; >>>>        struct xfs_failure_info        *notify = data; >>>> +    struct address_space        *mapping; >>>> +    pgoff_t                pgoff; >>>> +    unsigned long            pgcnt; >>>>        int                error = 0; >>>>        if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || >>>>            (rec->rm_flags & (XFS_RMAP_ATTR_FORK | >>>> XFS_RMAP_BMBT_BLOCK))) { >>>> +        /* Continue the query because this isn't a failure. */ >>>> +        if (notify->mf_flags & MF_MEM_PRE_REMOVE) >>>> +            return 0; >>>>            notify->want_shutdown = true; >>>>            return 0; >>>>        } >>>> @@ -92,14 +99,55 @@ xfs_dax_failure_fn( >>>>            return 0; >>>>        } >>>> -    error = mf_dax_kill_procs(VFS_I(ip)->i_mapping, >>>> -                  xfs_failure_pgoff(mp, rec, notify), >>>> -                  xfs_failure_pgcnt(mp, rec, notify), >>>> -                  notify->mf_flags); >>>> +    mapping = VFS_I(ip)->i_mapping; >>>> +    pgoff = xfs_failure_pgoff(mp, rec, notify); >>>> +    pgcnt = xfs_failure_pgcnt(mp, rec, notify); >>>> + >>>> +    /* Continue the rmap query if the inode isn't a dax file. */ >>>> +    if (dax_mapping(mapping)) >>>> +        error = mf_dax_kill_procs(mapping, pgoff, pgcnt, >>>> +                      notify->mf_flags); >>>> + >>>> +    /* Invalidate the cache in dax pages. */ >>>> +    if (notify->mf_flags & MF_MEM_PRE_REMOVE) >>>> +        invalidate_inode_pages2_range(mapping, pgoff, >>>> +                          pgoff + pgcnt - 1); >>>> + >>>>        xfs_irele(ip); >>>>        return error; >>>>    } >>>> +static void >>>> +xfs_dax_notify_failure_freeze( >>>> +    struct xfs_mount    *mp) >>>> +{ >>>> +    struct super_block     *sb = mp->m_super; >>>> + >>>> +    /* Wait until no one is holding the FREEZE_HOLDER_KERNEL. */ >>>> +    while (freeze_super(sb, FREEZE_HOLDER_KERNEL) != 0) { >>>> +        // Shall we just wait, or print warning then return -EBUSY? >>>> +        delay(HZ / 10); >>>> +    } >>>> +} >>>> + >>>> +static void >>>> +xfs_dax_notify_failure_thaw( >>>> +    struct xfs_mount    *mp) >>>> +{ >>>> +    struct super_block    *sb = mp->m_super; >>>> +    int            error; >>>> + >>>> +    error = thaw_super(sb, FREEZE_HOLDER_KERNEL); >>>> +    if (error) >>>> +        xfs_emerg(mp, "still frozen after notify failure, err=%d", >>>> +              error); >>>> +    /* >>>> +     * Also thaw userspace call anyway because the device is about >>>> to be >>>> +     * removed immediately. >>>> +     */ >>>> +    thaw_super(sb, FREEZE_HOLDER_USERSPACE); >>>> +} >>>> + >>>>    static int >>>>    xfs_dax_notify_ddev_failure( >>>>        struct xfs_mount    *mp, >>>> @@ -120,7 +168,7 @@ xfs_dax_notify_ddev_failure( >>>>        error = xfs_trans_alloc_empty(mp, &tp); >>>>        if (error) >>>> -        return error; >>>> +        goto out; >>>>        for (; agno <= end_agno; agno++) { >>>>            struct xfs_rmap_irec    ri_low = { }; >>>> @@ -165,11 +213,23 @@ xfs_dax_notify_ddev_failure( >>>>        } >>>>        xfs_trans_cancel(tp); >>>> + >>>> +    /* >>>> +     * Determine how to shutdown the filesystem according to the >>>> +     * error code and flags. >>>> +     */ >>>>        if (error || notify.want_shutdown) { >>>>            xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); >>>>            if (!error) >>>>                error = -EFSCORRUPTED; >>>> -    } >>>> +    } else if (mf_flags & MF_MEM_PRE_REMOVE) >>>> +        xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); >>>> + >>>> +out: >>>> +    /* Thaw the fs if it is freezed before. */ >>>> +    if (mf_flags & MF_MEM_PRE_REMOVE) >>>> +        xfs_dax_notify_failure_thaw(mp); >>>> + >>>>        return error; >>>>    } >>>> @@ -197,6 +257,8 @@ xfs_dax_notify_failure( >>>>        if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == >>>> dax_dev && >>>>            mp->m_logdev_targp != mp->m_ddev_targp) { >>>> +        if (mf_flags & MF_MEM_PRE_REMOVE) >>>> +            return 0; >>>>            xfs_err(mp, "ondisk log corrupt, shutting down fs!"); >>>>            xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); >>>>            return -EFSCORRUPTED; >>>> @@ -210,6 +272,12 @@ xfs_dax_notify_failure( >>>>        ddev_start = mp->m_ddev_targp->bt_dax_part_off; >>>>        ddev_end = ddev_start + >>>> bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1; >>>> +    /* Notify failure on the whole device. */ >>>> +    if (offset == 0 && len == U64_MAX) { >>>> +        offset = ddev_start; >>>> +        len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev); >>>> +    } >>>> + >>>>        /* Ignore the range out of filesystem area */ >>>>        if (offset + len - 1 < ddev_start) >>>>            return -ENXIO; >>>> @@ -226,6 +294,12 @@ xfs_dax_notify_failure( >>>>        if (offset + len - 1 > ddev_end) >>>>            len = ddev_end - offset + 1; >>>> +    if (mf_flags & MF_MEM_PRE_REMOVE) { >>>> +        xfs_info(mp, "device is about to be removed!"); >>>> +        /* Freeze fs to prevent new mappings from being created. */ >>>> +        xfs_dax_notify_failure_freeze(mp); >>>> +    } >>>> + >>>>        return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), >>>> BTOBB(len), >>>>                mf_flags); >>>>    } >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index 27ce77080c79..a80c255b88d2 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -3576,6 +3576,7 @@ enum mf_flags { >>>>        MF_UNPOISON = 1 << 4, >>>>        MF_SW_SIMULATED = 1 << 5, >>>>        MF_NO_RETRY = 1 << 6, >>>> +    MF_MEM_PRE_REMOVE = 1 << 7, >>>>    }; >>>>    int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, >>>>                  unsigned long count, int mf_flags); >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 5b663eca1f29..483b75f2fcfb 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -688,7 +688,7 @@ static void add_to_kill_fsdax(struct task_struct >>>> *tsk, struct page *p, >>>>     */ >>>>    static void collect_procs_fsdax(struct page *page, >>>>            struct address_space *mapping, pgoff_t pgoff, >>>> -        struct list_head *to_kill) >>>> +        struct list_head *to_kill, bool pre_remove) >>>>    { >>>>        struct vm_area_struct *vma; >>>>        struct task_struct *tsk; >>>> @@ -696,8 +696,15 @@ static void collect_procs_fsdax(struct page *page, >>>>        i_mmap_lock_read(mapping); >>>>        read_lock(&tasklist_lock); >>>>        for_each_process(tsk) { >>>> -        struct task_struct *t = task_early_kill(tsk, true); >>>> +        struct task_struct *t = tsk; >>>> +        /* >>>> +         * Search for all tasks while MF_MEM_PRE_REMOVE, because the >>>> +         * current may not be the one accessing the fsdax page. >>>> +         * Otherwise, search for the current task. >>>> +         */ >>>> +        if (!pre_remove) >>>> +            t = task_early_kill(tsk, true); >>>>            if (!t) >>>>                continue; >>>>            vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>>> pgoff) { >>>> @@ -1793,6 +1800,7 @@ int mf_dax_kill_procs(struct address_space >>>> *mapping, pgoff_t index, >>>>        dax_entry_t cookie; >>>>        struct page *page; >>>>        size_t end = index + count; >>>> +    bool pre_remove = mf_flags & MF_MEM_PRE_REMOVE; >>>>        mf_flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; >>>> @@ -1804,9 +1812,10 @@ int mf_dax_kill_procs(struct address_space >>>> *mapping, pgoff_t index, >>>>            if (!page) >>>>                goto unlock; >>>> -        SetPageHWPoison(page); >>>> +        if (!pre_remove) >>>> +            SetPageHWPoison(page); >>>> -        collect_procs_fsdax(page, mapping, index, &to_kill); >>>> +        collect_procs_fsdax(page, mapping, index, &to_kill, >>>> pre_remove); >>>>            unmap_and_kill(&to_kill, page_to_pfn(page), mapping, >>>>                    index, mf_flags); >>>>    unlock: