From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00578C001E0 for ; Mon, 23 Oct 2023 07:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72B9E6B00B1; Mon, 23 Oct 2023 03:27:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D8776B00B2; Mon, 23 Oct 2023 03:27:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A0456B00B3; Mon, 23 Oct 2023 03:27:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 463536B00B1 for ; Mon, 23 Oct 2023 03:27:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0F7EBB56A5 for ; Mon, 23 Oct 2023 07:27:02 +0000 (UTC) X-FDA: 81375894684.23.03785A0 Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by imf09.hostedemail.com (Postfix) with ESMTP id 98DBF140005 for ; Mon, 23 Oct 2023 07:26:59 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf09.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698046019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JZiDwueU8pnHNrgrwW4EVmi9suHqyrDVvBNkcqrUAhI=; b=nfYIhYe678m6eth5YijzXX+F/ICsM5r5hyg1DhUmyYI5vFaBJFG3EpSUzkU6QlvCvpO8WM BDcaoDFHbDeTKABF8GhAfqCNR2o1qzG9vfoEHTEm1KuXaudUfYgzeBFIUmXrSZ/LLhjbGw VXWeiuuhhNi0V/i9mkJd1aDPVpKmDZQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf09.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698046019; a=rsa-sha256; cv=none; b=ruXu63Y75IQZ4/aslzGXrbkFiBuYHDWUT4xJncIUMUEa2wM4HKwkSBpLTGOZgQR2Pw3Owm eX4FttwQLzBUac9ltwsUhNXgtd4yLWw9m6Gmkz4t3r6612hx+UbowfqV7cBaulvTHU/q/5 R2yE+J5WJjT7jhoMUwJ7BqbJpy0KQw0= X-IronPort-AV: E=McAfee;i="6600,9927,10871"; a="137164601" X-IronPort-AV: E=Sophos;i="6.03,244,1694703600"; d="scan'208";a="137164601" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2023 16:26:57 +0900 Received: from yto-m2.gw.nic.fujitsu.com (yto-nat-yto-m2.gw.nic.fujitsu.com [192.168.83.65]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id BB5A9D6186 for ; Mon, 23 Oct 2023 16:26:54 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id F3E92D5EA8 for ; Mon, 23 Oct 2023 16:26:53 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 7A8BAE5E61 for ; Mon, 23 Oct 2023 16:26:53 +0900 (JST) Received: from [192.168.50.5] (unknown [10.167.226.34]) by edo.cn.fujitsu.com (Postfix) with ESMTP id D96A21A0070; Mon, 23 Oct 2023 15:26:52 +0800 (CST) Message-ID: <834497bc-0876-43bb-bd67-154ad7f26af3@fujitsu.com> Date: Mon, 23 Oct 2023 15:26:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind To: Chandan Babu R Cc: akpm@linux-foundation.org, "Darrick J. Wong" , linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, linux-xfs@vger.kernel.org, linux-mm@kvack.org, dan.j.williams@intel.com, willy@infradead.org, jack@suse.cz, mcgrof@kernel.org References: <20230828065744.1446462-1-ruansy.fnst@fujitsu.com> <20230928103227.250550-1-ruansy.fnst@fujitsu.com> <875y31wr2d.fsf@debian-BULLSEYE-live-builder-AMD64> <20231020154009.GS3195650@frogsfrogsfrogs> <87msw9zvpk.fsf@debian-BULLSEYE-live-builder-AMD64> From: Shiyang Ruan In-Reply-To: <87msw9zvpk.fsf@debian-BULLSEYE-live-builder-AMD64> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27952.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27952.006 X-TMASE-Result: 10--28.189200-10.000000 X-TMASE-MatchedRID: rL1qmhkJqTSPvrMjLFD6eHchRkqzj/bEC/ExpXrHizw0tugJQ9WdwznG P45Axioi/yFtx9iVdmDVxQiG9mswQLROK334qTpuThuQJkjAOL59v5k7uQeUSMpj/9aYiP+hhgy 5XeTMdJ2PqNrXxMR4yOGblErXgh8LMeIPuyyqyWwqy6shOlK/47Jyu9jGj0qnteXjSBMYnmmAI+ pLfk3sB0Bh0sVevfs+8qJOyQU+3TGSag5+i6uYdAPZZctd3P4B+LidURF+DB2+U1asDs8Y/EHE/ BQYbIDwO3wTUW8jWH7SG0KbgBF5jIo5z9AAPkJBCtzGvPCy/m6MhbTsXysU38MCKZLERpBPU9a6 zfLFA1ba2H5wcV0ekF88doC4WsZaYw1f/0r5B94vz6alF1rVgzVEnbrqmBw73unRG7yMq8Vqi2X vg/6dOmsflbxbbYmSC6Kd+BnbGseuD0sHS7NwhHaNJ/iTxXCafS0Ip2eEHnzUHQeTVDUrIqHkM5 YY92pZtwKUvHHyXGXdB/CxWTRRu/558CedkGIvqcoAhihTwvgXmJebktkAIA== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 98DBF140005 X-Stat-Signature: dbdwxcpctn4nbzwf1dokw5xzf7istkea X-HE-Tag: 1698046019-63636 X-HE-Meta: U2FsdGVkX1/hyCYXvKtD+EzFE35egqsXibBCwAyFn9mLPsDK8LxhGLdXpesUSCgba/oNVieP1ckmA8MpaYly37qz4SbBGFhaF3Wg9EURTfUJew/jHS9ADD1Jz7UYr55kEus8QyyZ5yFArv0NjZuKGl5eay8oJwuJUQLMLq6IaZ3I99BkzzdIzFM5XG1ZCOYlYUy+qchh9fS0NrC5WduupW1ddfSPVjbDD5BZ3Lk6cVOLaVg5E1wE26RpnG1CWdixVigweN8K/HJUuKGEkRKQNPhH1YfeyLGMXBFjs152K692/xHLH5VoPZFfCUH5nSydUJLIIyozBza2Out1XEwn0yfL5fUa30Gpg4nyf/8a/IPlDW09wVLeORNOQS0NRYTQq/mLADPds7UoAaoTstwy6/K2yZn8A6sLehzKTs9KUW3IltWBYTZdvaz907IoBG9ot+BSkyLSFF3W8t+yMBXeo/staOpfnnWgn8CNooOGb7n2ROLe2IG4Icb4M69PyGTp8oL3jcoRojkLy9MBrQiIOqPoASlezMthKTVoO2wxwZIVljkxU8STTeibRQSwACA3fvUOjYEFTyAy5LZhnFW4sVGeP59wUcS+w9YJCMBwWTatOv1N75gH1l99jofbI3jeY4Ogd3drkG4QHITZABbP2EbSZVeeNr8knSMV3I8gNJCB5GHVi2LeI5E7V5bKTC+k06tA/hp9mxsrj2lO4lfTgaYVVu2d8JNvDyXViPJj+ViAchOH7YSwYUEkcvnCxrDusaMXjAzyxVNagBgVru5YKw3I77KrmdDkQYxZV9d0HLF9k+qShwCo0H55KSslrxoxb6dUGZfTNccQKfj5yXx2Rd2Acve9Ft+XUGz93wg4bqxgXJX6QT3HeztsGY3O4/i/fNByJYz7s0QDgY/w/LVUPAzws/nkW12bKWZKwchkNw1v/spzJ4qpaCQJfNmaONMohNg93ZOmGSKia6CMkgk j+Tc1v8n P5nsLExuyHJ7wkJKsYV8vPaSwAfrDb8ht+tHqjhhSPn2F1K/zn9Yr8AbIra4uYVgcfrVm+u7GMNxBsFFs6c0PP1HdxmuLfO0+iHo9UiITeQOeY9ODK2Zw5yVwnG3JC+HtK2h//N587zJP4x7/bUexhNinhqWcMXxbsfxFU39nI3/jtJwwM/ZMzzE72Y1r0LXb4K0pF4gsut9/s7O8T5D0Gn3TzMz2nb9tROJMxbBlMjnfcscs4K0S+ytcpBBQGGhV7cmdlFTv6uh9S6y8K4+Zjxn9uXFUfVjChIeTFKisRJxJVmre7k2xwysCKRyAwWJAG3SXgH+L3s0GaoZVBtPvZkF6PQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2023/10/23 14:40, Chandan Babu R 写道: > > On Fri, Oct 20, 2023 at 08:40:09 AM -0700, Darrick J. Wong wrote: >> On Fri, Oct 20, 2023 at 03:26:32PM +0530, Chandan Babu R wrote: >>> On Thu, Sep 28, 2023 at 06:32:27 PM +0800, Shiyang Ruan wrote: >>>> ==== >>>> Changes since v14: >>>> 1. added/fixed code comments per Dan's comments >>>> ==== >>>> >>>> Now, if we suddenly remove a PMEM device(by calling unbind) which >>>> contains FSDAX while programs are still accessing data in this device, >>>> e.g.: >>>> ``` >>>> $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 & >>>> # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 & >>>> echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind >>>> ``` >>>> it could come into an unacceptable state: >>>> 1. device has gone but mount point still exists, and umount will fail >>>> with "target is busy" >>>> 2. programs will hang and cannot be killed >>>> 3. may crash with NULL pointer dereference >>>> >>>> To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we >>>> are going to remove the whole device, and make sure all related processes >>>> could be notified so that they could end up gracefully. >>>> >>>> This patch is inspired by Dan's "mm, dax, pmem: Introduce >>>> dev_pagemap_failure()"[1]. With the help of dax_holder and >>>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem >>>> on it to unmap all files in use, and notify processes who are using >>>> those files. >>>> >>>> Call trace: >>>> trigger unbind >>>> -> unbind_store() >>>> -> ... (skip) >>>> -> devres_release_all() >>>> -> kill_dax() >>>> -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) >>>> -> xfs_dax_notify_failure() >>>> `-> freeze_super() // freeze (kernel call) >>>> `-> do xfs rmap >>>> ` -> mf_dax_kill_procs() >>>> ` -> collect_procs_fsdax() // all associated processes >>>> ` -> unmap_and_kill() >>>> ` -> invalidate_inode_pages2_range() // drop file's cache >>>> `-> thaw_super() // thaw (both kernel & user call) >>>> >>>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove >>>> event. Use the exclusive freeze/thaw[2] to lock the filesystem to prevent >>>> new dax mapping from being created. Do not shutdown filesystem directly >>>> if configuration is not supported, or if failure range includes metadata >>>> area. Make sure all files and processes(not only the current progress) >>>> are handled correctly. Also drop the cache of associated files before >>>> pmem is removed. >>>> >>>> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ >>>> [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/ >>>> >>>> Signed-off-by: Shiyang Ruan >>>> Reviewed-by: Darrick J. Wong >>>> Acked-by: Dan Williams >>> >>> Hi Andrew, >>> >>> Shiyang had indicated that this patch has been added to >>> akpm/mm-hotfixes-unstable branch. However, I don't see the patch listed in >>> that branch. >>> >>> I am about to start collecting XFS patches for v6.7 cycle. Please let me know >>> if you have any objections with me taking this patch via the XFS tree. >> >> V15 was dropped from his tree on 28 Sept., you might as well pull it >> into your own tree for 6.7. It's been testing fine on my trees for the >> past 3 weeks. >> >> https://lore.kernel.org/mm-commits/20230928172815.EE6AFC433C8@smtp.kernel.org/ > > Shiyang, this patch does not apply cleanly on v6.6-rc7. Can you please rebase > the patch on v6.6-rc7 and send it to the mailing list? Sure. I have rebased it and sent a v15.1. Please check it: https://lore.kernel.org/linux-xfs/20231023072046.1626474-1-ruansy.fnst@fujitsu.com/ -- Thanks, Ruan. >