From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBC08C433FE for ; Wed, 16 Mar 2022 13:46:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF0C38D0002; Wed, 16 Mar 2022 09:46:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA0328D0001; Wed, 16 Mar 2022 09:46:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D68728D0002; Wed, 16 Mar 2022 09:46:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id C83A68D0001 for ; Wed, 16 Mar 2022 09:46:14 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 921E221CA7 for ; Wed, 16 Mar 2022 13:46:14 +0000 (UTC) X-FDA: 79250373468.06.1BEDA7D Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf25.hostedemail.com (Postfix) with ESMTP id 2FB71A0017 for ; Wed, 16 Mar 2022 13:46:12 +0000 (UTC) IronPort-Data: =?us-ascii?q?A9a23=3ARC8MSKm+Gx15ewHA4EoSTyro5gzqJ0RdPkR7XQ2?= =?us-ascii?q?eYbTBsI5bp2MEmmMcWWyPaa2CZ2D1KthxbYm/8EgDv5LTm9Y3HlRl+CA2RRqmi?= =?us-ascii?q?+KfW43BcR2Y0wB+jyH7ZBs+qZ1YM7EsFehsJpPnjkrrYuiJQUVUj/nSHOKmULe?= =?us-ascii?q?cY0ideCc/IMsfoUM68wIGqt4w6TSJK1vlVeLa+6UzCnf8s9JHGj58B5a4lf9al?= =?us-ascii?q?K+aVAX0EbAJTasjUFf2zxH5BX+ETE27ByOQroJ8RoZWSwtfpYxV8F81/z91Yj+?= =?us-ascii?q?kur39NEMXQL/OJhXIgX1TM0SgqkEa4HVsjeBgb7xBAatUo2zhc9RZ0shEs4ehD?= =?us-ascii?q?wkvJbHklvkfUgVDDmd1OqguFLrveCLl7pfIkhSZG5fr67A0ZK0sBqUU8/h2DUl?= =?us-ascii?q?A7/sdLyoHbwzFjOWzqJq7QelEh8ItNsDnMYoT/HZ6wlnxAf8gB5KFXKTO4d5R2?= =?us-ascii?q?SwYh8ZSEPKYbM0cARJjbgvHZRJnOVoNDp862uCyiRHXdzxetULQoK8f4Hbaxw8?= =?us-ascii?q?316LiWPLTZNCLQMB9mkeDunmA+2X/HwFcONGBoRKH+3ShwOTPgAv8QosZELD+/?= =?us-ascii?q?flv6HWXx2oOGFgYTle2v/S9olCxVsgZKEEO/Ccq668o+ySDStj7Qg39o3OeuBM?= =?us-ascii?q?Yc8RfHvd86wyXzKfQpQGDCQAsSj9HdcxjpMEtbSIl20XPnN7zAzFr9rqPRhqgG?= =?us-ascii?q?h28xd+pEXFNazZcOmlfFk1Yi+QPabob1nrnJuuP2obs5jEtJQzN/g=3D=3D?= IronPort-HdrOrdr: =?us-ascii?q?A9a23=3AVUg6hKlM/KmWHn+NaQObFCc54BzpDfIQ3DAb?= =?us-ascii?q?v31ZSRFFG/Fw9vre+MjzsCWYtN9/Yh8dcK+7UpVoLUm8yXcX2/h1AV7BZniEhI?= =?us-ascii?q?LAFugLgrcKqAeQeREWmNQ86Y5QN4B6CPDVSWNxlNvG5mCDeOoI8Z2q97+JiI7l?= =?us-ascii?q?o0tQcQ=3D=3D?= X-IronPort-AV: E=Sophos;i="5.88,333,1635177600"; d="scan'208";a="122733517" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 16 Mar 2022 21:46:09 +0800 Received: from G08CNEXMBPEKD06.g08.fujitsu.local (unknown [10.167.33.206]) by cn.fujitsu.com (Postfix) with ESMTP id 3582E4D16FC7; Wed, 16 Mar 2022 21:46:09 +0800 (CST) Received: from G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.80) by G08CNEXMBPEKD06.g08.fujitsu.local (10.167.33.206) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Wed, 16 Mar 2022 21:46:08 +0800 Received: from [10.167.201.9] (10.167.201.9) by G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.23 via Frontend Transport; Wed, 16 Mar 2022 21:46:08 +0800 Message-ID: <4fd95f0b-106f-6933-7bc6-9f0890012b53@fujitsu.com> Date: Wed, 16 Mar 2022 21:46:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v11 1/8] dax: Introduce holder for dax_device To: Dan Williams CC: Linux Kernel Mailing List , linux-xfs , Linux NVDIMM , Linux MM , linux-fsdevel , "Darrick J. Wong" , david , Christoph Hellwig , Jane Chu References: <20220227120747.711169-1-ruansy.fnst@fujitsu.com> <20220227120747.711169-2-ruansy.fnst@fujitsu.com> From: Shiyang Ruan In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-yoursite-MailScanner-ID: 3582E4D16FC7.A3D86 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com X-Stat-Signature: hjgt14ac1nbm9x7aj3pos57em49yr9jj Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=fujitsu.com (policy=none); spf=none (imf25.hostedemail.com: domain of ruansy.fnst@fujitsu.com has no SPF policy when checking 183.91.158.132) smtp.mailfrom=ruansy.fnst@fujitsu.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2FB71A0017 X-HE-Tag: 1647438372-229511 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2022/3/12 7:35, Dan Williams 写道: > On Sun, Feb 27, 2022 at 4:08 AM Shiyang Ruan wrote: >> >> To easily track filesystem from a pmem device, we introduce a holder for >> dax_device structure, and also its operation. This holder is used to >> remember who is using this dax_device: >> - When it is the backend of a filesystem, the holder will be the >> instance of this filesystem. >> - When this pmem device is one of the targets in a mapped device, the >> holder will be this mapped device. In this case, the mapped device >> has its own dax_device and it will follow the first rule. So that we >> can finally track to the filesystem we needed. >> >> The holder and holder_ops will be set when filesystem is being mounted, >> or an target device is being activated. >> >> Signed-off-by: Shiyang Ruan >> --- >> drivers/dax/super.c | 89 +++++++++++++++++++++++++++++++++++++++++++++ >> include/linux/dax.h | 32 ++++++++++++++++ >> 2 files changed, 121 insertions(+) >> >> diff --git a/drivers/dax/super.c b/drivers/dax/super.c >> index e3029389d809..da5798e19d57 100644 >> --- a/drivers/dax/super.c >> +++ b/drivers/dax/super.c >> @@ -21,6 +21,9 @@ >> * @cdev: optional character interface for "device dax" >> * @private: dax driver private data >> * @flags: state and boolean properties >> + * @ops: operations for dax_device >> + * @holder_data: holder of a dax_device: could be filesystem or mapped device >> + * @holder_ops: operations for the inner holder >> */ >> struct dax_device { >> struct inode inode; >> @@ -28,6 +31,8 @@ struct dax_device { >> void *private; >> unsigned long flags; >> const struct dax_operations *ops; >> + void *holder_data; >> + const struct dax_holder_operations *holder_ops; >> }; >> >> static dev_t dax_devt; >> @@ -193,6 +198,29 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, >> } >> EXPORT_SYMBOL_GPL(dax_zero_page_range); >> >> +int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off, >> + u64 len, int mf_flags) >> +{ >> + int rc, id; >> + >> + id = dax_read_lock(); >> + if (!dax_alive(dax_dev)) { >> + rc = -ENXIO; >> + goto out; >> + } >> + >> + if (!dax_dev->holder_ops) { >> + rc = -EOPNOTSUPP; > > I think it is ok to return success (0) for this case. All the caller > of dax_holder_notify_failure() wants to know is if the notification > was successfully delivered to the holder. If there is no holder > present then there is nothing to report. This is minor enough for me > to fix up locally if nothing else needs to be changed. I thought it could fall back to generic memory failure handler: mf_generic_kill_procs(), if holder_ops not exists. > >> + goto out; >> + } >> + >> + rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags); >> +out: >> + dax_read_unlock(id); >> + return rc; >> +} >> +EXPORT_SYMBOL_GPL(dax_holder_notify_failure); >> + >> #ifdef CONFIG_ARCH_HAS_PMEM_API >> void arch_wb_cache_pmem(void *addr, size_t size); >> void dax_flush(struct dax_device *dax_dev, void *addr, size_t size) >> @@ -268,6 +296,10 @@ void kill_dax(struct dax_device *dax_dev) >> >> clear_bit(DAXDEV_ALIVE, &dax_dev->flags); >> synchronize_srcu(&dax_srcu); >> + >> + /* clear holder data */ >> + dax_dev->holder_ops = NULL; >> + dax_dev->holder_data = NULL; > > Isn't this another failure scenario? If kill_dax() is called while a > holder is still holding the dax_device that seems to be another > ->notify_failure scenario to tell the holder that the device is going > away and the holder has not released the device yet. Yes. I should call dax_holder_notify_failure() and then unregister the holder. > >> } >> EXPORT_SYMBOL_GPL(kill_dax); >> >> @@ -409,6 +441,63 @@ void put_dax(struct dax_device *dax_dev) >> } >> EXPORT_SYMBOL_GPL(put_dax); >> >> +/** >> + * dax_holder() - obtain the holder of a dax device >> + * @dax_dev: a dax_device instance >> + >> + * Return: the holder's data which represents the holder if registered, >> + * otherwize NULL. >> + */ >> +void *dax_holder(struct dax_device *dax_dev) >> +{ >> + if (!dax_alive(dax_dev)) >> + return NULL; > > It's safe for the holder to assume that it can de-reference > ->holder_data freely in its notify_handler callback because > dax_holder_notify_failure() arranges for the callback to run in > dax_read_lock() context. > > This is another minor detail that I can fixup locally. > >> + >> + return dax_dev->holder_data; >> +} >> +EXPORT_SYMBOL_GPL(dax_holder); >> + >> +/** >> + * dax_register_holder() - register a holder to a dax device >> + * @dax_dev: a dax_device instance >> + * @holder: a pointer to a holder's data which represents the holder >> + * @ops: operations of this holder >> + >> + * Return: negative errno if an error occurs, otherwise 0. >> + */ >> +int dax_register_holder(struct dax_device *dax_dev, void *holder, >> + const struct dax_holder_operations *ops) >> +{ >> + if (!dax_alive(dax_dev)) >> + return -ENXIO; >> + >> + if (cmpxchg(&dax_dev->holder_data, NULL, holder)) >> + return -EBUSY; >> + >> + dax_dev->holder_ops = ops; >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(dax_register_holder); >> + >> +/** >> + * dax_unregister_holder() - unregister the holder for a dax device >> + * @dax_dev: a dax_device instance >> + * @holder: the holder to be unregistered >> + * >> + * Return: negative errno if an error occurs, otherwise 0. >> + */ >> +int dax_unregister_holder(struct dax_device *dax_dev, void *holder) >> +{ >> + if (!dax_alive(dax_dev)) >> + return -ENXIO; >> + >> + if (cmpxchg(&dax_dev->holder_data, holder, NULL) != holder) >> + return -EBUSY; >> + dax_dev->holder_ops = NULL; >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(dax_unregister_holder); >> + >> /** >> * inode_dax: convert a public inode into its dax_dev >> * @inode: An inode with i_cdev pointing to a dax_dev >> diff --git a/include/linux/dax.h b/include/linux/dax.h >> index 9fc5f99a0ae2..262d7bad131a 100644 >> --- a/include/linux/dax.h >> +++ b/include/linux/dax.h >> @@ -32,8 +32,24 @@ struct dax_operations { >> int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); >> }; >> >> +struct dax_holder_operations { >> + /* >> + * notify_failure - notify memory failure into inner holder device >> + * @dax_dev: the dax device which contains the holder >> + * @offset: offset on this dax device where memory failure occurs >> + * @len: length of this memory failure event > > Forgive me if this has been discussed before, but since dax_operations > are in terms of pgoff and nr pages and memory_failure() is in terms of > pfns what was the rationale for making the function signature byte > based? Maybe I didn't describe it clearly... The @offset and @len here are byte-based. And so is ->memory_failure(). You can find the implementation of ->memory_failure() in 3rd patch: +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, + phys_addr_t addr, u64 len, int mf_flags) +{ + struct pmem_device *pmem = + container_of(pgmap, struct pmem_device, pgmap); + u64 offset = addr - pmem->phys_addr - pmem->data_offset; + + return dax_holder_notify_failure(pmem->dax_dev, offset, len, mf_flags); +} > > I want to get this series merged into linux-next shortly after > v5.18-rc1. Then we can start working on incremental fixups rather > resending the full series with these long reply cycles. Thanks. That really helps. -- Ruan.