From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D90C4828E for ; Fri, 2 Feb 2024 19:29:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500056B0074; Fri, 2 Feb 2024 14:29:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AF566B007E; Fri, 2 Feb 2024 14:29:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 377AA6B0081; Fri, 2 Feb 2024 14:29:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 246056B0074 for ; Fri, 2 Feb 2024 14:29:08 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E479D1A018B for ; Fri, 2 Feb 2024 19:29:07 +0000 (UTC) X-FDA: 81747851934.27.70A8868 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf10.hostedemail.com (Postfix) with ESMTP id DFA09C0008 for ; Fri, 2 Feb 2024 19:29:05 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=vuoQB1le; spf=pass (imf10.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706902146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=06LoBIpBsKBy2QprdyT9WQSgpJdIqcGNJyZIddLKHb4=; b=ksZqZ0zmt8gnUF6sUcHFJBA6abUKQP0a/N/E5PxZiM/9Y3M4cTfs4t2qfR1u9cmEOb5jP3 M3uR6jfNO9qiZrxLqDhpjMWAN4gQQUu8LXZPWVFqoHiEuUAIX9+RD+BPoVlJPUZhsCwRrt w2b1rqYNBjZ+uP88gOIzuCTpX4megn8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706902146; a=rsa-sha256; cv=none; b=pntFn7mz4ZIM3v56IFI1DdMPWNSRoFC+loqLPxtN/lfh06tEcH5BHk+Yj1efEQbUtCJSCP Ntk45CwFHPTQMF2zjRxRvWtfmc/EcN/MtXGWst0lIcsA20MCEeuH1e9LjbSFTI0/QcYySl /HyJSlQeV8caHAZqydjme1DOTkms15Y= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=vuoQB1le; spf=pass (imf10.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1706902144; bh=P5L94LQ6Hg9uW7husBFpiivZAxhm5usar4GO4ypUsik=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=vuoQB1lekXlbjADkCcvYIvisuIDfVtul+Mv7g1QhfkY+Dn8Kqp5EqVVFl6WgUeCME W9MsuJgQBpNHIHIEZrM7HtgrA80tGoe3cEBvkf6bFPPH93r8ERPxoEGcU19Vj0+0N6 vvtibcNd6X2uUGqTdNGHLALH2lWeui4gSATNqcr9uoTDeHeflu3Bh9HHFnnUzjxPdr 2UeI6iLMjX38nTDhl/OfOu57xWLk8+a9WU2pcnEYgkgUyO9CpJPcbBkMD7EXg1Ghaj DFKnWfrxihPS7a21/dHY3lcOn/pX72Zc1QAG/Xba2Jl3NCt0qpKoejnR54b+IqXJUn nbll165jXqVbA== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4TRQnD3vfJzXHG; Fri, 2 Feb 2024 14:29:04 -0500 (EST) Message-ID: <6bdf6085-101d-47ef-86f4-87936622345a@efficios.com> Date: Fri, 2 Feb 2024 14:29:05 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime Content-Language: en-US To: Dan Williams , Arnd Bergmann , Dave Chinner Cc: linux-kernel@vger.kernel.org, Andrew Morton , Linus Torvalds , linux-mm@kvack.org, linux-arch@vger.kernel.org, Vishal Verma , Dave Jiang , Matthew Wilcox , Russell King , nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, dm-devel@lists.linux.dev References: <20240131162533.247710-1-mathieu.desnoyers@efficios.com> <20240131162533.247710-3-mathieu.desnoyers@efficios.com> <65bab567665f3_37ad2943c@dwillia2-xfh.jf.intel.com.notmuch> <0a38176b-c453-4be0-be83-f3e1bb897973@efficios.com> <65bac71a9659b_37ad29428@dwillia2-xfh.jf.intel.com.notmuch> <65bd284165177_2d43c29443@dwillia2-mobl3.amr.corp.intel.com.notmuch> From: Mathieu Desnoyers In-Reply-To: <65bd284165177_2d43c29443@dwillia2-mobl3.amr.corp.intel.com.notmuch> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: DFA09C0008 X-Rspam-User: X-Stat-Signature: cq9pi3e69ibdzpsoys9gmhxcfhtgtuaw X-Rspamd-Server: rspam03 X-HE-Tag: 1706902145-34169 X-HE-Meta: U2FsdGVkX1+5RTRfMp4XdTXBQVEhbwxWqIV6C9f37PbCww9AirkD05aDBnz3y7XqFyF5Trq35C09F8jTXwkLZs89IgtiMHNdLBQ5ZWWJaWIT1e8IuOT6vY1DDDaO7an9nFZpzAzvLQSfapiE2z6iXH9RlP1n9ir0YOVTerTcGUHd7gNYTgwZF8m9fHreO0oRJd1Qb+cWXUPvZRrlX8MMOKOBVPWH/IAnF/sXRheP4yLRVtWOGF+Kyq+ehD1Vps9biFA3r38v+7w8UNezJEs90J7XFeyQ3q87Wzxy2IQWtwsUIq9eVnSXL1+1opRJP+7U8gN+kZ8pw0LUt+W1y0dwESiszIo1rymEjOybKkEJw39QAbtwvZgqyAFQLJcfyZ+WZfBTUMlK+44SRisUsJFDi4Uru++a4ddCKZk/7rEXvgbXZ90pHqGL6xp6WO3RyD6K/xNNW93ldk463U1IbAT7/y3q1fvdJZi2ZkcpUbv54dXfXd0A00HXRbBR0E5f/4/1R2V5e0QFYAxHwH6KlITgKrCAHeU/aM0z+z2IZrsoy+0DCJbNgAGbRt1W+9NhVbDNRBTDsI+Lm/sJ5WhXdyJ6DmqnrIFn/iXM2vCFj/KGNzH+/ClSzdT5nVLX39pDwd/jgw5/pea7dg5j4f8LHR5is85SE/7AWp0WYBEED+tEKnmZUEUTlhEim7j+Cg6PJkewcrC8OwPUqEmWdaBfnkMcoDWpMM9wcnh15TLu1fwEc1QmCNlIXMbdW6NMRAiEL791MVI1Jqdgyy4Cb5M4wbmQmGbH/MmgD+IOi+8cNb4/KjGqMkc5f5NBiVIWCmx7odg0kVa5A4QDKirp2SnfsOOP8YJ/8TOl8gukZKmJfkNFE0OMHgTaapILeY4N+6wOR+XzW2dIxLK1cNXI89A0J5ygR3vYFAnqd5wlKrFB6sTHa6pyg8PSJn+WvdwCPCwc1L9qVcQkygUyem5urEjg6Pg QO2GMtSF sYjImn9g+n+V39htzcpuWZJspM6ghVu3HyfGJ87ke/t5ZN12yNLbdDI5XlWmE0aUht8Z3QOuDSvhahC3dOCAthBzgKt1Osom7rI88n1aOEn0z9Jzvrq/KDRB2GuIoDVZwRZMIjB9LkwJdQHZmtWw52hnA0rr/l0y5tMsEnMHKfQwxBHOfO7ACSqyQmefG7dqpBELaVOlDkL3JpIZ54MbvKyn7SMQX2SrGnHQET/4TZppGvufyTLh1aJ46hUbEqa+KDYkI5qfsn026o5CdyrBCUXLVNrJuiEpmZu/B1pbxbAJ+ONg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-02-02 12:37, Dan Williams wrote: > Mathieu Desnoyers wrote: [...] >> > >> The alternative route I intend to take is to audit all callers >> of alloc_dax() and make sure they all save the alloc_dax() return >> value in a struct dax_device * local variable first for the sake >> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer >> initialized to NULL in the error case and simplify the rest of >> error checking. > > I could maybe get on board with that, but it needs a comment somewhere > about the asymmetric subtlety. Is this "somewhere" at every alloc_dax() call site, or do you have something else in mind ? > >> >> >>> return; >>> >>> if (dax_dev->holder_data != NULL) >>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >>> index 4e8fdcb3f1c8..b69c9e442cf4 100644 >>> --- a/drivers/nvdimm/pmem.c >>> +++ b/drivers/nvdimm/pmem.c >>> @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev, >>> dax_dev = alloc_dax(pmem, &pmem_dax_ops); >>> if (IS_ERR(dax_dev)) { >>> rc = PTR_ERR(dax_dev); >>> - goto out; >>> + if (rc != -EOPNOTSUPP) >>> + goto out; >> >> If I compare the before / after this change, if previously >> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would >> result in a NULL pointer dereference. > > No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the > CONFIG_FS_DAX=n case. Indeed, I was wrong there. > So that means that pmem devices on ARM have been > possible without FS_DAX. So, in order for alloc_dax() returning > ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error > path needs to be changed. Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y, which includes configurations that had FS_DAX=n previously. I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP alloc_dax() return value as non-fatal. > >> This would be an error handling fix all by itself. Do we really want >> to return successfully if dax is unsupported, or should we return >> an error here ? > > Per above, there is no error handling fix, and pmem block device > available should not depend on alloc_dax() succeeding. I agree on treating alloc_dax() failure as non-fatal. There is however one error handling fix to nvdimm/pmem which I plan to introduce as an initial patch before this change: nvdimm/pmem: Fix leak on dax_add_host() failure Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done before setting pmem->dax_dev, which therefore issues the two following calls on NULL pointers: out_cleanup_dax: kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); Signed-off-by: Mathieu Desnoyers diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 4e8fdcb3f1c8..9fe358090720 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev, set_dax_nomc(dax_dev); if (is_nvdimm_sync(nd_region)) set_dax_synchronous(dax_dev); + pmem->dax_dev = dax_dev; rc = dax_add_host(dax_dev, disk); if (rc) goto out_cleanup_dax; dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); - pmem->dax_dev = dax_dev; - rc = device_add_disk(dev, disk, pmem_attribute_groups); if (rc) goto out_remove_host; > > The real question is what to do about device-dax. I *think* it is not > affected by cpu_dcache aliasing because it never accesses user mappings > through a kernel alias. I doubt device-dax is in use on these platforms, > but we might need another fixup for that if someone screams about the > alloc_dax() behavior change making them lose device-dax access. By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX. Based on your analysis, is alloc_dax() still the right spot where to place this runtime check ? Which call sites are responsible for invoking alloc_dax() for device-dax ? If we know which call sites do not intend to use the kernel linear mapping, we could introduce a flag (or a new variant of the alloc_dax() API) that would either enforce or skip the check. [...] >> >> Here what I'm seeing so far: >> >> - devm_release_mem_region() is never called after devm_request_mem_region(). Not >> on error, neither on teardown, > > devm_release_mem_region() is called from virtio_fs_probe() context. That I guess you mean "devm_request_mem_region()" here. > means that when virtio_fs_probe() returns an error the driver core will > automatically call devm_request_mem_region(). And "devm_release_mem_region()" here. > >> - pgmap is never freed on error after devm_kzalloc. > > That is what the "devm_" in devm_kzalloc() does, free the memory on > driver-probe failure, or after the driver remove callback is invoked. Got it. > >> >>> { >>> + struct dax_device *dax_dev __free(cleanup_dax) = NULL; >>> struct virtio_shm_region cache_reg; >>> struct dev_pagemap *pgmap; >>> bool have_cache; >>> @@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) >>> if (!IS_ENABLED(CONFIG_FUSE_DAX)) >>> return 0; >>> >>> + dax_dev = alloc_dax(fs, &virtio_fs_dax_ops); >>> + if (IS_ERR(dax_dev)) { >>> + int rc = PTR_ERR(dax_dev); >>> + >>> + if (rc == -EOPNOTSUPP) >>> + return 0; >>> + return rc; >>> + } >> >> What is gained by moving this allocation here ? > > The gain is to fail early in virtio_fs_setup_dax() since the fundamental > dependency of alloc_dax() success is not met. For example why let the > setup progress to devm_memremap_pages() when alloc_dax() is going to > return ERR_PTR(-EOPNOTSUPP). What I don't know is whether there is a dependency requiring to do devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages() before calling alloc_dax() ? Those 3 calls are used to populate: fs->window_phys_addr = (phys_addr_t) cache_reg.addr; fs->window_len = (phys_addr_t) cache_reg.len; and then alloc_dax() takes "fs" as private data parameter. So it's unclear to me whether we can swap the invocation order. I suspect that it is not an issue because it is only used to populate dax_dev->private, but I prefer to confirm this with you just to be on the safe side. [...] Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com