From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58AF4C4332F for ; Fri, 1 Oct 2021 17:45:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B0EDF61ABA for ; Fri, 1 Oct 2021 17:45:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0EDF61ABA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D00A294011C; Fri, 1 Oct 2021 13:45:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8A569400E4; Fri, 1 Oct 2021 13:45:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB4D394011C; Fri, 1 Oct 2021 13:45:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 993C79400E4 for ; Fri, 1 Oct 2021 13:45:14 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4113D1846CA0B for ; Fri, 1 Oct 2021 17:45:14 +0000 (UTC) X-FDA: 78648594948.31.78DF7D8 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by imf10.hostedemail.com (Postfix) with ESMTP id D0AB16001A99 for ; Fri, 1 Oct 2021 17:45:13 +0000 (UTC) Received: by mail-qk1-f169.google.com with SMTP id 138so9904369qko.10 for ; Fri, 01 Oct 2021 10:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=CY+QWSave8EGEaNVpceAJQ+ky5w1DWCyy9j7COQnHFM=; b=askYTJzym7O5yaOqpeMzKwOj0j6Qwo04kEM179/a3sFhqrYyXzPAu1ifpOhcFl9q0Y F/PLEDnH9R02hCpQbyV2Dzb5iXF38C6sxpIDCyRjVSLdg9m4x9mzmxGfikoOrIb4KO5g KK7/5+UqT19eZNeA5mH/CtKtbeSKvPvg7bkvpD2k5W8s0SA56gdmbn7g92dyRjv6qzWm NApTHpAV13PnJqbcQXh1/icou6aS6s4O8Q5naDdwg2J2VccmTim8IhxnqvDSONGlD1i/ 3CEpYbeXydKMWMWPJCbhj1ViExG40KWNfqcSecJq4GbUKkEnpgalIV0poHVYP2TUwniL UMFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=CY+QWSave8EGEaNVpceAJQ+ky5w1DWCyy9j7COQnHFM=; b=2gP/OjEv+57tY9CLRxSlUGkU7KBM2B7NoZWbw/w//grLRD2x/L02bjftABaU3FJouZ P8IeDrr8GDKARb6udWR5DSxmjOkggqzlCjllcDLFMlWNpimvOWhy8uIEXmnoJF32FpnZ qsBahX62x4lV5l+nYqlTIXRpZFWe7bkmOLTTgkB46Fs1x0U/ggu8IAfxwuBJ/TKFWxZE M/5I0X5/PV70+r6/ZdAzMp06giU+JxTQAFB62P/Jp+pypPiCGR/S1iiZUVbh2uBLxfwc Qo05YhLZEdkBkz44loFu+kNnyaZMFXZ8mpoCORXHxcOJY4P2axmL2bt4C06NzqahREnS Iaxw== X-Gm-Message-State: AOAM532IqcnxFmDfo+Z7V4oruESi5p+/jxb6hvU0Ak1GB8nCBoC8jTGp 5IE6IKkaj4E/Vc4WhueTdbtglA== X-Google-Smtp-Source: ABdhPJx/K6VNm0duCKrTeu/4uI4ViO+0lSC4pUeGkBrnh5E8R/aNPnPsk6AlICpx2DIIm1QpxAom9A== X-Received: by 2002:a37:8747:: with SMTP id j68mr10365945qkd.165.1633110313062; Fri, 01 Oct 2021 10:45:13 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id w17sm3357837qkf.97.2021.10.01.10.45.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Oct 2021 10:45:12 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mWMat-0097tt-HS; Fri, 01 Oct 2021 14:45:11 -0300 Date: Fri, 1 Oct 2021 14:45:11 -0300 From: Jason Gunthorpe To: Logan Gunthorpe Cc: Alistair Popple , Felix Kuehling , Christoph Hellwig , Dan Williams , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org, Stephen Bates , Christian =?utf-8?B?S8O2bmln?= , John Hubbard , Don Dutile , Matthew Wilcox , Daniel Vetter , Jakowski Andrzej , Minturn Dave B , Jason Ekstrand , Dave Hansen , Xiong Jianxin , Bjorn Helgaas , Ira Weiny , Robin Murphy , Martin Oliveira , Chaitanya Kulkarni Subject: Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem() Message-ID: <20211001174511.GQ3544071@ziepe.ca> References: <20210916234100.122368-20-logang@deltatee.com> <20210928195518.GV3544071@ziepe.ca> <8d386273-c721-c919-9749-fc0a7dc1ed8b@deltatee.com> <20210929230543.GB3544071@ziepe.ca> <32ce26d7-86e9-f8d5-f0cf-40497946efe9@deltatee.com> <20210929233540.GF3544071@ziepe.ca> <20210930003652.GH3544071@ziepe.ca> <20211001134856.GN3544071@ziepe.ca> <4fdd337b-fa35-a909-5eee-823bfd1e9dc4@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4fdd337b-fa35-a909-5eee-823bfd1e9dc4@deltatee.com> X-Rspamd-Queue-Id: D0AB16001A99 X-Stat-Signature: oi5qz46zixtuinjbc5uh1hbxnkqw3aok Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=askYTJzy; spf=pass (imf10.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.169 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none X-Rspamd-Server: rspam06 X-HE-Tag: 1633110313-343170 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote: > In device-dax, the refcount is only used to prevent the device, and > therefore the pages, from going away on device unbind. Pages cannot be > recycled, as you say, as they are mapped linearly within the device. The > address space invalidation is done only when the device is unbound. By address space invalidation I mean invalidation of the VMA that is pointing to those pages. device-dax may not have a issue with use-after-VMA-invalidation by it's very nature since every PFN always points to the same thing. fsdax and this p2p stuff are different though. > Before the invalidation, an active flag is cleared to ensure no new > mappings can be created while the unmap is proceeding. > unmap_mapping_range() should sequence itself with the TLB flush and AFIAK unmap_mapping_range() kicks off the TLB flush and then returns. It doesn't always wait for the flush to fully finish. Ie some cases use RCU to lock the page table against GUP fast and so the put_page() doesn't happen until the call_rcu completes - after a grace period. The unmap_mapping_range() does not wait for grace periods. This is why for normal memory the put_page is done after the TLB flush completes, not when unmap_mapping_range() finishes. This ensures that before the refcount reaches 0 no concurrent GUP fast can still observe the old PTEs. > GUP-fast using the same mechanism it does for regular pages. As far as I > can see, by the time unmap_mapping_range() returns, we should be > confident that there are no pages left in any mapping (seeing no new > pages could be added since before the call). When viewed under the page table locks this is true, but the 'fast' walkers like gup_fast and hmm_range_fault can continue to be working on old data in the ptes because they don't take the page table locks. They interact with unmap_mapping_range() via the IPI/rcu (gup fast) or mmu notifier sequence count (hmm_range_fault) > P2PDMA follows this pattern, except pages are not mapped linearly and > are returned to the genalloc when their refcount falls to 1. This only > happens after a VMA is closed which should imply the PTEs have already > been unlinked from the pages. And here is the problem, since the genalloc is being used we now care that a page should not continue to be accessed by userspace after it has be placed back into the genalloc. I suppose fsdax has the same basic issue too. > Not to say that all this couldn't use a big conceptual cleanup. A > similar question exists with the single find_special_page() user > (xen/gntdev) and it's definitely not clear what the differences are > between the find_special_page() and vmf_insert_mixed() techniques and > when one should be used over the other. Or could they both be merged to > use the same technique? Oh that gntdev stuff is just nonsense. IIRC is trying to delegate control over a PTE entry itself to the hypervisor. /* * gntdev takes the address of the PTE in find_grant_ptes() and * passes it to the hypervisor in gntdev_map_grant_pages(). The * purpose of the notifier is to prevent the hypervisor pointer * to the PTE from going stale. * * Since this vma's mappings can't be touched without the * mmap_lock, and we are holding it now, there is no need for * the notifier_range locking pattern. I vaugely recall it stuffs in a normal page then has the hypervisor overwrite the PTE. When it comes time to free the PTE it recovers the normal page via the 'find_special_page' hack and frees it. Somehow the hypervisor is also using the normal page for something. It is all very strange and one shouldn't think about it :| Jason