From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F342C00528 for ; Thu, 27 Jul 2023 21:43:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9540D6B0072; Thu, 27 Jul 2023 17:43:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 903A26B0074; Thu, 27 Jul 2023 17:43:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CB2E6B0075; Thu, 27 Jul 2023 17:43:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6F1E76B0072 for ; Thu, 27 Jul 2023 17:43:21 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2F0141202CD for ; Thu, 27 Jul 2023 21:43:21 +0000 (UTC) X-FDA: 81058718202.03.E26AA9C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 1627A100011 for ; Thu, 27 Jul 2023 21:43:18 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jSM36QHN; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690494199; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MJJwHwtXNnxbSlT+8MqC4Bb1EN+dfWWfKfM43w5YNrs=; b=NwDXr4xcp7W3qjamNnfmCduzl5FUf2M9MoW3y6BzbIgkrFu5DIv5BL9lexYmGUClr0hP94 O6eaDKPFcI9WkS4Qj58QlhVrc5V1AIzd76CAZuOW02udFM7W+51FFUeDsWWC6j4ZN3bqPm 6XNSf6XAH6LxMyhq5S3yJ68mzFP/Di0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jSM36QHN; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690494199; a=rsa-sha256; cv=none; b=PgnxUkHT/opcFabSnuWQOacdKwTn0MTrb07Ac2cQXI9cgEQBEYyynG1htgu9kttt3W2ck7 8SQk40nAaRCpnQ03d1rh2YXZI/C3fnKjNxc9GTi+BlmBo+FSTDh9l0cPxlRcvcka3+4PBd dOvr2Dy54xDVj+3EUnvVoHQgoM5ahnI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690494198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MJJwHwtXNnxbSlT+8MqC4Bb1EN+dfWWfKfM43w5YNrs=; b=jSM36QHNYG0odQXLeq2UxfpSRDRZNKetBEzRyLaPy6uuddmvU4niN7U/TZacTePy9gZndW fYTh/VL7yYRCE9zpcfUlK6erMmVPypUwyrUs6G8BN3BuOrIr0t7ajckpwDmIHMkyhZFRIS PZbEuq2KhqnYS1RWDjcwR3AL79UW/xg= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-649-qb5N619sNhiBV-UioR0N9Q-1; Thu, 27 Jul 2023 17:43:16 -0400 X-MC-Unique: qb5N619sNhiBV-UioR0N9Q-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-63c9463c116so3308926d6.0 for ; Thu, 27 Jul 2023 14:43:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690494196; x=1691098996; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MJJwHwtXNnxbSlT+8MqC4Bb1EN+dfWWfKfM43w5YNrs=; b=XKqG55IvO9U8yVg7sXdJR2eXGNAOPxkRenkpe2NyscGh1Yeri55pPMYcR7bTuzPNQc 8P97xaywcjs00NYXTGjUuCOWpVpAdOoBUDSYgUWTwmBR+4LmqIWyXh8OzTHIkmmg5uXR aWXaoR0zimYnUSyW/c3XuNGdOnewsZ+LEHaIul24T/S5jnhuEqEi5k4DVxOWw2Dz4cqq lLugOJZ6lsn+OFamijJiA9qFANYvMI/Veto97rTqxHDTSuhSCSx+PgvGOBr/GLamyWcg Lwq+k7FBTkIVgeeadX9swQPERQBzMDScCDlMzmkL2/+9sStDOktFSft+h+pDommIqpCf Ze0Q== X-Gm-Message-State: ABy/qLaS5jC6jWvvRoloCbZgWud1uVtnBlo0/pXIOSf4lKYZtVTySsP3 4p7nphR8znr8oTV+WCO0xY0oIm+87HTC3xr9ljZrWVGNJcTNZbQphWTxIBJtZ5lhZAM43Y/pwjV pOZBhQH73TQ4= X-Received: by 2002:a05:6214:4011:b0:63c:f5fd:d30f with SMTP id kd17-20020a056214401100b0063cf5fdd30fmr801657qvb.1.1690494195827; Thu, 27 Jul 2023 14:43:15 -0700 (PDT) X-Google-Smtp-Source: APBJJlEgrGhLY9tc/heWbKgbRSzR9pLyQzuZ3sSVWAPPUKkETb1sJp3OQ1uEXpyPHbNa/xZi03x+zw== X-Received: by 2002:a05:6214:4011:b0:63c:f5fd:d30f with SMTP id kd17-20020a056214401100b0063cf5fdd30fmr801640qvb.1.1690494195470; Thu, 27 Jul 2023 14:43:15 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id l26-20020a0c979a000000b0063d2a70dff5sm720739qvd.72.2023.07.27.14.43.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 14:43:15 -0700 (PDT) Date: Thu, 27 Jul 2023 17:43:13 -0400 From: Peter Xu To: "Kasireddy, Vivek" Cc: Hugh Dickins , Jason Gunthorpe , Alistair Popple , Gerd Hoffmann , "Kim, Dongwon" , David Hildenbrand , "Chang, Junxiao" , "linux-mm@kvack.org" , "dri-devel@lists.freedesktop.org" , Mike Kravetz Subject: Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Message-ID: References: <20230718082858.1570809-1-vivek.kasireddy@intel.com> <20230718082858.1570809-2-vivek.kasireddy@intel.com> <87jzuwlkae.fsf@nvdebian.thelocal> <87pm4nj6s5.fsf@nvdebian.thelocal> <75e3a74a-68f5-df-9a49-a0553c04320@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 1627A100011 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 9h3c7ta5djabww8hw55kbsnbaqhscaws X-HE-Tag: 1690494198-882818 X-HE-Meta: U2FsdGVkX19qwBBz4dNa2c6102QQcOx3HP/BSIgTN5GTDEb9hc6BYExEuDAEtWX9rGvCWwmx77tiF33LWgR/jTbfttyNumoou8QLb122ElXcsSgxsNs3oNkVCsieuT6FjhEQ+fvzCxCPkdF1BcXYLibWhJlwpRXoPSzmzyegz2BH4JxHQT5jFFTD14Y/jbwXWzT7n1wypvAI9i8b1xBPiEjKRN+3RBg9D/wXxnsBKEfdQ2Qc1S0YPe57JFXTrVdbmoqlsW18APVngH5aVbjWFlVP1Uq0jhyoIysDtd3n4lLtmgMfDENg3sxsvhBWZoKMC7qAJdSI/w9bpRzK0XGv/UGVCh+2n+glBZ1rOr/IKNDpqqDb07cbat2MYRkQInjCET9Q2kUzh+F9AFmMif7kQggBVI1x1Vp7ANWcbYkOZTeOTnBlDqPX7qZMOEXz8E9Jbv2xuVkxjNMGFOBI98OcP15wlYCJ0JJE2VjBA2wYgwfUZq4GeK3a672YjE6isNPcb3QRz1SazK0nFf3A1xIntBKYLi+gWKFe7GXpIBF42MHsbjNP/o0PJenSgUJckY17q2/LX8i7sv0aUijmUIBGtVOhhAT3YKe7LI9dMut1+gqvgEmOzVYxkClQMznKrQ5RWH2N3RW9yS5fHcsdNBOtSdEYhlejrgEXBsuo4Q35mbqWQeZQ3iRbsdKmvU2Pb/GfTP6e91SvVL9Tkgr16DK4Dw+YSuy8MAYg0jcw40wajnTuBJvep7+e2fKaaz3uunGaRDzfg8EDx6k1VxGt6/KPvxRXNVRa/PN6RDZGLXR+nk5j2qR17+kcykH2a+YxDqmt8Pjl8bu+0jZI1sb4iGkJ//GP5huR/uXxe1U+wjO4KbozjjCdSBY4P63Z/T73EtpJvd/yehd2rjoNjaRnSPw/DzEQ63SHX5Ehgc9ZW8hqaguOeS7REJc+9BYvlyYtowj1/sDx5yV0adyxy4UPPYm yAWzfJwF L0+15SovlA1n/tsly0FUfZgvIfRpg6Fh5sqexjcSsSieYW25FQWJRBjE+7LIrKnSfczrb/lA2ava1gHkqJ9ExLcatL2m7jJaN/OALE1Six65UPELIknkgJTObWE505JnLlzAlD4iNpAmo2EUv6/NNFCe26BhNdLXIxdcvpe+GBUUKkywffo/OP2u9nuMibZPOW7j15ZAKXzVuoyQ0kx5VOlCpuW1cK2QVMQOfkkOSWKFzV7D1MFFR09vZvDRSO346GnjynpNkRnGBlaV7yUsV5b8abWUjHm+ZxjVDwVAZq3FaREMP3+L+gaOVxej7PswZumtnGWEISCVu0NUHhWZ6NA5ys/ff8Bb1Fq78DMHnG/B1p+ywdBhiX6pj3nc47QvLe4DlXora0RzDPHFE7hd/TvBvQEckM6GviqjqisX/wJET669qMdV1fXS2OC0mb38KetTYaperUe67ptU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Vivek, On Tue, Jul 25, 2023 at 10:24:21PM +0000, Kasireddy, Vivek wrote: > Hi Hugh, > > > > > On Mon, 24 Jul 2023, Kasireddy, Vivek wrote: > > > Hi Jason, > > > > On Mon, Jul 24, 2023 at 07:54:38AM +0000, Kasireddy, Vivek wrote: > > > > > > > > > > I'm not at all familiar with the udmabuf use case but that sounds > > > > > > brittle and effectively makes this notifier udmabuf specific right? > > > > > Oh, Qemu uses the udmabuf driver to provide Host Graphics > > components > > > > > (such as Spice, Gstreamer, UI, etc) zero-copy access to Guest created > > > > > buffers. In other words, from a core mm standpoint, udmabuf just > > > > > collects a bunch of pages (associated with buffers) scattered inside > > > > > the memfd (Guest ram backed by shmem or hugetlbfs) and wraps > > > > > them in a dmabuf fd. And, since we provide zero-copy access, we > > > > > use DMA fences to ensure that the components on the Host and > > > > > Guest do not access the buffer simultaneously. > > > > > > > > So why do you need to track updates proactively like this? > > > As David noted in the earlier series, if Qemu punches a hole in its memfd > > > that goes through pages that are registered against a udmabuf fd, then > > > udmabuf needs to update its list with new pages when the hole gets > > > filled after (guest) writes. Otherwise, we'd run into the coherency > > > problem (between udmabuf and memfd) as demonstrated in the selftest > > > (patch #3 in this series). > > > > Wouldn't this all be very much better if Qemu stopped punching holes there? > I think holes can be punched anywhere in the memfd for various reasons. Some I just start to read this thread, even haven't finished all of them.. but so far I'm not sure whether this is right at all.. udmabuf is a file, it means it should follow the file semantics. mmu notifier is per-mm, otoh. Imagine for some reason QEMU mapped the guest pages twice, udmabuf is created with vma1, so udmabuf registers the mm changes over vma1 only. However the shmem/hugetlb page cache can be populated in either vma1, or vma2. It means when populating on vma2 udmabuf won't get update notify at all, udmabuf pages can still be obsolete. Same thing to when multi-process QEMU is used, where we can have vma1 in QEMU while vma2 in the other process like vhost-user. I think the trick here is we tried to "hide" the fact that these are actually normal file pages, but we're doing PFNMAP on them... then we want the file features back, like hole punching.. If we used normal file operations, everything will just work fine; TRUNCATE will unmap the host mapped frame buffers when needed, and when accessed it'll fault on demand from the page cache. We seem to be trying to reinvent "truncation" for pfnmap but mmu notifier doesn't sound right to this at least.. > of the use-cases where this would be done were identified by David. Here is what > he said in an earlier discussion: > "There are *probably* more issues on the QEMU side when udmabuf is paired > with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for > virtio-balloon, virtio-mem, postcopy live migration, ... for example, in" Now after seething this, I'm truly wondering whether we can still simply use the file semantics we already have (for either shmem/hugetlb/...), or is it a must we need to use a single fd to represent all? Say, can we just use a tuple (fd, page_array) rather than the udmabuf itself to do host zero-copy mapping? the page_array can be e.g. a list of file offsets that points to the pages (rather than pinning the pages using FOLL_GET). The good thing is then the fd can be the guest memory file itself. With that, we can mmap() over the shmem/hugetlb in whatever vma and whatever process. Truncation (and actually everything... e.g. page migration, swapping, ... which will be disabled if we use PFNMAP pins) will just all start to work, afaiu. Thanks, -- Peter Xu