From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7737C27C4F for ; Fri, 21 Jun 2024 07:32:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C00E6B029A; Fri, 21 Jun 2024 03:32:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76F616B029B; Fri, 21 Jun 2024 03:32:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 637756B029E; Fri, 21 Jun 2024 03:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 44D486B029A for ; Fri, 21 Jun 2024 03:32:48 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CDA3AA0AD1 for ; Fri, 21 Jun 2024 07:32:47 +0000 (UTC) X-FDA: 82254078774.02.F7ED7D6 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf02.hostedemail.com (Postfix) with ESMTP id F09AB80020 for ; Fri, 21 Jun 2024 07:32:45 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SVESVheo; spf=pass (imf02.hostedemail.com: domain of qperret@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718955160; a=rsa-sha256; cv=none; b=FrAZFXYg8g2L0a5HmgYO1zcCoxWVBhkzmxCz/2DOivB3gUeF45Ejix5NH0tLOl8/RsDeMs NICzEh+m4Fj/8zYrRlJyhhkVhHzZhLfGrPmakLucWDyFn1uEELM+hELElAe6xUrS/Ci4AN pWVodbxJFuqab8z+thpVy82j7I7Oq4k= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SVESVheo; spf=pass (imf02.hostedemail.com: domain of qperret@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718955160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e1fAFm+e+fwl4p3avvK+TEsJDfR37gnsqnlBJ9746Co=; b=v05mlZmqW7oncKI7DuWrpia56fxmOw0FwD+VjmkS3fp+TSpZD4E+sbqBm6MsBn3g4zIN4q fXjt39AL7ug3MBY+jKbppe/zDHDh0fqeWNjDsduODNQi1TPZ8cWI9teXgBjzJJH5j7PCCm hwJ1fTjphIH3Ih+/nFu7VCc14IqoNd4= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-57d0eca877cso1897720a12.2 for ; Fri, 21 Jun 2024 00:32:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718955164; x=1719559964; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=e1fAFm+e+fwl4p3avvK+TEsJDfR37gnsqnlBJ9746Co=; b=SVESVheow6tSHrnsaXk6dsH4R+JKttv6D+g0oZzFu62PDtEUEG1yeft+mxYpJmxqKY 3i9rhJiN0RAy+dY6MaGp1fmAAoBKXN/BpV0d/qsB1+qhY9T3k9IpNUVa9W2xXS5BarKY zuWS3Rs3k72iOyEmr0rRj4UPjJKbijzl0ds+ISIbvQNU59MBCu+p5Rhjl5+UyCwCl5T0 Xjw/AbZjdYpSTrjtxiR+LFRXXJycSSlvab29+11rgmaDRIgoiy9P/2LJGdF4VzX6yP/J jKvYF4P7gyT/SwgVl6lsHZ9JHp2bHoMuUrgk/mmpOkGUXbLUdZAN1oyLWrfPcyiKu7Ba vQww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718955164; x=1719559964; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=e1fAFm+e+fwl4p3avvK+TEsJDfR37gnsqnlBJ9746Co=; b=KMnmLOyaugsAc7mPZedRSzwb66J3gONVWluuTLYUiraterQRwZZS3Dukzz/IU3jP1U rsVp1EWbVmrLDCbQBid8iRJySoZp3Gk+tbMY1E8yDEd2gIMyz+Pvln/AQIAZ4/lG2X5S cj6zIQIx/e72IiUTTTBlgVV04fG8v9pCPrs7k9Ete7RHhi1DSD1G3xLNSV0EPIqhLNpw 4tCoYCXRskWoiuAdOhhljC8sxInKibLQdCWhkt6SOChEX86wKcESOHf5XiKobqoUKu2g ZplmWHDo/lfWjgE1ZwAg9dvv7qm/Tvlg5AVySVUOZgz0vdwLfdkfVtm9MLk4Yh3J2v8h 5Vxw== X-Forwarded-Encrypted: i=1; AJvYcCVZiHTeiy8zYwoEKt9XBr2xx/xITpOO1XC3mDskciOWvBwki3dShoRt069Sg64GyGbTH9SbvJG/M9p4QnNKS5DwOxs= X-Gm-Message-State: AOJu0YzRmqGfqGnUOTe71Vir0GjgWbuH1zIi2vUiM5TOrg8YcV9oaoxd ejbHZVoIj4amsPYIPTyNrzcE2AyH8G3AqYjTyZDySNsWaHJfIseV38LecBy37w== X-Google-Smtp-Source: AGHT+IGQUGmgU0uXOViTqXbPvvn37qtzDP+kkZ4bvmoG0/DYSamWSm6G++bLxYcqakW19CJtXkx5Qw== X-Received: by 2002:aa7:d153:0:b0:57d:7ef:573b with SMTP id 4fb4d7f45d1cf-57d07ef576amr5115522a12.38.1718955164162; Fri, 21 Jun 2024 00:32:44 -0700 (PDT) Received: from google.com (118.240.90.34.bc.googleusercontent.com. [34.90.240.118]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-57d303d7b10sm550126a12.15.2024.06.21.00.32.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 00:32:43 -0700 (PDT) Date: Fri, 21 Jun 2024 07:32:40 +0000 From: Quentin Perret To: Jason Gunthorpe Cc: Elliot Berman , David Hildenbrand , Fuad Tabba , Christoph Hellwig , John Hubbard , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning Message-ID: References: <14bd145a-039f-4fb9-8598-384d6a051737@redhat.com> <20240619115135.GE2494510@nvidia.com> <20240620135540.GG2494510@nvidia.com> <6d7b180a-9f80-43a4-a4cc-fd79a45d7571@redhat.com> <20240620142956.GI2494510@nvidia.com> <20240620140516768-0700.eberman@hu-eberman-lv.qualcomm.com> <20240620231814.GO2494510@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240620231814.GO2494510@nvidia.com> X-Stat-Signature: tq1memj6tgezpjeyiikb5tm4pbhhkmy5 X-Rspamd-Queue-Id: F09AB80020 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1718955165-432525 X-HE-Meta: U2FsdGVkX1++k7+J1bKoPW7XXr6nHVjdBzdOOIN7vM8DvkbpREhYajFg6iqVemyJvfb8WBfzrYQgoU3M0l8Cysx8PU2yqrAUQqVYqP88OKHOnCTy7uplOQqqIqzZeqbmvuhianSsgc07jBBEvRPpFgAnFm6HsO1BzfBhRdQK34+/8z8llTA5llDhTAp4WF4A1UQ5G0b/8kf01otqMKdzXzxrYfgmoyIufIiAx8AdmQtA/3HrrNOhjZ0JmjHyAbt121mCX4W45KdxmdPddg0yTnYhsNYksZY04z05Rr+DwLXAQRq/w6eK0ivLrf8ef1U8iVupqX2njItLriTDAjU2Metoq6ukFqoTuLvV59XLjb1ePy1HL7hYjscFe0RPpBxkc112yGp4kzlRWhcFu1JKyP3Vx/lYHr0MrxcXjVEDy9hL2Bb7SJuKxkP3vyjcUMACq3TEnN2skmRxCjgUIWu8mL3ZZStvN/pfOr3AwY9/5SXWWX2vDz6dAZ2XJy95CECq6gkrQSkgqjmq8lC9J4Oia8lenUnNkQ2WFaraerIZLUuUwpbqTNaWyo3+jhNc/jWKMTVFGHGvVrx5IhE869weMAJpi+P+U1OaGJbacZohB2qkpgz6SfIGQbeeuvxmv4g5pR2d0HCh/rxTEblEWtH1N4JfADetIGpASr4MDpOD07D8BIcXm4oX1CKb1N0CuLBdwK68ZS6BpoG14tmqJuL7TiF8ZSwijYz+eQgHAzp1ggMT52ZBLpV6WIWOsJz6EGSBc4FDS0ljhD6MzbavnEv3XOeevh7b9IvSu2i53FtHGij26AI86xvO3hUogFMC7frNSv0d9oBbvXjnnR4XbnqWucDZyKbke2Liqk9zv5csIEW9e7YjeEEV+F0bc4tVgHPVPfEkNBbLl0JylnY105u5emEtwkBOjZYZiepVRPxq1LxqtQg9o1JUc2FHz2yzuToLH5SAka+h8QV1B9Ag5sr UWUCiaxy ETEfq1tcn1+C0+XNKvdbAtjLFV7A4iIayDJNRHns57E7nXpdDJC7QlqTkqDh7vHkh7tUfV7MjP3TnyJ/zaO1b2ZEdLeSzfp4WOS8Kj+ZY/1x7eoswEumInBudw5nei25Lar0uMDdUBmrZYE4EVPR6tSKHRR4vQY2ud7RCUEAyNPKH0NSDzsgoLqCtyQWjRDH1tWq9ncXZyV3ESD15WBbpapLycZe+lobcbzIby72OF1pto3+9U25yZqTxnb4BDVZPTVyL1LcLCHAv3nWKi6CCb5cgd+7MbzYwjEt4oSnGBWa2tV4+4BbEmRxYgQWVxN6Xt5D8QSaKQP4MPp/wYyodwK+oiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thursday 20 Jun 2024 at 20:18:14 (-0300), Jason Gunthorpe wrote: > On Thu, Jun 20, 2024 at 03:47:23PM -0700, Elliot Berman wrote: > > On Thu, Jun 20, 2024 at 11:29:56AM -0300, Jason Gunthorpe wrote: > > > On Thu, Jun 20, 2024 at 04:01:08PM +0200, David Hildenbrand wrote: > > > > Regarding huge pages: assume the huge page (e.g., 1 GiB hugetlb) is shared, > > > > now the VM requests to make one subpage private. > > > > > > I think the general CC model has the shared/private setup earlier on > > > the VM lifecycle with large runs of contiguous pages. It would only > > > become a problem if you intend to to high rate fine granual > > > shared/private switching. Which is why I am asking what the actual > > > "why" is here. > > > > > > > I'd let Fuad comment if he's aware of any specific/concrete Anrdoid > > usecases about converting between shared and private. One usecase I can > > think about is host providing large multimedia blobs (e.g. video) to the > > guest. Rather than using swiotlb, the CC guest can share pages back with > > the host so host can copy the blob in, possibly using H/W accel. I > > mention this example because we may not need to support shared/private > > conversions at granularity finer than huge pages. > > I suspect the more useful thing would be to be able to allocate actual > shared memory and use that to shuffle data without a copy, setup much > less frequently. Ie you could allocate a large shared buffer for video > sharing and stream the video frames through that memory without copy. > > This is slightly different from converting arbitary memory in-place > into shared memory. The VM may be able to do a better job at > clustering the shared memory allocation requests, ie locate them all > within a 1GB region to further optimize the host side. > > > Jason, do you have scenario in mind? I couldn't tell if we now had a > > usecase or are brainstorming a solution to have a solution. > > No, I'm interested in what pKVM is doing that needs this to be so much > different than the CC case.. The underlying technology for implementing CC is obviously very different (MMU-based for pKVM, encryption-based for the others + some extra bits but let's keep it simple). In-place conversion is inherently painful with encryption-based schemes, so it's not a surprise the approach taken in these cases is built around destructive conversions as a core construct. But as Elliot highlighted, the MMU-based approach allows for pretty flexible and efficient zero-copy, which we're not ready to sacrifice purely to shoehorn pKVM into a model that was designed for a technology that has very different set of constraints. A private->shared conversion in the pKVM case is nothing more than setting a PTE in the recipient's stage-2 page-table. I'm not at all against starting with something simple and bouncing via swiotlb, that is totally fine. What is _not_ fine however would be to bake into the userspace API that conversions are not in-place and destructive (which in my mind equates to 'you can't mmap guest_memfd pages'). But I think that isn't really a point of disagreement these days, so hopefully we're aligned. And to clarify some things I've also read in the thread, pKVM can handle the vast majority of faults caused by accesses to protected memory just fine. Userspace accesses protected guest memory? Fine, we'll SEGV the userspace process. The kernel accesses via uaccess macros? Also fine, we'll fail the syscall (or whatever it is we're doing) cleanly -- the whole extable machinery works OK, which also means that things like load_unaligned_zeropad() keep working as-is. The only thing pKVM does is re-inject the fault back into the kernel with some extra syndrome information it can figure out what to do by itself. It's really only accesses via e.g. the linear map that are problematic, hence the exclusive GUP approach proposed in the series that tries to avoid that by construction. That has the benefit of leaving guest_memfd to other CC solutions that have more things in common. I think it's good for that discussion to happen, no matter what we end up doing in the end. I hope that helps! Thanks, Quentin