From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1CB7C27C53 for ; Wed, 19 Jun 2024 12:01:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 598A76B011B; Wed, 19 Jun 2024 08:01:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5486F8D0066; Wed, 19 Jun 2024 08:01:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40F986B011E; Wed, 19 Jun 2024 08:01:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 228266B011B for ; Wed, 19 Jun 2024 08:01:55 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 93703C114D for ; Wed, 19 Jun 2024 12:01:54 +0000 (UTC) X-FDA: 82247499348.15.E9D637B Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) by imf14.hostedemail.com (Postfix) with ESMTP id BE796100002 for ; Wed, 19 Jun 2024 12:01:52 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kMSFtsW2; spf=pass (imf14.hostedemail.com: domain of tabba@google.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718798504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iOim+GQVHsUMHPzr0/zmYaztK3VZnG5jmRJWacqKqkM=; b=ejMjSahfPzzZy0fupesa9FlWwe6+PT2pZ2vmZvak6YwOHmW6382yj/RrZ/FvujSGFYqHJu YK0tlbiOBAa4KvaORYAvC3pewVOZcDV5eZX5h+K+9CO3xCbHA2o5TsDZXbAOWq5JcZsBfh IJ2l45DSwEre4TNzxpOp6xJdmtWag8Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718798504; a=rsa-sha256; cv=none; b=XdXo4u/qEJZRM8aPCDxGdGuRaubu1fnrfBBxshb1kek6w6JOU1JiJ/FmzwZ9Izb62FLVM5 asAoFJZppda/gKXKdM3f+4D1Zt7aP5bGyPfZRaQisH9YMudVgTjlW5OJkau7qHVH9+Sh24 d0sINm2qgW9zCCzmNjPoYZIGQoM7Jbo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kMSFtsW2; spf=pass (imf14.hostedemail.com: domain of tabba@google.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-oi1-f181.google.com with SMTP id 5614622812f47-3d2220a11a8so3687194b6e.0 for ; Wed, 19 Jun 2024 05:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718798511; x=1719403311; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iOim+GQVHsUMHPzr0/zmYaztK3VZnG5jmRJWacqKqkM=; b=kMSFtsW2r3S4DVDJlm9emFyfSNAoAATnmabNrIgL4nCKGbufwewkrKrtsMUXyeRJ2h GKmkuhMpkoPb97TilEMmeNmxUw7jzECGE3ygi0eQvk8T15VJcQHngPXkz/EjE1UjDd4A Sv6Z5Ts9gd8lXZ1JGK3sGK+VcIkVIMbOzkr7ZMiJeJVDXDBDzAuklpr1mFfCD9/BfMAA p3ec+JzQDORL5qUZQ6qw+ANRIAXfKi4RR3TnGsGugDdUAEVKZOg+LA1MQ0Js6B4oJpqf BOE9sVaQ10O8O5wniyh+oyTjOxkb1SDJrEHPD/CYlJABcp8dsKcD+nCo8V9cjt0n/Dcz fuUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718798511; x=1719403311; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iOim+GQVHsUMHPzr0/zmYaztK3VZnG5jmRJWacqKqkM=; b=gB7R64kM5RmyAYpNeDGtj5Crtbdj78CBPtsg7F6sJQG39bOk4xBYSe8clxyxi6XfPI kdsV6ADFOj1U4x+NexKNKl+JjcvTyyETxe06KaSp1lmQjCq7eYaH80YFPTrWk4DmG7QO W4M6vezggne3/gqeto2q44UTkG0F1n4ZVhpoV5WNxcAThzaO9HQbvmZKonu2Blvf6p5G 5H+OmCYsuSYHLzWIG2Xsvg065+Nfk6fXJAhR/mD+/6FoyPkwZdwRN5zx953tBZtCAmeq Uhta154UWZzVR54KswBu0GJ1fF1bPHBFJ93SSF0gX7TEba+Raw2XutQSvwLm1wHg0l9N XNlA== X-Forwarded-Encrypted: i=1; AJvYcCUPy55gUOlhTGM4WIzBkCledhBiyOXFbxDVhvlHvaX7l5ViXFej/raT43m6TlisnRO7R20gaY8+kZhDRSnxN5FcfjQ= X-Gm-Message-State: AOJu0YwuCByUau8JXmJP5+JqEAiLmpWMa3e0dL7ZadLLpdhB19vSkKbt pQkofG8vs95V3I0R3gxugT0ZjTnjZVgxyiyFI5ydUnd6B8vYVTNrIvwPdQji6lglV9s2/JIOcsl 76g+xIeY9Aly22L9kvtrr/liwStRPkOOfHFwJ X-Google-Smtp-Source: AGHT+IEn8HCT0+a4ly7gvmO3sbJl6r+3UVNd8+bJNqdKRtJx00yvI5NJhWVel34eQ5lTMDMcqvOsVTaCCjBhWoD15B8= X-Received: by 2002:a05:6808:128b:b0:3d5:1f50:188a with SMTP id 5614622812f47-3d51f5028a9mr1376646b6e.24.1718798511327; Wed, 19 Jun 2024 05:01:51 -0700 (PDT) MIME-Version: 1.0 References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> <7fb8cc2c-916a-43e1-9edf-23ed35e42f51@nvidia.com> <14bd145a-039f-4fb9-8598-384d6a051737@redhat.com> <20240619115135.GE2494510@nvidia.com> In-Reply-To: <20240619115135.GE2494510@nvidia.com> From: Fuad Tabba Date: Wed, 19 Jun 2024 13:01:14 +0100 Message-ID: Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning To: Jason Gunthorpe Cc: David Hildenbrand , John Hubbard , Elliot Berman , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 35atrz11hc6xfpd43ef76ria3f99hwju X-Rspamd-Queue-Id: BE796100002 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718798512-242433 X-HE-Meta: U2FsdGVkX18m1fNy4h9jOnFahH37xMQMiTKRoiH4gphXOm0FC2Up2DtlPagASUwdNjIN3t2MWT1ryvjvtSkdIt+55xkc3mkwkgWNd+KM4AlC4ob+Oq5yOjhQ1vvrszj42DcJCKAzFhPKdN8uLgpKdC0/Wb6OTDNKVpBE7c4FyCbRQr5GsbBD474/fRAkjQ/yyNaNx076EkDiMO2sAn6AcSg+SofOEbTuXZquoQVuvwoDOn5JfTl5xYIcFdt+Yxv93D8axcusnDtU99KFtXuL2miAiEcqEuOsTqu92bQqgDKg/6QEBuzqRGGeQDRLj7r8iHyq6H2CS65ldKmwOD3nEb6jVopNK5H1SmmYJU20Wu0luPpixAZg0jSdF4OXRZeeuCGBvMN10dKS1v3wO03MV13mJ1uMhq6uLt5moZiU7l8tsPhGITdVSEYwQCudJcA8jJEkC6evm2dxVJ4vpMIfIWYM/GatQsLqgvnsVmKEFjEDWZZeqbxfa65StWTmtOBh2+Y4NVuSw3FTAa2h2FF7DVw8d6x4/bGFsdLSI5cXaOndF1cLap/pbvc0tgI6lOVvMd6MZgvqd6mlgBml8HxkcL9FnhvYjIXd75BH5AJf9Z8E8QLm8PCInp1PZR3R1IvRwzsEpPDAvS7RAH/nDaDHrA1RLm4Dgc1RAB2vpa1svdCTiKlWjOwdqOvDcNezKIMKYQjkA7vqo3YQ3vbDft7CvLwcvAa9EJWTpUk+7gCFS8xkf0h5hUBfN9oXGNDs04NLomJzXCd4tONxd5PJljUMhkYq1s3Pl1uHWmdtAMQbRk62itWzFybUcNjcRBEicSAS7rWq0F2KfDfVbI9e9hEEWOg9FF9YBjPYjJ29stPSmq1V95XszyrdYgfPP8GSav34hdQ52D96bUvuEfhLZ7vu/Zjo2+6u2nFLARKZS607rfxkWerqx9YSCZ+DBEevnseEutYxxXE4mwDGVvNQlHh WWmlsOHr SOYMTSpfW+UNTWS1/orVF6hf73wLzQBqqg2tJ69iyOKl+0/hJOU6T6BwTZqzrSHsb6kxh3/N3POqTwlnUvB4rGY8ic3w2HG7HcjJ4BHZZSVGu03CUZmcXmoxmIu79iiW8oZ3B19qoeY0jVFSz8m3cSjTxJdsGRUVSJPzLj2pWMmCk7MGnFM/v1dfpwpzkQMoiFkmt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jason, On Wed, Jun 19, 2024 at 12:51=E2=80=AFPM Jason Gunthorpe w= rote: > > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote: > > > To be honest, personally (speaking only for myself, not necessarily > > for Elliot and not for anyone else in the pKVM team), I still would > > prefer to use guest_memfd(). I think that having one solution for > > confidential computing that rules them all would be best. But we do > > need to be able to share memory in place, have a plan for supporting > > huge pages in the near future, and migration in the not-too-distant > > future. > > I think using a FD to control this special lifetime stuff is > dramatically better than trying to force the MM to do it with struct > page hacks. > > If you can't agree with the guest_memfd people on how to get there > then maybe you need a guest_memfd2 for this slightly different special > stuff instead of intruding on the core mm so much. (though that would > be sad) > > We really need to be thinking more about containing these special > things and not just sprinkling them everywhere. I agree that we need to agree :) This discussion has been going on since before LPC last year, and the consensus from the guest_memfd() folks (if I understood it correctly) is that guest_memfd() is what it is: designed for a specific type of confidential computing, in the style of TDX and CCA perhaps, and that it cannot (or will not) perform the role of being a general solution for all confidential computing. > > The approach we're taking with this proposal is to instead restrict > > the pinning of protected memory. If the host kernel can't pin the > > memory, then a misbehaving process can't trick the host into accessing > > it. > > If the memory can't be accessed by the CPU then it shouldn't be mapped > into a PTE in the first place. The fact you made userspace faults > (only) work is nifty but still an ugly hack to get around the fact you > shouldn't be mapping in the first place. > > We already have ZONE_DEVICE/DEVICE_PRIVATE to handle exactly this > scenario. "memory" that cannot be touched by the CPU but can still be > specially accessed by enlightened components. > > guest_memfd, and more broadly memfd based instead of VMA based, memory > mapping in KVM is a similar outcome to DEVICE_PRIVATE. > > I think you need to stay in the world of not mapping the memory, one > way or another. As I mentioned earlier, that's my personal preferred option. > > > 3) How can we be sure we don't need other long-term pins (IOMMUs?) in > > > the future? > > > > I can't :) > > AFAICT in the pKVM model the IOMMU has to be managed by the > hypervisor.. I realized that I misunderstood this. At least speaking for pKVM, we don't need other long term pins as long as the memory is private. The exclusive pin is dropped when the memory is shared. > > We are gating it behind a CONFIG flag :) > > > > Also, since pin is already overloading the refcount, having the > > exclusive pin there helps in ensuring atomic accesses and avoiding > > races. > > Yeah, but every time someone does this and then links it to a uAPI it > becomes utterly baked in concrete for the MM forever. I agree. But if we can't modify guest_memfd() to fit our needs (pKVM, Gunyah), then we don't really have that many other options. Thanks! /fuad > Jason