From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A2BC4332F for ; Wed, 1 Nov 2023 16:36:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7BAE8D004D; Wed, 1 Nov 2023 12:36:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2AF28D0001; Wed, 1 Nov 2023 12:36:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F3218D004D; Wed, 1 Nov 2023 12:36:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 910088D0001 for ; Wed, 1 Nov 2023 12:36:05 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 636461CBA36 for ; Wed, 1 Nov 2023 16:36:05 +0000 (UTC) X-FDA: 81409937490.27.D0E0CAB Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 74B5E40012 for ; Wed, 1 Nov 2023 16:36:03 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=T7Kfbx3m; spf=pass (imf11.hostedemail.com: domain of 3cn5CZQYKCKkbNJWSLPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--seanjc.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3cn5CZQYKCKkbNJWSLPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698856563; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=kepAmqoE6H3Nx2x4mqVrdydvUi3hFoEIgttSAePsnQpz3phNoSp/PMe3yFy0Gj5lAuX6zy Ux9G0+OHD5znHJXDya9W8ywwDrsnHfhyjQ0Kpc6+M+IaP5gOtT7EFVgvSrLXtoKuhTrh7q 6Z+WVdLxRhRMtuPCtisGnCjn7/9CDp4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=T7Kfbx3m; spf=pass (imf11.hostedemail.com: domain of 3cn5CZQYKCKkbNJWSLPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--seanjc.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3cn5CZQYKCKkbNJWSLPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698856563; a=rsa-sha256; cv=none; b=IJXb9t89XM+qvxXxMnahmDp64jIHT1RIWr/gUJFlbW8Dyy6tw2lfqFIJcky18nXu141xrG kG725c+1T8nZlv1BjyXRm/amNLchluPB9Rzkt6VB+gpKdXxGYBUKsc+bdHK+v07HNfCkqc 4jzGeYwXWhAS9PCe4xg06lGn7dbnbI4= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-d9cb4de3bf0so7625704276.0 for ; Wed, 01 Nov 2023 09:36:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698856562; x=1699461362; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=T7Kfbx3mzdjr4khIDq+8fzszV25Wuha+nqFj1618vyLWAyaJvMVahTuL/W2z6v70Bw MY2L+1Fb5vE2+XVMK6Xo4BXQuxhLhxKh32etQZzfwvBChwjTCXt+UW6+njBkCZjgKZpt XfjtPKA/Dggicb7LeF4jbXApYoNCLhWNh2ZyoanRQ/arLEHICeckNk8lF3DWxr3CJuDh SMFHv3wdkun+OPIl2nmQ1Ku9VYfV+Akblgr770+MFyXuhqz++TGCjcOX/b0ieUIr/swP tV4QLgsiCfzmtknM2VZSFB37TsL+lWYoi+HXvw498nOHQMi0gww781OEnFXMvmaVK7v3 S6/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698856562; x=1699461362; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=TDMId+mYh/z01ksRySmPmySk0yoyHqf/zyvKYE4IpkRg0f8ttu4QKZFvZc1BZYdX7J LY1FhUxSuCJQZi3HsmcuwDSUseaZfx62Pxs1MACtAXpBs4D+yxePsYPhLiEeUg12nI1K 3qi2UK1A8saIzPCQSqUosL3o4SmkCPnzoVph7Z/nWKZd4gXHdcnPlBydSRXqGxb5OjpV e8O1e5eOgaR4CkCxfBwmvTbdWoah7XlM0j6nQdlT7R5qFZHjhjcX7a5Gsmk7m825sMbl EudEH/7uugqoGAlsyoWM4ziJwXVQ0yB6m9qcqLMukVtC59oKCk29xw8C8cBOHzxH4UU8 R21w== X-Gm-Message-State: AOJu0YzVgM8L+6nOqkUK7v35BLXYHLpYHC1lxGlMF6gYEelU/lPudp6i nt2TJmArbn/2nbVKuj7HXzyQhI0KPMI= X-Google-Smtp-Source: AGHT+IFqCABorYKmfCmJZy/oBSeOXTtnBU+krovHVjWnvhFIdm8AwbTMBJoMdfZ9FR5pLbycY5bTrtHt4Z4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:770f:0:b0:da0:73c2:db78 with SMTP id s15-20020a25770f000000b00da073c2db78mr326876ybc.9.1698856562473; Wed, 01 Nov 2023 09:36:02 -0700 (PDT) Date: Wed, 1 Nov 2023 09:36:00 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-18-seanjc@google.com> <7c0844d8-6f97-4904-a140-abeabeb552c1@intel.com> <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Paolo Bonzini Cc: Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 74B5E40012 X-Rspam-User: X-Stat-Signature: x7gmg6hi6xsnni4tmboptqnsh3qzbpoo X-Rspamd-Server: rspam01 X-HE-Tag: 1698856563-885007 X-HE-Meta: U2FsdGVkX19w3ynGvfX4yN7JbZFWqpwj2TWL8oe2ibAwKW071b1Ms5cljUm5W8okYVN2MlPe48NkVVu9qFDNcYwBtM9Y3PSJ6/dLBq3HX09875uiuQdm197xtOKOJL3aBQDDFYqq8V8uHNfy6uNCJfZkLqKPXfXC4BU5BIyN6NuBb5Kc6tM8T7lw6QMzbqgI8dDyZsBVmpTPU+Ip+erjxiY0r2jKDXt4xSpd9+Uyhc9uSxSp+oY8yy1M7Ymn4k2MXo4pmZ6C1zQ126fAMUOi7Sf9RsiBihyiCfu9rIQh37+g97erDqUPTtQ5lyx50mvPBY8OkXlE6r29K8XKi7MnQGFgxVy3ogt3YucWkKgIDZWUjU49rav7aDLl7uXuTlCPuGDEB4GVsCOJA+CQgEfSwP9fBCwO/p+aYkqbKd3T0S76tpvlbq0fB7n8AST5iQpYV6ynbvztcr8ODhSd71PLI6ZwIGt2EBUC8K1HIEhe7Xjr0h0dkuDKMMUFRQYhMGmD79u62ID2YLR1XeddJ3Ba6XSPMZeWXuTd+43w/i9+Lb0OSCrN3E51/uvm71PQE7psCmQB9oTf4S2OrPT99/xVjmLAG3s47eHGyFO3vHmvYKokPDhmCLXgXITnQF5J+P2DtBknXooyJOCrmgk9+6E8jmdOfiWdqkS9eoEqkoat581ajSZ+giIQKRFULg70LD6r8GVZXlFy6e+3hMrMtSmL40VJgRuuWXNLmsSkvNPx+7pr67Q55Uahhc785jJQGyCD3qUOn6Pk8M881KolaDNk2p87dnroZNpkAA1oFZbUAStFluwIl5XUMOsmgcs0wj3bjpGA4XJsRN4RbXK9gkCpBxsqRX/qy9JqnrRKHBauExXLdzhBhFMObjWLd0HIGlN+R2KbzeelSEXf0WiW6Q2Jy9xz7OOpU4DF9X+ykrl9amQ1AM6sAO93q5W93dvV0e7SjgpD6e8F6j8w18elgy+ O5LHHkDj GZlW+tKC8goJCb1HM8vj/rJnrWTGwnmnrEH+VagVcdBVpCRt0/yLl74ZSfFB7tg+OhLcG1JoN42/ZFGt+/50NRnIdgP0qzx2lg8+2JWMvRYUCnx7pgC5uyTrHeRqIf5vA8vLxcEKqIpiyPt6sxjXubyAZ0Qrr/SAqKSPhygICJxjJDg/BDqQW44vHDCuNfsw/xOtZmfy3plVkTCr7CML6RtFqlrMFrZUpv0Yf2Vsd7U53cD7uMBWW6OWJ/tZm/yGKfwwYG+9y9PLYKZW8rMErKUhVqxMWUZq4i82iPqgwtnU7GzFopA0owh+RaGRu/aqcygKY/rrhgoMgvPSc8puXllZ52LPsfoSWunxXiyELhysOyr2NsFVjz8d7C8CIK2WsGbb4HybQj0zNHhvSqBrpwKwLOGmCZb/PS+sHwBUZO2PQ0C+emOKo6tSRsG3P9C8J8Nc9RxfiBr6YeYl+EKpjZ5sBoV5m3AhhVsARfIpXBA9/QJUPjS9bTvzqn6bUX6PaiWOKXSKF4QpzrkyUq/ZEFFFDW4jvXkvtrch3sFF9Wee2aD5jJbzVZWqDURjtS7Y8AZfOY2GXllbtS7Cm1KNeuVi0huNfah+EpZb+h2gsEm3GhE3+lE+n8LN2YwxBsyN2nySiL0hX9F5G9UFr46Rw3Pz98eIs97Picw7GF0gjxfCMrO0jtX+PqutdeQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 01, 2023, Paolo Bonzini wrote: > On Wed, Nov 1, 2023 at 2:41=E2=80=AFPM Sean Christopherson wrote: > > > > On Wed, Nov 01, 2023, Xiaoyao Li wrote: > > > On 10/31/2023 10:16 PM, Sean Christopherson wrote: > > > > On Tue, Oct 31, 2023, Xiaoyao Li wrote: > > > > > On 10/28/2023 2:21 AM, Sean Christopherson wrote: > > > But it's different than MADV_HUGEPAGE, in a way. Per my understanding= , the > > > failure of MADV_HUGEPAGE is not fatal, user space can ignore it and > > > continue. > > > > > > However, the failure of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is fatal, whic= h leads > > > to failure of guest memfd creation. > > > > Failing KVM_CREATE_GUEST_MEMFD isn't truly fatal, it just requires diff= erent > > action from userspace, i.e. instead of ignoring the error, userspace co= uld redo > > KVM_CREATE_GUEST_MEMFD with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE=3D0. > > > > We could make the behavior more like MADV_HUGEPAGE, e.g. theoretically = we could > > extend fadvise() with FADV_HUGEPAGE, or add a guest_memfd knob/ioctl() = to let > > userspace provide advice/hints after creating a guest_memfd. But I sus= pect that > > guest_memfd would be the only user of FADV_HUGEPAGE, and IMO a post-cre= ation hint > > is actually less desirable. > > > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will fail only if userspace didn't provi= de a > > compatible size or the kernel doesn't support THP. An incompatible siz= e is likely > > a userspace bug, and for most setups that want to utilize guest_memfd, = lack of THP > > support is likely a configuration bug. I.e. many/most uses *want* fail= ures due to > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE to be fatal. > > > > > For current implementation, I think maybe KVM_GUEST_MEMFD_DESIRE_HUGE= PAGE > > > fits better than KVM_GUEST_MEMFD_ALLOW_HUGEPAGE? or maybe *PREFER*? > > > > Why? Verbs like "prefer" and "desire" aren't a good fit IMO because th= ey suggest > > the flag is a hint, and hints are usually best effort only, i.e. are ig= nored if > > there is a fundamental incompatibility. > > > > "Allow" isn't perfect, e.g. I would much prefer a straight KVM_GUEST_ME= MFD_USE_HUGEPAGES > > or KVM_GUEST_MEMFD_HUGEPAGES flag, but I wanted the name to convey that= KVM doesn't > > (yet) guarantee hugepages. I.e. KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is stro= nger than > > a hint, but weaker than a requirement. And if/when KVM supports a dedi= cated memory > > pool of some kind, then we can add KVM_GUEST_MEMFD_REQUIRE_HUGEPAGE. >=20 > I think that the current patch is fine, but I will adjust it to always > allow the flag, and to make the size check even if !CONFIG_TRANSPARENT_HU= GEPAGE. > If hugepages are not guaranteed, and (theoretically) you could have no > hugepage at all in the result, it's okay to get this result even if THP i= s not > available in the kernel. Can you post a fixup patch? It's not clear to me exactly what behavior you= intend to end up with.