From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4C97C369DC for ; Mon, 28 Apr 2025 19:02:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACEC66B00C5; Mon, 28 Apr 2025 15:02:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7C906B00C6; Mon, 28 Apr 2025 15:02:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91EC66B00C7; Mon, 28 Apr 2025 15:02:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 775B56B00C5 for ; Mon, 28 Apr 2025 15:02:14 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9394A1CC144 for ; Mon, 28 Apr 2025 19:02:15 +0000 (UTC) X-FDA: 83384373030.25.408275D Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf07.hostedemail.com (Postfix) with ESMTP id A63D740006 for ; Mon, 28 Apr 2025 19:02:13 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y4PPaTrw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745866933; a=rsa-sha256; cv=none; b=lYIpUdvV6L7m/m/HBWFmoUYo99Jsx+uefRU719ht4vk/qgMhVCcdLcPuYVaV69vCXekZ/u 1gMaRZR7p5W3Go98IoHjc9FdhIC30Qzunplorou8Loj3wPc0EgkigKHuDE8lHgqvjmKCYN tYRsGw8UxpOlNFQj77wh1RZEXl6LX28= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y4PPaTrw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745866933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0d6vKqkeMQuJL7A/bSp8ZFP6OrnkpLdgBb2QvNCbFNU=; b=toR++8jBAkya4xLY0NX8jGJtGY8+JgdPTWa4j1S9GxOB4Wheuhsu3UdDlCoTIMxjpwlIhg wPpINI8schLAY6M1a4slMJD9MHUG5lCjsr/yuImeIOkbxVPdahObmAJNuZBQRVwUsng13k gnthkUjsa/uqKw4Syegstnk5q9wIDoI= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2240aad70f2so36695ad.0 for ; Mon, 28 Apr 2025 12:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745866932; x=1746471732; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0d6vKqkeMQuJL7A/bSp8ZFP6OrnkpLdgBb2QvNCbFNU=; b=Y4PPaTrwAPi+LQlv9vEeOAODbY0G3Aqelyl0vdl4/Gq00+gFUm3SSgZQdVdYIuXXzQ fvsAx7GoHQQEckanUULuBx+bZp93mPeQvlP4qQAfReao3+IwveiBWy9crbjiWXcY5iHk a4Om5Kqt7ceJwpcHOZ2tfU4LtcjWILaKSwlSQn7TzrO5HiQeppzQFJSWGnpM5vJFEBDb qaBsOFx2sYj2rzcZlJblGn7E053St5oNUx3mkIZwasW5+GA73ae9h7dPOOYtZCypbEf2 bCgMPW1Jyz201JfCEJJ28KDrdWoUtH8Vv7JMjFxR7SkFt46O/XPbTlPI2XCkQpQyeD5K MxMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745866932; x=1746471732; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0d6vKqkeMQuJL7A/bSp8ZFP6OrnkpLdgBb2QvNCbFNU=; b=a6eW5YfYU12zMYVo54j2OyD39d7eAYbnfyXuTwyLAo2SetcDf5D4E6j15hVb7G27xz Y7piQiI6QaZRZfqxycpcIJe3QveVf/IYPeVnOjPLCej9VMhoxQP13oxhVk/uoF/u4e1f wi9V7+qRExTugrT9Tt4pkUdTnkLHaeGDtLqhZGCuEFBICMd99apIvqd8ANfItrSscmMO YhF977cn8f4Y4u6ZjM7A/egsvTXBAOSzfAzXBIZbrgqz72Zt6A6mzbwIHRVyZ/MfAJvY jYEAG4xHhfQNQ8eSzVAvvcec79qDwP1my1EyraQ+cRBMAnahUJPNNazYxCfmUQ3KarYg gOmQ== X-Forwarded-Encrypted: i=1; AJvYcCVOVS8PBfOl+RkCyl5BXWbz+3f0GXg8LxIB9o6dE5gHyZsei3AVUY/KUbrjkroBQB8Qy3BhjdW03Q==@kvack.org X-Gm-Message-State: AOJu0Yyb1pTlerCTO6eO6PHDgHmeZb0aCmAx6bX1y/Du39+W3jzHH3eF 0AVN7RzG+ymDw0r54Mz81OAe89H3Yq4VhawIut3Cqn6UsRKOvgj6ouD8mi9U8x0hTqfxEeog8nV HvkAIldya4c/wXVCqbJ+eeClAbfONEkwbivGU X-Gm-Gg: ASbGncuxLxOuFiXGMxEHzsxSkzgmIo9k9p4qpVOtRqny8UCQVgHl4R80GA7R2gTtyIo 4resZTB47lNH7IDtHCrIMoPxbWWRh63FaTLjYfMKypAaqzis3g9Ei0CDmLZ49sb6eXNl++Kn97b cBH9TqofIBk3n9TG/B10Vy7mBLkOhkh/upk9GyBDZyd48eoERPC9gE11zFzk0Kp9I= X-Google-Smtp-Source: AGHT+IGHVt/gKeNNEkBuiDZ/WNCdhRReTsGIQqOMd5cCX92xfcbOewQ1LqrEb8wI4CAhtx4rSrkAMWhGeINdBb7YCZY= X-Received: by 2002:a17:902:f54c:b0:215:65f3:27ef with SMTP id d9443c01a7336-22de6c47e20mr422765ad.12.1745866932074; Mon, 28 Apr 2025 12:02:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Mon, 28 Apr 2025 12:02:00 -0700 X-Gm-Features: ATxdqUFfs0t7N5eWmI-1jNaKWhFko9iPatrQfOG5_SqVZrK0K5YoR1PgPQdgCTU Message-ID: Subject: Re: [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page To: Yan Zhao Cc: Ackerley Tng , Chenyi Qiang , tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, erdemaktas@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: A63D740006 X-Rspamd-Server: rspam04 X-Stat-Signature: exa3qn8z3xsdpq61rmbqdu5p8fscaisg X-HE-Tag: 1745866933-751874 X-HE-Meta: U2FsdGVkX192qo8mwul7ZP6x3SS7aEGSlmbNM2NpX5QCBRpvcqS7kPpOaFX9/GLiFKQQcARgNnTLdI/1dTQ85cRD1KL+6JutK37VkMC2W4CFqgr8RSRpVi5WQKUosktX4KHKAz4hDoO7s48Xl0yJBx+ZS83Djqe7u3+uiEOB6/z4WGimzR6BYCoXWph3ImFEjQGiKpbHciCwIuitgSaK+idBRLLRnutwP+vSgbBUMFR2yd5Gn5EQnGs7Db+/RmLmqg6/YFzTCpZBk1yvhf9uUz5YpWm6Vc8AWzc/Q2nJkKEppU9ZGy25Ba2oN5/BBj62B04eJnVjmnPN56I1GpUvHf9zhLVsQ3bGjNFtPFz6SLZ9o9CMYlR2gP14CZBEbUH8UCfdZxdEQ5nKH+psbAYsRBLPzGOrHU4CjMPpXcHpVCtXwOVoQ7GYk53zSKuL5XSb8RCyhZT1DmeTEc6AS8y7g/8V6lZh82LZMEVK9u3HCdROMY8VOJZ397YG+i21sr9RL0BEvdYUBeVomW9k5NI+pODQHOm//B9YRQ+DLOwKJ/Wr02KNsUd2MWpCornX64CM73v8RBapbKcltvXXJF8XfPbT5mWHshAAwXWIRU8Mj1+PULR5yVqOL9YZVoM5sLH9qkSjZrg4lZFkrA6TK3a6oeqX0yYWRCPWKjeEwA4XGtmESFJ1NQqR9cMTKUcJz3Z4e9QnDpmO65w+lDE4Ttdi+4UKC69DGS33q/TH9D+C0o4slgUdafzfQFhuX658sJZLrttAT/NwjLfJ/cNeLJDsKMEasw0555bS64t07qMPax5zHyn5w1sZmzDoQAdq+ASuCg0dNnWoX0jg0hDPd49T5KXhhOYUI5jel7qfF6Wc5nzK10iaEsujZwvdyc9EwySBNLaU2lh26fTloJ8M1n1NOmIyHebqgWR3CcGWn+L/9iSaV691CfD4XsLC8R1QIFyUknEAIAUOxgIJ1nHDdnm RNhYwmCB otaDGZWkZ9PwYR9FyQDhHyXz4mLBbiOdxY0amAixdWrlyAXhItYThznConXU+x9zD2vP5JQH7GrlHR3iVSFwxh1W3C9iW36X44jKmsmSHQwSsJ5mDZmNTr4eh/AyCS9ZHEslQQr9ZMn8n+wY+fTUOkCtOJB5SMkQ9NSiSzqgl0gA+w1+HydyLdaWv3niRkKaUM0mQBcQ/ZExKe41/YE+gFnmacEkUaOxouTZ/kZ2Df+nwq/+CfREblfGQ1UELVRhTobUAzKrJV8JKmWjQzdb4iszRHDWj8J+yixfIXNxsHLBMtoVBojLii+Ah8vjgJoR/aHGzzG6ZR0uGbX3k7t9jSGSoWqfEMpca7KHnl5tvwr1jS4PaGbRyKie7i36UTvis1wimUJ9c0Ub35YyC5WddlR4lDezRvJUGWjsgAKY8wcZIAFM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Apr 27, 2025 at 6:08=E2=80=AFPM Yan Zhao wro= te: > > On Fri, Apr 25, 2025 at 03:45:20PM -0700, Ackerley Tng wrote: > > Yan Zhao writes: > > ... > > > > > > For some memory region, e.g., "pc.ram", it's divided into 2 parts: > > > - one with offset 0, size 0x80000000(2G), > > > positioned at GPA 0, which is below GPA 4G; > > > - one with offset 0x80000000(2G), size 0x80000000(2G), > > > positioned at GPA 0x100000000(4G), which is above GPA 4G. > > > > > > For the second part, its slot->base_gfn is 0x100000000, while slot->g= mem.pgoff > > > is 0x80000000. > > > > > > > Nope I don't mean to enforce that they are equal, we just need the > > offsets within the page to be equal. > > > > I edited Vishal's code snippet, perhaps it would help explain better: > > > > page_size is the size of the hugepage, so in our example, > > > > page_size =3D SZ_2M; > > page_mask =3D ~(page_size - 1); > page_mask =3D page_size - 1 ? > > > offset_within_page =3D slot->gmem.pgoff & page_mask; > > gfn_within_page =3D (slot->base_gfn << PAGE_SHIFT) & page_mask; > > > > We will enforce that > > > > offset_within_page =3D=3D gfn_within_page; > For "pc.ram", if it has 2.5G below 4G, it would be configured as follows > - slot 1: slot->gmem.pgoff=3D0, base GPA 0, size=3D2.5G > - slot 2: slot->gmem.pgoff=3D2.5G, base GPA 4G, size=3D1.5G > > When binding these two slots to the same guest_memfd created with flag > KVM_GUEST_MEMFD_HUGE_1GB: > - binding the 1st slot will succeed; > - binding the 2nd slot will fail. > > What options does userspace have in this scenario? Userspace can create new gmem files that have aligned offsets. But I see your point, enforcing alignment at binding time will lead to wastage of memory. i.e. Your example above could be reworked to have: - slot 1: slot->gmem.pgoff=3D0, base GPA 0, size=3D2.5G, gmem_fd =3D x, gme= m_size =3D 3G - slot 2: slot->gmem.pgoff=3D0, base GPA 4G, size=3D1.5G, gmem_fd =3D y, gmem_size =3D 2G This will waste 1G of memory as gmem files will have to be hugepage aligned= . > It can't reduce the flag to KVM_GUEST_MEMFD_HUGE_2MB. Adjusting the gmem.= pgoff > isn't ideal either. > > What about something similar as below? > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index d2feacd14786..87c33704a748 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -1842,8 +1842,16 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_m= emory_slot *slot, > } > > *pfn =3D folio_file_pfn(folio, index); > - if (max_order) > - *max_order =3D folio_order(folio); > + if (max_order) { > + int order; > + > + order =3D folio_order(folio); > + > + while (order > 0 && ((slot->base_gfn ^ slot->gmem.pgoff) = & ((1 << order) - 1))) This sounds better. Userspace will need to avoid this in general or keep such ranges short so that most of the guest memory ranges can be mapped at hugepage granularity. So maybe a pr_warn could be spewed during binding that the alignment is not optimal. > + order--; > + > + *max_order =3D order; > + } > > *is_prepared =3D folio_test_uptodate(folio); > return folio; > > > > >> Adding checks at binding time will allow hugepage-unaligned offsets = (to > > >> be at parity with non-guest_memfd backing memory) but still fix this > > >> issue. > > >> > > >> lpage_info will make sure that ranges near the bounds will be > > >> fragmented, but the hugepages in the middle will still be mappable a= s > > >> hugepages. > > >> > > >> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/= 3706/binding-must-have-same-alignment.svg