From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB0F9C27C52 for ; Thu, 6 Jun 2024 21:21:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 382246B00B0; Thu, 6 Jun 2024 17:21:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 331FB6B00B2; Thu, 6 Jun 2024 17:21:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FAB46B00B3; Thu, 6 Jun 2024 17:21:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 009046B00B0 for ; Thu, 6 Jun 2024 17:21:24 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 852A51602A0 for ; Thu, 6 Jun 2024 21:21:24 +0000 (UTC) X-FDA: 82201734888.04.2685FCD Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf09.hostedemail.com (Postfix) with ESMTP id 81566140008 for ; Thu, 6 Jun 2024 21:21:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=U1UlPvf6; dmarc=none; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717708882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=32DgMdPqDhf7Of7xQevKbSboujGEuMHQ5DhXYk4/6hw=; b=J/edi/OYJasJJvkg3BGXTlsh5Nso/mT6oDgqPW1/s91Lr2/fhg1hwq81MHbEZ7QjbBjA8k ulg5QGnhlCzaRtCO3jCed14/pCqYcKBMIfbN/Zy6gw7CqK2R96H0qkoC0BCJQlllc0kY3A rHiD3o1cYeC4RNJrop8FoYlfA8+32Bk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=U1UlPvf6; dmarc=none; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717708883; a=rsa-sha256; cv=none; b=G3boEC4+/DMWwMzNlkg2ZapUxxfsK/6ASGCRvkHU8cl4SQwN3eTjF3SPm9364a0f3RgeiC nWJ2NPKGMFXntnkzoH02896zIqYhwKUKy6oV9PnqiTx9BCZEGkrFK0aFW3hw82x46+QPeD 1yQ/dEWHZRIOjoDmtFUPtdWKpR4Jd4c= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=32DgMdPqDhf7Of7xQevKbSboujGEuMHQ5DhXYk4/6hw=; b=U1UlPvf6mS0FfEYAcwX2rMKQqx SnoeVrdi9uYlGufMlLwUWvp+JX/bwyuXDffEm295fWu5ALZvrBU2oqPrF3NUQWxZMYXBNkde83012 gsEm0Bx1hx+PSEp5SNNz/0xF4qraM6k6pZdJ+VtzNAZ67seBzxs4OBDXs3Th7FzajtOz2XiffkDX9 N/KatgH0XXNKYLVReK16b/OP/W4pehzsReUPuQ5MgqcH/eP9/qjt8XREexdr0Jz0eBIgGI6mx6s0g kgSVWEujW9845pIrS97NCQoSKwG9+PlFLIxVu4QmawJUju1Us+jo8z6uUxnB+m+U6OXgpgR/FP24k 1XaD/AhA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFKXp-00000004GIW-3Vhr; Thu, 06 Jun 2024 21:21:13 +0000 Date: Thu, 6 Jun 2024 22:21:13 +0100 From: Matthew Wilcox To: James Houghton Cc: Khalid Aziz , Peter Xu , Vishal Moola , Jane Chu , Muchun Song , linux-mm@kvack.org Subject: Re: Unifying page table walkers Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 81566140008 X-Stat-Signature: 56npqosd6fm98uhmwajnfgw3iucob9wg X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717708879-758884 X-HE-Meta: U2FsdGVkX1+3ho5Aj1uNIMZNQI2DEV7zb1TgW8oJ0Q3YwFUnWM3a9OqNVC6N3ChOE1rCL6lmZ9TYacsRTsyV+yFh1REufTPO/BASM7xZpyvNx3/qi62FwTVOn0uZTa85uDk/P5Om8bSSGFDw0UWiZOpCADDS8jea3jrCqSDnWK/+sizs3xfUdW0Y2SPJDcnbtb/OUw1yYTNXcSnpJ2ci0CAvhunMjGLba9BXJzJTCzJadNQXcJfZlZ8mJnTEU1s9oOkrC7z0TWvoOBcR7hoNMG+6FfuNpGLiF9QlbpsnNlwfnCtMuQUiIqdAl2m4hdMagRmg7EUkdLQkGgVNhvX382sfojaEq/zp8qS3dgOPUt7tMGLJjYKzW9n2HhK/a7amt4ZJKK8ks/UcpZ7XIw70rv5ChRJXru6JYP0vdQVvGlGuahWD0rNRsaeHwgj86gEmi8RatAc9BVwiqN0ZlWaSCeHA8a6HhM3rvWX7k43JBeFbhFLTiPBCfDwYLc11xqRztPolx55g0RU1hoKpW2mI+Ava+P8Tmy90c2EGbmuGA490K4Nmxz7qhg6HK4OoVbuWbtzLdRu7lCrNvXRTJYrCOBKUfnCBGwEqn6PyR0QIYsCKvKmKkSd5scnJ0wyz8VdkoWXuLjcq5nKMRm1TVPQZsmtQqRNG9n2a1+GffHTvsELhBZvXau2tejItmckqdQZijsvNoKEV8OUqqkgMz5Dc2d92Vc5OUUfJv1Cs0RokVin4/HhMQN8xuwzI5OOzqdDEZWYqvWzN60WMemxnNTcBQlmIBJkuTPA3u0MDdTjVYae9mFAnicQhu2s3ZXL51ai7uq3+Jxxewpi3T/0MOA6y8tqjFyZwkDzjoLxiJcE3+TKfjfDkMT9RFqWsrhXpmkZcMr5Nerm7K0Jnear8FCO5pHnbnzPTR77Dqoo/ielXj3ZhonodVzjsE1lYG37HkElDqXRO/SuT48Dk1YgXGCq IChFfGa+ FtWnQRJpsBLyIYGASh5WXHFgni8O+ouDcHI0q5RLy4xEycACSzBe5aEajAZxRc+Xmgrep4cVH5gsURnmIzjVNwVu/V5fU/hgERV4YR2syHCAAU9lHMO4q3KwR8isa0UPj4Ra76BjI298mQMrqCj8XOIZnCcni6oqgFR1K2PjVflgcu8ewFJ9MEJpXf5e4oseslFIqaBddFcxvJE617j1fzp1ELuRvYHIxRA8iKR3ZMN4GGB45ZS4fqRNUl682f7ixFz2Xhj6NSloUbfpnbSEGAiOfp3AnuRk1NBRr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 06, 2024 at 01:23:08PM -0700, James Houghton wrote: > On Thu, Jun 6, 2024 at 1:04 PM Matthew Wilcox wrote: > > Right, so we ignore hugetlb_fault() and call into __handle_mm_fault(). > > Once there, we'll do: > > > > vmf.pud = pud_alloc(mm, p4d, address); > > if (pud_none(*vmf.pud) && > > thp_vma_allowable_order(vma, vm_flags, > > TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { > > ret = create_huge_pud(&vmf); > > > > which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER); > > > > So all we need to do is implement huge_fault in hugetlb_vm_ops. I > > don't think that's the same as creating a hugetlbfs2 because it's just > > another entry point. You can mmap() the same file both ways and it's > > all cache coherent. > > That makes a lot of sense. FWIW, this sounds good to me (though I'm > curious what Peter thinks :)). > > But I think you'll need to be careful to ensure that, for now anyway, > huge_fault() is always called with the exact same ptep/pmdp/pudp that > hugetlb_walk() would have returned (ignoring sharing). If you allow > PMD mapping of what would otherwise be PUD-mapped hugetlb pages right > now, you'll break the vmemmap optimization (and probably other > things). Why is that? This sounds like you know something I don't ;-) Is it the mapcount issue? > Also I'm not sure how this will interact with arm64's hugetlb pages > implemented with contiguous PTEs/PMDs. You might have to round > `address` down to make sure you've picked the first PTE/PMD in the > group. I hadn't thought about the sub-PMD size hugetlb issue either. We can certainly limit the support to require alignment to the appropriate size.