From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 535CCC27C52 for ; Thu, 6 Jun 2024 20:05:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5B896B009C; Thu, 6 Jun 2024 16:05:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0B836B009E; Thu, 6 Jun 2024 16:05:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D37A6B00A0; Thu, 6 Jun 2024 16:05:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7C5246B009C for ; Thu, 6 Jun 2024 16:05:08 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1DAC8A0EDE for ; Thu, 6 Jun 2024 20:05:08 +0000 (UTC) X-FDA: 82201542696.12.BBE01CA Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf22.hostedemail.com (Postfix) with ESMTP id 8576CC0022 for ; Thu, 6 Jun 2024 20:05:02 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=gSLWGqck; dmarc=none; spf=none (imf22.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717704305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TdBlc7mXGs6c9CTxdav7vj5iZxtFFc+d4sWcnwKyXNo=; b=ABvqI3CWBK5UzkQWSOW/4QdRM0ARtX4FxU2p3RVohiOvUYIn2ktJgLb0lODXu1cV2Zdhqf HJVzMqu6frnI8zItYUVUj3cSE4BLK+wx8rOPL46fWDL7/8RROVeeRuExCRY7/zwFhwaC1X E/qvpl2Qx7Jvk6QJrB+BV/xm1kfD1d0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=gSLWGqck; dmarc=none; spf=none (imf22.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717704305; a=rsa-sha256; cv=none; b=cPnRXhKcrwGqX78bVG/vgTWXLijILEb/ABsefAUbKuO0AT6uaaXqT/sbcAWWtODsxOKB9D eixWzQ7scCyUmc+RC3FXS2+kpmgurRXwfd4GPn9Fwz9BtvPcEA5oBIGY6JRWuiBx33xkyk R6NLN22n+MhkNawxSN8cDNGUNsGxf2o= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=TdBlc7mXGs6c9CTxdav7vj5iZxtFFc+d4sWcnwKyXNo=; b=gSLWGqckzrw25ZGTFahOB6ho2o afbnoA0xMIvNu03gkQKnr/Brgb8WiPtiqxeyEBVTEI3vlNfH9BRy2hq4aFzbuafQUF3oitMBiMFJK xOVl3YirAhTH6B1/qk7plWtqd2HWiR0Oi6eo/PtfKMLsnzJhOCB6jxR6lW2RTt9NycCx7PEbd+cub m5gs4qdWp3vQ/CJIyL+5jWGI5wo2MnfE4hJS59qEoLEHSMLqzapxQHV750a5nMgjRV3ZP2Va+jtTY jbGGrVD1LW4xI6rlbTSbcH2q8zNPq86U3B4qqvHAPGIjdldL3zxYtlb5K97bYt5vvtrhwKYjgp4js 1pJ1Vy8Q==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFJLx-00000004EEt-16lb; Thu, 06 Jun 2024 20:04:53 +0000 Date: Thu, 6 Jun 2024 21:04:53 +0100 From: Matthew Wilcox To: James Houghton Cc: Khalid Aziz , Peter Xu , Vishal Moola , Jane Chu , Muchun Song , linux-mm@kvack.org Subject: Re: Unifying page table walkers Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 8576CC0022 X-Stat-Signature: wb9iot7kbnru8zxmj9yzgpe683wg3n9y X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1717704302-323405 X-HE-Meta: U2FsdGVkX1/WSHB6ssF1+ESr37HTRb6yeI0mRHIKa5rAtrM6pKgbQpehTTqJ/YAtxRkb389iY2L++/fAOkm9Y8Mv4HmnjJ2uA2zeUElfVoYrloR47oulj5KEGL038FeIYxhuaTOp6NnnSYybrAliELrKS2iVPpPckCwhO/miWAnecszIx1GG5G6cCmEA2LKO/VggYh7Nn/VyTH61pRKaW5YrNQv3xGxOPqeSfN1BeNOqJWmlCA8BVJ/1rpeK43jeuhqI+Xgt4ArA5RF/hs1ajFU8kosfucBBbwuv1F8rPk/qgRb9AtKnxondq6LSXXEobFlJ8udLtia3XgGcjFVcOJous/RBAp23JM6kPgqaC1kZJKOuRSEG5NKg/46JEEZe3YHggafFlup6wCsz6iWdmofHSTKxJEhittzSNTD5UMb7cYRadyO/gBmfunpi5nwCSIOPC8NlCW69s8MENkMHJB0oqSg7DxhqvlHEh/pTFvzeH06bjo5C878uM8BsWB3SUTbpxt65IdA1OuZeo8K0nVA8OSXk238oTfoIuD4Z4lZKgTFrWZYbKBhpCoeQ+9T1FMN/l2rGBnt+aL+7jXo28B+RwrSHttam3v2QIZLzUKQX9lJXFtE35hIeYTTv0RsXHlqQMQYNFYaajSmd/INVPzp7lqV03OTzGXm63txrKyssDGQxlO3J6/dUQr2+mwSxmtBKsfBp3+ubF7U0k/Y3/nN8YKEXaikDmgU/fsfKfdrlmNPLFjc08gl7p1Jm0cmQ22dYmVgDFCQJVH0vQsw7IrQkEj6rUFtTOJ6wXWiGzh5VnZnqEqw9TPrN7RTXKaEs3Wrlvjhv5NaLL82jdFIg12UAkWR+cWHfX8VZVldRPMJIOEMl7GkCjQ/9x19i3EKkYui72pVwA83FoH6LOZfU0I4cA1IdhYY2mztPzO834kYYORtK+QzdA7cNhQ/Qy3caza0T3GOddGaMuVybRXN CLzDt+0S TxnmIOiwqLRo3tKEq0a+NjlDxAvFXQIaR0FJUxhCSyP9xJfbNU0VTmzJerNEGxPyJr6MpD2gEpP5/xtIfOqQHOBGQKZMThrVOjzKcXGutWDe9ToGUqThdMQTJMa9k1JBR1PkOKVLcgv4giG8Nr8VAj0P+5rJm8d1rQ5Y2H7+a1cwW258PXI5Rj1r8SW+ktKzVdsi/jddvbG2unu8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 06, 2024 at 12:30:44PM -0700, James Houghton wrote: > Today the VM_HUGETLB flag tells the fault handler to call into > hugetlb_fault() (there are many other special cases, but this one is > probably the most important). How should faults on VMAs without > VM_HUGETLB that map HugeTLB folios be handled? If you handle faults > with the main mm fault handler without getting rid of hugetlb_fault(), > I think you're basically implementing a second, more tmpfs-like > hugetlbfs... right? > > I don't really have anything against this approach, but I think the > decision was to reduce the number of special cases as much as we can > first before attempting to rewrite hugetlbfs. > > Or maybe I've got something wrong and what you're asking doesn't > logically end up at a hugetlbfs v2. Right, so we ignore hugetlb_fault() and call into __handle_mm_fault(). Once there, we'll do: vmf.pud = pud_alloc(mm, p4d, address); if (pud_none(*vmf.pud) && thp_vma_allowable_order(vma, vm_flags, TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { ret = create_huge_pud(&vmf); which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER); So all we need to do is implement huge_fault in hugetlb_vm_ops. I don't think that's the same as creating a hugetlbfs2 because it's just another entry point. You can mmap() the same file both ways and it's all cache coherent.