From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AEC5C27C52 for ; Thu, 6 Jun 2024 18:29:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DADBA6B00B0; Thu, 6 Jun 2024 14:29:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5D776B00B3; Thu, 6 Jun 2024 14:29:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C73F76B00B5; Thu, 6 Jun 2024 14:29:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AAD6E6B00B0 for ; Thu, 6 Jun 2024 14:29:34 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3071EA0D12 for ; Thu, 6 Jun 2024 18:29:34 +0000 (UTC) X-FDA: 82201301868.10.7A869DD Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf02.hostedemail.com (Postfix) with ESMTP id 069608000F for ; Thu, 6 Jun 2024 18:29:27 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mj+a7V63; dmarc=none; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717698569; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=8GgXsnMY5tWhsPbAbxv6zXS4+5uJIpMVze6I5FDUv+8=; b=liANSFOOyotKmkn/ISaXU3x9/rX9DaQZRSQVIDSVat9nOdw1uHs38sJ6fAAXxttccX9Bn/ /npPaF00ZKsQ9TrGYvzl5kvsiuy8AVVEtZxLg0tDMGEnw9FjmKy7/HObW8Hz5bOjem+qpU 9oARI4Fdf9HHoF4iKnL7sp1g9ewk+kc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mj+a7V63; dmarc=none; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717698569; a=rsa-sha256; cv=none; b=BFeriIfteCqMK7QidiLl8KgoTcGAjBZozl9tqcCIlDDQnVMPE0rRftz5PTz5w+rswBfi/j k4s19Ajpnq8MhEKwauCm88+qOnglkmCifJoPnr3M/XdAfinpiBTf5ICxi+9KRggG9Tbz07 AILvUqAaLLJRrTww+uQZp8vNSctWAqo= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=8GgXsnMY5tWhsPbAbxv6zXS4+5uJIpMVze6I5FDUv+8=; b=mj+a7V63XR76jiLyYNvCuN92x4 lGg7P+1ruPMdwPWv+UJ3Q9MdCHm598WZEnj3VkrN833yywcnmUxl+eKEVXjEpemqHrBUNWrqI6JTV iexT5YvCdi61zEfFzC7alpDCtgnJ+zuBzlvElkwklL/FV6hKB50f0dGtBYzSyX1VwFTy2MBqxJgPY ABkSlUkMJ92opjNBZDB3+XJ3XqcplGkeN5R7ngDiE4GOKZYuF+FXXH3k1ZPyPooWFyqprbzXu0AUB 1s1WutxIkOuXdf0R18nkBPl0GgB5Lx2/uA+kAUi9/eVhJO9dzQczif7eZlpjahbq73CNSUG5czN5/ EGL14q7A==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFHrW-000000022Fl-0ZQg; Thu, 06 Jun 2024 18:29:22 +0000 Date: Thu, 6 Jun 2024 19:29:22 +0100 From: Matthew Wilcox To: Khalid Aziz , Peter Xu , Vishal Moola , Jane Chu , Muchun Song Cc: linux-mm@kvack.org Subject: Unifying page table walkers Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 069608000F X-Stat-Signature: ssxhz5ccamc139snckugarkcqz3ypjjf X-HE-Tag: 1717698567-959348 X-HE-Meta: U2FsdGVkX1/o18R+kSltHh1mnbHuHXrjBqlWPFvypSMnbuqFIX1bXcaY/xxMb81bkbNvlEa7K5pBtxioIqgXIYgOp+DpBQ740BvCQt+o2+63DnaPsqqKc8Y9XEdq8uy81vQlTBpU+3zrGEVnKK612g+wM2yNhso+pTLhD2q/hcCrfiit949rz8NUFbrXrMF1YEA0cAENiKcJdrrYYaBN43+/ec4sV8feUV1w0ZYFWuVB2UR2KA4jym9/r8WUo13bhxN5yhAci2XSz/qfimd3O6V0bKHna0jPGOhgIaxxlNZ2eShYu9oGClq9m0ui4U4HQivE5iF4CsOhON/pYy9DoudReEoKAbiYSRq6d0bmJuEJMDoS1zic628fgAkR9Y6t5+r8iufO0EVjAeXBSl0vdMUKvmvN90pZTxerArj5y56zRoUjh8dOk1uD7T57ypBnmV7S8DXYstO1HsgFqBlI8QI8ADESaQhX0G4ZUPcMX5u9d60zT2klp/beRtq3crqfip0XUEAtg5OJcOSvSiU5agVPD7V/y0V9WaNr5pT8us+4JnpVZOayCjs/dN/gNApWYE2HgP2YgiOPN6TyCPOD+kgrWKel+/VS7jaGfKeJ2HBPD2zc/AERcVGZUzO4PbVGrGwtiG5UocC5qxRXuGb7yLFmS02uSzMAIBlh8qm7cWc7Mqnj2/U2Kl/son5HSvUTxNGRjdeDsI82troZkWQz2QHaIs3BQhpajezyLJPBg8n5q2R2ftYwGLdaMkuTG/YdwCio0LPyVvSWr9YN017LLs3gpNOYbmsybpdOjaCSvG3pZh/iD1cwJmHbXryUJpsW4MLGrXvOFfShhhvZDPTfqVcDW6KhuAA9rYA8C03ZEJwfWN1sf3Syk1ca8U5y846i0COQoc8f12cW5DAoVST5g/xmzXd04AkCuSRJZFKrdlwskAvGCAZkmhNZcqtq3rKfUwFCN7r2MQ82NKo/9uT PK4u+3OF kRULro9O2ztV7OzK+FILOnjl+O0WFrBOkey+f4Srmb4RxLlj60e5ZMEAb5w79aIMGeXm+Fpv34U26q2lOOdESDUlk8B308ZB5NMik3rSclCL3ujp4FstwARSRS837IueiVheAqvUOSbsH1Bq4cPdl8QtiCCHoXKLpIGqpQFTNQkfMXq92zupYNNRVaB2RolYD9zH5BfpKGOxKiMI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: One of the things we discussed at LSFMM was unifying the hugetlb and THP page table walkers. I've been looking into it some more recently; I've found a problem and I think a solution. The reason we have a separate hugetlb_entry from pmd_entry and pud_entry is that it has a different locking context. It is called with the hugetlb_vma_lock held for read (nb: this is not the same as the vma lock; see walk_hugetlb_range()). Why do we need this? Because of page table sharing. In a completely separate discussion, I was talking with Khalid about mshare() support for hugetlbfs, and I suggested that we permit hugetlbfs pages to be mapped by a VMA which does not have the VM_HUGETLB flag set. If we do that, the page tables would not be permitted to be shared with other users of that hugetlbfs file. But we want to eliminate support for that anyway, so that's more of a feature than a bug. Once we don't use the VM_HUGETLB flag on these VMAs, that opens the door to the other features we want, like mapping individual pages from a hugetlb folio. And we can use the regular page table walkers for these VMAs. Is this a reasonable path forward, or have I overlooked something?