From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA817C67861 for ; Thu, 4 Apr 2024 21:48:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FC6B6B0098; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6854D6B009F; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FF336B00A0; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F8846B0098 for ; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9391C1A0EF5 for ; Thu, 4 Apr 2024 21:48:13 +0000 (UTC) X-FDA: 81973188066.20.E74F4F4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 8A46218000C for ; Thu, 4 Apr 2024 21:48:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U4cl7svS; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712267291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=szfWg5Ufq6SY0/lUfRxJD9uKzfIQ6yXqGNFrTxU5JgB6Wb+2jLSdEhqVt0Jsx7XmOMVXN9 xxCeIihTwV9Apw3atMHgpzW4X/VE5QgkHl3NhBMg0ZAencmrqd1LbzbqXfRPTBXvWpj20O fjkDac6rLjP4guToNmYnBPuIaCyl6QU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U4cl7svS; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712267291; a=rsa-sha256; cv=none; b=BDYYoMdVAHV4HFznWNFX4oeQg1SC/KFQO/aQJvgmovmnysMCCKr705usu3gY6ygN66/qZl D4MwoVRV0vvuWzsibCTqwuWUbX7Np4Arg2snX9Ry69M4dk+Vosv8cmfcUj4TMLsBpGEViE ZCVhQSROQkkGSUe2AQhg9OHa2+cVTtA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=U4cl7svSlIK7RcjmgEdmdGc2rP4assD95p547ZSziMc3T4tfYTQxv+ScKab+haYTYNo7Vt R3WE+vIFpTo8SiXx0n4YzkXksCrsj8wEI1cUhrD+sT4hJX+L3kQ1j/GOYC9lTN3DXa/s5M UsYW5NdbrOllh8fYakYIWxjBpHttVNY= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-27-2VH9HPUONLCWwxVGNo0TSw-1; Thu, 04 Apr 2024 17:48:07 -0400 X-MC-Unique: 2VH9HPUONLCWwxVGNo0TSw-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-78d41af5bebso33977885a.0 for ; Thu, 04 Apr 2024 14:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712267287; x=1712872087; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=Uo+fRiowfpZUf97kyIOqXEpTjNUHftQo8d6nhhxvjYNrbWfbTp/KbIk05FXjcTp45B p4ojdjHYtMBiK9m7BkXZzJPJXFnz5NDRy8cNpARKh0bAvfQmzQCzsGqQsdCNI4wfc0uv tssW1d3w4y2PLB0GXKEqnqQ4EnxygI6pCixeS+UUo1myp7JZHI4h/ulhqIF3SMQ+bUK7 3pGvFgez/aGZf9N0uHTRQeMcaKF0hPwvsuMqNJFR5UKINh5ay0WYis04t2LuZcwZTG8n R0uK1PaERXDduMjeVzaYdMJzTs/YSh54rQ9ZoOpQ2iIviZ3E5CJS1kpmvmK0MuYifT1N cJ+Q== X-Gm-Message-State: AOJu0YyoSYlQdwxaNxO0Q+6Tm2lQxsy/QhBQk8ds7V8Mp9pN0y1l+xTr pFhmhuniEWA7QFGcpjir3Ao4cm37JKgLYsFzeYIA0u6U9urpnw143W1qMHXzNXrYcZFApapmGbi hGH5zoH6P5tV298Hnt1m+w8qFSN+O9bCHrdKYJBgGkdr3SzskGuKPiec98+A= X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680986qkb.0.1712267286804; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGchKyZTo6ytc9p0sjvCKGDIoG+ERHoc+VJC1tNlLSTRdcMUNyMf3HEESrAIicSQ8HrtIJ6pQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680952qkb.0.1712267286207; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id wg6-20020a05620a568600b00789e49808ffsm105555qkn.105.2024.04.04.14.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:48:05 -0700 (PDT) Date: Thu, 4 Apr 2024 17:48:03 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , Christophe Leroy , Matthew Wilcox , Rik van Riel , Lorenzo Stoakes , Axel Rasmussen , Yang Shi , John Hubbard , linux-arm-kernel@lists.infradead.org, "Kirill A . Shutemov" , Andrew Jones , Vlastimil Babka , Mike Rapoport , Andrew Morton , Muchun Song , Christoph Hellwig , linux-riscv@lists.infradead.org, James Houghton , David Hildenbrand , Andrea Arcangeli , "Aneesh Kumar K . V" , Mike Kravetz Subject: Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2 Message-ID: References: <20240321220802.679544-1-peterx@redhat.com> <20240322161000.GJ159172@nvidia.com> <20240326140252.GH6245@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240326140252.GH6245@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 8A46218000C X-Rspam-User: X-Stat-Signature: dkyanana8h5oqtkjrpy8fhuwyn66tu84 X-Rspamd-Server: rspam01 X-HE-Tag: 1712267289-386421 X-HE-Meta: U2FsdGVkX19lOhGYxdZYOHyzOSqpZQXso5yOOVViFYt6YGN4JjFk4fXn3z7fJlohpdCtfB9FtV9pSi03SQzOYAEPlzB7RscQT/ezTo+qkr8ZmQozDFPfPdRUwkupK5LKqhSb6P1PSg1MvqgnFJcsY2cgPcvae+slJWSo6A8XNXpApEgyyn7Hf5oojiRLLQ/3IJt49oYwQUjn5JaH/xhO5Dlj0IKaxcMlVjQL2Uj7BnzLE2Y7upEQNaapd7K5YrR9IE6R3iGIXFl0sX67FlEEN0zgS5Ss1BDADdht5r6ZeGr6Vkfxs5xV4tGqtL8Ll3iht9WCtoBZNExkeHq7gn3igv0lNPymGgWSVTkvabVwaXFcZEiOouEW/Hxd5QYfibrXLlts8R65tORG9HcoT6M9JuvF+x6Olx1HPuRB2JNHCcOdJNul6Z2Ip7HksIp19mUI+/2VgYhaexTB++aumGlfBeAY55kNnlDWSQQdfFTjhcshhYwQ9XOvR4RqrEqO2C7UTCuzcArmXnm4nBH5hV+RlYIKI9Y5w9kvPAE8/LGOYyzwT+saKxKjhKIcrulvK53Mupmx6t5V78g9CxNGOPMwmHHSpkD4IHNpH5MwvO6lvQ7ZB9t8hsAMq47la+zeWCPxzM1MBX7Mnb+oJymc6BZpc3pNZbFTdN0pmF3G8gGa8DrfDwoUdlvd7JDk9hlk5TtXIIOsCmG5LftvDF5zs0f1N/LDczrMCLRgm7OCE88UMbMU9mJ2/0Dm3re7IjhjvfRbYJtW7Re2JbrBaYa6/W7JH+a5lTNxURcPfOD6DF5F5FqOeV+ie888M15QpKCMh+TEiUkagif0Sm8z+qoNBFYi8r6Lb7WC4pS1CdV55JwKQrHX5VhtkDEw49xUjFGQvwZzbe5bZ7/Z54VGJg/POQv74clDRiDobYUj6TBab435gWsCVO31sWDj0Bqgop3h4XSaeTwAMsw/eLjpeFyO1UH NnITFt2j YfQUmeI6BT4w7weRqs8iCS7cDnQAKf6wG4AaUsv+ASta3g5vaw7RdUb4Yu29mpvJcnMIBWpc4fHwvia5+YfgXvecNvGKDHWRiapeXcnehCbdYEoAMF1hfCVykxo69zJLsJpCPG9g6PoIkaHS8duR0wcGjcdUoYbMPS2zXOGpW6cQmPvr6CTi1IQMsaYvQBzTLaNedw8wc4NyH3Wk5jOZWakJqkjiBZE1iHaZcqA9y9/qditr+EWMUJavsiMVBnNAZdH+s+i7CmKwqJcUY/UP/mQ5FmWEZV89eVazP6AtMtNXBpPxvfFFUq+dE7/70Bb44AgDWaxdef5/QFDCWdsX185U/3MY1sGHUjS3FovgvW3DUzP6t3/asC+thfQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote: > The more I look at this the more I think we need to get to Matthew's > idea of having some kind of generic page table API that is not tightly > tied to level. Replacing the hugetlb trick of 'everything is a PTE' > with 5 special cases in every place seems just horrible. > > struct mm_walk_ops { > int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk); > } > > And many cases really want something like: > struct mm_walk_state state; > > if (!mm_walk_seek_leaf(state, mm, address)) > goto no_present > if (mm_walk_is_write(state)) .. > > And detailed walking: > for_each_pt_leaf(state, mm, address) { > if (mm_walk_is_write(state)) .. > } > > Replacing it with a mm_walk_state that retains the level or otherwise > to allow decoding any entry composes a lot better. Forced Loop > unrolling can get back to the current code gen in alot of places. > > It also makes the power stuff a bit nicer as the mm_walk_state could > automatically retain back pointers to the higher levels in the state > struct too... > > The puzzle is how to do it and still get reasonable efficient codegen, > many operations are going to end up switching on some state->level to > know how to decode the entry. These discussions are definitely constructive, thanks Jason. Very helpful. I thought about this last week but got interrupted. It does make sense to me; it looks pretty generic and it is flexible enough as a top design. At least that's what I thought. However now when I rethink about it, and look more into the code when I got the chance, it turns out this will be a major rewrite of mostly every walkers.. it doesn't mean that this is a bad idea, but then I'll need to compare the other approach, because there can be a huge difference on when we can get that code ready, I think. :) Consider that what we (or.. I) want to teach the pXd layers are two things right now: (1) hugetlb mappings (2) MMIO (PFN) mappings. That mostly shares the generic concept when working on the mm walkers no matter which way to go, just different treatment on different type of mem. (2) is on top of current code and new stuff, while (1) is a refactoring to drop hugetlb_entry() hook point as the goal. Taking a simplest mm walker (smaps) as example, I think most codes are ready thanks to THP's existance, and also like vm_normal_page[_pmd]() which should even already work for pfnmaps; pud layer is missing but that should be trivial. It means we may have chance to drop hugetlb_entry() without an huge overhaul yet. Now the important question I'm asking myself is: do we really need huge p4d or even bigger? It's 512GB on x86, and we said "No 512 GiB pages yet" (commit fe1e8c3e963) since 2017 - that is 7 years without chaning this fact. While on non-x86 p4d_leaf() never defined. Then it's also interesting to see how many codes are "ready" to handle p4d entries (by looking at p4d_leaf() calls; much easier to see with the removal of the rest huge apis..) even if none existed. So, can we over-engineer too much if we go the generic route now? Considering that we already have most of pmd/pud entries around in the mm walker ops. So far it sounds better we leave it for later, until further justifed to be useful. And that won't block it if it ever justified to be needed, I'd say it can also be seen as a step forward if I can make it to remove hugetlb_entry() first. Comments welcomed (before I start to work on anything..). Thanks, -- Peter Xu