From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34D68C3DA4A for ; Fri, 9 Aug 2024 13:53:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAAFE6B0095; Fri, 9 Aug 2024 09:53:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A33EC6B0098; Fri, 9 Aug 2024 09:53:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8ACA86B009A; Fri, 9 Aug 2024 09:53:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 668FD6B0095 for ; Fri, 9 Aug 2024 09:53:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EC11081073 for ; Fri, 9 Aug 2024 13:53:37 +0000 (UTC) X-FDA: 82432849674.04.A12C124 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id C9C7F40013 for ; Fri, 9 Aug 2024 13:53:35 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TDFX5rJq; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723211549; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WDbHYgHEck43iqNj8mlC9mwPoKgwtzUQVA3kNM0mHs4=; b=eTIbM+AUOXxATd1gK4Em7jXPbq9cPztFfFKzFqZqm4f5KB3kpc7kXFGwbNYoZE8yHW8VAg XM6CuHuAzakGpw4PPHPQxKU7KRjaGmtMY4EzbMTmemuCNIuhg5wWo25yYLr6VRUJTKrLkv Fl/CUtUcVxtjU/5G9JpcQ/DrbBUiYeg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723211549; a=rsa-sha256; cv=none; b=0rwQMbg4Djiz4DtY4ObNbfh+88jo6uEyBJF6VaffqXC7iVVG86gqHBoiJZFAx8yhHFY86o 7o37s56dX0+Ugiazi++LmL11RIa4ZQzkUQUgp2wPJGPq9/EBWLYvZsXuppzggkudtTV9uz W0Mz65U6OiXQyaNjxh5G82OaGLxnQwI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TDFX5rJq; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723211615; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WDbHYgHEck43iqNj8mlC9mwPoKgwtzUQVA3kNM0mHs4=; b=TDFX5rJqtdEdbbQlnGGg9S1Ua36OGVG3tqPstZD1KYTgs3yWd/iqwfpb11FRQlHL7arxKb 6B9c/eMjLA8YX/AEE1gau6ucdfIPe0jxaSFwObXqN8wesf1aof13vB4cQ06t1Bgb5w1tDl 9vCdhdzhg+JEHhu6afevXIMOeQ7JTDc= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-403-OLg1daswPPeL5yUQ83HB7w-1; Fri, 09 Aug 2024 09:53:33 -0400 X-MC-Unique: OLg1daswPPeL5yUQ83HB7w-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b7ad98c1f8so3760086d6.1 for ; Fri, 09 Aug 2024 06:53:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723211612; x=1723816412; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WDbHYgHEck43iqNj8mlC9mwPoKgwtzUQVA3kNM0mHs4=; b=qCLMwMfTzsPGa/rOjYw9jrV3If+iACobu6blTokdCsahgCHQU+nvsHlLeDcBXen9VZ wWrqUwPKYjp2Cmu/YgTlQ8BEhYgtprreCyOyLX/POStEA0xBK7FHsEFYj5oRuuRBqiSb 2kQE7/9Al28864mgIW2Om6NEF48BJ6nDcvpg3A8iY+8SOi9WXcm0j2I5uN+Zr0jwH8/l rcZ0Er56yo7GDDTMuwJcxQKVpM5KXHzUcYrMF0x5sWjIpMwD4fYZIcM08kx6z9VoNn9C AozSH6cF1XHQPpI6g0+x+xgvaW0Qaj1igjoCG7q6OA4XnMQDtw28VVpuT3JZEGkmM5N4 BYRA== X-Forwarded-Encrypted: i=1; AJvYcCVaTfQpEmhktDVGRw5yW7qFYFIQ6NoHjhWkJAopJ4mXotd2C3qMnT0fsPavfbk9gs9ozLXzwf12QA==@kvack.org X-Gm-Message-State: AOJu0Yzoeb/2oT8LpI6h3Ow+8cToz7LatmGvKFKbLoCs4JZdovR1YRLW r8YgzqKDn0A3nCMCnuH+a/++DkWjoKb+pX9Ns6uvIOBLs+2EpOm/HUSzNpmQ/YK/bX8bBcl50zo ztkIdkC0KRTfkip4z2gaHi7CB+TjkzHojYvJhilQK16Lv64sd X-Received: by 2002:a05:620a:319d:b0:7a1:5683:4aae with SMTP id af79cd13be357-7a4c18681efmr93159285a.10.1723211612537; Fri, 09 Aug 2024 06:53:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHMiWgDgRbpmEJ2yt83G9IfveiqoJa1zDyWB54xiHqbp4o5xzDhd8cR/4kQNOyjKUHOvFbE/Q== X-Received: by 2002:a05:620a:319d:b0:7a1:5683:4aae with SMTP id af79cd13be357-7a4c18681efmr93156985a.10.1723211612040; Fri, 09 Aug 2024 06:53:32 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a3786d43f2sm263955685a.125.2024.08.09.06.53.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Aug 2024 06:53:31 -0700 (PDT) Date: Fri, 9 Aug 2024 09:53:28 -0400 From: Peter Xu To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Aneesh Kumar K . V" , Michael Ellerman , Oscar Salvador , Dan Williams , James Houghton , Matthew Wilcox , Nicholas Piggin , Rik van Riel , Dave Jiang , Andrew Morton , x86@kernel.org, Ingo Molnar , Rick P Edgecombe , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Mel Gorman , Hugh Dickins , Borislav Petkov , David Hildenbrand , Vlastimil Babka , Dave Hansen , Christophe Leroy , Huang Ying Subject: Re: [PATCH v4 4/7] mm/x86: Make pud_leaf() only care about PSE bit Message-ID: References: <20240807194812.819412-1-peterx@redhat.com> <20240807194812.819412-5-peterx@redhat.com> <87bk240y8h.ffs@tglx> <87mslluceb.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87mslluceb.ffs@tglx> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C9C7F40013 X-Stat-Signature: uem4pyd8qdbbijh96g8oay66bm375ugk X-HE-Tag: 1723211615-458756 X-HE-Meta: U2FsdGVkX1+6S/JarwN0pR4eZczysPEpMVt5X8El6wCuh9jElDo5uJAPXKZZ5jLt+EGv4Q8/G0T6SHs5ObLTNsZVLn5I0dLXqzChrZsxW3qm6U6Feort2evA1ej/1PiWhs1qkwwQfJYzs8LaLT4deqGpurkWDkJsqgxI5JLUW2aisyR6Scaak/vNzM/1CRq/HP29BiDwS8WAGJESoD1ajGxrqNbRicWFqPXHqrOVFEleLYN6hmYfjyvuth6xp9xvUQ2zYij7G0FBxNDET3xUkLAw/3YTsNn9wX6cQO8LqLSAAvdXxvXQ3pVN6myWK/LNFTXWKpQmLMZpBctUrIMCDvjmZ6lQw251PPmgQ8mpNoB7ZDKg7oW0+EhvnG9OqM5EhRUt03xoPbgmoudyN7VImCzCxFHNKkjuO0aQRFeOtyLGRUDPYt7iY7keMDtR/zxZ1CmLt9aAD9WD3hZ0dieTSjpdfRz8fLTeMN9/cCxZwuXBYXzROBQ5JaE8tSV/1uwHQSJ8lMiD12xOSU4SRVsihdX/fMYB9wswwL9bNsRJ2OcI6+V/DxJ4wkJjkiT31Be3KfWHR7Y7ECNpSWMfxelPlMakHHZuxTmAh1wD9IomEQLqTEcxLCl7btkk8ggJxsYnNu2WT+A7tb9mc/6ByXWfhIQE4qOlxAUM5dAjvxOHqbcLFMzVDAPBbr6wA3w3ALLEiaDkMlT63GLcnj+aMmS2xRTm054SzmtkCiftu4rofTwCZyCaylE0z6txX3IRLcAsWq93ZXNAWFnO1zpob2udOCqHoZR2plVis1ULoxJir1s9KBZr+8rYS0lNJITqBpUPSkoRsh4L2WSKms3No3V0PgvQjhlQHoTbclkIaAmPFYo96DheYHc9v/SlQ1x33yYw/J10aqvx649/WxEv4NjMjk8mQyLfu88K06rwSmj88rRokXtkKk7VUWv7kl4C3MdOIhCnTUYNXq9JJmnqW74 G67lwEzX E5lN9r62vriUHXwV0C8aVlqDKphWrmrujPEpaq09erFuf7rbjw+eASm3Ut7CjRmQBF7vpGOfJtvyxGd24vmyn2YiOuBPAQQo5YRBTPXSXA6aR+ZgRoqTx0nMgRWVdl8eFG/k1clSsvnQcdWMQ0St1ZOD0VDUDR6aUMU6miLsIQQJibbvM81x9UigYNmmKPYuxLz81UAbIRI0/IeWUzY+TJn1apW01JpOSC3aficFnmzoYoQ487WrkjCHhcz0zVaVWxzdykPCfxslGIHX7DUjv8qJh73ugoDaKEfzM4/kL/fVoC0LiKctuu+06aDHPjXwmVK4bbHfxGkkxonWR8fs6wnTL0EK5qnFxT2W+IIZXL7jxG240nTLDpgvKgqv6lEu1xyir8VRF+dEv61XfokfQCMo3Y73njjrG6wcPYypzmJxod7wq1G7RI7NDxsr28v3TUH6Sas6IV6uy03E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 09, 2024 at 02:08:28PM +0200, Thomas Gleixner wrote: > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog > > A good structure is to explain the context, the problem and the > solution in separate paragraphs and this order I'll try to follow, thanks. [...] > > And IMHO it's mostly fine before because mprotect() is broken with 1g > > anyway, and I guess nobody managed to populate any pud entry with PROT_NONE > > on dax 1g before, and that's what this whole series is trying to fix. > > Again your humble opinion matters, but technical facts and analysis > matter way more. All the rest comments in the reply were about "why it's a PUD leaf". So let me reply in one shot. Referring to pXd_leaf() documentation in linux/pgtable.h: /* * pXd_leaf() is the API to check whether a pgtable entry is a huge page * mapping. It should work globally across all archs, without any * dependency on CONFIG_* options. For architectures that do not support * huge mappings on specific levels, below fallbacks will be used. * * A leaf pgtable entry should always imply the following: * * - It is a "present" entry. IOW, before using this API, please check it * with pXd_present() first. NOTE: it may not always mean the "present * bit" is set. For example, PROT_NONE entries are always "present". * * - It should _never_ be a swap entry of any type. Above "present" check * should have guarded this, but let's be crystal clear on this. * * - It should contain a huge PFN, which points to a huge page larger than * PAGE_SIZE of the platform. The PFN format isn't important here. * * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(), * pXd_devmap(), or hugetlb mappings). */ It explicitly stated that PROT_NONE should be treated as a present entry, and also a leaf. The document is for pXd_leaf(), so it should cover puds too. In this specific case of the zapping path, it's only possible it's a DAX 1G thp. But pud_leaf() should work for hugetlb too, for example, when PROT_NONE applied on top of a 1G hugetlb with PSE set. Unfortunately, I wrote this document in 64078b3d57.. so that's also another way of saying "my humble opinion".. it's just nobody disagreed so far, and please shoot if you see any issue out of it. IOW, I don't think we must define pXd_leaf() like this - we used to define pXd_leaf() to cover migration entries at least on x86, for example. But per my own past mm experience, the current way is the right thing to do to make everything much easier and less error prone. Sorry, I can't get rid of "IMHO" here. Another example of "we can define pXd_leaf() in other ways" is I believe for PPC 8XX series it's possible to make special use of pmd_leaf() by allowing pmd_leaf() to return true even for two continuous pte pgtable covering 8MB memory. But that will be an extremely special use of pmd_leaf() even if it comes, maybe worth an update above when it happens, and it'll only be used by powerpc not any other arch. It won't happen if we want to drop 8MB support, though. So in short, I don't think there's a 100% correct "technical" answer of saying "how to define pxx_leaf()"; things just keep evolving, and "humble opinions" keeps coming with some good reasons. Hope that answers the question to some extent. Taking all things into account, I wonder whether below enriched commit message would get me closer to your ACK on this, trying to follow the rule you referenced on the order of how context/problem/solution should be ordered and addressed: When working on mprotect() on 1G dax entries, I hit an zap bad pud error when zapping a huge pud that is with PROT_NONE permission. Here the problem is x86's pud_leaf() requires both PRESENT and PSE bits set to report a pud entry as a leaf, but that doesn't look right, as it's not following the pXd_leaf() definition that we stick with so far, where PROT_NONE entries should be reported as leaves. To fix it, change x86's pud_leaf() implementation to only check against PSE bit to report a leaf, irrelevant of whether PRESENT bit is set. Thanks, -- Peter Xu