From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23382C41513 for ; Wed, 3 Jul 2024 13:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FFBB6B0088; Wed, 3 Jul 2024 09:57:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A1966B0089; Wed, 3 Jul 2024 09:57:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6699A6B008A; Wed, 3 Jul 2024 09:57:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 473F36B0088 for ; Wed, 3 Jul 2024 09:57:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AB388120C74 for ; Wed, 3 Jul 2024 13:57:40 +0000 (UTC) X-FDA: 82298594280.28.DC41A50 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf29.hostedemail.com (Postfix) with ESMTP id 4F9B8120008 for ; Wed, 3 Jul 2024 13:57:37 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf29.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720015039; a=rsa-sha256; cv=none; b=lo6tnAutxt/zZe6BLQZiuGvdeuAe9LJtcBNcgJ1frUvBN0mW/x1jCx8T3J5DuVabpC6sDn COHVs/q654GaCnaFlDhmGdSORRZ2jXXINtKLpWBQIqRaZjFMQf8aLTy7FXreK1LvrqXreC Ty6wLIs/XhByrAl4RqPajiHF6Xxe7mM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf29.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720015039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EWSsJZlTvGsOFFc+izAIsjnWUcuio0kOQrT3E6ouaNQ=; b=hlUHjtrgop8+MZM2zibWrNoRCXXIUPk813g4Ed9TG8NXmnpUMe+kdrcPBg6EsVTQalPGKV AyBVofEP6ecSzYv/PTssPXhvmV0bwkGQue0aD58u0d1rQlJBg13T5+ViTQYIUpICsEZH39 8+WLfNrQxhm0kGHfElXWWf4OM3PGwT8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id AB614CE12DD; Wed, 3 Jul 2024 13:57:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FEC0C3277B; Wed, 3 Jul 2024 13:57:31 +0000 (UTC) Date: Wed, 3 Jul 2024 14:57:29 +0100 From: Catalin Marinas To: David Hildenbrand Cc: Yang Shi , muchun.song@linux.dev, will@kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] hugetlbfs: add MTE support Message-ID: References: <20240625233717.2769975-1-yang@os.amperecomputing.com> <9dd065aa-f377-4b4c-893a-df69c9f67360@os.amperecomputing.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4F9B8120008 X-Stat-Signature: pcny41y9n6sc6sa1yqiqbwy5w1kay6pe X-Rspam-User: X-HE-Tag: 1720015057-276114 X-HE-Meta: U2FsdGVkX1+dzEjT+elnxul6TMNcVjDo/Q8MhOaurkl8JyzXuuqhOQdJJIfhBaW9XECNdkfMta4tixyFumoX3MCmp5SmA0B38jkcA+t/i2Cd2dPbMaJt/EqwMEKOSMGJkUqqW3EK+rZcaSFjKkdnOd6p37+PWcfEPI05Lfp5ILchmvX/7p/wOgd14N5wgBVKJCKa8FQjTCvIxMDO7QZc8JTQDpCnxrs3bFQiRH1CvRsmQB+hhuKK5KEx8VMMQUCdXLHNAmTYi/ps+pJdGoLewkinZOQ5yxTHU3gu238tFlyDzQRo9Gcoxjyal8qAqyK5MFuclbLsqwga3n3OFEL+3XJwwCyjudd6Ln21jYZhjBVOt0niee50FtPVs4lWnNJ1m5Qy9dLNa50/W4/01d1EvEwOk8OOSpDU4CS650CTzNEV3WQQsMbRw2gdtw8j0/JyLq32SBhgspKXP2F4XX9RbjOxFmkq+5HjnQrslFt+GSZpX7+Oal58aJfLLhvgMynUj7+fKWxkbN8j0borRhs7BvOBBkY3imZOJ6rwLXAbztmr/690J3lfj/e6eoct+hbkwjNt3W9A1bhoZaplGH9a7qlDgfefJtgyXDItlcFPC/ynyi6v0YBmT3xPsWeu3LqKOzzMO+iGb6JLPd+RGf3KKGw1lUbqCuGqC3TckEp3h69eOI7Sfhur/P3SeSLPQ7spq5UhgmJ6nn3nrqIRv9UI5vlzWjvpfDYdHHtnZ8H4w/Al67py+UU8NjoVhbRU9rjMhwirumDvKUgboIClJIhH6S2V8JNA5WJn41A3Sv88rG6ew3o0ksaqLZHq5S1gbaEjqZ5+XfT1v/VRLXCWoWXDs8QaN7np8H6fmQxcc2DxPG9BnB6WtAqcZ3dK4bae3GF5pRm66PDZ7k/bMg/IMqYCiEjoC9p93o8wEpPx1auGNSTn7IVr1/QaVHlmwosWDmwW8nUYmwo5gl/GsjCdYEP 45HxuyOf E2O+3BNai69Tpk0lLHWt1sS1c5dQ3x+PUu64RZX9lnGnM9wODulqaX968J1/ZtjyQPsnTRNCjXxYUiaviza6OmYcnGOX65+h2taXORPbsAiyqJ+y8QdMYkDl6gpes7wtu7OggPEEyx1T0OSbepTHAJKgWgI5ykQcwQI1N0vKNATb05hzV8F7Yee9ZQHRvwUVQDercsQQ5Esp7Z+J7KqY4gDK9JQSBD4Y6s3F0M6GtwOuaM4bEQWsp+GRA4+wpByr/hZ8wr8wq8flhpfAkKOUVmiIhp6v7DDywPmFim1qi4XmrTNYL8PGKEFIQaRrONuQJqBejIh+2i51b8m33C3fYfDXA9lGtQwExME1mjUUrdKB+xlGJrJUcJCvyq0bERfCQhn8yHzA1bOnG80FKWcydoGq/YQT6SA2giUhiFKmBS6X3h2k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 03, 2024 at 12:24:40PM +0200, David Hildenbrand wrote: > On 03.07.24 02:20, Yang Shi wrote: > > On 7/2/24 6:09 AM, David Hildenbrand wrote: > > > On 02.07.24 14:34, Catalin Marinas wrote: > > > > On Tue, Jun 25, 2024 at 04:37:17PM -0700, Yang Shi wrote: > > > > > MTE can be supported on ram based filesystem. It is supported on tmpfs. > > > > > There is use case to use MTE on hugetlbfs as well, adding MTE support. > > > > > > > > > > Signed-off-by: Yang Shi > > > > > --- > > > > >   fs/hugetlbfs/inode.c | 2 +- > > > > >   1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > > > > index ecad73a4f713..c34faef62daf 100644 > > > > > --- a/fs/hugetlbfs/inode.c > > > > > +++ b/fs/hugetlbfs/inode.c > > > > > @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file > > > > > *file, struct vm_area_struct *vma) > > > > >        * way when do_mmap unwinds (may be important on powerpc > > > > >        * and ia64). > > > > >        */ > > > > > -    vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND); > > > > > +    vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND | VM_MTE_ALLOWED); > > > > >       vma->vm_ops = &hugetlb_vm_ops; > > > > > > > > Last time I checked, about a year ago, this was not sufficient. One > > > > issue is that there's no arch_clear_hugetlb_flags() implemented by your > > > > patch, leaving PG_arch_{2,3} set on a page. The other issue was that I > > > > initially tried to do this only on the head page but this did not go > > > > well with the folio_copy() -> copy_highpage() which expects the > > > > PG_arch_* flags on each individual page. The alternative was for > > > > arch_clear_hugetlb_flags() to iterate over all the pages in a folio. > > > > > > This would likely also add a blocker for > > > ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP on arm64 (no idea if there are now > > > ways to move forward with that now, or if we are still not sure if we > > > can actually add support), correct? > > > > IIUC, it is not. We just need to guarantee each subpage has > > PG_mte_tagged flag and allocated tags. The HVO just maps the 7 vmemmap > > pages for sub pages to the first page, they still see the flag and the > > space for tag is not impacted, right? Did I miss something? > > In the R/O vmemmap optimization we won't be able to modify the flags of the > double-mapped vmemmap pages via the double mappings. > > Of course, we could find HVO-specific ways to only modify the flags of the > first vmemmap page, but it does sound wrong ... > > Really, the question is if we can have a per-folio flag for hugetlb instead > and avoid all that? I think it is possible and I have some half-baked changes but got distracted and never completed. The only issue I came across was folio_copy() calling copy_highpage() on individual pages that did not have the original PG_mte_tagged (PG_arch_2) flag. To avoid some races, we also use PG_mte_lock (PG_arch_3) as some form of locking but for optimisation we don't clear this flag after copying the tags and setting PG_mte_tagged. So doing the checks on the head page only confuses the tail page copying. Even if we use PG_arch_3 as a proper lock bit and clear it after tag copying, I'll need to check whether this can race with any mprotect(PROT_MTE) that could cause losing tags or leaking tags (not initialising the pages). set_pte_at() relies on the PG_mte_tagged flag to decide whether to initialise the tags. The arm64 hugetlbfs supports contiguous ptes, so we'd get multiple set_pte_at() calls. Anyway, I think with some care it is doable, I just did not have the time, nor did I see anyone asking for such feature until now. -- Catalin