From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E070C5B543
	for <linux-mm@archiver.kernel.org>; Thu,  5 Jun 2025 10:27:42 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DDD486B0577; Thu,  5 Jun 2025 06:27:41 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DB4926B0578; Thu,  5 Jun 2025 06:27:41 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CCB136B0579; Thu,  5 Jun 2025 06:27:41 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id A9B8F6B0577
	for <linux-mm@kvack.org>; Thu,  5 Jun 2025 06:27:41 -0400 (EDT)
Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 52A04B94DE
	for <linux-mm@kvack.org>; Thu,  5 Jun 2025 10:27:41 +0000 (UTC)
X-FDA: 83520970722.22.0CD3AEC
Received: from mail-ua1-f41.google.com (mail-ua1-f41.google.com [209.85.222.41])
	by imf18.hostedemail.com (Postfix) with ESMTP id 5D9401C0005
	for <linux-mm@kvack.org>; Thu,  5 Jun 2025 10:27:39 +0000 (UTC)
Authentication-Results: imf18.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=YDIfgEB3;
	spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1749119259;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=OPuXwmLiDfs6Evoo10hPlCqsLPGGHuYNhb/1GnE1QMQ=;
	b=8iCCBctbKSCULUQwio+1CdUb3L0A1YbrfRBJhBks4pdjcH1gz3xp2n4s3gKeW5LotjHXkZ
	I3j4jAskEYTeMPa2XcgBzTNDMgcgT9YmaH9kMt47BqeRjpfR22qp8KHRtumF+Vm+Ljs2fv
	WBV7QhiCm2Mzm5g/wX6zX661BGWcox8=
ARC-Authentication-Results: i=1;
	imf18.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=YDIfgEB3;
	spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749119259; a=rsa-sha256;
	cv=none;
	b=naQ7aCQxzpU5DuHKhg8BcCJwBlny5nTQYeDlRLGpSyJNy59WVRhCdjxamuNjRU4EaMR+R6
	avjFVDR4BtGd6po4l09o41a/9HbP+g4vkGE2aTOfsaf1/3LfnXx3UrkLEytCd8Ab4a8mxa
	V7y40JppmSftrpGhhkixV4MW5Npv55M=
Received: by mail-ua1-f41.google.com with SMTP id a1e0cc1a2514c-87eba2c5d78so12896241.0
        for <linux-mm@kvack.org>; Thu, 05 Jun 2025 03:27:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1749119258; x=1749724058; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OPuXwmLiDfs6Evoo10hPlCqsLPGGHuYNhb/1GnE1QMQ=;
        b=YDIfgEB3Ack/+n5Pgji0yzNpssprMQ2Xcph8vhHly2awHXTktrrZ50QVQey2YPDyQt
         QrdLiYgfxfn43N2UjsnV+SB6iLJEj0KxoB7YJfKM0uZaxDWwLmB11NQtNMKO4mQsVPaa
         w5aKqbchiCJUxwvcPmGO7+vn4/97JVZc69QHO+XHSix1kfwVbKnfbOCIWuWkUU38AQsF
         HLtx2lUlzYKHd9aH25wZglVLzpmJyKWI11wvg5KtiQJ6WCb5X5RPaR7n80JGbvvIs67/
         xqShuHCK222nTWPd/BuTbGGIMK7v4FmRNVWCHUwHOQjttlZ7sg/OFmojF9pjzz0rc5HS
         YA0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1749119258; x=1749724058;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OPuXwmLiDfs6Evoo10hPlCqsLPGGHuYNhb/1GnE1QMQ=;
        b=by9XFOvuiIj/0qYsDL6xi18RdHe8xfGI5b7YLEyNfxXMQKqO+qDGqfNOvPhnf474mc
         7/eKpNVSUgnWmVgAqX1ze7FovzrnnqT2XBp5GuBsi/h8BGxVpa2NH62DrpHTVrqw/w7T
         HxEo4ndoZLkTROzhGXMR9PcUoDH3zvNmX8ijdSYvcO9MTXqkbvPUV5nQmUmVbrKV7HiD
         bJWYZJe288DJeSKB1fRzqiuXPq+XGXkdflEU2scv/oyBeYDAi/wknzVlp1hNWga0AXvB
         AQ4AUyzVnybXmSyhbAn2W40Va3La5xms0aao3vz3RE4fnyA71wOYKdGnDCWDupqs8rOk
         FVvA==
X-Forwarded-Encrypted: i=1; AJvYcCUB+IxoAt0Z5WWvQs3mAl5UwLtFGI2RTmk8Y6MhN6NCfixhV6PBkF5KD8a/HStwHmWlSOtqFibj9Q==@kvack.org
X-Gm-Message-State: AOJu0YwRzisVkqnpk8sHP8dGCb9E12uGefzvZmmwsblC+3KdSQF1PncI
	xhjZhbGJYPK9BASoLR2m/82pN80UuMbtMBWg+TAf8i+ZUNatGRZ6/lO7MizAfcHB5P0LUccUjiu
	OvZxAbHSkifHT2N6dcI+mTxdrfJQKIck=
X-Gm-Gg: ASbGnctnZ4s8+wcNeE8AGfzLMxBa+8fFgjvzrnm95TFfJjahmy8sDxWewwaEhRhUYi8
	Wnnea1cI7drpjY+fx+m6pP0Xnwo1g7tZn1V+J7V3CJm4c1DxbKPOApZ9ZdiZzDS5qH42eWkZB67
	38GaF/IgnBDDsXHJ8M8r0XuoGrgiGTb+A5Gg==
X-Google-Smtp-Source: AGHT+IHcA4sk69Rb5N1LVTzskp7VvgQB7U1dp01kQMhkFxP9GNZvlgulf6A8Ppsw92Taa3ZxT05PdBxqFiBX2k0csYk=
X-Received: by 2002:a05:6102:1625:b0:4e5:a83a:3ceb with SMTP id
 ada2fe7eead31-4e746e68e2fmr4509098137.20.1749119258259; Thu, 05 Jun 2025
 03:27:38 -0700 (PDT)
MIME-Version: 1.0
References: <20250530104439.64841-1-21cnbao@gmail.com> <CAG48ez11zi-1jicHUZtLhyoNPGGVB+ROeAJCUw48bsjk4bbEkA@mail.gmail.com>
 <002aa917-d952-491d-800c-88a0476ac02f@lucifer.local> <CAG48ez0kb+on=erofZL2ZwB9CqtrSCJVND7K7=ww1prMUGXDRg@mail.gmail.com>
 <CAGsJ_4y0LJRdMg3-0NC6n9-UvRXAEuwHzEL7KMJ3dD1CUkQa9w@mail.gmail.com> <CAG48ez0L_dRjArd_NMVSeQ3eJ5pGLJgNppsHxpBuBQbQHPU57w@mail.gmail.com>
In-Reply-To: <CAG48ez0L_dRjArd_NMVSeQ3eJ5pGLJgNppsHxpBuBQbQHPU57w@mail.gmail.com>
From: Barry Song <21cnbao@gmail.com>
Date: Thu, 5 Jun 2025 22:27:27 +1200
X-Gm-Features: AX0GCFvJzPQUg6xIqO8HvbiHQT4ZjWwTAyyWi78nTT_QjfOS3oSv-chCNrNnQG0
Message-ID: <CAGsJ_4wjSXDc_BsYQaELAxbpCUoCqbBJsabJkvJh3rZ+kJoULA@mail.gmail.com>
Subject: Re: [PATCH RFC v2] mm: use per_vma lock for MADV_DONTNEED
To: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>, akpm@linux-foundation.org, 
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	Barry Song <v-songbaohua@oppo.com>, "Liam R. Howlett" <Liam.Howlett@oracle.com>, 
	David Hildenbrand <david@redhat.com>, Vlastimil Babka <vbabka@suse.cz>, 
	Suren Baghdasaryan <surenb@google.com>, Lokesh Gidra <lokeshgidra@google.com>, 
	Tangquan Zheng <zhengtangquan@oppo.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 5D9401C0005
X-Stat-Signature: 9e6bqq4thdogpk3fhgscmsi693iz3fmm
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-HE-Tag: 1749119259-127983
X-HE-Meta: U2FsdGVkX1+RHRzKgXX1586+7EgAYsG6XwpvLPEqDSb1UFiIwlMp9vadZhBBhuMBcETZ3cH0WoWbIFlWQWSTPBFg8EG0WlW4bjfe9q12ajQwM9IVjG/p2KzJB8iDb7Bv2mrz9aUIHRQEfUki91dR1EUgLBfw2/HPJd+W5hGBViSY8VVn9QOrQA6ZPIv5Fb3VBymwJyjF03vd8rpVX8HWLHR8PcLWPW2fHzW08+Nsu/HKOsWn+3PsKnbZKWcbsrpVJjeQI2vS+rED7TSW1q6rxERM6BSN+fj5pepwaBLTNat0oEPPpZ1EfGPaUKYZfS9pCh9dbsf1jxUvezze+178HXJHiZQ/FVl3YVu13mO1xzGjSgx4xBrENqMg7EQR5zQEYoi7KArRBCUg4dSKfdEVzXQVB5qa8iXGzjj0sATq908AffqWdlv9VzNbkp/rU2rbHu5Wmz5S9GCM3rCRItMjzufeGEZzUINPwVKPmi7n2QpR7e+xNN7iobznY3W4rmjQyj/jI4S2NqGTpmQmMJSFWCSarnQvJlP2YIesiistK9CA9XP1gKTwWe/m9HO+a8DEmKg97wv3ZGttXZ6f9rLocmpmO3ekT/0BjWPKJmdumSvLCM5kJPedfYd7TrOBIV96j+oB5M03V2fGvc1pf0/kZ37GeFWOSUf5loXhKEmYnRdRNFlY5eznk0BWtJzkllqvUfuze26bmk1Cwmhifp8tq9uNlx2HkZBqMiS0fksFuLcGJYz4HpK0Xk6O4f9Gf4qz7VZks83pIqxYiYnmjF+io3h9OvUo4oDvZhyqMf+9lIe0IpuNSIBdkbL/45Hv5BzbrvlztCnUOCCrzacAcS/Zlox6h6IlGRfk4Yk8fQsTxnQmvzK4y5TOqnUY0KPdLpL3Ysje0XPJyOYESiWT0MdFkDyNm7hxLWlqlSTk2VEgvOYCDNSF6Z6hS7P2M8obxUGe+tbM9QkzsQxuvBgm0zE
 EVrduzgt
 IKTLvVKy0tcydFJYxjOBAe5/yag880EKuAVxpRNe8oafFtqspH6Z4SzFqsR4bYzVwogHq59a9ekc07TGe8wQxJuYC6sL10GXWSQ34eBPXnMyl1fEtVdkr/lERr+bhlbR+rKFBcCag9DbQPqQjjvIksQVaBRoHvci5zflLW+8SeAgMzFGWcAppKCP1XVn/D6aIDlOJFYSPbAuYhEEk1eiHOfyxno4fLtyswEmnOCB+pDVizRnaGl2Kwe4Cq2jZQk3AZq2rlQF2PsOsrM5csCS0zAO19ENXHMACadHjECC9PFB+PT7WhjL48Ziuz6IZsGJ/k1mPGHvE866tp23Gr0rw2Y7Lll0y7zryb71YPs87M1ltrZVIMZwbhMewjoKa8aZAWrv+wmtnkGV0mLqOIjKvO87b0JUlSgZ8IC1x37dd4OinvW+4UwqlI+8dMwAe75hBvWlXQpFJYSfQ2Kys0twspuDCzpBjmvJZtOvO8anFjQ9nJ19SGKoaZyyprkYuNxZvvdwqAtdQddzFLk0=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Jun 4, 2025 at 4:53=E2=80=AFAM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Jun 3, 2025 at 9:06=E2=80=AFAM Barry Song <21cnbao@gmail.com> wro=
te:
> > On Sat, May 31, 2025 at 8:41=E2=80=AFAM Jann Horn <jannh@google.com> wr=
ote:
> > > On Fri, May 30, 2025 at 4:34=E2=80=AFPM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > > On Fri, May 30, 2025 at 04:06:30PM +0200, Jann Horn wrote:
> > > > > On Fri, May 30, 2025 at 12:44=E2=80=AFPM Barry Song <21cnbao@gmai=
l.com> wrote:
> > > > > One important quirk of this is that it can, from what I can see, =
cause
> > > > > freeing of page tables (through pt_reclaim) without holding the m=
map
> > > > > lock at all:
> > > > >
> > > > > do_madvise [behavior=3DMADV_DONTNEED]
> > > > >   madvise_lock
> > > > >     lock_vma_under_rcu
> > > > >   madvise_do_behavior
> > > > >     madvise_single_locked_vma
> > > > >       madvise_vma_behavior
> > > > >         madvise_dontneed_free
> > > > >           madvise_dontneed_single_vma
> > > > >             zap_page_range_single_batched [.reclaim_pt =3D true]
> > > > >               unmap_single_vma
> > > > >                 unmap_page_range
> > > > >                   zap_p4d_range
> > > > >                     zap_pud_range
> > > > >                       zap_pmd_range
> > > > >                         zap_pte_range
> > > > >                           try_get_and_clear_pmd
> > > > >                           free_pte
> > > > >
> > > > > This clashes with the assumption in walk_page_range_novma() that
> > > > > holding the mmap lock in write mode is sufficient to prevent
> > > > > concurrent page table freeing, so it can probably lead to page ta=
ble
> > > > > UAF through the ptdump interface (see ptdump_walk_pgd()).
> > > >
> > > > Hmmmmmm is this because of the series that allows page table freein=
g on
> > > > zap... I think Zi's?
> > >
> > > Yeah, that was Qi Zheng's
> > > https://lore.kernel.org/all/92aba2b319a734913f18ba41e7d86a265f0b84e2.=
1733305182.git.zhengqi.arch@bytedance.com/
> > > .
> > >
> > > > We need to update the documentation on this then... which currently=
 states
> > > > the VMA need only be stable.
> > > >
> > > > I guess this is still the case except for the novma walker you ment=
ion.
> > > >
> > > > Relatedly, It's worth looking at Dev's series which introduces a co=
ncerning
> > > > new 'no lock at all' mode to the page table walker explicitly for n=
ovma. I
> > > > cc'd you :) See [0].
> > > >
> > > > [0]: https://lore.kernel.org/linux-mm/6a60c052-9935-489e-a38e-1b03a=
1a79155@lucifer.local/
> > >
> > > Yeah, I saw that you CC'ed me; at a first glance that seems relativel=
y
> > > innocuous to me as long as it's only done for kernel mappings where
> > > all the rules are different.
> > >
> > > >
> > > > >
> > > > > I think before this patch can land, you'll have to introduce some=
 new
> > > > > helper like:
> > > > >
> > > > > void mmap_write_lock_with_all_vmas(struct mm_struct *mm)
> > > > > {
> > > > >   mmap_write_lock(mm);
> > > > >   for_each_vma(vmi, vma)
> > > > >     vma_start_write(vma);
> > > > > }
> > > > >
> > > > > and use that in walk_page_range_novma() for user virtual address =
space
> > > > > walks, and update the comment in there.
> > > >
> > > > What dude? No, what? Marking literally all VMAs write locked? :/
> > > >
> > > > I think this could have unexpected impact no? We're basically disab=
ling VMA
> > > > locking when we're in novma, that seems... really silly?
> > >
> > > I mean, walk_page_range_novma() being used on user virtual address
> > > space is pretty much a debug-only thing, I don't think it matters if
> > > it has to spend time poking flags in a few thousand VMAs. I guess the
> > > alternative would be to say "ptdump just doesn't show entries between
> > > VMAs, which shouldn't exist in the first place", and change ptdump to
> > > do a normal walk that skips over userspace areas not covered by a VMA=
.
> > > Maybe that's cleaner.
> > >
> > > But FWIW, we already do worse than what I proposed here when
> > > installing MMU notifiers, with mm_take_all_locks().
> > >
> > > > > > +       else
> > > > > > +               __madvise_unlock(mm, madv_behavior->behavior);
> > > > > > +}
> > > > > > +
> > > > > >  static bool madvise_batch_tlb_flush(int behavior)
> > > > > >  {
> > > > > >         switch (behavior) {
> > > > > > @@ -1714,19 +1770,24 @@ static int madvise_do_behavior(struct m=
m_struct *mm,
> > > > > >                 unsigned long start, size_t len_in,
> > > > > >                 struct madvise_behavior *madv_behavior)
> > > > > >  {
> > > > > > +       struct vm_area_struct *vma =3D madv_behavior->vma;
> > > > > >         int behavior =3D madv_behavior->behavior;
> > > > > > +
> > > > > >         struct blk_plug plug;
> > > > > >         unsigned long end;
> > > > > >         int error;
> > > > > >
> > > > > >         if (is_memory_failure(behavior))
> > > > > >                 return madvise_inject_error(behavior, start, st=
art + len_in);
> > > > > > -       start =3D untagged_addr_remote(mm, start);
> > > > > > +       start =3D untagged_addr(start);
> > > > >
> > > > > Why is this okay? I see that X86's untagged_addr_remote() asserts=
 that
> > > > > the mmap lock is held, which is no longer the case here with your
> > > > > patch, but untagged_addr() seems wrong here, since we can be oper=
ating
> > > > > on another process. I think especially on X86 with 5-level paging=
 and
> > > > > LAM, there can probably be cases where address bits are used for =
part
> > > > > of the virtual address in one task while they need to be masked o=
ff in
> > > > > another task?
> > > > >
> > > > > I wonder if you'll have to refactor X86 and Risc-V first to make =
this
> > > > > work... ideally by making sure that their address tagging state
> > > > > updates are atomic and untagged_area_remote() works locklessly.
> > > >
> > > > Yeah I don't know why we're doing this at all? This seems new unles=
s I
> > > > missed it?
> > >
> > > Because untagged_addr_remote() has a mmap_assert_locked(mm) on x86 an=
d
> > > reads data that is updated under the mmap lock, I think? So without
> > > this change you should get a lockdep splat on x86.
> > >
> > > > > (Or you could try to use something like the
> > > > > mmap_write_lock_with_all_vmas() I proposed above for synchronizin=
g
> > > > > against untagged_addr(), first write-lock the MM and then write-l=
ock
> > > > > all VMAs in it...)
> > > >
> > > > This would completely eliminate the point of this patch no? The who=
le point
> > > > is not taking these locks... And I'm very much not in favour of
> > > > write-locking literally every single VMA. under any circumstances.
> > >
> > > I'm talking about doing this heavyweight locking in places like
> > > arch_prctl(ARCH_ENABLE_TAGGED_ADDR, ...) that can, if I understand
> > > correctly, essentially reconfigure the size of the virtual address
> > > space of a running process from 56-bit to 47-bit at the hardware leve=
l
> > > and cause address bits that were previously part of the virtual
> > > address to be ignored. READ_ONCE()/WRITE_ONCE() might do the job too,
> > > but then we'll have to keep in mind that two subsequent invocations o=
f
> > > untagged_addr() can translate a userspace-specified virtual address
> > > into two different virtual addresses at the page table level.
> >
> > I=E2=80=99m confused about how arch_prctl(ARCH_ENABLE_TAGGED_ADDR, ...)=
 can
> > reconfigure a running process from using 56-bit addresses to 47-bit.
> > I read the code and see the x86 kernel only supports LAM U57, and not
> > LAM U48 at all:
> >
> > static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long=
 nr_bits)
> > {
> >         ...
> >
> >         if (!nr_bits || nr_bits > LAM_U57_BITS) {
> >                 mmap_write_unlock(mm);
> >                 return -EINVAL;
> >         }
> >
> >         mm_enable_lam(mm);
> >         mmap_write_unlock(mm);
> >
> >         return 0;
> > }
>
> Oh, you're right, currently only LAM U57 is supported by Linux, I was
> making bad assumptions.
>
> commit 2f8794bd087e, which introduced that code, also mentions that
> "For now only LAM_U57 is supported, with 6 tag bits".
>
> > I still don't fully understand why x86 differs from ARM64,
> > where the same bit mask is always applied unconditionally.
> >
> > On ARM64, we can even enable or disable PROT_MTE on a per-VMA basis
> > using mmap or mprotect. However, the same bitmask operation is
> > always executed regardless of whether memory tags are present for a
> > given VMA.
>
> Hmm, true, that does look like a weird difference.
>
> > I mean, on arm64, if a process or a VMA doesn't have tag access
> > enabled, and we pass an address with high bits to madvise,
> > untagged_addr() will still strip the tag. But wouldn't that address
> > be invalid for a process or VMA that doesn't have TBI enabled?
>
> Yeah, I guess in that regard it might be fine to always strip the bits...
>
> Maybe the situation on arm64 is simpler, in that these bits are always
> either tag bits or unused on arm64, while my understanding is that on
> X86, the CPU supports changing the meaning of address bits between
> "part of virtual address" and "ignored tag bits". So I think stripping
> the bits might work fine for LAM U57, but not for LAM U48, and maybe
> the code is trying to be future-proof in case someone wants to add
> support for LAM U48?

Perhaps supporting two different tag lengths is a bug rather than a feature=
?

>
> It is arguably also a bit more robust to reject garbage addresses
> instead of ignoring bits that would cause the CPU to treat the address
> as noncanonical, but I guess doing that just in MM code is not a big
> problem. (Unless someone relies on seccomp filters that block
> manipulation of specific virtual address ranges via munmap() and such,
> but I think almost nobody does that. I think I've only seen that once
> in some rarely-used component of the Tor project.)
>

Unless we perform a strict check in the kernel to ensure the allocated tag
matches the logical tag, random high-bit values are still acceptable.
For example, x86 only checks if the mask is -1UL or above 57 bits=E2=80=94
there=E2=80=99s no mechanism to exclude arbitrary high-bit values.

Otherwise, there's no point in checking whether tagging is enabled for the =
VMA
or the mm. Thus, we end up applying a mask unconditionally anyway.

> But I am out of my depth here and might be severely misunderstanding
> what's going on.

Same here :-) Especially since I have no idea what's going on with RISC-V
with two different tagging lengths.

Thanks
Barry