linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Peter Xu <peterx@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Kees Cook <kees@kernel.org>, Matthew Wilcox <willy@infradead.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Jann Horn <jannh@google.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	David Rientjes <rientjes@google.com>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Kairui Song <kasong@tencent.com>, Nhat Pham <nphamcs@gmail.com>,
	Baoquan He <bhe@redhat.com>, Chris Li <chrisl@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 1/4] mm: declare VMA flags by bit
Date: Thu, 30 Oct 2025 09:07:19 +0000	[thread overview]
Message-ID: <f1d67c7b-5e08-43b3-b98c-8a35a5095052@lucifer.local> (raw)
In-Reply-To: <20251029190228.GS760669@ziepe.ca>

On Wed, Oct 29, 2025 at 04:02:28PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 29, 2025 at 05:49:35PM +0000, Lorenzo Stoakes wrote:
> > We declare a sparse-bitwise type vma_flag_t which ensures that users can't
> > pass around invalid VMA flags by accident and prepares for future work
> > towards VMA flags being a bitmap where we want to ensure bit values are
> > type safe.
>
> Does sparse attach the type to the enum item? Normal C says the enum
> item's type is always 'int' if the value fits in int..

It does, have tested this, not sure if due to sparse doing extra work to
make that happen or GNU C doing more there.

You can see an anon enum being used for this in the examples in the sparse
docs for instance (see [0]) so it's kind of a 'thing' it seems.

I also tested this to make sure, when intentionally passing some non-flag
value to the functions which accept vma_flag_t and it got picked up right
away, checked via:

make C=2 -j $(nproc) 2>&1 | grep vma_flag_t

[0]:https://docs.kernel.org/dev-tools/sparse.html

>
> And I'm not sure bitwise rules work quite the way you'd like for this
> enum, it was ment for things that are |'d..
>
> I have seen an agressively abuse-resistent technique before, I don't
> really recommend it, but FYI:
>
> struct vma_bits {
>   u8 VMA_READ_BIT;
>   u8 VMA_WRITE_BIT;
>   ..
> };
> #define VMA_BIT(bit_name) BIT(offsetof(struct vma_bits, bit_name))

Oh my eyes! :P I mean kinda clever but also lord above :)

I don't think we need this afaict. The idea is to catch accidental
instances of e.g.:

	vma_test(vma, VM_WRITE);

Rather than abuse. Doing the above is _very easy_ and so I wanted to
explicitly have the bots moan if people make this mistake.

If only C had a stronger type system...

>
> > Finally, we have to update some rather silly if-deffery found in
> > mm/task_mmu.c which would otherwise break.
> >
> > Additionally, update the VMA userland testing vma_internal.h header to
> > include these changes.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  fs/proc/task_mmu.c               |   4 +-
> >  include/linux/mm.h               | 286 +++++++++++++++++---------
> >  tools/testing/vma/vma_internal.h | 341 +++++++++++++++++++++++++++----
>
> Maybe take the moment to put them in some vma_flags.h and then can
> that be included from tools/testing to avoid this copying??

It sucks to have this copy/paste yeah. The problem is to make the VMA
userland testing work, we intentionally isolate vma.h/vma.c dependencies
into vma_internal.h in mm/ and also do the same in the userland component,
so we can #include vma.c/h in the userland code.

So we'd have to have a strict requirement that vma_flags.h doesn't import
any other headers or at least none which aren't substituted somehow in the
tools/include directory.

The issue is people might quite reasonably update include/linux/vma_flags.h
to do more later and then break all of the VMA userland testing...

It's a bit of a delicate thing to keep it all

>
> > +/**
> > + * vma_flag_t - specifies an individual VMA flag by bit number.
> > + *
> > + * This value is made type safe by sparse to avoid passing invalid flag values
> > + * around.
> > + */
> > +typedef int __bitwise vma_flag_t;
> > +
> > +enum {
> > +	/* currently active flags */
> > +	VMA_READ_BIT = (__force vma_flag_t)0,
> > +	VMA_WRITE_BIT = (__force vma_flag_t)1,
> > +	VMA_EXEC_BIT = (__force vma_flag_t)2,
> > +	VMA_SHARED_BIT = (__force vma_flag_t)3,
> > +
> > +	/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> > +	VMA_MAYREAD_BIT = (__force vma_flag_t)4, /* limits for mprotect() etc */
> > +	VMA_MAYWRITE_BIT = (__force vma_flag_t)5,
> > +	VMA_MAYEXEC_BIT = (__force vma_flag_t)6,
> > +	VMA_MAYSHARE_BIT = (__force vma_flag_t)7,
> > +
> > +	VMA_GROWSDOWN_BIT = (__force vma_flag_t)8, /* general info on the segment */
> > +#ifdef CONFIG_MMU
> > +	VMA_UFFD_MISSING_BIT = (__force vma_flag_t)9, /* missing pages tracking */
> > +#else
> > +	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> > +	VMA_MAYOVERLAY_BIT = (__force vma_flag_t)9,
> > +#endif
> > +	/* Page-ranges managed without "struct page", just pure PFN */
> > +	VMA_PFNMAP_BIT = (__force vma_flag_t)10,
> > +
> > +	VMA_MAYBE_GUARD_BIT = (__force vma_flag_t)11,
> > +
> > +	VMA_UFFD_WP_BIT = (__force vma_flag_t)12, /* wrprotect pages tracking */
> > +
> > +	VMA_LOCKED_BIT = (__force vma_flag_t)13,
> > +	VMA_IO_BIT = (__force vma_flag_t)14, /* Memory mapped I/O or similar */
> > +
> > +	/* Used by madvise() */
> > +	VMA_SEQ_READ_BIT = (__force vma_flag_t)15, /* App will access data sequentially */
> > +	VMA_RAND_READ_BIT = (__force vma_flag_t)16, /* App will not benefit from clustered reads */
> > +
> > +	VMA_DONTCOPY_BIT = (__force vma_flag_t)17, /* Do not copy this vma on fork */
> > +	VMA_DONTEXPAND_BIT = (__force vma_flag_t)18, /* Cannot expand with mremap() */
> > +	VMA_LOCKONFAULT_BIT = (__force vma_flag_t)19, /* Lock pages covered when faulted in */
> > +	VMA_ACCOUNT_BIT = (__force vma_flag_t)20, /* Is a VM accounted object */
> > +	VMA_NORESERVE_BIT = (__force vma_flag_t)21, /* should the VM suppress accounting */
> > +	VMA_HUGETLB_BIT = (__force vma_flag_t)22, /* Huge TLB Page VM */
> > +	VMA_SYNC_BIT = (__force vma_flag_t)23, /* Synchronous page faults */
> > +	VMA_ARCH_1_BIT = (__force vma_flag_t)24, /* Architecture-specific flag */
> > +	VMA_WIPEONFORK_BIT = (__force vma_flag_t)25, /* Wipe VMA contents in child. */
> > +	VMA_DONTDUMP_BIT = (__force vma_flag_t)26, /* Do not include in the core dump */
> > +
> > +#ifdef CONFIG_MEM_SOFT_DIRTY
> > +	VMA_SOFTDIRTY_BIT = (__force vma_flag_t)27, /* Not soft dirty clean area */
> > +#endif
> > +
> > +	VMA_MIXEDMAP_BIT = (__force vma_flag_t)28, /* Can contain struct page and pure PFN pages */
> > +	VMA_HUGEPAGE_BIT = (__force vma_flag_t)29, /* MADV_HUGEPAGE marked this vma */
> > +	VMA_NOHUGEPAGE_BIT = (__force vma_flag_t)30, /* MADV_NOHUGEPAGE marked this vma */
> > +	VMA_MERGEABLE_BIT = (__force vma_flag_t)31, /* KSM may merge identical pages */
> > +
> > +#ifdef CONFIG_64BIT
> > +	/* These bits are reused, we define specific uses below. */
> > +#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> > +	VMA_HIGH_ARCH_0_BIT = (__force vma_flag_t)32,
> > +	VMA_HIGH_ARCH_1_BIT = (__force vma_flag_t)33,
> > +	VMA_HIGH_ARCH_2_BIT = (__force vma_flag_t)34,
> > +	VMA_HIGH_ARCH_3_BIT = (__force vma_flag_t)35,
> > +	VMA_HIGH_ARCH_4_BIT = (__force vma_flag_t)36,
> > +	VMA_HIGH_ARCH_5_BIT = (__force vma_flag_t)37,
> > +	VMA_HIGH_ARCH_6_BIT = (__force vma_flag_t)38,
> > +#endif
> > +
> > +	VMA_ALLOW_ANY_UNCACHED_BIT = (__force vma_flag_t)39,
> > +	VMA_DROPPABLE_BIT = (__force vma_flag_t)40,
> > +
> > +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
> > +	VMA_UFFD_MINOR_BIT = (__force vma_flag_t)41,
> > +#endif
> > +
> > +	VMA_SEALED_BIT = (__force vma_flag_t)42,
> > +#endif /* CONFIG_64BIT */
> > +};
> > +
> > +#define VMA_BIT(bit)	BIT((__force int)bit)
>
> > -/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> > -#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
> > -#define VM_MAYWRITE	0x00000020
> > -#define VM_MAYEXEC	0x00000040
> > -#define VM_MAYSHARE	0x00000080
> > +#define VM_MAYREAD	VMA_BIT(VMA_MAYREAD_BIT)
> > +#define VM_MAYWRITE	VMA_BIT(VMA_MAYWRITE_BIT)
> > +#define VM_MAYEXEC	VMA_BIT(VMA_MAYEXEC_BIT)
> > +#define VM_MAYSHARE	VMA_BIT(VMA_MAYSHARE_BIT)
>
> I suggest removing some of this duplication..
>
> #define DECLARE_VMA_BIT(name, bitno) \
>     NAME ## _BIT = (__force vma_flag_t)bitno,
>     NAME = BIT(bitno),
>
> enum {
>    DECLARE_VMA_BIT(VMA_READ, 0),
> }
>
> Especially since the #defines and enum need to have matching #ifdefs.
>
> It is OK to abuse the enum like the above, C won't get mad and works
> better in gdb/clangd.

I think having the enum anon avoids issues I've been concerned about with
named enum's containing flags when used as parameters yes.

>
> Later you can have a variation of the macro for your first sytem
> word/second system word idea.

Well I think we'd probably want to name the macro accordingly.

DECLARE_VMA_BIT_AND_FLAG() maybe? And mention in the comment that it's for
system word siz

>
> Otherwise I think this is a great thing to do, thanks!

Thanks :)

To give due credit - Matthew suggested this a while ago, I've been working
towards it with the mm flags first as an easier case to tackle.

It came out of my assuming that the VM_MAYBE_GUARD stuff didn't have a flag
free to do this in the 32-bit space. As part of this work it became
apparent I was wrong, so I implemented + sent that series yesterday (doh!)
but this change is still useful as it's beyond silly that we're constrained
like this.

I should actually probably put a Suggested-by for this, didn't even think
to, sorry Matthew! :)

>
> Jason

Cheers, Lorenzo


  reply	other threads:[~2025-10-30  9:07 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 17:49 [PATCH 0/4] initial work on making VMA flags a bitmap Lorenzo Stoakes
2025-10-29 17:49 ` [PATCH 1/4] mm: declare VMA flags by bit Lorenzo Stoakes
2025-10-29 19:02   ` Jason Gunthorpe
2025-10-30  9:07     ` Lorenzo Stoakes [this message]
2025-10-30 12:55       ` Jason Gunthorpe
2025-10-30 13:45         ` Lorenzo Stoakes
2025-10-31 13:58   ` Gregory Price
2025-10-29 17:49 ` [PATCH 2/4] mm: simplify and rename mm flags function for clarity Lorenzo Stoakes
2025-10-29 17:49 ` [PATCH 3/4] mm: introduce VMA flags bitmap type Lorenzo Stoakes
2025-10-29 17:49 ` [PATCH 4/4] mm: introduce and use VMA flag test helpers Lorenzo Stoakes
2025-10-29 19:22   ` Jason Gunthorpe
2025-10-30 10:04     ` Lorenzo Stoakes
2025-10-30 12:52       ` Jason Gunthorpe
2025-10-30 14:03         ` Lorenzo Stoakes
2025-10-30 17:54           ` Jason Gunthorpe
2025-10-30 19:21             ` Lorenzo Stoakes
2025-10-30  3:07 ` [PATCH 0/4] initial work on making VMA flags a bitmap Nico Pache
2025-10-30  8:33   ` Lorenzo Stoakes
2025-10-30  9:20     ` Nico Pache
2025-10-30  9:22       ` Nico Pache
2025-10-30 11:43     ` Alice Ryhl
2025-10-30 12:02       ` Lorenzo Stoakes
2025-10-30 13:38         ` Alice Ryhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1d67c7b-5e08-43b3-b98c-8a35a5095052@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=bsegall@google.com \
    --cc=byungchul@sk.com \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=kasong@tencent.com \
    --cc=kees@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=leon@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox