arm64 MTE tag storage reuse - alternatives to MIGRATE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
@ 2024-02-20 11:26 Alexandru Elisei
  2024-02-20 12:05 ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Elisei @ 2024-02-20 11:26 UTC (permalink / raw)
  To: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	david, eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk,
	willy, jgross, hch, geert, vitaly.wool, ddstreet, sjenning,
	hughd, linux-arm-kernel, linux-kernel, linux-arch, linux-mm

Hello,

This is a request to discuss alternatives to the current approach for
reusing the MTE tag storage memory for data allocations [1]. Each iteration
of the series uncovered new issues, the latest being that memory allocation
is being performed in atomic contexts [2]; I would like to start a
discussion regarding possible alternative, which would integrate better
with the memory management code.

This is a high level overview of the current approach:

 * Tag storage pages are put on the MIGRATE_CMA lists, meaning they can be
   used for data allocations like (almost) any other page in the system.

 * When a page is allocated as tagged, the corresponding tag storage is
   also allocated.

 * There's a static relationship between a page and the location in memory
   where its tags are stored. Because of this, if the corresponding tag
   storage is used for data, the tag storage page is migrated.

Although this is the most generic approach because tag storage pages are
treated like normal pages, it has some disadvantages:

 * HW KASAN (MTE in the kernel) cannot be used. The kernel allocates memory
   in atomic context, where migration is not possible.

 * Tag storage pages cannot be themselves tagged, and this means that all
   CMA pages, even those which aren't tag storage, cannot be used for
   tagged allocations.

 * Page migration is costly, and a process that uses MTE can experience
   measurable slowdowns if the tag storage it requires is in use for data.
   There might be ways to reduce this cost (by reducing the likelihood that
   tag storage pages are allocated), but it cannot be completely
   eliminated.

 * Worse yet, a userspace process can use a tag storage page in such a way
   that migration is effectively impossible [3],[4].  A malicious process
   can make use of this to prevent the allocation of tag storage for other
   processes in the system, leading to a degraded experience for the
   affected processes. Worst case scenario, progress becomes impossible for
   those processes.

One alternative approach I'm looking at right now is cleancache. Cleancache
was removed in v5.17 (commit 0a4ee518185e) because the only backend, the
tmem driver, had been removed earlier (in v5.3, commit 814bbf49dcd0).

With this approach, MTE tag storage would be implemented as a driver
backend for cleancache. When a tag storage page is needed for storing tags,
the page would simply be dropped from the cache (cleancache_get_page()
returns -1).

I believe this is a very good fit for tag storage reuse, because it allows
tag storage to be allocated even in atomic contexts, which enables MTE in
the kernel. As a bonus, all of the changes to MM from the current approach
wouldn't be needed, as tag storage allocation can be handled entirely in
set_ptes_at(), copy_*highpage() or arch_swap_restore().

Is this a viable approach that would be upstreamable? Are there other
solutions that I haven't considered? I'm very much open to any alternatives
that would make tag storage reuse viable.

[1] https://lore.kernel.org/all/20240125164256.4147-1-alexandru.elisei@arm.com/
[2] https://lore.kernel.org/all/CAMn1gO7M51QtxPxkRO3ogH1zasd2-vErWqoPTqGoPiEvr8Pvcw@mail.gmail.com/
[3] https://lore.kernel.org/linux-trace-kernel/4e7a4054-092c-4e34-ae00-0105d7c9343c@redhat.com/
[4] https://lore.kernel.org/linux-trace-kernel/92833873-cd70-44b0-9f34-f4ac11b9e498@redhat.com/

Thanks,
Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 11:26 arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA Alexandru Elisei
@ 2024-02-20 12:05 ` David Hildenbrand
  2024-02-20 13:26   ` Alexandru Elisei
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-02-20 12:05 UTC (permalink / raw)
  To: Alexandru Elisei, catalin.marinas, will, oliver.upton, maz,
	james.morse, suzuki.poulose, yuzenghui, pcc, steven.price,
	anshuman.khandual, eugenis, kcc, hyesoo.yu, rppt, akpm, peterz,
	konrad.wilk, willy, jgross, hch, geert, vitaly.wool, ddstreet,
	sjenning, hughd, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm

On 20.02.24 12:26, Alexandru Elisei wrote:
> Hello,
> 

Hi!

> This is a request to discuss alternatives to the current approach for
> reusing the MTE tag storage memory for data allocations [1]. Each iteration
> of the series uncovered new issues, the latest being that memory allocation
> is being performed in atomic contexts [2]; I would like to start a
> discussion regarding possible alternative, which would integrate better
> with the memory management code.
> 
> This is a high level overview of the current approach:
> 
>   * Tag storage pages are put on the MIGRATE_CMA lists, meaning they can be
>     used for data allocations like (almost) any other page in the system.
> 
>   * When a page is allocated as tagged, the corresponding tag storage is
>     also allocated.
> 
>   * There's a static relationship between a page and the location in memory
>     where its tags are stored. Because of this, if the corresponding tag
>     storage is used for data, the tag storage page is migrated.
> 
> Although this is the most generic approach because tag storage pages are
> treated like normal pages, it has some disadvantages:
> 
>   * HW KASAN (MTE in the kernel) cannot be used. The kernel allocates memory
>     in atomic context, where migration is not possible.
> 
>   * Tag storage pages cannot be themselves tagged, and this means that all
>     CMA pages, even those which aren't tag storage, cannot be used for
>     tagged allocations.
> 
>   * Page migration is costly, and a process that uses MTE can experience
>     measurable slowdowns if the tag storage it requires is in use for data.
>     There might be ways to reduce this cost (by reducing the likelihood that
>     tag storage pages are allocated), but it cannot be completely
>     eliminated.
> 
>   * Worse yet, a userspace process can use a tag storage page in such a way
>     that migration is effectively impossible [3],[4].  A malicious process
>     can make use of this to prevent the allocation of tag storage for other
>     processes in the system, leading to a degraded experience for the
>     affected processes. Worst case scenario, progress becomes impossible for
>     those processes.
> 
> One alternative approach I'm looking at right now is cleancache. Cleancache
> was removed in v5.17 (commit 0a4ee518185e) because the only backend, the
> tmem driver, had been removed earlier (in v5.3, commit 814bbf49dcd0).
> 
> With this approach, MTE tag storage would be implemented as a driver
> backend for cleancache. When a tag storage page is needed for storing tags,
> the page would simply be dropped from the cache (cleancache_get_page()
> returns -1).

With large folios in place, we'd likely want to investigate not working 
on individual pages, but on (possibly large) folios instead.

> 
> I believe this is a very good fit for tag storage reuse, because it allows
> tag storage to be allocated even in atomic contexts, which enables MTE in
> the kernel. As a bonus, all of the changes to MM from the current approach
> wouldn't be needed, as tag storage allocation can be handled entirely in
> set_ptes_at(), copy_*highpage() or arch_swap_restore().
> 
> Is this a viable approach that would be upstreamable? Are there other
> solutions that I haven't considered? I'm very much open to any alternatives
> that would make tag storage reuse viable.

As raised recently, I had similar ideas with something like virtio-mem 
in the past (wanted to call it virtio-tmem back then), but didn't have 
time to look into it yet.

I considered both, using special device memory as "cleancache" backend, 
and using it as backend storage for something similar to zswap. We would 
not need a memmap/"struct page" for that special device memory, which 
reduces memory overhead and makes "adding more memory" a more reliable 
operation.

Using it as "cleancache" backend does make some things a lot easier.

The idea would be to provide a variable amount of additional memory to a 
VM, that can be reclaimed easily and reliably on demand.

The details are a bit more involved, but in essence, imagine a special 
physical memory region that is provided by a the hypervisor via a device 
to the VM. A virtio device "owns" that region and the driver manages it, 
based on requests from the hypervisor.

Similar to virtio-mem, there are ways for the hypervisor to request 
changes to the memory consumption of a device (setting the requested 
size). So when requested to consume less, clean pagecache pages can be 
dropped and the memory can be handed back to the hypervisor.

Of course, likely we would want to consider using "slower" memory in the 
hypervisor to back such a device.

I also thought about better integrating memory reclaim in the 
hypervisor, similar to "MADV_FREE" semantic way. One idea I had was that 
the memory provided by the device might have "special" semantics (as if 
the memory is always marked MADV_FREE), whereby the hypervisor could 
reclaim+discard any memory in that region any time, and the driver would 
have ways to get notified about that, or detect that reclaim happened.

I learned that there are cases where data that is significantly larger 
than main memory might be read repeatedly. As long as there is free 
memory in the hypervisor, it could be used as a cache for clean 
pagecache pages. In contrast to memory ballonning + virtio-mem, that 
memory can be easily and reliably reclaimed. And reclaiming that memory 
cannot really hurt the VM, it would only affect performance.

Long story short: what I had in mind would require similar hooks (again).

In contrast to tmem, with arm64 MTE we could get an actual supported 
cleancache backend fairly easily. I recall that tmem was abandoned in 
XEN and never really reached production quality.

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 12:05 ` David Hildenbrand
@ 2024-02-20 13:26   ` Alexandru Elisei
  2024-02-20 14:07     ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Elisei @ 2024-02-20 13:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk, willy,
	jgross, hch, geert, vitaly.wool, ddstreet, sjenning, hughd,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

Hi David,

On Tue, Feb 20, 2024 at 01:05:42PM +0100, David Hildenbrand wrote:
> On 20.02.24 12:26, Alexandru Elisei wrote:
> > Hello,
> > 
> 
> Hi!
> 
> > This is a request to discuss alternatives to the current approach for
> > reusing the MTE tag storage memory for data allocations [1]. Each iteration
> > of the series uncovered new issues, the latest being that memory allocation
> > is being performed in atomic contexts [2]; I would like to start a
> > discussion regarding possible alternative, which would integrate better
> > with the memory management code.
> > 
> > This is a high level overview of the current approach:
> > 
> >   * Tag storage pages are put on the MIGRATE_CMA lists, meaning they can be
> >     used for data allocations like (almost) any other page in the system.
> > 
> >   * When a page is allocated as tagged, the corresponding tag storage is
> >     also allocated.
> > 
> >   * There's a static relationship between a page and the location in memory
> >     where its tags are stored. Because of this, if the corresponding tag
> >     storage is used for data, the tag storage page is migrated.
> > 
> > Although this is the most generic approach because tag storage pages are
> > treated like normal pages, it has some disadvantages:
> > 
> >   * HW KASAN (MTE in the kernel) cannot be used. The kernel allocates memory
> >     in atomic context, where migration is not possible.
> > 
> >   * Tag storage pages cannot be themselves tagged, and this means that all
> >     CMA pages, even those which aren't tag storage, cannot be used for
> >     tagged allocations.
> > 
> >   * Page migration is costly, and a process that uses MTE can experience
> >     measurable slowdowns if the tag storage it requires is in use for data.
> >     There might be ways to reduce this cost (by reducing the likelihood that
> >     tag storage pages are allocated), but it cannot be completely
> >     eliminated.
> > 
> >   * Worse yet, a userspace process can use a tag storage page in such a way
> >     that migration is effectively impossible [3],[4].  A malicious process
> >     can make use of this to prevent the allocation of tag storage for other
> >     processes in the system, leading to a degraded experience for the
> >     affected processes. Worst case scenario, progress becomes impossible for
> >     those processes.
> > 
> > One alternative approach I'm looking at right now is cleancache. Cleancache
> > was removed in v5.17 (commit 0a4ee518185e) because the only backend, the
> > tmem driver, had been removed earlier (in v5.3, commit 814bbf49dcd0).
> > 
> > With this approach, MTE tag storage would be implemented as a driver
> > backend for cleancache. When a tag storage page is needed for storing tags,
> > the page would simply be dropped from the cache (cleancache_get_page()
> > returns -1).
> 
> With large folios in place, we'd likely want to investigate not working on
> individual pages, but on (possibly large) folios instead.

Yes, that would be interesting. Since the backend has no way of controlling
what tag storage page will be needed for tags, and subsequently dropped
from the cache, we would have to figure out what to do if one of the pages
that is part of a large folio is dropped. The easiest solution that I can
see is to remove the entire folio from the cleancache, but that would mean
also dropping the rest of the pages from the folio unnecessarily.

> 
> > 
> > I believe this is a very good fit for tag storage reuse, because it allows
> > tag storage to be allocated even in atomic contexts, which enables MTE in
> > the kernel. As a bonus, all of the changes to MM from the current approach
> > wouldn't be needed, as tag storage allocation can be handled entirely in
> > set_ptes_at(), copy_*highpage() or arch_swap_restore().
> > 
> > Is this a viable approach that would be upstreamable? Are there other
> > solutions that I haven't considered? I'm very much open to any alternatives
> > that would make tag storage reuse viable.
> 
> As raised recently, I had similar ideas with something like virtio-mem in
> the past (wanted to call it virtio-tmem back then), but didn't have time to
> look into it yet.
> 
> I considered both, using special device memory as "cleancache" backend, and
> using it as backend storage for something similar to zswap. We would not
> need a memmap/"struct page" for that special device memory, which reduces
> memory overhead and makes "adding more memory" a more reliable operation.

Hm... this might not work with tag storage memory, the kernel needs to
perform cache maintenance on the memory when it transitions to and from
storing tags and storing data, so the memory must be mapped by the kernel.

> 
> Using it as "cleancache" backend does make some things a lot easier.
> 
> The idea would be to provide a variable amount of additional memory to a VM,
> that can be reclaimed easily and reliably on demand.
> 
> The details are a bit more involved, but in essence, imagine a special
> physical memory region that is provided by a the hypervisor via a device to
> the VM. A virtio device "owns" that region and the driver manages it, based
> on requests from the hypervisor.
> 
> Similar to virtio-mem, there are ways for the hypervisor to request changes
> to the memory consumption of a device (setting the requested size). So when
> requested to consume less, clean pagecache pages can be dropped and the
> memory can be handed back to the hypervisor.
> 
> Of course, likely we would want to consider using "slower" memory in the
> hypervisor to back such a device.

I'm not sure how useful that will be with tag storage reuse. KVM must
assume that **all** the memory that the guest uses is tagged and it needs
tag storage allocated (it's a known architectural limitation), so that will
leave even less tag storage memory to distribute between the host and the
guest(s).

Adding to that, at the moment Android is going to be the major (only?) user
of tag storage reuse, and as far as I know pKVM is more restrictive with
regards to the emulated devices and the memory that is shared between
guests and the host.

> 
> I also thought about better integrating memory reclaim in the hypervisor,
> similar to "MADV_FREE" semantic way. One idea I had was that the memory
> provided by the device might have "special" semantics (as if the memory is
> always marked MADV_FREE), whereby the hypervisor could reclaim+discard any
> memory in that region any time, and the driver would have ways to get
> notified about that, or detect that reclaim happened.
> 
> I learned that there are cases where data that is significantly larger than
> main memory might be read repeatedly. As long as there is free memory in the
> hypervisor, it could be used as a cache for clean pagecache pages. In
> contrast to memory ballonning + virtio-mem, that memory can be easily and
> reliably reclaimed. And reclaiming that memory cannot really hurt the VM, it
> would only affect performance.
> 
> Long story short: what I had in mind would require similar hooks (again).
> 
> In contrast to tmem, with arm64 MTE we could get an actual supported
> cleancache backend fairly easily. I recall that tmem was abandoned in XEN
> and never really reached production quality.

Yes, that was also my impression after reading commit 814bbf49dcd0 ("xen:
remove tmem driver").

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 13:26   ` Alexandru Elisei
@ 2024-02-20 14:07     ` David Hildenbrand
  2024-02-20 16:03       ` Alexandru Elisei
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-02-20 14:07 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk, willy,
	jgross, hch, geert, vitaly.wool, ddstreet, sjenning, hughd,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

>>
>> With large folios in place, we'd likely want to investigate not working on
>> individual pages, but on (possibly large) folios instead.
> 
> Yes, that would be interesting. Since the backend has no way of controlling
> what tag storage page will be needed for tags, and subsequently dropped
> from the cache, we would have to figure out what to do if one of the pages
> that is part of a large folio is dropped. The easiest solution that I can
> see is to remove the entire folio from the cleancache, but that would mean
> also dropping the rest of the pages from the folio unnecessarily.

Right, but likely that won't be an issue. Things get interesting when 
thinking about an efficient allocation approach.

> 
>>
>>>
>>> I believe this is a very good fit for tag storage reuse, because it allows
>>> tag storage to be allocated even in atomic contexts, which enables MTE in
>>> the kernel. As a bonus, all of the changes to MM from the current approach
>>> wouldn't be needed, as tag storage allocation can be handled entirely in
>>> set_ptes_at(), copy_*highpage() or arch_swap_restore().
>>>
>>> Is this a viable approach that would be upstreamable? Are there other
>>> solutions that I haven't considered? I'm very much open to any alternatives
>>> that would make tag storage reuse viable.
>>
>> As raised recently, I had similar ideas with something like virtio-mem in
>> the past (wanted to call it virtio-tmem back then), but didn't have time to
>> look into it yet.
>>
>> I considered both, using special device memory as "cleancache" backend, and
>> using it as backend storage for something similar to zswap. We would not
>> need a memmap/"struct page" for that special device memory, which reduces
>> memory overhead and makes "adding more memory" a more reliable operation.
> 
> Hm... this might not work with tag storage memory, the kernel needs to
> perform cache maintenance on the memory when it transitions to and from
> storing tags and storing data, so the memory must be mapped by the kernel.

The direct map will definitely be required I think (copy in/out data). 
But memmap for tag memory will likely not be required. Of course, it 
depends how to manage tag storage. Likely we have to store some 
metadata, hopefully we can avoid the full memmap and just use something 
else.

[...]

>> Similar to virtio-mem, there are ways for the hypervisor to request changes
>> to the memory consumption of a device (setting the requested size). So when
>> requested to consume less, clean pagecache pages can be dropped and the
>> memory can be handed back to the hypervisor.
>>
>> Of course, likely we would want to consider using "slower" memory in the
>> hypervisor to back such a device.
> 
> I'm not sure how useful that will be with tag storage reuse. KVM must
> assume that **all** the memory that the guest uses is tagged and it needs
> tag storage allocated (it's a known architectural limitation), so that will
> leave even less tag storage memory to distribute between the host and the
> guest(s).

Yes, I don't think this applies to tag storage.

> 
> Adding to that, at the moment Android is going to be the major (only?) user
> of tag storage reuse, and as far as I know pKVM is more restrictive with
> regards to the emulated devices and the memory that is shared between
> guests and the host.

Right, what I described here does not have overlap with tag storage 
besides requiring similar (cleancache) hooks.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 14:07     ` David Hildenbrand
@ 2024-02-20 16:03       ` Alexandru Elisei
  2024-02-20 16:16         ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Elisei @ 2024-02-20 16:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk, willy,
	jgross, hch, geert, vitaly.wool, ddstreet, sjenning, hughd,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	alexandru.elisei

Hi,

On Tue, Feb 20, 2024 at 03:07:22PM +0100, David Hildenbrand wrote:
> > > 
> > > With large folios in place, we'd likely want to investigate not working on
> > > individual pages, but on (possibly large) folios instead.
> > 
> > Yes, that would be interesting. Since the backend has no way of controlling
> > what tag storage page will be needed for tags, and subsequently dropped
> > from the cache, we would have to figure out what to do if one of the pages
> > that is part of a large folio is dropped. The easiest solution that I can
> > see is to remove the entire folio from the cleancache, but that would mean
> > also dropping the rest of the pages from the folio unnecessarily.
> 
> Right, but likely that won't be an issue. Things get interesting when
> thinking about an efficient allocation approach.

Indeed.

> 
> > 
> > > 
> > > > 
> > > > I believe this is a very good fit for tag storage reuse, because it allows
> > > > tag storage to be allocated even in atomic contexts, which enables MTE in
> > > > the kernel. As a bonus, all of the changes to MM from the current approach
> > > > wouldn't be needed, as tag storage allocation can be handled entirely in
> > > > set_ptes_at(), copy_*highpage() or arch_swap_restore().
> > > > 
> > > > Is this a viable approach that would be upstreamable? Are there other
> > > > solutions that I haven't considered? I'm very much open to any alternatives
> > > > that would make tag storage reuse viable.
> > > 
> > > As raised recently, I had similar ideas with something like virtio-mem in
> > > the past (wanted to call it virtio-tmem back then), but didn't have time to
> > > look into it yet.
> > > 
> > > I considered both, using special device memory as "cleancache" backend, and
> > > using it as backend storage for something similar to zswap. We would not
> > > need a memmap/"struct page" for that special device memory, which reduces
> > > memory overhead and makes "adding more memory" a more reliable operation.
> > 
> > Hm... this might not work with tag storage memory, the kernel needs to
> > perform cache maintenance on the memory when it transitions to and from
> > storing tags and storing data, so the memory must be mapped by the kernel.
> 
> The direct map will definitely be required I think (copy in/out data). But
> memmap for tag memory will likely not be required. Of course, it depends how
> to manage tag storage. Likely we have to store some metadata, hopefully we
> can avoid the full memmap and just use something else.

So I guess instead of ZONE_DEVICE I should try to use arch_add_memory()
directly? That has the limitation that it cannot be used by a driver
(symbol not exported to modules).

Thanks,
Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 16:03       ` Alexandru Elisei
@ 2024-02-20 16:16         ` David Hildenbrand
  2024-02-20 16:36           ` Alexandru Elisei
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-02-20 16:16 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk, willy,
	jgross, hch, geert, vitaly.wool, ddstreet, sjenning, hughd,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

>>>>> I believe this is a very good fit for tag storage reuse, because it allows
>>>>> tag storage to be allocated even in atomic contexts, which enables MTE in
>>>>> the kernel. As a bonus, all of the changes to MM from the current approach
>>>>> wouldn't be needed, as tag storage allocation can be handled entirely in
>>>>> set_ptes_at(), copy_*highpage() or arch_swap_restore().
>>>>>
>>>>> Is this a viable approach that would be upstreamable? Are there other
>>>>> solutions that I haven't considered? I'm very much open to any alternatives
>>>>> that would make tag storage reuse viable.
>>>>
>>>> As raised recently, I had similar ideas with something like virtio-mem in
>>>> the past (wanted to call it virtio-tmem back then), but didn't have time to
>>>> look into it yet.
>>>>
>>>> I considered both, using special device memory as "cleancache" backend, and
>>>> using it as backend storage for something similar to zswap. We would not
>>>> need a memmap/"struct page" for that special device memory, which reduces
>>>> memory overhead and makes "adding more memory" a more reliable operation.
>>>
>>> Hm... this might not work with tag storage memory, the kernel needs to
>>> perform cache maintenance on the memory when it transitions to and from
>>> storing tags and storing data, so the memory must be mapped by the kernel.
>>
>> The direct map will definitely be required I think (copy in/out data). But
>> memmap for tag memory will likely not be required. Of course, it depends how
>> to manage tag storage. Likely we have to store some metadata, hopefully we
>> can avoid the full memmap and just use something else.
> 
> So I guess instead of ZONE_DEVICE I should try to use arch_add_memory()
> directly? That has the limitation that it cannot be used by a driver
> (symbol not exported to modules).
You can certainly start with something simple, and we can work on 
removing that memmap allocation later.

Maybe we have to expose new primitives in the context of such drivers. 
arch_add_memory() likely also doesn't do what you need.

I recall that we had a way of only messing with the direct map.

Last time I worked with that was in the context of memtrace
(arch/powerpc/platforms/powernv/memtrace.c)

There, we call arch_create_linear_mapping()/arch_remove_linear_mapping().

... and now my memory comes back: we never finished factoring out 
arch_create_linear_mapping/arch_remove_linear_mapping so they would be 
available on all architectures.


Your driver will be very arm64 specific, so doing it in an arm64-special 
way might be good enough initially. For example, the arm64-core could 
detect that special memory region and just statically prepare the direct 
map and not expose the memory to the buddy/allocate a memmap. Similar to 
how we handle the crashkernel/kexec IIRC (we likely do not have a direct 
map for that, though; ).

[I was also wondering if we could simply dynamically map/unmap when 
required so you can just avoid creating the entire direct map; might bot 
be the best approach performance-wise, though]

There are a bunch of details to be sorted out, but I don't consider the 
directmap/memmap side of things a big problem.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA
  2024-02-20 16:16         ` David Hildenbrand
@ 2024-02-20 16:36           ` Alexandru Elisei
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandru Elisei @ 2024-02-20 16:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: catalin.marinas, will, oliver.upton, maz, james.morse,
	suzuki.poulose, yuzenghui, pcc, steven.price, anshuman.khandual,
	eugenis, kcc, hyesoo.yu, rppt, akpm, peterz, konrad.wilk, willy,
	jgross, hch, geert, vitaly.wool, ddstreet, sjenning, hughd,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	alexandru.elisei

Hi,

On Tue, Feb 20, 2024 at 05:16:26PM +0100, David Hildenbrand wrote:
> > > > > > I believe this is a very good fit for tag storage reuse, because it allows
> > > > > > tag storage to be allocated even in atomic contexts, which enables MTE in
> > > > > > the kernel. As a bonus, all of the changes to MM from the current approach
> > > > > > wouldn't be needed, as tag storage allocation can be handled entirely in
> > > > > > set_ptes_at(), copy_*highpage() or arch_swap_restore().
> > > > > > 
> > > > > > Is this a viable approach that would be upstreamable? Are there other
> > > > > > solutions that I haven't considered? I'm very much open to any alternatives
> > > > > > that would make tag storage reuse viable.
> > > > > 
> > > > > As raised recently, I had similar ideas with something like virtio-mem in
> > > > > the past (wanted to call it virtio-tmem back then), but didn't have time to
> > > > > look into it yet.
> > > > > 
> > > > > I considered both, using special device memory as "cleancache" backend, and
> > > > > using it as backend storage for something similar to zswap. We would not
> > > > > need a memmap/"struct page" for that special device memory, which reduces
> > > > > memory overhead and makes "adding more memory" a more reliable operation.
> > > > 
> > > > Hm... this might not work with tag storage memory, the kernel needs to
> > > > perform cache maintenance on the memory when it transitions to and from
> > > > storing tags and storing data, so the memory must be mapped by the kernel.
> > > 
> > > The direct map will definitely be required I think (copy in/out data). But
> > > memmap for tag memory will likely not be required. Of course, it depends how
> > > to manage tag storage. Likely we have to store some metadata, hopefully we
> > > can avoid the full memmap and just use something else.
> > 
> > So I guess instead of ZONE_DEVICE I should try to use arch_add_memory()
> > directly? That has the limitation that it cannot be used by a driver
> > (symbol not exported to modules).
> You can certainly start with something simple, and we can work on removing
> that memmap allocation later.
> 
> Maybe we have to expose new primitives in the context of such drivers.
> arch_add_memory() likely also doesn't do what you need.
> 
> I recall that we had a way of only messing with the direct map.
> 
> Last time I worked with that was in the context of memtrace
> (arch/powerpc/platforms/powernv/memtrace.c)
> 
> There, we call arch_create_linear_mapping()/arch_remove_linear_mapping().
> 
> ... and now my memory comes back: we never finished factoring out
> arch_create_linear_mapping/arch_remove_linear_mapping so they would be
> available on all architectures.
> 
> 
> Your driver will be very arm64 specific, so doing it in an arm64-special way
> might be good enough initially. For example, the arm64-core could detect
> that special memory region and just statically prepare the direct map and
> not expose the memory to the buddy/allocate a memmap. Similar to how we
> handle the crashkernel/kexec IIRC (we likely do not have a direct map for
> that, though; ).
> 
> [I was also wondering if we could simply dynamically map/unmap when required
> so you can just avoid creating the entire direct map; might bot be the best
> approach performance-wise, though]
> 
> There are a bunch of details to be sorted out, but I don't consider the
> directmap/memmap side of things a big problem.

Sounds reasonable, thank you for the feedback!

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-20 16:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-20 11:26 arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA Alexandru Elisei
2024-02-20 12:05 ` David Hildenbrand
2024-02-20 13:26   ` Alexandru Elisei
2024-02-20 14:07     ` David Hildenbrand
2024-02-20 16:03       ` Alexandru Elisei
2024-02-20 16:16         ` David Hildenbrand
2024-02-20 16:36           ` Alexandru Elisei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox