* untagged_addr_remote() in do_madvise()
@ 2025-01-14 19:43 Liam R. Howlett
2025-01-14 20:15 ` Dave Hansen
2025-01-14 20:41 ` Lorenzo Stoakes
0 siblings, 2 replies; 5+ messages in thread
From: Liam R. Howlett @ 2025-01-14 19:43 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov
Cc: Shakeel Butt, SeongJae Park, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Andrew Morton, Jens Axboe, Pavel Begunkov,
linux-kernel, linux-mm, Ryan Roberts
Hello,
I noticed that mm/madivse.c:do_madvise() calls untagged_addr_remote()
after validating start.
Looking through git blame shows that this line was moved in
428e106ae1ad4 ("mm: Introduce untagged_addr_remote()") [1], with the
reason being:
The new helper untagged_addr_remote() has to be used when the address
targets remote process. It requires the mmap lock for target mm to be
taken.
Although this may be needed, we cannot move the untagging below
validating the start/end because we have not validated the start/end
that will be used for the operation, or at least, isn't clear why it's
okay?
Can anyone tell me why the code today is correct? That is, how can we
trust the validation of start/end is still okay after we change the
start/end by untagging the start?
I think we have to move the locking and the untagging above the
validation for this to work as expected?
[1] https://lore.kernel.org/all/20230312112612.31869-6-kirill.shutemov@linux.intel.com/
Thanks,
Liam
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: untagged_addr_remote() in do_madvise()
2025-01-14 19:43 untagged_addr_remote() in do_madvise() Liam R. Howlett
@ 2025-01-14 20:15 ` Dave Hansen
2025-01-14 20:41 ` Lorenzo Stoakes
1 sibling, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2025-01-14 20:15 UTC (permalink / raw)
To: Liam R. Howlett, dave.hansen, kirill.shutemov, Shakeel Butt,
SeongJae Park, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Andrew Morton, Jens Axboe, Pavel Begunkov,
linux-kernel, linux-mm, Ryan Roberts
On 1/14/25 11:43, Liam R. Howlett wrote:
> Can anyone tell me why the code today is correct? That is, how can we
> trust the validation of start/end is still okay after we change the
> start/end by untagging the start?
Well, let's walk through the start/end validation. First:
if (!PAGE_ALIGNED(start))
return -EINVAL;
That should stay valid as long as the tag/untag doesn't affect the lower
bits. All the tagging is upper bits that are far away from the
<PAGE_SHIFT bits. I can hardly imagine an implementation that would tag
in the lower bits.
end = start + len;
if (end < start)
return -EINVAL;
This one is a test for negative 'len' and for start+len overflows. It's
certainly possible that a tagged 'start' would overflow when an untagged
'start' would not. But something that overflows that positive/negative
boundary would also cross a tag boundary so it would probably be a bug
anyway.
The last check is:
if (end == start)
return 0;
But since 'end' is derived from 'start':
end = start + len;
I can't think of a way that changing 'start' (via untagging) will end
up changing the result 'end==start' comparison.
So, is the code "correct"? The overflow detection can certainly be
triggered with tagged addresses, but it's arguably doing a service in
that case since the input in nonsensical crossing tags.
I'd say it's correct, but far from *obviously* correct. It could
definitely use some clarity.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: untagged_addr_remote() in do_madvise()
2025-01-14 19:43 untagged_addr_remote() in do_madvise() Liam R. Howlett
2025-01-14 20:15 ` Dave Hansen
@ 2025-01-14 20:41 ` Lorenzo Stoakes
2025-01-14 21:13 ` Dave Hansen
1 sibling, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 20:41 UTC (permalink / raw)
To: Liam R. Howlett, dave.hansen, kirill.shutemov, Shakeel Butt,
SeongJae Park, David Hildenbrand, Vlastimil Babka, Andrew Morton,
Jens Axboe, Pavel Begunkov, linux-kernel, linux-mm, Ryan Roberts
+cc Kirill for commit
On Tue, Jan 14, 2025 at 02:43:17PM -0500, Liam R. Howlett wrote:
> Hello,
>
> I noticed that mm/madivse.c:do_madvise() calls untagged_addr_remote()
> after validating start.
>
> Looking through git blame shows that this line was moved in
> 428e106ae1ad4 ("mm: Introduce untagged_addr_remote()") [1], with the
> reason being:
>
> The new helper untagged_addr_remote() has to be used when the address
> targets remote process. It requires the mmap lock for target mm to be
> taken.
>
> Although this may be needed, we cannot move the untagging below
> validating the start/end because we have not validated the start/end
> that will be used for the operation, or at least, isn't clear why it's
> okay?
>
> Can anyone tell me why the code today is correct? That is, how can we
> trust the validation of start/end is still okay after we change the
> start/end by untagging the start?
>
> I think we have to move the locking and the untagging above the
> validation for this to work as expected?
>
> [1] https://lore.kernel.org/all/20230312112612.31869-6-kirill.shutemov@linux.intel.com/
>
> Thanks,
> Liam
To avoid losing context from IRC discussion, seems to me the only check that
needs to be potentially moved is:
end = start + len;
if (end < start)
return -EINVAL;
However, MADV_HWPOISON, MADV_SOFT_OFFLINE seems fundamentally broken for tagged
addresses:
#ifdef CONFIG_MEMORY_FAILURE
if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
return madvise_inject_error(behavior, start, start + len_in);
#endif
^ this is invoked before untagged_addr_remote() is called (as no mmap lock is
acquired) and so no attempt at untagging happens at all...!
We do need to fix this... unless CONFIG_MEMORY_FAILURE somehow automagically
disallows address tagging...
Perhaps need in that case to detect if the address is tagged and do some
horror-show hack, maybe acquire lock and untag and drop lock in that case... Or
maybe make it arch-dependent since it seems only x86 needs to actually hold the
lock for untagging?
Other than this case I think we are good to just put:
end = start + len;
if (end < start)
return -EINVAL;
Below the untagged_addr_remote() invocation?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: untagged_addr_remote() in do_madvise()
2025-01-14 20:41 ` Lorenzo Stoakes
@ 2025-01-14 21:13 ` Dave Hansen
2025-01-15 11:55 ` Lorenzo Stoakes
0 siblings, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2025-01-14 21:13 UTC (permalink / raw)
To: Lorenzo Stoakes, Liam R. Howlett, dave.hansen, kirill.shutemov,
Shakeel Butt, SeongJae Park, David Hildenbrand, Vlastimil Babka,
Andrew Morton, Jens Axboe, Pavel Begunkov, linux-kernel,
linux-mm, Ryan Roberts
On 1/14/25 12:41, Lorenzo Stoakes wrote:
...
> However, MADV_HWPOISON, MADV_SOFT_OFFLINE seems fundamentally broken for tagged
> addresses:
>
> #ifdef CONFIG_MEMORY_FAILURE
> if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> return madvise_inject_error(behavior, start, start + len_in);
> #endif
>
> ^ this is invoked before untagged_addr_remote() is called (as no mmap lock is
> acquired) and so no attempt at untagging happens at all...!
Except this call path:
madvise_inject_error() ->
get_user_pages_fast() ->
gup_fast_fallback()
does its own untagging:
start = untagged_addr(start) & PAGE_MASK;
It might also have some funky behavior if start+len_in overflows. But,
just as in the other case, it's invalid to begin with so I think
userspace kinda gets to keep the pieces.
But I do 100% agree that this is non-obvious. In a perfect world, tagged
addresses would get untagged at the user/kernel boundary in _one_ choke
point. But the world is hard and that would make things too easy and
then we wouldn't get paid the big bucks. ;)
To clarify things, I don't think it'd be the worst thing to just move
the madvise_inject_error() down and have that case acquire
mmap_read_lock(). Sure, it's not required, but it's basically debugging
code and I can't imagine it's avoiding the lock for performance reasons.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: untagged_addr_remote() in do_madvise()
2025-01-14 21:13 ` Dave Hansen
@ 2025-01-15 11:55 ` Lorenzo Stoakes
0 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2025-01-15 11:55 UTC (permalink / raw)
To: Dave Hansen
Cc: Liam R. Howlett, dave.hansen, kirill.shutemov, Shakeel Butt,
SeongJae Park, David Hildenbrand, Vlastimil Babka, Andrew Morton,
Jens Axboe, Pavel Begunkov, linux-kernel, linux-mm, Ryan Roberts
On Tue, Jan 14, 2025 at 01:13:36PM -0800, Dave Hansen wrote:
> On 1/14/25 12:41, Lorenzo Stoakes wrote:
> ...
> > However, MADV_HWPOISON, MADV_SOFT_OFFLINE seems fundamentally broken for tagged
> > addresses:
> >
> > #ifdef CONFIG_MEMORY_FAILURE
> > if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> > return madvise_inject_error(behavior, start, start + len_in);
> > #endif
> >
> > ^ this is invoked before untagged_addr_remote() is called (as no mmap lock is
> > acquired) and so no attempt at untagging happens at all...!
>
> Except this call path:
>
> madvise_inject_error() ->
> get_user_pages_fast() ->
> gup_fast_fallback()
>
> does its own untagging:
>
> start = untagged_addr(start) & PAGE_MASK;
>
Yeah you're right! Good spot.
> It might also have some funky behavior if start+len_in overflows. But,
> just as in the other case, it's invalid to begin with so I think
> userspace kinda gets to keep the pieces.
Right yeah.
>
> But I do 100% agree that this is non-obvious. In a perfect world, tagged
> addresses would get untagged at the user/kernel boundary in _one_ choke
> point. But the world is hard and that would make things too easy and
> then we wouldn't get paid the big bucks. ;)
Yeah agreed especially on that last bit ;)
I think it'd be good to have a comment there, I will stick on my todo to
add one.
Or Liam - if you're doing some changes here - maybe you could add? Just
something highlighting that gup_fast does the untagging?
>
> To clarify things, I don't think it'd be the worst thing to just move
> the madvise_inject_error() down and have that case acquire
> mmap_read_lock(). Sure, it's not required, but it's basically debugging
> code and I can't imagine it's avoiding the lock for performance reasons.
Yeah it's odd that, but that code is going to the lengths of using gup_fast
so I have to assume that maybe some debug user really does care about perf?
It'd need some more digging to really feel confident to use a lock there I
think.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-15 11:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-14 19:43 untagged_addr_remote() in do_madvise() Liam R. Howlett
2025-01-14 20:15 ` Dave Hansen
2025-01-14 20:41 ` Lorenzo Stoakes
2025-01-14 21:13 ` Dave Hansen
2025-01-15 11:55 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox