* [PATCH] mm/fadvise: validate offset in generic_fadvise
@ 2025-12-22 14:18 klourencodev
2025-12-22 17:52 ` Andrew Morton
2025-12-23 9:45 ` David Hildenbrand (Red Hat)
0 siblings, 2 replies; 6+ messages in thread
From: klourencodev @ 2025-12-22 14:18 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, david, Kevin Lourenco, Kevin Lourenco
From: Kevin Lourenco <klourencodev@gmail.com>
When converted to (u64) for page calculations, a negative offset
can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or
cache invalidation)
offsets are normally non-negative, but the API does not guarantee this. Since 'len' is already
validated, checking 'offset' here is reasonable to prevent potential system instability.
Signed-off-by: Kevin Lourenco <k.lourenco@criteo.com>
---
mm/fadvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 67028e30aa91..b63fe21416ff 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -43,7 +43,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
return -ESPIPE;
mapping = file->f_mapping;
- if (!mapping || len < 0)
+ if (!mapping || len < 0 || offset < 0)
return -EINVAL;
bdi = inode_to_bdi(mapping->host);
--
2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/fadvise: validate offset in generic_fadvise
2025-12-22 14:18 [PATCH] mm/fadvise: validate offset in generic_fadvise klourencodev
@ 2025-12-22 17:52 ` Andrew Morton
2025-12-22 23:38 ` Kevin Lourenco
2025-12-23 9:45 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2025-12-22 17:52 UTC (permalink / raw)
To: klourencodev; +Cc: linux-mm, david, Kevin Lourenco
On Mon, 22 Dec 2025 15:18:17 +0100 klourencodev@gmail.com wrote:
> From: Kevin Lourenco <klourencodev@gmail.com>
>
> When converted to (u64) for page calculations, a negative offset
> can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or
> cache invalidation)
>
> offsets are normally non-negative, but the API does not guarantee this. Since 'len' is already
> validated, checking 'offset' here is reasonable to prevent potential system instability.
>
Yeah, seems this code really didn't think about negative offsets.
fadvise(fd, -20, 40, advice);
does make some sense I guess - equivalent to
fadvise(fd, 0, 20, advice);
but whether this is how the implementation actually treats -20,40 is
unclear. Probably not, and what will filemap_flush_range() make of
this.
ugh.
I suppose I'll toss this into -next for now, see if it breaks any tests.
But more thought is needed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/fadvise: validate offset in generic_fadvise
2025-12-22 17:52 ` Andrew Morton
@ 2025-12-22 23:38 ` Kevin Lourenco
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Lourenco @ 2025-12-22 23:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, david, Kevin Lourenco
Thank you, Andrew, for taking a look and for queueing in -next.
I'll keep an eye on it and will adjust the patch if needed.
Kevin
Le lun. 22 déc. 2025 à 18:52, Andrew Morton
<akpm@linux-foundation.org> a écrit :
>
> On Mon, 22 Dec 2025 15:18:17 +0100 klourencodev@gmail.com wrote:
>
> > From: Kevin Lourenco <klourencodev@gmail.com>
> >
> > When converted to (u64) for page calculations, a negative offset
> > can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or
> > cache invalidation)
> >
> > offsets are normally non-negative, but the API does not guarantee this. Since 'len' is already
> > validated, checking 'offset' here is reasonable to prevent potential system instability.
> >
>
> Yeah, seems this code really didn't think about negative offsets.
>
> fadvise(fd, -20, 40, advice);
>
> does make some sense I guess - equivalent to
>
> fadvise(fd, 0, 20, advice);
>
> but whether this is how the implementation actually treats -20,40 is
> unclear. Probably not, and what will filemap_flush_range() make of
> this.
>
> ugh.
>
> I suppose I'll toss this into -next for now, see if it breaks any tests.
> But more thought is needed.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/fadvise: validate offset in generic_fadvise
2025-12-22 14:18 [PATCH] mm/fadvise: validate offset in generic_fadvise klourencodev
2025-12-22 17:52 ` Andrew Morton
@ 2025-12-23 9:45 ` David Hildenbrand (Red Hat)
2025-12-23 16:16 ` Kevin Lourenco
1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-23 9:45 UTC (permalink / raw)
To: klourencodev, linux-mm; +Cc: akpm, Kevin Lourenco
On 12/22/25 15:18, klourencodev@gmail.com wrote:
> From: Kevin Lourenco <klourencodev@gmail.com>
>
> When converted to (u64) for page calculations, a negative offset
> can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or
> cache invalidation)
>
> offsets are normally non-negative, but the API does not guarantee this. Since 'len' is already
> validated, checking 'offset' here is reasonable to prevent potential system instability.
Hi,
we tend to break lines as 72 chars in the patch description. I assume
Andrew will fix that up or already did it :)
>
> Signed-off-by: Kevin Lourenco <k.lourenco@criteo.com>
> ---
> mm/fadvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 67028e30aa91..b63fe21416ff 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -43,7 +43,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> return -ESPIPE;
>
> mapping = file->f_mapping;
> - if (!mapping || len < 0)
> + if (!mapping || len < 0 || offset < 0)
> return -EINVAL;
>
> bdi = inode_to_bdi(mapping->host);
The man page of fadvise64()/posix_madvise() is a bit unclear. It doesn't
really specify what's supposed to happen on negative size or offset.
Staring at test cases in LTP, we seem to have:
* Check the value that posix_fadvise returns for wrong ADVISE value
* Check the value that posix_fadvise returns for wrong file descriptor
* Check the value that posix_fadvise returns for wrong ADVISE value
* Check the value that posix_fadvise returns for pipe descriptor
And we primarily only seem to test what's documented in the man page to
fail.
Which raises the questions:
(1) Could we accidentally break some users out there?
(2) Should we update the man page to document what is supposed to happen
with negative size or offset.
--
Cheers
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/fadvise: validate offset in generic_fadvise
2025-12-23 9:45 ` David Hildenbrand (Red Hat)
@ 2025-12-23 16:16 ` Kevin Lourenco
2026-01-06 19:46 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Lourenco @ 2025-12-23 16:16 UTC (permalink / raw)
To: David Hildenbrand (Red Hat); +Cc: linux-mm, akpm, Kevin Lourenco
Hi David,
Thank you for your thoughts and recommendations. Well noted, I will
now start limiting the commit message to those specifications :)
For point (1):
I see POSIX definition like this: "The byte position in the file where
the next I/O operation begins."
(https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html?utm_source=chatgpt.com),
so what is almost certain is that a negative offset makes no sense /
there is no hidden mechanism or anything. I don't think we can
accidentally break users, my intuition is that nobody ever
intentionally passes a negative offset (probably the same for len).
We could rely exclusively on the user to ensure the offset is positive
and follow POSIX recommendation. But since errors can occur, I think
it can be hard to debug where a simple test, the same for len, could
have made their experience easier. I think this is also the path
FreeBSD chose: https://github.com/freebsd/freebsd-src/blob/main/sys/kern/vfs_syscalls.c#L4918
For point (2):
Since I think nobody intentionally passes negative values and since
that makes no sense semantically speaking, I think we can consider it
unnecessary to update the man page right now and only think about it
in the future if a negative offset gains a new meaning, for example.
Wdyt?
Le mar. 23 déc. 2025 à 10:45, David Hildenbrand (Red Hat)
<david@kernel.org> a écrit :
>
> On 12/22/25 15:18, klourencodev@gmail.com wrote:
> > From: Kevin Lourenco <klourencodev@gmail.com>
> >
> > When converted to (u64) for page calculations, a negative offset
> > can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or
> > cache invalidation)
> >
> > offsets are normally non-negative, but the API does not guarantee this. Since 'len' is already
> > validated, checking 'offset' here is reasonable to prevent potential system instability.
>
> Hi,
>
> we tend to break lines as 72 chars in the patch description. I assume
> Andrew will fix that up or already did it :)
>
> >
> > Signed-off-by: Kevin Lourenco <k.lourenco@criteo.com>
> > ---
> > mm/fadvise.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/fadvise.c b/mm/fadvise.c
> > index 67028e30aa91..b63fe21416ff 100644
> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -43,7 +43,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> > return -ESPIPE;
> >
> > mapping = file->f_mapping;
> > - if (!mapping || len < 0)
> > + if (!mapping || len < 0 || offset < 0)
> > return -EINVAL;
> >
> > bdi = inode_to_bdi(mapping->host);
>
> The man page of fadvise64()/posix_madvise() is a bit unclear. It doesn't
> really specify what's supposed to happen on negative size or offset.
>
> Staring at test cases in LTP, we seem to have:
>
> * Check the value that posix_fadvise returns for wrong ADVISE value
> * Check the value that posix_fadvise returns for wrong file descriptor
> * Check the value that posix_fadvise returns for wrong ADVISE value
> * Check the value that posix_fadvise returns for pipe descriptor
>
> And we primarily only seem to test what's documented in the man page to
> fail.
>
> Which raises the questions:
> (1) Could we accidentally break some users out there?
> (2) Should we update the man page to document what is supposed to happen
> with negative size or offset.
>
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/fadvise: validate offset in generic_fadvise
2025-12-23 16:16 ` Kevin Lourenco
@ 2026-01-06 19:46 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 19:46 UTC (permalink / raw)
To: Kevin Lourenco; +Cc: linux-mm, akpm, Kevin Lourenco
On 12/23/25 17:16, Kevin Lourenco wrote:
> Hi David,
Hi!
>
> Thank you for your thoughts and recommendations. Well noted, I will
> now start limiting the commit message to those specifications :)
>
> For point (1):
>
> I see POSIX definition like this: "The byte position in the file where
> the next I/O operation begins."
> (https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html?utm_source=chatgpt.com),
> so what is almost certain is that a negative offset makes no sense /
> there is no hidden mechanism or anything.
All I find in that regard is
"... starting at offset and continuing for len bytes. The specified
range need not currently exist in the file. If len is zero, all data
from offset to the largest possible value of the file offset for that
file shall be specified. "
So, something with a negative offset sure does not exist in the file.
Which makes me wonder whether we are here in a scenario where we would
simply ignore everything "negative" (clamp offset to 0), simply return 0
(because it's not documented to create an error), or whether we are in
undefined behavior land and can simply return EINVAL.
> I don't think we can
> accidentally break users, my intuition is that nobody ever
> intentionally passes a negative offset (probably the same for len).
Some programs do stupid things, you would be surprised :)
>
> We could rely exclusively on the user to ensure the offset is positive
> and follow POSIX recommendation. But since errors can occur, I think
> it can be hard to debug where a simple test, the same for len, could
> have made their experience easier. I think this is also the path
> FreeBSD chose: https://github.com/freebsd/freebsd-src/blob/main/sys/kern/vfs_syscalls.c#L4918
That's very good thing to note!
Interestingly, they also check
offset > OFF_MAX - len
So checking that offset+len will not overflow. Do we want something
similar? (check_add_overflow() etc. )
>
> For point (2):
>
> Since I think nobody intentionally passes negative values and since
> that makes no sense semantically speaking, I think we can consider it
Well, then why document negative length? ;) We should be as clear as
possible in the documentation.
If POSIX would have been clearer we wouldn't have this discussion :)
> unnecessary to update the man page right now and only think about it
> in the future if a negative offset gains a new meaning, for example.
>
> Wdyt?
Likely we should really update the man page to reflect reality.
"Starting with Linux v7.0, posix_fadvise() will fail with ...".
Given that FreeBSD rejects negative offsets I guess we are good.
--
Cheers
David
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-06 19:46 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-22 14:18 [PATCH] mm/fadvise: validate offset in generic_fadvise klourencodev
2025-12-22 17:52 ` Andrew Morton
2025-12-22 23:38 ` Kevin Lourenco
2025-12-23 9:45 ` David Hildenbrand (Red Hat)
2025-12-23 16:16 ` Kevin Lourenco
2026-01-06 19:46 ` David Hildenbrand (Red Hat)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox