* [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
@ 2024-11-15 12:41 Lorenzo Stoakes
2024-11-15 12:41 ` [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Lorenzo Stoakes
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
Critical fixes for mmap_region(), backported to 6.6.y.
Some notes on differences from upstream:
* In this kernel is_shared_maywrite() does not exist and the code uses
VM_SHARED to determine whether mapping_map_writable() /
mapping_unmap_writable() should be invoked. This backport therefore
follows suit.
* Each version of these series is confronted by a slightly different
mmap_region(), so we must adapt the change for each stable version. The
approach remains the same throughout, however, and we correctly avoid
closing the VMA part way through any __mmap_region() operation.
Lorenzo Stoakes (5):
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm: unconditionally close VMAs on error
mm: refactor map_deny_write_exec()
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: resolve faulty mmap_region() error path behaviour
arch/arm64/include/asm/mman.h | 10 ++-
arch/parisc/include/asm/mman.h | 5 +-
include/linux/mman.h | 28 ++++++--
mm/internal.h | 45 ++++++++++++
mm/mmap.c | 128 ++++++++++++++++++---------------
mm/mprotect.c | 2 +-
mm/nommu.c | 9 ++-
mm/shmem.c | 3 -
8 files changed, 153 insertions(+), 77 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
@ 2024-11-15 12:41 ` Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: avoid unsafe VMA hook invocation when error arises on mmap hook" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 2/5] mm: unconditionally close VMAs on error Lorenzo Stoakes
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
[ Upstream commit 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf ]
Patch series "fix error handling in mmap_region() and refactor
(hotfixes)", v4.
mmap_region() is somewhat terrifying, with spaghetti-like control flow and
numerous means by which issues can arise and incomplete state, memory
leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
This series goes to great lengths to simplify how mmap_region() works and
to avoid unwinding errors late on in the process of setting up the VMA for
the new mapping, and equally avoids such operations occurring while the
VMA is in an inconsistent state.
The patches in this series comprise the minimal changes required to
resolve existing issues in mmap_region() error handling, in order that
they can be hotfixed and backported. There is additionally a follow up
series which goes further, separated out from the v1 series and sent and
updated separately.
This patch (of 5):
After an attempted mmap() fails, we are no longer in a situation where we
can safely interact with VMA hooks. This is currently not enforced,
meaning that we need complicated handling to ensure we do not incorrectly
call these hooks.
We can avoid the whole issue by treating the VMA as suspect the moment
that the file->f_ops->mmap() function reports an error by replacing
whatever VMA operations were installed with a dummy empty set of VMA
operations.
We do so through a new helper function internal to mm - mmap_file() -
which is both more logically named than the existing call_mmap() function
and correctly isolates handling of the vm_op reassignment to mm.
All the existing invocations of call_mmap() outside of mm are ultimately
nested within the call_mmap() from mm, which we now replace.
It is therefore safe to leave call_mmap() in place as a convenience
function (and to avoid churn). The invokers are:
ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap()
coda_file_operations -> mmap -> coda_file_mmap()
shm_file_operations -> shm_mmap()
shm_file_operations_huge -> shm_mmap()
dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops
-> i915_gem_dmabuf_mmap()
None of these callers interact with vm_ops or mappings in a problematic
way on error, quickly exiting out.
Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/internal.h | 27 +++++++++++++++++++++++++++
mm/mmap.c | 4 ++--
mm/nommu.c | 4 ++--
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index ef8d787a510c..d52d6b57dafb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -83,6 +83,33 @@ static inline void *folio_raw_mapping(struct folio *folio)
return (void *)(mapping & ~PAGE_MAPPING_FLAGS);
}
+/*
+ * This is a file-backed mapping, and is about to be memory mapped - invoke its
+ * mmap hook and safely handle error conditions. On error, VMA hooks will be
+ * mutated.
+ *
+ * @file: File which backs the mapping.
+ * @vma: VMA which we are mapping.
+ *
+ * Returns: 0 if success, error otherwise.
+ */
+static inline int mmap_file(struct file *file, struct vm_area_struct *vma)
+{
+ int err = call_mmap(file, vma);
+
+ if (likely(!err))
+ return 0;
+
+ /*
+ * OK, we tried to call the file hook for mmap(), but an error
+ * arose. The mapping is in an inconsistent state and we most not invoke
+ * any further hooks on it.
+ */
+ vma->vm_ops = &vma_dummy_vm_ops;
+
+ return err;
+}
+
void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio,
int nr_throttled);
static inline void acct_reclaim_writeback(struct folio *folio)
diff --git a/mm/mmap.c b/mm/mmap.c
index 6530e9cac458..8a055bae6bdb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2779,7 +2779,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}
vma->vm_file = get_file(file);
- error = call_mmap(file, vma);
+ error = mmap_file(file, vma);
if (error)
goto unmap_and_free_vma;
@@ -2793,7 +2793,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_iter_config(&vmi, addr, end);
/*
- * If vm_flags changed after call_mmap(), we should try merge
+ * If vm_flags changed after mmap_file(), we should try merge
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
diff --git a/mm/nommu.c b/mm/nommu.c
index 7f9e9e5a0e12..e976c62264c9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -896,7 +896,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma)
{
int ret;
- ret = call_mmap(vma->vm_file, vma);
+ ret = mmap_file(vma->vm_file, vma);
if (ret == 0) {
vma->vm_region->vm_top = vma->vm_region->vm_end;
return 0;
@@ -929,7 +929,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
* happy.
*/
if (capabilities & NOMMU_MAP_DIRECT) {
- ret = call_mmap(vma->vm_file, vma);
+ ret = mmap_file(vma->vm_file, vma);
/* shouldn't return success if we're not sharing */
if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags)))
ret = -ENOSYS;
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6.6.y 2/5] mm: unconditionally close VMAs on error
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
2024-11-15 12:41 ` [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Lorenzo Stoakes
@ 2024-11-15 12:41 ` Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: unconditionally close VMAs on error" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 3/5] mm: refactor map_deny_write_exec() Lorenzo Stoakes
` (5 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
[ Upstream commit 4080ef1579b2413435413988d14ac8c68e4d42c8 ]
Incorrect invocation of VMA callbacks when the VMA is no longer in a
consistent state is bug prone and risky to perform.
With regards to the important vm_ops->close() callback We have gone to
great lengths to try to track whether or not we ought to close VMAs.
Rather than doing so and risking making a mistake somewhere, instead
unconditionally close and reset vma->vm_ops to an empty dummy operations
set with a NULL .close operator.
We introduce a new function to do so - vma_close() - and simplify existing
vms logic which tracked whether we needed to close or not.
This simplifies the logic, avoids incorrect double-calling of the .close()
callback and allows us to update error paths to simply call vma_close()
unconditionally - making VMA closure idempotent.
Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/internal.h | 18 ++++++++++++++++++
mm/mmap.c | 9 +++------
mm/nommu.c | 3 +--
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index d52d6b57dafb..36c6693f4ebf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -110,6 +110,24 @@ static inline int mmap_file(struct file *file, struct vm_area_struct *vma)
return err;
}
+/*
+ * If the VMA has a close hook then close it, and since closing it might leave
+ * it in an inconsistent state which makes the use of any hooks suspect, clear
+ * them down by installing dummy empty hooks.
+ */
+static inline void vma_close(struct vm_area_struct *vma)
+{
+ if (vma->vm_ops && vma->vm_ops->close) {
+ vma->vm_ops->close(vma);
+
+ /*
+ * The mapping is in an inconsistent state, and no further hooks
+ * may be invoked upon it.
+ */
+ vma->vm_ops = &vma_dummy_vm_ops;
+ }
+}
+
void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio,
int nr_throttled);
static inline void acct_reclaim_writeback(struct folio *folio)
diff --git a/mm/mmap.c b/mm/mmap.c
index 8a055bae6bdb..9fefd13640d1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -137,8 +137,7 @@ void unlink_file_vma(struct vm_area_struct *vma)
static void remove_vma(struct vm_area_struct *vma, bool unreachable)
{
might_sleep();
- if (vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (vma->vm_file)
fput(vma->vm_file);
mpol_put(vma_policy(vma));
@@ -2899,8 +2898,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return addr;
close_and_free_vma:
- if (file && vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (file || vma->vm_file) {
unmap_and_free_vma:
@@ -3392,8 +3390,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
return new_vma;
out_vma_link:
- if (new_vma->vm_ops && new_vma->vm_ops->close)
- new_vma->vm_ops->close(new_vma);
+ vma_close(new_vma);
if (new_vma->vm_file)
fput(new_vma->vm_file);
diff --git a/mm/nommu.c b/mm/nommu.c
index e976c62264c9..8bc339050e6d 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -600,8 +600,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
*/
static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma)
{
- if (vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (vma->vm_file)
fput(vma->vm_file);
put_nommu_region(vma->vm_region);
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6.6.y 3/5] mm: refactor map_deny_write_exec()
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
2024-11-15 12:41 ` [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Lorenzo Stoakes
2024-11-15 12:41 ` [PATCH 6.6.y 2/5] mm: unconditionally close VMAs on error Lorenzo Stoakes
@ 2024-11-15 12:41 ` Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: refactor map_deny_write_exec()" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 4/5] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Lorenzo Stoakes
` (4 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
[ Upstream commit 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf ]
Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add
some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mman.h | 21 ++++++++++++++++++---
mm/mmap.c | 2 +-
mm/mprotect.c | 2 +-
3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index db4741007bef..651705c2bf47 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -187,16 +187,31 @@ static inline bool arch_memory_deny_write_exec_supported(void)
*
* d) mmap(PROT_READ | PROT_EXEC)
* mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ *
+ * This is only applicable if the user has set the Memory-Deny-Write-Execute
+ * (MDWE) protection mask for the current process.
+ *
+ * @old specifies the VMA flags the VMA originally possessed, and @new the ones
+ * we propose to set.
+ *
+ * Return: false if proposed change is OK, true if not ok and should be denied.
*/
-static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags)
+static inline bool map_deny_write_exec(unsigned long old, unsigned long new)
{
+ /* If MDWE is disabled, we have nothing to deny. */
if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
return false;
- if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+ /* If the new VMA is not executable, we have nothing to deny. */
+ if (!(new & VM_EXEC))
+ return false;
+
+ /* Under MDWE we do not accept newly writably executable VMAs... */
+ if (new & VM_WRITE)
return true;
- if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+ /* ...nor previously non-executable VMAs becoming executable. */
+ if (!(old & VM_EXEC))
return true;
return false;
diff --git a/mm/mmap.c b/mm/mmap.c
index 9fefd13640d1..d71ac65563b2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2826,7 +2826,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma, vma->vm_flags)) {
+ if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
error = -EACCES;
goto close_and_free_vma;
}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b94fbb45d5c7..7e870a8c9402 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -791,7 +791,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
break;
}
- if (map_deny_write_exec(vma, newflags)) {
+ if (map_deny_write_exec(vma->vm_flags, newflags)) {
error = -EACCES;
break;
}
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6.6.y 4/5] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
` (2 preceding siblings ...)
2024-11-15 12:41 ` [PATCH 6.6.y 3/5] mm: refactor map_deny_write_exec() Lorenzo Stoakes
@ 2024-11-15 12:41 ` Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 5/5] mm: resolve faulty mmap_region() error path behaviour Lorenzo Stoakes
` (3 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
[ Upstream commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81 ]
Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm@linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/arm64/include/asm/mman.h | 10 +++++++---
arch/parisc/include/asm/mman.h | 5 +++--
include/linux/mman.h | 7 ++++---
mm/mmap.c | 2 +-
mm/nommu.c | 2 +-
mm/shmem.c | 3 ---
6 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..ef35c52aabd6 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -3,6 +3,8 @@
#define __ASM_MMAN_H__
#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/shmem_fs.h>
#include <linux/types.h>
#include <uapi/asm/mman.h>
@@ -21,19 +23,21 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
}
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file,
+ unsigned long flags)
{
/*
* Only allow MTE on anonymous mappings as these are guaranteed to be
* backed by tags-capable memory. The vm_flags may be overridden by a
* filesystem supporting MTE (RAM-based).
*/
- if (system_supports_mte() && (flags & MAP_ANONYMOUS))
+ if (system_supports_mte() &&
+ ((flags & MAP_ANONYMOUS) || shmem_file(file)))
return VM_MTE_ALLOWED;
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
static inline bool arch_validate_prot(unsigned long prot,
unsigned long addr __always_unused)
diff --git a/arch/parisc/include/asm/mman.h b/arch/parisc/include/asm/mman.h
index 89b6beeda0b8..663f587dc789 100644
--- a/arch/parisc/include/asm/mman.h
+++ b/arch/parisc/include/asm/mman.h
@@ -2,6 +2,7 @@
#ifndef __ASM_MMAN_H__
#define __ASM_MMAN_H__
+#include <linux/fs.h>
#include <uapi/asm/mman.h>
/* PARISC cannot allow mdwe as it needs writable stacks */
@@ -11,7 +12,7 @@ static inline bool arch_memory_deny_write_exec_supported(void)
}
#define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags)
{
/*
* The stack on parisc grows upwards, so if userspace requests memory
@@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
#endif /* __ASM_MMAN_H__ */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 651705c2bf47..b2e2677ea156 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_MMAN_H
#define _LINUX_MMAN_H
+#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/percpu_counter.h>
@@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long pages)
#endif
#ifndef arch_calc_vm_flag_bits
-#define arch_calc_vm_flag_bits(flags) 0
+#define arch_calc_vm_flag_bits(file, flags) 0
#endif
#ifndef arch_validate_prot
@@ -151,12 +152,12 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
* Combine the mmap "flags" argument into "vm_flags" used internally.
*/
static inline unsigned long
-calc_vm_flag_bits(unsigned long flags)
+calc_vm_flag_bits(struct file *file, unsigned long flags)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
- arch_calc_vm_flag_bits(flags);
+ arch_calc_vm_flag_bits(file, flags);
}
unsigned long vm_commit_limit(void);
diff --git a/mm/mmap.c b/mm/mmap.c
index d71ac65563b2..fca3429da2fe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1273,7 +1273,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
* to. we assume access permissions have been handled by the open
* of the memory object, so we don't do any here.
*/
- vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
+ vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
if (flags & MAP_LOCKED)
diff --git a/mm/nommu.c b/mm/nommu.c
index 8bc339050e6d..7d37b734e66b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -853,7 +853,7 @@ static unsigned long determine_vm_flags(struct file *file,
{
unsigned long vm_flags;
- vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags);
+ vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags);
if (!file) {
/*
diff --git a/mm/shmem.c b/mm/shmem.c
index 5d076022da24..78c061517a72 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2402,9 +2402,6 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
if (ret)
return ret;
- /* arm64 - allow memory tagging on RAM-based files */
- vm_flags_set(vma, VM_MTE_ALLOWED);
-
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
if (inode->i_nlink)
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6.6.y 5/5] mm: resolve faulty mmap_region() error path behaviour
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
` (3 preceding siblings ...)
2024-11-15 12:41 ` [PATCH 6.6.y 4/5] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Lorenzo Stoakes
@ 2024-11-15 12:41 ` Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: resolve faulty mmap_region() error path behaviour" has been added to the 6.6-stable tree gregkh
2024-11-15 17:17 ` [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Vlastimil Babka
` (2 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-15 12:41 UTC (permalink / raw)
To: stable
Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn,
linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
[ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]
The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
Taking advantage of previous patches in this series we move a number of
checks earlier in the code, simplifying things by moving the core of the
logic into a static internal function __mmap_region().
Doing this allows us to perform a number of checks up front before we do
any real work, and allows us to unwind the writable unmap check
unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
validation unconditionally also.
We move a number of things here:
1. We preallocate memory for the iterator before we call the file-backed
memory hook, allowing us to exit early and avoid having to perform
complicated and error-prone close/free logic. We carefully free
iterator state on both success and error paths.
2. The enclosing mmap_region() function handles the mapping_map_writable()
logic early. Previously the logic had the mapping_map_writable() at the
point of mapping a newly allocated file-backed VMA, and a matching
mapping_unmap_writable() on success and error paths.
We now do this unconditionally if this is a file-backed, shared writable
mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
doing so does not invalidate the seal check we just performed, and we in
any case always decrement the counter in the wrapper.
We perform a debug assert to ensure a driver does not attempt to do the
opposite.
3. We also move arch_validate_flags() up into the mmap_region()
function. This is only relevant on arm64 and sparc64, and the check is
only meaningful for SPARC with ADI enabled. We explicitly add a warning
for this arch if a driver invalidates this check, though the code ought
eventually to be fixed to eliminate the need for this.
With all of these measures in place, we no longer need to explicitly close
the VMA on error paths, as we place all checks which might fail prior to a
call to any driver mmap hook.
This eliminates an entire class of errors, makes the code easier to reason
about and more robust.
Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap.c | 115 +++++++++++++++++++++++++++++++-----------------------
1 file changed, 66 insertions(+), 49 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index fca3429da2fe..e4dfeaef668a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2666,14 +2666,14 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
-unsigned long mmap_region(struct file *file, unsigned long addr,
+static unsigned long __mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
struct vm_area_struct *next, *prev, *merge;
- pgoff_t pglen = len >> PAGE_SHIFT;
+ pgoff_t pglen = PHYS_PFN(len);
unsigned long charged = 0;
unsigned long end = addr + len;
unsigned long merge_start = addr, merge_end = end;
@@ -2770,25 +2770,26 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma->vm_page_prot = vm_get_page_prot(vm_flags);
vma->vm_pgoff = pgoff;
- if (file) {
- if (vm_flags & VM_SHARED) {
- error = mapping_map_writable(file->f_mapping);
- if (error)
- goto free_vma;
- }
+ if (vma_iter_prealloc(&vmi, vma)) {
+ error = -ENOMEM;
+ goto free_vma;
+ }
+ if (file) {
vma->vm_file = get_file(file);
error = mmap_file(file, vma);
if (error)
- goto unmap_and_free_vma;
+ goto unmap_and_free_file_vma;
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(addr != vma->vm_start);
/*
- * Expansion is handled above, merging is handled below.
- * Drivers should not alter the address of the VMA.
+ * Drivers should not permit writability when previously it was
+ * disallowed.
*/
- error = -EINVAL;
- if (WARN_ON((addr != vma->vm_start)))
- goto close_and_free_vma;
+ VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
+ !(vm_flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
vma_iter_config(&vmi, addr, end);
/*
@@ -2800,6 +2801,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma->vm_end, vma->vm_flags, NULL,
vma->vm_file, vma->vm_pgoff, NULL,
NULL_VM_UFFD_CTX, NULL);
+
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -2813,7 +2815,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma = merge;
/* Update vm_flags to pick up the change. */
vm_flags = vma->vm_flags;
- goto unmap_writable;
+ goto file_expanded;
}
}
@@ -2821,24 +2823,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)
- goto free_vma;
+ goto free_iter_vma;
} else {
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
- error = -EACCES;
- goto close_and_free_vma;
- }
-
- /* Allow architectures to sanity-check the vm_flags */
- error = -EINVAL;
- if (!arch_validate_flags(vma->vm_flags))
- goto close_and_free_vma;
-
- error = -ENOMEM;
- if (vma_iter_prealloc(&vmi, vma))
- goto close_and_free_vma;
+#ifdef CONFIG_SPARC64
+ /* TODO: Fix SPARC ADI! */
+ WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+#endif
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
@@ -2861,10 +2854,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
*/
khugepaged_enter_vma(vma, vma->vm_flags);
- /* Once vma denies write, undo our temporary denial count */
-unmap_writable:
- if (file && vm_flags & VM_SHARED)
- mapping_unmap_writable(file->f_mapping);
+file_expanded:
file = vma->vm_file;
ksm_add_vma(vma);
expanded:
@@ -2894,33 +2884,60 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_page_prot(vma);
- validate_mm(mm);
return addr;
-close_and_free_vma:
- vma_close(vma);
-
- if (file || vma->vm_file) {
-unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
+unmap_and_free_file_vma:
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
- vma->vm_end, vma->vm_end, true);
- }
- if (file && (vm_flags & VM_SHARED))
- mapping_unmap_writable(file->f_mapping);
+ vma_iter_set(&vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
+ vma->vm_end, vma->vm_end, true);
+free_iter_vma:
+ vma_iter_free(&vmi);
free_vma:
vm_area_free(vma);
unacct_error:
if (charged)
vm_unacct_memory(charged);
- validate_mm(mm);
return error;
}
+unsigned long mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ unsigned long ret;
+ bool writable_file_mapping = false;
+
+ /* Check to see if MDWE is applicable. */
+ if (map_deny_write_exec(vm_flags, vm_flags))
+ return -EACCES;
+
+ /* Allow architectures to sanity-check the vm_flags. */
+ if (!arch_validate_flags(vm_flags))
+ return -EINVAL;
+
+ /* Map writable and ensure this isn't a sealed memfd. */
+ if (file && (vm_flags & VM_SHARED)) {
+ int error = mapping_map_writable(file->f_mapping);
+
+ if (error)
+ return error;
+ writable_file_mapping = true;
+ }
+
+ ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf);
+
+ /* Clear our write mapping regardless of error. */
+ if (writable_file_mapping)
+ mapping_unmap_writable(file->f_mapping);
+
+ validate_mm(current->mm);
+ return ret;
+}
+
static int __vm_munmap(unsigned long start, size_t len, bool unlock)
{
int ret;
--
2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
` (4 preceding siblings ...)
2024-11-15 12:41 ` [PATCH 6.6.y 5/5] mm: resolve faulty mmap_region() error path behaviour Lorenzo Stoakes
@ 2024-11-15 17:17 ` Vlastimil Babka
2024-11-15 20:09 ` Liam R. Howlett
2024-11-19 13:16 ` Greg KH
7 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2024-11-15 17:17 UTC (permalink / raw)
To: Lorenzo Stoakes, stable
Cc: Andrew Morton, Liam R . Howlett, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu, Catalin Marinas, Will Deacon,
Mark Brown, David S . Miller, Andreas Larsson,
James E . J . Bottomley, Helge Deller
On 11/15/24 13:41, Lorenzo Stoakes wrote:
> Critical fixes for mmap_region(), backported to 6.6.y.
>
> Some notes on differences from upstream:
>
> * In this kernel is_shared_maywrite() does not exist and the code uses
> VM_SHARED to determine whether mapping_map_writable() /
> mapping_unmap_writable() should be invoked. This backport therefore
> follows suit.
>
> * Each version of these series is confronted by a slightly different
> mmap_region(), so we must adapt the change for each stable version. The
> approach remains the same throughout, however, and we correctly avoid
> closing the VMA part way through any __mmap_region() operation.
>
> Lorenzo Stoakes (5):
> mm: avoid unsafe VMA hook invocation when error arises on mmap hook
> mm: unconditionally close VMAs on error
> mm: refactor map_deny_write_exec()
> mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
> mm: resolve faulty mmap_region() error path behaviour
I don't know if review tags are actually applied to stable backports on top
of the original reviews, but I've checked so FTR:
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> arch/arm64/include/asm/mman.h | 10 ++-
> arch/parisc/include/asm/mman.h | 5 +-
> include/linux/mman.h | 28 ++++++--
> mm/internal.h | 45 ++++++++++++
> mm/mmap.c | 128 ++++++++++++++++++---------------
> mm/mprotect.c | 2 +-
> mm/nommu.c | 9 ++-
> mm/shmem.c | 3 -
> 8 files changed, 153 insertions(+), 77 deletions(-)
>
> --
> 2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
` (5 preceding siblings ...)
2024-11-15 17:17 ` [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Vlastimil Babka
@ 2024-11-15 20:09 ` Liam R. Howlett
2024-11-19 13:16 ` Greg KH
7 siblings, 0 replies; 16+ messages in thread
From: Liam R. Howlett @ 2024-11-15 20:09 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: stable, Andrew Morton, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu, Catalin Marinas, Will Deacon,
Mark Brown, David S . Miller, Andreas Larsson,
James E . J . Bottomley, Helge Deller
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241115 07:42]:
> Critical fixes for mmap_region(), backported to 6.6.y.
>
> Some notes on differences from upstream:
>
> * In this kernel is_shared_maywrite() does not exist and the code uses
> VM_SHARED to determine whether mapping_map_writable() /
> mapping_unmap_writable() should be invoked. This backport therefore
> follows suit.
>
> * Each version of these series is confronted by a slightly different
> mmap_region(), so we must adapt the change for each stable version. The
> approach remains the same throughout, however, and we correctly avoid
> closing the VMA part way through any __mmap_region() operation.
>
> Lorenzo Stoakes (5):
> mm: avoid unsafe VMA hook invocation when error arises on mmap hook
> mm: unconditionally close VMAs on error
> mm: refactor map_deny_write_exec()
> mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
> mm: resolve faulty mmap_region() error path behaviour
These backports look good.
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
> arch/arm64/include/asm/mman.h | 10 ++-
> arch/parisc/include/asm/mman.h | 5 +-
> include/linux/mman.h | 28 ++++++--
> mm/internal.h | 45 ++++++++++++
> mm/mmap.c | 128 ++++++++++++++++++---------------
> mm/mprotect.c | 2 +-
> mm/nommu.c | 9 ++-
> mm/shmem.c | 3 -
> 8 files changed, 153 insertions(+), 77 deletions(-)
>
> --
> 2.47.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
` (6 preceding siblings ...)
2024-11-15 20:09 ` Liam R. Howlett
@ 2024-11-19 13:16 ` Greg KH
2024-11-19 13:24 ` Lorenzo Stoakes
7 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2024-11-19 13:16 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: stable, Andrew Morton, Liam R . Howlett, Vlastimil Babka,
Jann Horn, linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
On Fri, Nov 15, 2024 at 12:41:53PM +0000, Lorenzo Stoakes wrote:
> Critical fixes for mmap_region(), backported to 6.6.y.
Did I miss the 6.11.y and 6.1.y versions of this series somewhere?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
2024-11-19 13:16 ` Greg KH
@ 2024-11-19 13:24 ` Lorenzo Stoakes
2024-11-19 14:14 ` Greg KH
0 siblings, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2024-11-19 13:24 UTC (permalink / raw)
To: Greg KH
Cc: stable, Andrew Morton, Liam R . Howlett, Vlastimil Babka,
Jann Horn, linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
On Tue, Nov 19, 2024 at 02:16:52PM +0100, Greg KH wrote:
> On Fri, Nov 15, 2024 at 12:41:53PM +0000, Lorenzo Stoakes wrote:
> > Critical fixes for mmap_region(), backported to 6.6.y.
>
> Did I miss the 6.11.y and 6.1.y versions of this series somewhere?
>
> thanks,
>
> greg k-h
5.10.y - https://lore.kernel.org/linux-mm/cover.1731670097.git.lorenzo.stoakes@oracle.com/
5.15.y - https://lore.kernel.org/linux-mm/cover.1731667436.git.lorenzo.stoakes@oracle.com/
6.1.y - https://lore.kernel.org/linux-mm/cover.1731946386.git.lorenzo.stoakes@oracle.com/
6.6.y - https://lore.kernel.org/linux-mm/cover.1731672733.git.lorenzo.stoakes@oracle.com/
I didn't backport to 6.11.y as we are about to move to 6.12, but I can if
you need that.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes)
2024-11-19 13:24 ` Lorenzo Stoakes
@ 2024-11-19 14:14 ` Greg KH
0 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2024-11-19 14:14 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: stable, Andrew Morton, Liam R . Howlett, Vlastimil Babka,
Jann Horn, linux-kernel, linux-mm, Linus Torvalds, Peter Xu,
Catalin Marinas, Will Deacon, Mark Brown, David S . Miller,
Andreas Larsson, James E . J . Bottomley, Helge Deller
On Tue, Nov 19, 2024 at 01:24:33PM +0000, Lorenzo Stoakes wrote:
> On Tue, Nov 19, 2024 at 02:16:52PM +0100, Greg KH wrote:
> > On Fri, Nov 15, 2024 at 12:41:53PM +0000, Lorenzo Stoakes wrote:
> > > Critical fixes for mmap_region(), backported to 6.6.y.
> >
> > Did I miss the 6.11.y and 6.1.y versions of this series somewhere?
> >
> > thanks,
> >
> > greg k-h
>
> 5.10.y - https://lore.kernel.org/linux-mm/cover.1731670097.git.lorenzo.stoakes@oracle.com/
> 5.15.y - https://lore.kernel.org/linux-mm/cover.1731667436.git.lorenzo.stoakes@oracle.com/
> 6.1.y - https://lore.kernel.org/linux-mm/cover.1731946386.git.lorenzo.stoakes@oracle.com/
> 6.6.y - https://lore.kernel.org/linux-mm/cover.1731672733.git.lorenzo.stoakes@oracle.com/
>
> I didn't backport to 6.11.y as we are about to move to 6.12, but I can if
> you need that.
True, 6.11.y is only going to be around for another few weeks, just
wanted to make sure I hadn't missed this. I'll go queue all of these up
now, thanks for the backports!
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "mm: avoid unsafe VMA hook invocation when error arises on mmap hook" has been added to the 6.6-stable tree
2024-11-15 12:41 ` [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Lorenzo Stoakes
@ 2024-11-19 14:25 ` gregkh
0 siblings, 0 replies; 16+ messages in thread
From: gregkh @ 2024-11-19 14:25 UTC (permalink / raw)
To: James.Bottomley, Liam.Howlett, akpm, andreas, broonie,
catalin.marinas, davem, deller, gregkh, jannh, linux-mm,
lorenzo.stoakes, peterx, torvalds, vbabka, will
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From stable+bounces-93535-greg=kroah.com@vger.kernel.org Fri Nov 15 13:42:51 2024
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 15 Nov 2024 12:41:54 +0000
Subject: mm: avoid unsafe VMA hook invocation when error arises on mmap hook
To: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>, "David S . Miller" <davem@davemloft.net>, Andreas Larsson <andreas@gaisler.com>, "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>
Message-ID: <33d70849ec62ba738ca2f8db58fe24076d5282bf.1731672733.git.lorenzo.stoakes@oracle.com>
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
[ Upstream commit 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf ]
Patch series "fix error handling in mmap_region() and refactor
(hotfixes)", v4.
mmap_region() is somewhat terrifying, with spaghetti-like control flow and
numerous means by which issues can arise and incomplete state, memory
leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
This series goes to great lengths to simplify how mmap_region() works and
to avoid unwinding errors late on in the process of setting up the VMA for
the new mapping, and equally avoids such operations occurring while the
VMA is in an inconsistent state.
The patches in this series comprise the minimal changes required to
resolve existing issues in mmap_region() error handling, in order that
they can be hotfixed and backported. There is additionally a follow up
series which goes further, separated out from the v1 series and sent and
updated separately.
This patch (of 5):
After an attempted mmap() fails, we are no longer in a situation where we
can safely interact with VMA hooks. This is currently not enforced,
meaning that we need complicated handling to ensure we do not incorrectly
call these hooks.
We can avoid the whole issue by treating the VMA as suspect the moment
that the file->f_ops->mmap() function reports an error by replacing
whatever VMA operations were installed with a dummy empty set of VMA
operations.
We do so through a new helper function internal to mm - mmap_file() -
which is both more logically named than the existing call_mmap() function
and correctly isolates handling of the vm_op reassignment to mm.
All the existing invocations of call_mmap() outside of mm are ultimately
nested within the call_mmap() from mm, which we now replace.
It is therefore safe to leave call_mmap() in place as a convenience
function (and to avoid churn). The invokers are:
ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap()
coda_file_operations -> mmap -> coda_file_mmap()
shm_file_operations -> shm_mmap()
shm_file_operations_huge -> shm_mmap()
dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops
-> i915_gem_dmabuf_mmap()
None of these callers interact with vm_ops or mappings in a problematic
way on error, quickly exiting out.
Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/internal.h | 27 +++++++++++++++++++++++++++
mm/mmap.c | 4 ++--
mm/nommu.c | 4 ++--
3 files changed, 31 insertions(+), 4 deletions(-)
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -83,6 +83,33 @@ static inline void *folio_raw_mapping(st
return (void *)(mapping & ~PAGE_MAPPING_FLAGS);
}
+/*
+ * This is a file-backed mapping, and is about to be memory mapped - invoke its
+ * mmap hook and safely handle error conditions. On error, VMA hooks will be
+ * mutated.
+ *
+ * @file: File which backs the mapping.
+ * @vma: VMA which we are mapping.
+ *
+ * Returns: 0 if success, error otherwise.
+ */
+static inline int mmap_file(struct file *file, struct vm_area_struct *vma)
+{
+ int err = call_mmap(file, vma);
+
+ if (likely(!err))
+ return 0;
+
+ /*
+ * OK, we tried to call the file hook for mmap(), but an error
+ * arose. The mapping is in an inconsistent state and we most not invoke
+ * any further hooks on it.
+ */
+ vma->vm_ops = &vma_dummy_vm_ops;
+
+ return err;
+}
+
void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio,
int nr_throttled);
static inline void acct_reclaim_writeback(struct folio *folio)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2779,7 +2779,7 @@ cannot_expand:
}
vma->vm_file = get_file(file);
- error = call_mmap(file, vma);
+ error = mmap_file(file, vma);
if (error)
goto unmap_and_free_vma;
@@ -2793,7 +2793,7 @@ cannot_expand:
vma_iter_config(&vmi, addr, end);
/*
- * If vm_flags changed after call_mmap(), we should try merge
+ * If vm_flags changed after mmap_file(), we should try merge
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -896,7 +896,7 @@ static int do_mmap_shared_file(struct vm
{
int ret;
- ret = call_mmap(vma->vm_file, vma);
+ ret = mmap_file(vma->vm_file, vma);
if (ret == 0) {
vma->vm_region->vm_top = vma->vm_region->vm_end;
return 0;
@@ -929,7 +929,7 @@ static int do_mmap_private(struct vm_are
* happy.
*/
if (capabilities & NOMMU_MAP_DIRECT) {
- ret = call_mmap(vma->vm_file, vma);
+ ret = mmap_file(vma->vm_file, vma);
/* shouldn't return success if we're not sharing */
if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags)))
ret = -ENOSYS;
Patches currently in stable-queue which might be from lorenzo.stoakes@oracle.com are
queue-6.6/mm-resolve-faulty-mmap_region-error-path-behaviour.patch
queue-6.6/mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
queue-6.6/mm-unconditionally-close-vmas-on-error.patch
queue-6.6/mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
queue-6.6/mm-refactor-map_deny_write_exec.patch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling" has been added to the 6.6-stable tree
2024-11-15 12:41 ` [PATCH 6.6.y 4/5] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Lorenzo Stoakes
@ 2024-11-19 14:25 ` gregkh
0 siblings, 0 replies; 16+ messages in thread
From: gregkh @ 2024-11-19 14:25 UTC (permalink / raw)
To: James.Bottomley, Liam.Howlett, akpm, andreas, broonie,
catalin.marinas, davem, deller, gregkh, jannh, linux-mm,
lorenzo.stoakes, peterx, torvalds, vbabka, will
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From stable+bounces-93538-greg=kroah.com@vger.kernel.org Fri Nov 15 13:43:44 2024
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 15 Nov 2024 12:41:57 +0000
Subject: mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
To: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>, "David S . Miller" <davem@davemloft.net>, Andreas Larsson <andreas@gaisler.com>, "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>
Message-ID: <7c0218d03fd2119025d8cbc1b814639cf09314e0.1731672733.git.lorenzo.stoakes@oracle.com>
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
[ Upstream commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81 ]
Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm@linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/arm64/include/asm/mman.h | 10 +++++++---
arch/parisc/include/asm/mman.h | 5 +++--
include/linux/mman.h | 7 ++++---
mm/mmap.c | 2 +-
mm/nommu.c | 2 +-
mm/shmem.c | 3 ---
6 files changed, 16 insertions(+), 13 deletions(-)
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -3,6 +3,8 @@
#define __ASM_MMAN_H__
#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/shmem_fs.h>
#include <linux/types.h>
#include <uapi/asm/mman.h>
@@ -21,19 +23,21 @@ static inline unsigned long arch_calc_vm
}
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file,
+ unsigned long flags)
{
/*
* Only allow MTE on anonymous mappings as these are guaranteed to be
* backed by tags-capable memory. The vm_flags may be overridden by a
* filesystem supporting MTE (RAM-based).
*/
- if (system_supports_mte() && (flags & MAP_ANONYMOUS))
+ if (system_supports_mte() &&
+ ((flags & MAP_ANONYMOUS) || shmem_file(file)))
return VM_MTE_ALLOWED;
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
static inline bool arch_validate_prot(unsigned long prot,
unsigned long addr __always_unused)
--- a/arch/parisc/include/asm/mman.h
+++ b/arch/parisc/include/asm/mman.h
@@ -2,6 +2,7 @@
#ifndef __ASM_MMAN_H__
#define __ASM_MMAN_H__
+#include <linux/fs.h>
#include <uapi/asm/mman.h>
/* PARISC cannot allow mdwe as it needs writable stacks */
@@ -11,7 +12,7 @@ static inline bool arch_memory_deny_writ
}
#define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags)
{
/*
* The stack on parisc grows upwards, so if userspace requests memory
@@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
#endif /* __ASM_MMAN_H__ */
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_MMAN_H
#define _LINUX_MMAN_H
+#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/percpu_counter.h>
@@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long
#endif
#ifndef arch_calc_vm_flag_bits
-#define arch_calc_vm_flag_bits(flags) 0
+#define arch_calc_vm_flag_bits(file, flags) 0
#endif
#ifndef arch_validate_prot
@@ -151,12 +152,12 @@ calc_vm_prot_bits(unsigned long prot, un
* Combine the mmap "flags" argument into "vm_flags" used internally.
*/
static inline unsigned long
-calc_vm_flag_bits(unsigned long flags)
+calc_vm_flag_bits(struct file *file, unsigned long flags)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
- arch_calc_vm_flag_bits(flags);
+ arch_calc_vm_flag_bits(file, flags);
}
unsigned long vm_commit_limit(void);
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1273,7 +1273,7 @@ unsigned long do_mmap(struct file *file,
* to. we assume access permissions have been handled by the open
* of the memory object, so we don't do any here.
*/
- vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
+ vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
if (flags & MAP_LOCKED)
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -853,7 +853,7 @@ static unsigned long determine_vm_flags(
{
unsigned long vm_flags;
- vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags);
+ vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags);
if (!file) {
/*
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2400,9 +2400,6 @@ static int shmem_mmap(struct file *file,
if (ret)
return ret;
- /* arm64 - allow memory tagging on RAM-based files */
- vm_flags_set(vma, VM_MTE_ALLOWED);
-
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
if (inode->i_nlink)
Patches currently in stable-queue which might be from lorenzo.stoakes@oracle.com are
queue-6.6/mm-resolve-faulty-mmap_region-error-path-behaviour.patch
queue-6.6/mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
queue-6.6/mm-unconditionally-close-vmas-on-error.patch
queue-6.6/mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
queue-6.6/mm-refactor-map_deny_write_exec.patch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "mm: refactor map_deny_write_exec()" has been added to the 6.6-stable tree
2024-11-15 12:41 ` [PATCH 6.6.y 3/5] mm: refactor map_deny_write_exec() Lorenzo Stoakes
@ 2024-11-19 14:25 ` gregkh
0 siblings, 0 replies; 16+ messages in thread
From: gregkh @ 2024-11-19 14:25 UTC (permalink / raw)
To: James.Bottomley, Liam.Howlett, akpm, andreas, broonie,
catalin.marinas, davem, deller, gregkh, jannh, linux-mm,
lorenzo.stoakes, peterx, torvalds, vbabka, will
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
mm: refactor map_deny_write_exec()
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-refactor-map_deny_write_exec.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From stable+bounces-93537-greg=kroah.com@vger.kernel.org Fri Nov 15 13:43:12 2024
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 15 Nov 2024 12:41:56 +0000
Subject: mm: refactor map_deny_write_exec()
To: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>, "David S . Miller" <davem@davemloft.net>, Andreas Larsson <andreas@gaisler.com>, "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>
Message-ID: <a7f0a2f48d376b2c4e2e3adf7ac011abe1eeeead.1731672733.git.lorenzo.stoakes@oracle.com>
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
[ Upstream commit 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf ]
Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add
some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/mman.h | 21 ++++++++++++++++++---
mm/mmap.c | 2 +-
mm/mprotect.c | 2 +-
3 files changed, 20 insertions(+), 5 deletions(-)
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -187,16 +187,31 @@ static inline bool arch_memory_deny_writ
*
* d) mmap(PROT_READ | PROT_EXEC)
* mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ *
+ * This is only applicable if the user has set the Memory-Deny-Write-Execute
+ * (MDWE) protection mask for the current process.
+ *
+ * @old specifies the VMA flags the VMA originally possessed, and @new the ones
+ * we propose to set.
+ *
+ * Return: false if proposed change is OK, true if not ok and should be denied.
*/
-static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags)
+static inline bool map_deny_write_exec(unsigned long old, unsigned long new)
{
+ /* If MDWE is disabled, we have nothing to deny. */
if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
return false;
- if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+ /* If the new VMA is not executable, we have nothing to deny. */
+ if (!(new & VM_EXEC))
+ return false;
+
+ /* Under MDWE we do not accept newly writably executable VMAs... */
+ if (new & VM_WRITE)
return true;
- if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+ /* ...nor previously non-executable VMAs becoming executable. */
+ if (!(old & VM_EXEC))
return true;
return false;
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2826,7 +2826,7 @@ cannot_expand:
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma, vma->vm_flags)) {
+ if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
error = -EACCES;
goto close_and_free_vma;
}
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -791,7 +791,7 @@ static int do_mprotect_pkey(unsigned lon
break;
}
- if (map_deny_write_exec(vma, newflags)) {
+ if (map_deny_write_exec(vma->vm_flags, newflags)) {
error = -EACCES;
break;
}
Patches currently in stable-queue which might be from lorenzo.stoakes@oracle.com are
queue-6.6/mm-resolve-faulty-mmap_region-error-path-behaviour.patch
queue-6.6/mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
queue-6.6/mm-unconditionally-close-vmas-on-error.patch
queue-6.6/mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
queue-6.6/mm-refactor-map_deny_write_exec.patch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "mm: resolve faulty mmap_region() error path behaviour" has been added to the 6.6-stable tree
2024-11-15 12:41 ` [PATCH 6.6.y 5/5] mm: resolve faulty mmap_region() error path behaviour Lorenzo Stoakes
@ 2024-11-19 14:25 ` gregkh
0 siblings, 0 replies; 16+ messages in thread
From: gregkh @ 2024-11-19 14:25 UTC (permalink / raw)
To: James.Bottomley, Liam.Howlett, akpm, andreas, broonie,
catalin.marinas, davem, deller, gregkh, jannh, linux-mm,
lorenzo.stoakes, peterx, torvalds, vbabka, will
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
mm: resolve faulty mmap_region() error path behaviour
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-resolve-faulty-mmap_region-error-path-behaviour.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From stable+bounces-93539-greg=kroah.com@vger.kernel.org Fri Nov 15 13:43:45 2024
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 15 Nov 2024 12:41:58 +0000
Subject: mm: resolve faulty mmap_region() error path behaviour
To: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>, "David S . Miller" <davem@davemloft.net>, Andreas Larsson <andreas@gaisler.com>, "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>
Message-ID: <b71c37d3a8b40fe1e07a085101f17b77bf293039.1731672733.git.lorenzo.stoakes@oracle.com>
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
[ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]
The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
Taking advantage of previous patches in this series we move a number of
checks earlier in the code, simplifying things by moving the core of the
logic into a static internal function __mmap_region().
Doing this allows us to perform a number of checks up front before we do
any real work, and allows us to unwind the writable unmap check
unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
validation unconditionally also.
We move a number of things here:
1. We preallocate memory for the iterator before we call the file-backed
memory hook, allowing us to exit early and avoid having to perform
complicated and error-prone close/free logic. We carefully free
iterator state on both success and error paths.
2. The enclosing mmap_region() function handles the mapping_map_writable()
logic early. Previously the logic had the mapping_map_writable() at the
point of mapping a newly allocated file-backed VMA, and a matching
mapping_unmap_writable() on success and error paths.
We now do this unconditionally if this is a file-backed, shared writable
mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
doing so does not invalidate the seal check we just performed, and we in
any case always decrement the counter in the wrapper.
We perform a debug assert to ensure a driver does not attempt to do the
opposite.
3. We also move arch_validate_flags() up into the mmap_region()
function. This is only relevant on arm64 and sparc64, and the check is
only meaningful for SPARC with ADI enabled. We explicitly add a warning
for this arch if a driver invalidates this check, though the code ought
eventually to be fixed to eliminate the need for this.
With all of these measures in place, we no longer need to explicitly close
the VMA on error paths, as we place all checks which might fail prior to a
call to any driver mmap hook.
This eliminates an entire class of errors, makes the code easier to reason
about and more robust.
Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/mmap.c | 117 +++++++++++++++++++++++++++++++++++---------------------------
1 file changed, 67 insertions(+), 50 deletions(-)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2666,14 +2666,14 @@ int do_munmap(struct mm_struct *mm, unsi
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
-unsigned long mmap_region(struct file *file, unsigned long addr,
+static unsigned long __mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
struct vm_area_struct *next, *prev, *merge;
- pgoff_t pglen = len >> PAGE_SHIFT;
+ pgoff_t pglen = PHYS_PFN(len);
unsigned long charged = 0;
unsigned long end = addr + len;
unsigned long merge_start = addr, merge_end = end;
@@ -2770,25 +2770,26 @@ cannot_expand:
vma->vm_page_prot = vm_get_page_prot(vm_flags);
vma->vm_pgoff = pgoff;
- if (file) {
- if (vm_flags & VM_SHARED) {
- error = mapping_map_writable(file->f_mapping);
- if (error)
- goto free_vma;
- }
+ if (vma_iter_prealloc(&vmi, vma)) {
+ error = -ENOMEM;
+ goto free_vma;
+ }
+ if (file) {
vma->vm_file = get_file(file);
error = mmap_file(file, vma);
if (error)
- goto unmap_and_free_vma;
+ goto unmap_and_free_file_vma;
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(addr != vma->vm_start);
/*
- * Expansion is handled above, merging is handled below.
- * Drivers should not alter the address of the VMA.
+ * Drivers should not permit writability when previously it was
+ * disallowed.
*/
- error = -EINVAL;
- if (WARN_ON((addr != vma->vm_start)))
- goto close_and_free_vma;
+ VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
+ !(vm_flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
vma_iter_config(&vmi, addr, end);
/*
@@ -2800,6 +2801,7 @@ cannot_expand:
vma->vm_end, vma->vm_flags, NULL,
vma->vm_file, vma->vm_pgoff, NULL,
NULL_VM_UFFD_CTX, NULL);
+
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -2813,7 +2815,7 @@ cannot_expand:
vma = merge;
/* Update vm_flags to pick up the change. */
vm_flags = vma->vm_flags;
- goto unmap_writable;
+ goto file_expanded;
}
}
@@ -2821,24 +2823,15 @@ cannot_expand:
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)
- goto free_vma;
+ goto free_iter_vma;
} else {
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
- error = -EACCES;
- goto close_and_free_vma;
- }
-
- /* Allow architectures to sanity-check the vm_flags */
- error = -EINVAL;
- if (!arch_validate_flags(vma->vm_flags))
- goto close_and_free_vma;
-
- error = -ENOMEM;
- if (vma_iter_prealloc(&vmi, vma))
- goto close_and_free_vma;
+#ifdef CONFIG_SPARC64
+ /* TODO: Fix SPARC ADI! */
+ WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+#endif
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
@@ -2861,10 +2854,7 @@ cannot_expand:
*/
khugepaged_enter_vma(vma, vma->vm_flags);
- /* Once vma denies write, undo our temporary denial count */
-unmap_writable:
- if (file && vm_flags & VM_SHARED)
- mapping_unmap_writable(file->f_mapping);
+file_expanded:
file = vma->vm_file;
ksm_add_vma(vma);
expanded:
@@ -2894,33 +2884,60 @@ expanded:
vma_set_page_prot(vma);
- validate_mm(mm);
return addr;
-close_and_free_vma:
- vma_close(vma);
-
- if (file || vma->vm_file) {
-unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
-
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
- vma->vm_end, vma->vm_end, true);
- }
- if (file && (vm_flags & VM_SHARED))
- mapping_unmap_writable(file->f_mapping);
+unmap_and_free_file_vma:
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
+
+ vma_iter_set(&vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
+ vma->vm_end, vma->vm_end, true);
+free_iter_vma:
+ vma_iter_free(&vmi);
free_vma:
vm_area_free(vma);
unacct_error:
if (charged)
vm_unacct_memory(charged);
- validate_mm(mm);
return error;
}
+unsigned long mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ unsigned long ret;
+ bool writable_file_mapping = false;
+
+ /* Check to see if MDWE is applicable. */
+ if (map_deny_write_exec(vm_flags, vm_flags))
+ return -EACCES;
+
+ /* Allow architectures to sanity-check the vm_flags. */
+ if (!arch_validate_flags(vm_flags))
+ return -EINVAL;
+
+ /* Map writable and ensure this isn't a sealed memfd. */
+ if (file && (vm_flags & VM_SHARED)) {
+ int error = mapping_map_writable(file->f_mapping);
+
+ if (error)
+ return error;
+ writable_file_mapping = true;
+ }
+
+ ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf);
+
+ /* Clear our write mapping regardless of error. */
+ if (writable_file_mapping)
+ mapping_unmap_writable(file->f_mapping);
+
+ validate_mm(current->mm);
+ return ret;
+}
+
static int __vm_munmap(unsigned long start, size_t len, bool unlock)
{
int ret;
Patches currently in stable-queue which might be from lorenzo.stoakes@oracle.com are
queue-6.6/mm-resolve-faulty-mmap_region-error-path-behaviour.patch
queue-6.6/mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
queue-6.6/mm-unconditionally-close-vmas-on-error.patch
queue-6.6/mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
queue-6.6/mm-refactor-map_deny_write_exec.patch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "mm: unconditionally close VMAs on error" has been added to the 6.6-stable tree
2024-11-15 12:41 ` [PATCH 6.6.y 2/5] mm: unconditionally close VMAs on error Lorenzo Stoakes
@ 2024-11-19 14:25 ` gregkh
0 siblings, 0 replies; 16+ messages in thread
From: gregkh @ 2024-11-19 14:25 UTC (permalink / raw)
To: James.Bottomley, Liam.Howlett, akpm, andreas, broonie,
catalin.marinas, davem, deller, gregkh, jannh, linux-mm,
lorenzo.stoakes, peterx, torvalds, vbabka, will
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
mm: unconditionally close VMAs on error
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-unconditionally-close-vmas-on-error.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From stable+bounces-93536-greg=kroah.com@vger.kernel.org Fri Nov 15 13:42:55 2024
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 15 Nov 2024 12:41:55 +0000
Subject: mm: unconditionally close VMAs on error
To: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>, "David S . Miller" <davem@davemloft.net>, Andreas Larsson <andreas@gaisler.com>, "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>
Message-ID: <cbd9c0b17ccd9898d18f8d6147e0dc6441c63217.1731672733.git.lorenzo.stoakes@oracle.com>
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
[ Upstream commit 4080ef1579b2413435413988d14ac8c68e4d42c8 ]
Incorrect invocation of VMA callbacks when the VMA is no longer in a
consistent state is bug prone and risky to perform.
With regards to the important vm_ops->close() callback We have gone to
great lengths to try to track whether or not we ought to close VMAs.
Rather than doing so and risking making a mistake somewhere, instead
unconditionally close and reset vma->vm_ops to an empty dummy operations
set with a NULL .close operator.
We introduce a new function to do so - vma_close() - and simplify existing
vms logic which tracked whether we needed to close or not.
This simplifies the logic, avoids incorrect double-calling of the .close()
callback and allows us to update error paths to simply call vma_close()
unconditionally - making VMA closure idempotent.
Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/internal.h | 18 ++++++++++++++++++
mm/mmap.c | 9 +++------
mm/nommu.c | 3 +--
3 files changed, 22 insertions(+), 8 deletions(-)
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -110,6 +110,24 @@ static inline int mmap_file(struct file
return err;
}
+/*
+ * If the VMA has a close hook then close it, and since closing it might leave
+ * it in an inconsistent state which makes the use of any hooks suspect, clear
+ * them down by installing dummy empty hooks.
+ */
+static inline void vma_close(struct vm_area_struct *vma)
+{
+ if (vma->vm_ops && vma->vm_ops->close) {
+ vma->vm_ops->close(vma);
+
+ /*
+ * The mapping is in an inconsistent state, and no further hooks
+ * may be invoked upon it.
+ */
+ vma->vm_ops = &vma_dummy_vm_ops;
+ }
+}
+
void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio,
int nr_throttled);
static inline void acct_reclaim_writeback(struct folio *folio)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -137,8 +137,7 @@ void unlink_file_vma(struct vm_area_stru
static void remove_vma(struct vm_area_struct *vma, bool unreachable)
{
might_sleep();
- if (vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (vma->vm_file)
fput(vma->vm_file);
mpol_put(vma_policy(vma));
@@ -2899,8 +2898,7 @@ expanded:
return addr;
close_and_free_vma:
- if (file && vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (file || vma->vm_file) {
unmap_and_free_vma:
@@ -3392,8 +3390,7 @@ struct vm_area_struct *copy_vma(struct v
return new_vma;
out_vma_link:
- if (new_vma->vm_ops && new_vma->vm_ops->close)
- new_vma->vm_ops->close(new_vma);
+ vma_close(new_vma);
if (new_vma->vm_file)
fput(new_vma->vm_file);
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -600,8 +600,7 @@ static int delete_vma_from_mm(struct vm_
*/
static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma)
{
- if (vma->vm_ops && vma->vm_ops->close)
- vma->vm_ops->close(vma);
+ vma_close(vma);
if (vma->vm_file)
fput(vma->vm_file);
put_nommu_region(vma->vm_region);
Patches currently in stable-queue which might be from lorenzo.stoakes@oracle.com are
queue-6.6/mm-resolve-faulty-mmap_region-error-path-behaviour.patch
queue-6.6/mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch
queue-6.6/mm-unconditionally-close-vmas-on-error.patch
queue-6.6/mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch
queue-6.6/mm-refactor-map_deny_write_exec.patch
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-11-19 14:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-15 12:41 [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Lorenzo Stoakes
2024-11-15 12:41 ` [PATCH 6.6.y 1/5] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: avoid unsafe VMA hook invocation when error arises on mmap hook" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 2/5] mm: unconditionally close VMAs on error Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: unconditionally close VMAs on error" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 3/5] mm: refactor map_deny_write_exec() Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: refactor map_deny_write_exec()" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 4/5] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling" has been added to the 6.6-stable tree gregkh
2024-11-15 12:41 ` [PATCH 6.6.y 5/5] mm: resolve faulty mmap_region() error path behaviour Lorenzo Stoakes
2024-11-19 14:25 ` Patch "mm: resolve faulty mmap_region() error path behaviour" has been added to the 6.6-stable tree gregkh
2024-11-15 17:17 ` [PATCH 6.6.y 0/5] fix error handling in mmap_region() and refactor (hotfixes) Vlastimil Babka
2024-11-15 20:09 ` Liam R. Howlett
2024-11-19 13:16 ` Greg KH
2024-11-19 13:24 ` Lorenzo Stoakes
2024-11-19 14:14 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox