linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: "Ricardo Cañuelo Navarro" <rcn@igalia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	revest@google.com, kernel-dev@igalia.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v2] mm: fix copy_vma() error handling for hugetlb mappings
Date: Fri, 23 May 2025 13:45:47 -0400	[thread overview]
Message-ID: <fs2zd6syrdpyrsqkfmzjjz3y3rricscyr3bnx5trnr6p5nbhmy@l6zyiaccoknt> (raw)
In-Reply-To: <20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com>

* Ricardo Cañuelo Navarro <rcn@igalia.com> [250523 08:19]:
> If, during a mremap() operation for a hugetlb-backed memory mapping,
> copy_vma() fails after the source vma has been duplicated and
> opened (ie. vma_link() fails), the error is handled by closing the new
> vma. This updates the hugetlbfs reservation counter of the reservation
> map which at this point is referenced by both the source vma and the new
> copy. As a result, once the new vma has been freed and copy_vma()
> returns, the reservation counter for the source vma will be incorrect.
> 
> This patch addresses this corner case by clearing the hugetlb private
> page reservation reference for the new vma and decrementing the
> reference before closing the vma, so that vma_close() won't update the
> reservation counter. This is also what copy_vma_and_data() does with the
> source vma if copy_vma() succeeds, so a helper function has been added
> to do the fixup in both functions.
> 
> The issue was reported by a private syzbot instance and can be
> reproduced using the C reproducer in [1]. It's also a possible duplicate
> of public syzbot report [2]. The WARNING report is:
> 
> ============================================================
> page_counter underflow: -1024 nr_pages=1024
> WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120
> Modules linked in:
> CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
> RIP: 0010:page_counter_cancel+0xf6/0x120
> Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81
> RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246
> RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a
> R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000
> R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400
> FS:  0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0
> Call Trace:
>  <TASK>
>  page_counter_uncharge+0x33/0x80
>  hugetlb_cgroup_uncharge_counter+0xcb/0x120
>  hugetlb_vm_op_close+0x579/0x960
>  ? __pfx_hugetlb_vm_op_close+0x10/0x10
>  remove_vma+0x88/0x130
>  exit_mmap+0x71e/0xe00
>  ? __pfx_exit_mmap+0x10/0x10
>  ? __mutex_unlock_slowpath+0x22e/0x7f0
>  ? __pfx_exit_aio+0x10/0x10
>  ? __up_read+0x256/0x690
>  ? uprobe_clear_state+0x274/0x290
>  ? mm_update_next_owner+0xa9/0x810
>  __mmput+0xc9/0x370
>  exit_mm+0x203/0x2f0
>  ? __pfx_exit_mm+0x10/0x10
>  ? taskstats_exit+0x32b/0xa60
>  do_exit+0x921/0x2740
>  ? do_raw_spin_lock+0x155/0x3b0
>  ? __pfx_do_exit+0x10/0x10
>  ? __pfx_do_raw_spin_lock+0x10/0x10
>  ? _raw_spin_lock_irq+0xc5/0x100
>  do_group_exit+0x20c/0x2c0
>  get_signal+0x168c/0x1720
>  ? __pfx_get_signal+0x10/0x10
>  ? schedule+0x165/0x360
>  arch_do_signal_or_restart+0x8e/0x7d0
>  ? __pfx_arch_do_signal_or_restart+0x10/0x10
>  ? __pfx___se_sys_futex+0x10/0x10
>  syscall_exit_to_user_mode+0xb8/0x2c0
>  do_syscall_64+0x75/0x120
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x422dcd
> Code: Unable to access opcode bytes at 0x422da3.
> RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec
> RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720
> R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0
> R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000
>  </TASK>
> ============================================================
> 
> Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1]
> Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2]

I don't like the fixup_ names, but not enough to hold this up (or look
at it again..). I also don't love the idea of moving a hugetlb..

This isn't the only call path for vma_link() that doesn't unwind the
hugetlb case correctly, but probably the only one that may need it.. I
would hope special mappings or __bprm_mm_init() wouldn't result in
hugetlbs.

This seems sufficient until syzbot figures a way into those insane
ideas.

One small nit below, but thanks for this.

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>


> ---
> Changes in v2 (suggested by Lorenzo):
> - Move the fix to a separate function and use it on copy_vma() and
>   copy_vma_and_data().
> - Paste the WARNING output in the commit message and remove the link.
> - Update the comment in clear_vma_resv_huge_pages().
> - Link to v1: https://lore.kernel.org/r/20250523-warning_in_page_counter_cancel-v1-1-b221eb61a402@igalia.com
> ---
>  include/linux/hugetlb.h |  5 +++++
>  mm/hugetlb.c            | 16 +++++++++++++++-
>  mm/mremap.c             |  3 +--
>  mm/vma.c                |  1 +
>  4 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8f3ac832ee7f3894e1445fe7cfa657f3844a62b2..4861a7e304bbf4bd588e41754af63b23af037a60 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -275,6 +275,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
>  bool is_hugetlb_entry_migration(pte_t pte);
>  bool is_hugetlb_entry_hwpoisoned(pte_t pte);
>  void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
> +void fixup_hugetlb_reservations(struct vm_area_struct *vma);
>  
>  #else /* !CONFIG_HUGETLB_PAGE */
>  
> @@ -468,6 +469,10 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
>  
>  static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { }
>  
> +static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma)
> +{
> +}
> +
>  #endif /* !CONFIG_HUGETLB_PAGE */
>  
>  #ifndef pgd_write
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7ae38bfb9096c4b610bc2e723c6c08d826c68830..757c5187d65ea3f158cd0c3dc1106ce3890c6dff 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1250,7 +1250,7 @@ void hugetlb_dup_vma_private(struct vm_area_struct *vma)
>  /*
>   * Reset and decrement one ref on hugepage private reservation.
>   * Called with mm->mmap_lock writer semaphore held.
> - * This function should be only used by move_vma() and operate on
> + * This function should be only used by mremap and operate on

nit: mremap(), do you need the ()? (is it literally this function or the
call path?)

>   * same sized vma. It should never come here with last ref on the
>   * reservation.
>   */
> @@ -7931,3 +7931,17 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
>  	hugetlb_unshare_pmds(vma, ALIGN(vma->vm_start, PUD_SIZE),
>  			ALIGN_DOWN(vma->vm_end, PUD_SIZE));
>  }
> +
> +/*
> + * For hugetlb, mremap() is an odd edge case - while the VMA copying is
> + * performed, we permit both the old and new VMAs to reference the same
> + * reservation.
> + *
> + * We fix this up after the operation succeeds, or if a newly allocated VMA
> + * is closed as a result of a failure to allocate memory.
> + */
> +void fixup_hugetlb_reservations(struct vm_area_struct *vma)
> +{
> +	if (is_vm_hugetlb_page(vma))
> +		clear_vma_resv_huge_pages(vma);
> +}
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 7db9da609c84f0a0efe7ee86f7b42b8e0eee6380..0d4948b720e22e6c77123707d5199401aaf5d661 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -1188,8 +1188,7 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm,
>  		mremap_userfaultfd_prep(new_vma, vrm->uf);
>  	}
>  
> -	if (is_vm_hugetlb_page(vma))
> -		clear_vma_resv_huge_pages(vma);
> +	fixup_hugetlb_reservations(vma);
>  
>  	/* Tell pfnmap has moved from this vma */
>  	if (unlikely(vma->vm_flags & VM_PFNMAP))
> diff --git a/mm/vma.c b/mm/vma.c
> index 839d12f02c885d3338d8d233583eb302d82bb80b..a468d4c29c0cd4141657a3b321f5da2871708bdf 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -1834,6 +1834,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>  	return new_vma;
>  
>  out_vma_link:
> +	fixup_hugetlb_reservations(new_vma);
>  	vma_close(new_vma);
>  
>  	if (new_vma->vm_file)
> 
> ---
> base-commit: 94305e83eccb3120c921cd3a015cd74731140bac
> change-id: 20250523-warning_in_page_counter_cancel-e8c71a6b4c88
> 
> 


  reply	other threads:[~2025-05-23 17:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23 12:19 Ricardo Cañuelo Navarro
2025-05-23 17:45 ` Liam R. Howlett [this message]
2025-05-24 15:56   ` Pedro Falcato
2025-05-26  6:38   ` Ricardo Cañuelo Navarro
2025-05-25  6:32 ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fs2zd6syrdpyrsqkfmzjjz3y3rricscyr3bnx5trnr6p5nbhmy@l6zyiaccoknt \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=rcn@igalia.com \
    --cc=revest@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox