linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Thorvald Natvig <thorvald@google.com>, Miaohe Lin <linmiaohe@huawei.com>
Cc: Linux-MM <linux-mm@kvack.org>
Subject: Re: hugetlbfs: WARNING: bad unlock balance detected during MADV_REMOVE
Date: Fri, 26 Jan 2024 15:50:23 +0800	[thread overview]
Message-ID: <42788ABD-99AE-4AEF-B543-C0FABAFA0464@linux.dev> (raw)
In-Reply-To: <CADOhuP78NVJrE1KpEd5y-xTaUW3su=Bfhzs0xx8zK9BO+POghQ@mail.gmail.com>



> On Jan 26, 2024, at 04:28, Thorvald Natvig <thorvald@google.com> wrote:
> 
> We've found what appears to be a lock issue that results in a blocked
> process somewhere in hugetlbfs for shared maps; seemingly from an
> interaction between hugetlb_vm_op_open and hugetlb_vmdelete_list.
> 
> Based on some added pr_warn, we believe the following is happening:
> When hugetlb_vmdelete_list is entered from the child process,
> vma->vm_private_data is NULL, and hence hugetlb_vma_trylock_write does
> not lock, since neither __vma_shareable_lock nor __vma_private_lock
> are true.
> 
> While hugetlb_vmdelete_list is executing, the parent process does
> fork(), which ends up in hugetlb_vm_op_open, which in turn allocates a
> lock for the same vma.
> 
> Thus, when the hugetlb_vmdelete_list in the child reaches the end of
> the function, vma->vm_private_data is now populated, and hence
> hugetlb_vma_unlock_write tries to unlock the vma_lock, which it does
> not hold.

Thanks for your report. ->vm_private_data was introduced since the
series [1]. So I suspect it was caused by this. But I haven't reviewed
that at that time (actually, it is a little complex in pmd sharing
case). I saw Miaohe had reviewed many of those.

CC Miaohe, maybe he has some ideas on this.

[1] https://lore.kernel.org/all/20220914221810.95771-7-mike.kravetz@oracle.com/T/#m2141e4bc30401a8ce490b1965b9bad74e7f791ff

Thanks.

> 
> dmesg:
> WARNING: bad unlock balance detected!
> 6.8.0-rc1+ #24 Not tainted
> -------------------------------------
> lock/2613 is trying to release lock (&vma_lock->rw_sema) at:
> [<ffffffffa94c6128>] hugetlb_vma_unlock_write+0x48/0x60
> but there are no more locks to release!
> 
> 
> 3 locks held by lock/2613:
> #0: ffff9b4bc6225450 (sb_writers#16){.+.+}-{0:0}, at:
> madvise_vma_behavior+0x4cc/0xcf0
> #1: ffff9ba4dc34eca0 (&sb->s_type->i_mutex_key#23){+.+.}-{3:3}, at:
> hugetlbfs_fallocate+0x3fe/0x620
> #2: ffff9ba4dc34ef38 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{3:3}, at:
> hugetlbfs_fallocate+0x438/0x620
> 
> 
> CPU: 17 PID: 2613 Comm: lock Not tainted 6.8.0-rc1+ #24
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 12/02/2023
> Call Trace:
> <TASK>
> dump_stack_lvl+0x77/0xe0
> ? hugetlb_vma_unlock_write+0x48/0x60
> dump_stack+0x10/0x20
> print_unlock_imbalance_bug+0x127/0x150
> lock_release+0x21a/0x3f0
> ? hugetlb_vma_unlock_write+0x48/0x60
> up_write+0x1c/0x1d0
> hugetlb_vma_unlock_write+0x48/0x60
> hugetlb_vmdelete_list+0x93/0xd0
> hugetlbfs_fallocate+0x4e1/0x620
> vfs_fallocate+0x153/0x4b0
> madvise_vma_behavior+0x4cc/0xcf0
> ? mas_prev+0x68/0x70
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? find_vma_prev+0x78/0xc0
> ? __pfx_madvise_vma_behavior+0x10/0x10
> madvise_walk_vmas+0xc4/0x140
> do_madvise+0x3df/0x450
> __x64_sys_madvise+0x2c/0x40
> do_syscall_64+0x8e/0x160
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_syscall_64+0x9b/0x160
> ? do_syscall_64+0x9b/0x160
> ? do_syscall_64+0x9b/0x160
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
> RIP: 0033:0x7f55e0b23bbb
> 
> Repro:
> 
> #include <signal.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> #include <unistd.h>
> 
> #define PSIZE (2048UL * 1024UL)
> 
> int main(int argc, char **argv) {
>  char *buffer = mmap(NULL, PSIZE, PROT_READ | PROT_WRITE,
> MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB, -1, 0);
>  if (buffer == MAP_FAILED) {
>    perror("mmap");
>    exit(1);
>  }
> 
>  pid_t remover = fork();
> 
>  if (remover == 0) {
>    while(1) {
>      if (madvise(buffer, PSIZE, MADV_REMOVE) == -1) {
>        perror("madvise");
>        exit(1);
>      }
>    }
>  }
> 
>  int wstatus;
> 
>  for(int l = 0; l < 10000; ++l) {
>    pid_t childpid = fork();
>    if (childpid == 0) {
>      exit(0);
>    } else {
>      waitpid(childpid, &wstatus, 0);
>    }
>  }
> 
>  kill(remover, SIGKILL);
>  waitpid(remover, &wstatus, 0);
>  printf("Clean exit\n");
> }
> 
> - Thorvald



  reply	other threads:[~2024-01-26  7:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25 20:28 Thorvald Natvig
2024-01-26  7:50 ` Muchun Song [this message]
2024-01-27 10:13   ` Miaohe Lin
2024-01-29 12:56     ` Miaohe Lin
2024-01-29 16:17       ` Liam R. Howlett
2024-01-30  2:14         ` Miaohe Lin
2024-01-30  4:08           ` Liam R. Howlett
2024-01-31  6:51             ` Miaohe Lin
2024-02-02 21:02               ` Jane Chu
2024-02-04  1:54                 ` Miaohe Lin
2024-03-29 15:54                   ` Thorvald Natvig
2024-04-02 11:24                     ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42788ABD-99AE-4AEF-B543-C0FABAFA0464@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=thorvald@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox