From: Thorvald Natvig <thorvald@google.com>
To: Muchun Song <muchun.song@linux.dev>
Cc: linux-mm@kvack.org
Subject: hugetlbfs: WARNING: bad unlock balance detected during MADV_REMOVE
Date: Thu, 25 Jan 2024 12:28:40 -0800 [thread overview]
Message-ID: <CADOhuP78NVJrE1KpEd5y-xTaUW3su=Bfhzs0xx8zK9BO+POghQ@mail.gmail.com> (raw)
We've found what appears to be a lock issue that results in a blocked
process somewhere in hugetlbfs for shared maps; seemingly from an
interaction between hugetlb_vm_op_open and hugetlb_vmdelete_list.
Based on some added pr_warn, we believe the following is happening:
When hugetlb_vmdelete_list is entered from the child process,
vma->vm_private_data is NULL, and hence hugetlb_vma_trylock_write does
not lock, since neither __vma_shareable_lock nor __vma_private_lock
are true.
While hugetlb_vmdelete_list is executing, the parent process does
fork(), which ends up in hugetlb_vm_op_open, which in turn allocates a
lock for the same vma.
Thus, when the hugetlb_vmdelete_list in the child reaches the end of
the function, vma->vm_private_data is now populated, and hence
hugetlb_vma_unlock_write tries to unlock the vma_lock, which it does
not hold.
dmesg:
WARNING: bad unlock balance detected!
6.8.0-rc1+ #24 Not tainted
-------------------------------------
lock/2613 is trying to release lock (&vma_lock->rw_sema) at:
[<ffffffffa94c6128>] hugetlb_vma_unlock_write+0x48/0x60
but there are no more locks to release!
3 locks held by lock/2613:
#0: ffff9b4bc6225450 (sb_writers#16){.+.+}-{0:0}, at:
madvise_vma_behavior+0x4cc/0xcf0
#1: ffff9ba4dc34eca0 (&sb->s_type->i_mutex_key#23){+.+.}-{3:3}, at:
hugetlbfs_fallocate+0x3fe/0x620
#2: ffff9ba4dc34ef38 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{3:3}, at:
hugetlbfs_fallocate+0x438/0x620
CPU: 17 PID: 2613 Comm: lock Not tainted 6.8.0-rc1+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 12/02/2023
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xe0
? hugetlb_vma_unlock_write+0x48/0x60
dump_stack+0x10/0x20
print_unlock_imbalance_bug+0x127/0x150
lock_release+0x21a/0x3f0
? hugetlb_vma_unlock_write+0x48/0x60
up_write+0x1c/0x1d0
hugetlb_vma_unlock_write+0x48/0x60
hugetlb_vmdelete_list+0x93/0xd0
hugetlbfs_fallocate+0x4e1/0x620
vfs_fallocate+0x153/0x4b0
madvise_vma_behavior+0x4cc/0xcf0
? mas_prev+0x68/0x70
? srso_alias_return_thunk+0x5/0xfbef5
? find_vma_prev+0x78/0xc0
? __pfx_madvise_vma_behavior+0x10/0x10
madvise_walk_vmas+0xc4/0x140
do_madvise+0x3df/0x450
__x64_sys_madvise+0x2c/0x40
do_syscall_64+0x8e/0x160
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x9b/0x160
? do_syscall_64+0x9b/0x160
? do_syscall_64+0x9b/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
RIP: 0033:0x7f55e0b23bbb
Repro:
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#define PSIZE (2048UL * 1024UL)
int main(int argc, char **argv) {
char *buffer = mmap(NULL, PSIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB, -1, 0);
if (buffer == MAP_FAILED) {
perror("mmap");
exit(1);
}
pid_t remover = fork();
if (remover == 0) {
while(1) {
if (madvise(buffer, PSIZE, MADV_REMOVE) == -1) {
perror("madvise");
exit(1);
}
}
}
int wstatus;
for(int l = 0; l < 10000; ++l) {
pid_t childpid = fork();
if (childpid == 0) {
exit(0);
} else {
waitpid(childpid, &wstatus, 0);
}
}
kill(remover, SIGKILL);
waitpid(remover, &wstatus, 0);
printf("Clean exit\n");
}
- Thorvald
next reply other threads:[~2024-01-25 20:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 20:28 Thorvald Natvig [this message]
2024-01-26 7:50 ` Muchun Song
2024-01-27 10:13 ` Miaohe Lin
2024-01-29 12:56 ` Miaohe Lin
2024-01-29 16:17 ` Liam R. Howlett
2024-01-30 2:14 ` Miaohe Lin
2024-01-30 4:08 ` Liam R. Howlett
2024-01-31 6:51 ` Miaohe Lin
2024-02-02 21:02 ` Jane Chu
2024-02-04 1:54 ` Miaohe Lin
2024-03-29 15:54 ` Thorvald Natvig
2024-04-02 11:24 ` Miaohe Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADOhuP78NVJrE1KpEd5y-xTaUW3su=Bfhzs0xx8zK9BO+POghQ@mail.gmail.com' \
--to=thorvald@google.com \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox