From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B229C47422 for ; Fri, 26 Jan 2024 07:51:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE9926B00C7; Fri, 26 Jan 2024 02:51:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B718F6B00C8; Fri, 26 Jan 2024 02:51:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EAAF6B00C9; Fri, 26 Jan 2024 02:51:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 88F5E6B00C7 for ; Fri, 26 Jan 2024 02:51:23 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 55CDCA2432 for ; Fri, 26 Jan 2024 07:51:23 +0000 (UTC) X-FDA: 81720692046.04.9150530 Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) by imf04.hostedemail.com (Postfix) with ESMTP id 6E25240007 for ; Fri, 26 Jan 2024 07:51:20 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sStlx+Sb; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706255481; a=rsa-sha256; cv=none; b=FIX9WeooNvadKZrYIFCDG5/lhA+zJsxHIEr66XlsFSpxHh0GbPaZ5zMFm6jvWeIhjRWT85 G6zsKGlSNo2EPjPnYm5r6tlCRCuCfkYNWhU5iNsklNkEFLuba33Sk7iwyBmFPX4JBZ1McZ 1D+ag8wqCggsf23rS9SDp2lxOvhAYYI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sStlx+Sb; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706255481; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NRDSTvRHjCQO2by2G/sOQWBhLrI/gTvm5NsO4V48syI=; b=ZMikzplH3KDPFLIfHMm/6+R8UduFlug9oexDdtA2PveVxgySQ3q3MSAJ/s5VbFAESSm7Nu VlfEfe6bVC6XLcLouuZETOVCLXkRkhoyPdOuLih5r4YKhfuA+nvQkV1wKVX7ejKNwaJx/X pwjVp2ZJL6xaQESdwXLwE9QWl+CR+yw= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706255470; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NRDSTvRHjCQO2by2G/sOQWBhLrI/gTvm5NsO4V48syI=; b=sStlx+SbPWmM2XpaKYKcj4DtbCIk12seVXmajnu7DTGfAY0AQ15kcJr9iamtmcMvECnwzE ioxAdaK8dnu1hQKcJbRNtcyHr1CaaeOL4TnliW3Yg1O11ytWqeaFPmJ2CRR4leVoQSQt7a fTs+zrzimxpKYuomMjRRbh7p2b+UPQI= Mime-Version: 1.0 Subject: Re: hugetlbfs: WARNING: bad unlock balance detected during MADV_REMOVE X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Fri, 26 Jan 2024 15:50:23 +0800 Cc: Linux-MM Content-Transfer-Encoding: quoted-printable Message-Id: <42788ABD-99AE-4AEF-B543-C0FABAFA0464@linux.dev> References: To: Thorvald Natvig , Miaohe Lin X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6E25240007 X-Stat-Signature: 6m5ns6umephrj1t6dxhmu9dd1adrpqyz X-HE-Tag: 1706255480-261332 X-HE-Meta: U2FsdGVkX1/vQJ0/d/hzeazvZ52TbhhLEdjCTewK0ImQj+Ij31AQsP3dccdbuRNtLE2IcyBWgvQTcbH2UP+0gb+18rMWgEZL2qgVPAc2VPWglkIjMjE0XWDyWapKaFvvpeZGWajo270QWcpdzY0u+g2CpHQ8oZ2EOlgutnYhJkoy+KWI8uqFeASVPyzzUhHL8xXh7spgTTKlNgvHhhSTY6Kwe2nVtWGzUvQp9IH67CAg6xn9mPXpn7ZthvvhP3S0qPU9qFyft6wxVmZ+jps+N8fkWwqpeBjeyYCC+cliWHE6njxsdMTKabWJQODFW78pvnGIlq3StyhbENiXBfZRlIbTKM2jscfueRAtkug6u5ueOmMCr2q66eYCrd8wPPQFD1/B1YUOfFhpdd8dOxpcm/OuZjStsszXI64r/JLXL6yOI72ahtjOnUClrtpKtFKurrK/+lybJj5FLPnsnf44Nb9H+CoZpS8jyL1KkEsQT+lnlsoKYKzAFMQAsJZg5Zc/bXB1PfzZtGPx7ihmYRgIaGRdw/FrPgMkTq/nHLCXtujVNGR1L9m3LP6Sa9RnRrNWKnmRb65bLwEeARNvOdnaoZt/g3yPcZCpm+g0y8qCskL0V4hOrphWZ2piUwX7tHLyi43+dBictKWEmPWBhwqZ9822yE6X436Qb2KaxGlceXFxFOtJIZwueMoiKS+UQ4kJE/W6pjDLUwKGGLnRjw8FO3S84QkoMVmstiQenfYas65DJKGIF+9FE1cxU+Kkpi4KQQhM9yNxgA32vdnH6jg0XyH9WhQHD70U7a7OmtjY9K4NGQ64jteI273V/6Qo4CivhCvBx1YE7RF7H7/XuiBFI6VxxbUAYBGrSoBsScPu0Y/MHi1+p4YfWvtI8G6Wd9aRe2qxvTzo5ANF8Q3XbXmI/Mn9Cu379PushphhKEgQDZ4YMUkbprm4BiILvu8ntCA52vkBdJhncfjI9wRS4WD 2gFU08Xs qhlVIexHmM23Z4EU0TezDfdfeDaE3pFVQRBA7nrd0OgusGamaj1wpNYjIVdHxYc6A2OFzaqYN3yD5Kh24XDpqR6KsoN1gkZsOL9pvzYr1VLeyC6yRyfaB1LxDLLPl+eodmltSscO3a0EGQu6bxRzkialDPL1SckGgvjJo5ktbXLZANJTHgONuZ0p90Wzu0cg8P5YgORq2DO1/vQ2mJgheprnITdwZMTncSq7lKgXOAWjNWke8xtrt9kIY0fb9HIa27CJtyZbEI7Z5Zw4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Jan 26, 2024, at 04:28, Thorvald Natvig = wrote: >=20 > We've found what appears to be a lock issue that results in a blocked > process somewhere in hugetlbfs for shared maps; seemingly from an > interaction between hugetlb_vm_op_open and hugetlb_vmdelete_list. >=20 > Based on some added pr_warn, we believe the following is happening: > When hugetlb_vmdelete_list is entered from the child process, > vma->vm_private_data is NULL, and hence hugetlb_vma_trylock_write does > not lock, since neither __vma_shareable_lock nor __vma_private_lock > are true. >=20 > While hugetlb_vmdelete_list is executing, the parent process does > fork(), which ends up in hugetlb_vm_op_open, which in turn allocates a > lock for the same vma. >=20 > Thus, when the hugetlb_vmdelete_list in the child reaches the end of > the function, vma->vm_private_data is now populated, and hence > hugetlb_vma_unlock_write tries to unlock the vma_lock, which it does > not hold. Thanks for your report. ->vm_private_data was introduced since the series [1]. So I suspect it was caused by this. But I haven't reviewed that at that time (actually, it is a little complex in pmd sharing case). I saw Miaohe had reviewed many of those. CC Miaohe, maybe he has some ideas on this. [1] = https://lore.kernel.org/all/20220914221810.95771-7-mike.kravetz@oracle.com= /T/#m2141e4bc30401a8ce490b1965b9bad74e7f791ff Thanks. >=20 > dmesg: > WARNING: bad unlock balance detected! > 6.8.0-rc1+ #24 Not tainted > ------------------------------------- > lock/2613 is trying to release lock (&vma_lock->rw_sema) at: > [] hugetlb_vma_unlock_write+0x48/0x60 > but there are no more locks to release! >=20 >=20 > 3 locks held by lock/2613: > #0: ffff9b4bc6225450 (sb_writers#16){.+.+}-{0:0}, at: > madvise_vma_behavior+0x4cc/0xcf0 > #1: ffff9ba4dc34eca0 (&sb->s_type->i_mutex_key#23){+.+.}-{3:3}, at: > hugetlbfs_fallocate+0x3fe/0x620 > #2: ffff9ba4dc34ef38 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{3:3}, at: > hugetlbfs_fallocate+0x438/0x620 >=20 >=20 > CPU: 17 PID: 2613 Comm: lock Not tainted 6.8.0-rc1+ #24 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 12/02/2023 > Call Trace: > > dump_stack_lvl+0x77/0xe0 > ? hugetlb_vma_unlock_write+0x48/0x60 > dump_stack+0x10/0x20 > print_unlock_imbalance_bug+0x127/0x150 > lock_release+0x21a/0x3f0 > ? hugetlb_vma_unlock_write+0x48/0x60 > up_write+0x1c/0x1d0 > hugetlb_vma_unlock_write+0x48/0x60 > hugetlb_vmdelete_list+0x93/0xd0 > hugetlbfs_fallocate+0x4e1/0x620 > vfs_fallocate+0x153/0x4b0 > madvise_vma_behavior+0x4cc/0xcf0 > ? mas_prev+0x68/0x70 > ? srso_alias_return_thunk+0x5/0xfbef5 > ? find_vma_prev+0x78/0xc0 > ? __pfx_madvise_vma_behavior+0x10/0x10 > madvise_walk_vmas+0xc4/0x140 > do_madvise+0x3df/0x450 > __x64_sys_madvise+0x2c/0x40 > do_syscall_64+0x8e/0x160 > ? srso_alias_return_thunk+0x5/0xfbef5 > ? do_syscall_64+0x9b/0x160 > ? do_syscall_64+0x9b/0x160 > ? do_syscall_64+0x9b/0x160 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > RIP: 0033:0x7f55e0b23bbb >=20 > Repro: >=20 > #include > #include > #include > #include > #include > #include > #include >=20 > #define PSIZE (2048UL * 1024UL) >=20 > int main(int argc, char **argv) { > char *buffer =3D mmap(NULL, PSIZE, PROT_READ | PROT_WRITE, > MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB, -1, 0); > if (buffer =3D=3D MAP_FAILED) { > perror("mmap"); > exit(1); > } >=20 > pid_t remover =3D fork(); >=20 > if (remover =3D=3D 0) { > while(1) { > if (madvise(buffer, PSIZE, MADV_REMOVE) =3D=3D -1) { > perror("madvise"); > exit(1); > } > } > } >=20 > int wstatus; >=20 > for(int l =3D 0; l < 10000; ++l) { > pid_t childpid =3D fork(); > if (childpid =3D=3D 0) { > exit(0); > } else { > waitpid(childpid, &wstatus, 0); > } > } >=20 > kill(remover, SIGKILL); > waitpid(remover, &wstatus, 0); > printf("Clean exit\n"); > } >=20 > - Thorvald