From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 444D1C0015E for ; Wed, 26 Jul 2023 06:51:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B15008D0002; Wed, 26 Jul 2023 02:51:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9BF58D0001; Wed, 26 Jul 2023 02:51:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93DC48D0002; Wed, 26 Jul 2023 02:51:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7ED818D0001 for ; Wed, 26 Jul 2023 02:51:31 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 547F340194 for ; Wed, 26 Jul 2023 06:51:31 +0000 (UTC) X-FDA: 81052841982.04.D7D0CB1 Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) by imf03.hostedemail.com (Postfix) with ESMTP id 452E820002 for ; Wed, 26 Jul 2023 06:51:28 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf03.hostedemail.com: domain of t.lamprecht@proxmox.com designates 94.136.29.106 as permitted sender) smtp.mailfrom=t.lamprecht@proxmox.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690354289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/izC22qrnBtjnRiSgFbgElfLhEEOw8HxshiYAg+JJKQ=; b=dX8/lDWLajPPhwfGJGoBNRBK4LHbyOWDfyyVbXriSbhN7uQ2tLX3knNcoPKWH4Umrw3Wih 2vI+B3J5vyXY/+ojk5xhDjyuWRVMVdagwqfkVtN9oGguXO6Ha9SUzoPqpf04EHho+9tbX1 jIix0h7+N9AFQcGJpwrdZM7CkApBQ78= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf03.hostedemail.com: domain of t.lamprecht@proxmox.com designates 94.136.29.106 as permitted sender) smtp.mailfrom=t.lamprecht@proxmox.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690354289; a=rsa-sha256; cv=none; b=Fb1trzIETcXzNeBwI/xQ5tyekh2e+J/81CavJZ9Pty+QRyc3fpqXWuT0/ZvcGZ1eOzPJjX h7/O7CSslynqJczyoxP1Xb6+75b4mmrscdebFq70D6QkzReWuRxjTTiaAXy1sMDlalNnvt +2Po/GRIVciDEsKjIQ5Wi6zeDSYdIRM= Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 36590452BD; Wed, 26 Jul 2023 08:51:26 +0200 (CEST) Message-ID: <85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com> Date: Wed, 26 Jul 2023 08:51:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable" Content-Language: en-GB To: Linus Torvalds , Fiona Ebner , "Eric W. Biederman" , Oleg Nesterov Cc: akpm@linux-foundation.org, Wolfgang Bumiller , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com> From: Thomas Lamprecht In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 452E820002 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: gtdhet8zbm3be57n4fch6s69xhc3ez3c X-HE-Tag: 1690354288-569930 X-HE-Meta: U2FsdGVkX18g3WkGWZvJNamU7K7u+LT8yHkuV66K6aeTjMIATEZQaK1ToCyuObOANXQlMIx9wpZULsEVVtx+LJHpKcfjlnJ8SovtnCaDuOPQFOgcTwupsfnchSqcp2kY1xy3pUYgy35aepYyZ5I9lQ3Ak3K0S/oMZSF1iqYxHxamakLcof/bOa5cZakMdUsWnvm5Rgv83mygvkY5Pw3RUx6aJetSqIOhCOyAepC4hWYC1AYQFXAX/m73FNiSZP2OyxZGsPq4By4mVbz53W02pxO7gfXiunlqI5hDGa/cU/PjUmS3H4LOoUfPu5TVDwHpDTnO6PWcu0c/GXep/26foO9oCS7EtMMhGjHoWnZoOP33g8ex7B/Zr8iOfUbO9VkSyU72wUP3zFTxLg0+AQm1VBzs4uYaHm0g+vn/AElBx2bipeCTo6BKbfleoG3YymmYpBefWjiVqM1spvwj2G1semMjimne1/xGdh3/uZ6mjN6GwckTlz8B6tQy0i3DOT/uSuRfIXMEzw5FsiWX3k/MArJqrZcd+mKeG364qkD5z1p6vT0u8xXFSqxvAEjA2xDd0KIcOsoAq4rUH4NLsXJ4ApCzSq+7PhZZV2Vw9buhgnA6EBBOmwLunvWg+JzA3LfZaWFSa7lkFQsDyca08LUFBAbiBAdZZdQIBzfluVf8L1ULjKSHOf+93R63gqk6dLOk6uxRdQEpNeLIrFBTgBRsx+NWZua+kTahlGRH1tcP2kZIoXZW3lV95gT0GZ23Q1uURBGZY+lQyyv85EiwEsXEbIf3poOgrAFckIxKKieSvZU8lcZyU0mNp0x0zV0adm7zrLJ/X79K9ADdd77GKXYZPF936apcoyP1ObyRvPaPr/319Lyp5vK3Lq2Z4pIhmhk9neQbEqDd/tAb8R78sCIyGSl34ic2ZmBza8a66kZctSJ7F8BC1cg9qMGOZ7/WPLB0LLsM5URIbo6haKgFKhT VPc1X5cD fN5Opwo50240i26TNHZ3+kHX49EdRPidpC3xqS4d5qYpm0QErrbTW+/CZlejMZuYoh17E X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25/07/2023 18:38, Linus Torvalds wrote: > But before we revert it, would you mind trying out the attached > trivial patch instead? Not Fiona, but as I was still online yesterday I got around to already try that patch out, after adding the missing `tsk` task_struct param to the fatal_signal_pending call. With the patched kernel booted, the original case we found in the wild went from logging a segfault roughly twice per hour before, to none afterward, and that with a bit more than 10h of boot time. Fiona might have a more definitive confirmation, as IIRC she got a better (= faster) reproducer used for bisecting. > > I'd also still be interested if the symptoms were anything else than > 'show_unhandled_signals' causing the show_signal_msg() dance, and > resulting in a message something like > > a.out[1567]: segfault at xyz ip [..] likely on CPU X > > in dmesg... exactly, it was just like that with no actual fall out. The messages were like: > pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0) And the slightly odd code triggering this was basically a fork, where the child wrote a message to the parent via a unix socket pair and then called exit. The parent read that message and then send a SIGKILL to the child process, i.e., the child exit and parent killing the child process would be pretty closely aligned, basically racing with each other. cheers, Thomas