From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B04B1C3DA4A for ; Mon, 5 Aug 2024 20:34:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FC116B007B; Mon, 5 Aug 2024 16:34:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ABF06B0082; Mon, 5 Aug 2024 16:34:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0736D6B0083; Mon, 5 Aug 2024 16:34:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DDA226B007B for ; Mon, 5 Aug 2024 16:34:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 85A70A1B88 for ; Mon, 5 Aug 2024 20:34:51 +0000 (UTC) X-FDA: 82419345582.20.47D0CB5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id C0F1F140014 for ; Mon, 5 Aug 2024 20:34:48 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sc9BqLhj; spf=pass (imf23.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722890040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S0JCAOYJUEYKaZev/inV9e5vk38/rnP8QJX01tqNb4A=; b=HY6+YI3cAnhIVtM5GF05nGYTb0XSGqZCkVHmV7gStbCrLZzCLof1OhLNCRhWAUxSHN9AOm /cnhHeteSe3nlhYMTdb+y35HSD6IzRGQ4FSqulSHOpgP+Mv8mTGPZXYAoiwc2rJk5E9QC7 g+JPwM1eHLtWSdb4pfqNbZ2/tvNw/cg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sc9BqLhj; spf=pass (imf23.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722890040; a=rsa-sha256; cv=none; b=yanRBc00rwxWvDNPKBFlDGOw+Zl3RMKLz1CLVEV0s2CE46YhFC34z9PrnHFi5X+STbpbXc Tk1Tna31OIP7npJe1LojHiCeajQiVRBpSiQdxuVoG8QLS9rixDEisP3CTwrMd2KnHx/RuJ bcoBpS0c1nmsKD+0SK6BXhmIe8EJm+k= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id AAA7860DC4; Mon, 5 Aug 2024 20:34:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72EEBC4AF11; Mon, 5 Aug 2024 20:34:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722890087; bh=WLTAncxHbb2bIwU5P7XV//UTrrUj8v2T3O3toZDJJr4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sc9BqLhjeh2KS44wtEj+WsvujNNEvhStNwwxogs6REPzOwcyGI831+DgM0R9qw/bo 4C3FuTDNs5h84XYJnsNLNQJSALmtq+Xtlhz/x6/rbj/PCqjSX4oSHeHAUNQEMJGVsT X0O6xcYsOSOwMNgV3+tuGwh6NomBRtwZneCWgLFRp9+EhOY7mR73S2+c4K1jSiEqbw jCQssanAT7SBIzXCJfsVhbeXQE41Krw76s8trgLsWqNDE7/AabkWyek2QoNNYpmQO3 lTz64XYyYVkLA31qK0KKS1nr6DegMsX4qNIqg1U7i31VICG/ch9yzgFfel7XkSSpm9 RkdEhGio3z4ZQ== Date: Mon, 5 Aug 2024 13:34:46 -0700 From: Kees Cook To: Brian Mak Cc: "Eric W. Biederman" , Oleg Nesterov , Linus Torvalds , Alexander Viro , Christian Brauner , Jan Kara , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [RFC PATCH] binfmt_elf: Dump smaller VMAs first in ELF cores Message-ID: <202408051330.277A639@keescook> References: <877cd1ymy0.fsf@email.froward.int.ebiederm.org> <4B7D9FBE-2657-45DB-9702-F3E056CE6CFD@juniper.net> <202408051018.F7BA4C0A6@keescook> <230E81B0-A0BD-44B5-B354-3902DB50D3D0@juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <230E81B0-A0BD-44B5-B354-3902DB50D3D0@juniper.net> X-Stat-Signature: eqqtq33xz6wardp87bb1sji5zf7cj91u X-Rspam-User: X-Rspamd-Queue-Id: C0F1F140014 X-Rspamd-Server: rspam02 X-HE-Tag: 1722890088-672594 X-HE-Meta: U2FsdGVkX18LjD5EWD2c373xhzF8J2uQB8jqHO77xQQ71G+FlKhve9Mpv7vNglKD1MgqXC+Pv+622XY5QRdO+KQdLyJwRylZzE85sAS7sMQPKK6FuwUm1u9imYnNuvVxxt16+YHC2m5FJfz1qYSgxOBjuaIfsdbIrmeCB4AAJ6V90w+WZ1iOXsDT1PGhwkQNSdchNdk5gfACHZfJ3j0LZRN0jPpDIS5OKlUCl3f2JdjFuBBaxYgg37H5heHN1ij0x5SuOk89n8OtUswDmUvUdHaz4nZqdOVPfDLk6FYjjZla/mu3pihWysRS478xSYd4jXwvXZ7lMLJPJvszfiAg9eYw8VVyHZzv8hy+IIR7A0z3kGNZLzWMK8TE7tA8wtze6d9bDlfXIL7Hxm4Btwr9IWVVsD8xaj/TlIYaSyMYBv93wbVUgCFeLPRbEAg+WemUHZb1vtIrbczna7fkw6FFAgGPTlF8Zu6A5OKSIhbtBnQWEn4rwBdt8AbchJfi4UVPoXFRDZV1BwmFM/4uLUlS98TV49vxaEW45oNUPWvy0XqwJMLHlsHnZgzO20UlDJ1hzRM4uB08nlJtqSe0UQczNxe3pVKEelOXkEm9RlGvVZnq3xqPeIYUwNLFVG+t6o/kSNEAZ+3+yotZHa2Zu8MqxC9J8wVsUUNvRsW44BAMAcr7RluZean4cW6Zzz4gEYXRRWRN8HHfkHEBicFUlqZrBkxrhjKksrDw05PreQAYYXOdOYJJqi29SiTSaAoz3nnh+ECGdZgTqtPigdA+F4y3F+vZ4Vh7472FshfcjEjNssS4Ykz1SAnnXmISbGqcLJpszK5csnVpekXqA+k05kFPwBuQOZjPISdkgO+WY6XM4e0ap7wcM+cM+iDhAel8eR07+TfTBQ4CKOV7V8kC1BDMzwogZQYYODoZS1fwkmfuqZHaSGOupAuqKMPO2vR9vo0VGE0dnrfBG5POcKwYpgl cjy4ZzgQ Tkr7nOzFIe7/khaBV4notnrX9yf6FBsXvbLGwhbeQug0aO2RsDeg/MVBKRrGcmSYud14cOM6gYVp5S/52Z7OrSb2Qjr6dTsSq9Kc2jk+0v5OqaLY5EsDIcVrfkzIQ/D23igxnI8cs2Qlas7As/25mM7cCNlvMwGOmUdi9ramh3XHbx3hK1B7gjQMHuGn2JXGttRjEpvf3QzBrmv1FfCfnQj+4II2SPbLH2UQu5EYmxU6kbqewmgwPSvLzKpBsyiTvI1jIC2h2HqcJlEjN4BZH4xAHb3KDdJ+P7/df4pHEBCqX3atT4CpjTRpdg6/BjMRPkZOSvzcvemeGrdMZJDritm9S1+BimTZncwRMUAXvTdspG4cqiWKjcq0Umokwlvea8UYwVoG1DT8HgfLG/s8i+qRT6nzBkkOCMjuMRBUzIQrvx2atuVAsC6sbpZH6tbeZx/eF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 05, 2024 at 06:44:44PM +0000, Brian Mak wrote: > On Aug 5, 2024, at 10:25 AM, Kees Cook wrote: > > > On Thu, Aug 01, 2024 at 05:58:06PM +0000, Brian Mak wrote: > >> On Jul 31, 2024, at 7:52 PM, Eric W. Biederman wrote: > >>> One practical concern with this approach is that I think the ELF > >>> specification says that program headers should be written in memory > >>> order. So a comment on your testing to see if gdb or rr or any of > >>> the other debuggers that read core dumps cares would be appreciated. > >> > >> I've already tested readelf and gdb on core dumps (truncated and whole) > >> with this patch and it is able to read/use these core dumps in these > >> scenarios with a proper backtrace. > > > > Can you compare the "rr" selftest before/after the patch? They have been > > the most sensitive to changes to ELF, ptrace, seccomp, etc, so I've > > tried to double-check "user visible" changes with their tree. :) > > Hi Kees, > > Thanks for your reply! > > Can you please give me some more information on these self tests? > What/where are they? I'm not too familiar with rr. I start from where whenever I go through their tests: https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests > > And those VMAs weren't thread stacks? > > Admittedly, I did do all of this exploration months ago, and only have > my notes to go off of here, but no, they should not have been thread > stacks since I had pulled all of them in during a "first pass". Okay, cool. I suspect you'd already explored that, but I wanted to be sure we didn't have an "easy to explain" solution. ;) > > It does also feel like part of the overall problem is that systemd > > doesn't have a way to know the process is crashing, and then creates the > > truncation problem. (i.e. we're trying to use the kernel to work around > > a visibility issue in userspace.) > > Even if systemd had visibility into the fact that a crash is happening, > there's not much systemd can do in some circumstances. In applications > with strict time to recovery limits, the process needs to restart within > a certain time limit. We run into a similar issue as the issue I raised > in my last reply on this thread: to keep the core dump intact and > recover, we either need to start up a new process while the old one is > core dumping, or wait until core dumping is complete to restart. > > If we start up a new process while the old one is core dumping, we risk > system stability in applications with a large memory footprint since we > could run out of memory from the duplication of memory consumption. If > we wait until core dumping is complete to restart, we're in the same > scenario as before with the core being truncated or we miss recovery > time objectives by waiting too long. > > For this reason, I wouldn't say we're using the kernel to work around a > visibility issue or that systemd is creating the truncation problem, but > rather that the issue exists due to limitations in how we're truncating > cores. That being said, there might be some use in this type of > visibility for others with less strict recovery time objectives or > applications with a lower memory footprint. Yeah, this is interesting. This effectively makes the coredumping activity rather "critical path": the replacement process can't start until the dump has finished... hmm. It feels like there should be a way to move the dumping process aside, but with all the VMAs still live, I can see how this might go weird. I'll think some more about this... -- Kees Cook