From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57F60C47DDF for ; Thu, 1 Feb 2024 10:47:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D792F6B0072; Thu, 1 Feb 2024 05:47:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2C1E6B0074; Thu, 1 Feb 2024 05:47:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF0816B0075; Thu, 1 Feb 2024 05:47:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AFDBA6B0072 for ; Thu, 1 Feb 2024 05:47:07 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 43638160E0D for ; Thu, 1 Feb 2024 10:47:07 +0000 (UTC) X-FDA: 81742907694.01.F72548B Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) by imf02.hostedemail.com (Postfix) with ESMTP id 4C6C880020 for ; Thu, 1 Feb 2024 10:47:05 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf02.hostedemail.com: domain of regressions@leemhuis.info designates 80.237.130.52 as permitted sender) smtp.mailfrom=regressions@leemhuis.info ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706784425; a=rsa-sha256; cv=none; b=49/VPV3nwObh5Tb/hAOJshXmVNm412uXD2u0C1VqnK5HBCoqDXtFUI6Fgyzde3TaQE9O06 qX4CFfwHIej9LhxIbJ7RU/DgZ5Ocrq6O98Im+cpMxG3PD2fRxkAoUhLJg9ZVJj9/bQi2uN wu4wnhBRssu3beZyh5E2mtebHEAB4lk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf02.hostedemail.com: domain of regressions@leemhuis.info designates 80.237.130.52 as permitted sender) smtp.mailfrom=regressions@leemhuis.info ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706784425; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ORqIUlJutTPUviP3j6diMyjodwZG/eRvv6Cq1AE1Q1I=; b=cVz+MR96yq9ptLivBZJrse2XJRAyDyNStpw8xmH5VXgRPkCZCcaVcKF21kR5fhrGdrXgbB GH45r+5OiZyhH1R9+K/tuPlMcq9uCE0OKN72zeCBETOVxQAZSiVJ0/tIS5bzV+gz6+HIy+ rpuxykjQPxmv5m+oFb+hY9U5kCJLcUA= Received: from [2a02:8108:8980:2478:8cde:aa2c:f324:937e]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1rVUb1-0007vf-0a; Thu, 01 Feb 2024 11:47:03 +0100 Message-ID: <95eae92a-ecad-4e0e-b381-5835f370a9e7@leemhuis.info> Date: Thu, 1 Feb 2024 11:47:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Linux regressions mailing list Subject: Re: Recent-ish changes in binfmt_elf made my program segfault Content-Language: en-US, de-DE To: Kees Cook , "Eric W. Biederman" Cc: Jan Bujak , linux-mm@kvack.org, linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Linux kernel regressions list References: <874jf5co8g.fsf@email.froward.int.ebiederm.org> <202401221226.DAFA58B78@keescook> <87v87laxrh.fsf@email.froward.int.ebiederm.org> <202401221339.85DBD3931@keescook> From: "Linux regression tracking (Thorsten Leemhuis)" In-Reply-To: <202401221339.85DBD3931@keescook> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1706784425;dc094bb0; X-HE-SMSGID: 1rVUb1-0007vf-0a X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4C6C880020 X-Stat-Signature: z355qaw96cybfsatnncjjdwjggd9ynk3 X-Rspam-User: X-HE-Tag: 1706784425-668454 X-HE-Meta: U2FsdGVkX19hDmmknF6si1Lcxa7vMzq6bsjAgF0fYncESF7WrxfumTBXJzWVKL7Q49FWP5Ks/W36Ka+r81VDi8irQj7iee4UNCvYbs4ndwyZIfWbrxSSk24/IkTcDHPVfMsNskhoSuLT28FbsiFgDtvE3u0LNHAhytVCeDuqaIM11jS04gd+uXbHl03KvrOxkjYfeWT0UO47n4Ry3kUDyy5Mg9AN1uNrtKoWlKB/Pj7KOpGvxdx6ztESbnXExPj1BRK92Z87tMJ+fxJT156B3hrq7uDEvDOZbY0O2HXPcVk3RL1lPP1bRLTNZ0A/h89A5qRSK9M6oMNM4KFQiPxvs/LvMK//1W7FU83TRIkbmg36VuIzms34entw+jeBlQmoRVmSRT3rnfNcc3gGTfcLmIjRCcvRwaxT6qvCYdTMgiMKhFC86leNlzSnqCHziP1+i+IuBVzIFRhr8xV2XBvy7ahwoVb3QP5Dtz0oJH64YoWJdfP1nLPkAw7dqCU9L5KHYbB86fICfroLTqKuDNakrAl9asb/mXAxsaIFe7w3dGkJmRgD55Cuhwo2AaWWyGALY+T2n8YeZvG4H/GRBN9Wv+seJ+gyNx6SDiyIZLW2AGz2leFTXlTC8GQzoZzX2A3cI0JHP6XOM9fVuYJgoDmhXbLT3eSZrLJEY0GQAQd0NVxDdRWnezEY74/AqtqUdWC20IcoElL3lUe5rxYT/L/6foaBr/Rn0p20Cb+nURBbd/KledrKcCCrrGKo/QV3MBmgdjWZRUkz4dFPzQxY3qeiFmwNDS8XWf8GywnAz4620zbhimAAt/uJB7UTEPoqHZcbjzPcI2xAlrJ/Z3XGevQIYVkZb0mvqYsGu8ov7CPNkZgDGnNTynAKUGtchzlS47xsRBIH+tEvjMWLZDN+lkJPC/sjmL14wXlEzqHRjOciEotMCTL0eVccZYtQQfxYZvd+o7tKe1GYBwVM0KVfg5u RqQQhInM EMcBsoUB/AczHaHjUc3Ba5GvJsG1Zokfqm7ONOQy/fSDcglbVwSoilPA4v7baYiPjNJ8pBIB0k00Bdc+mx/u7L4qmPXYgFZpQCQ9pWvTDAbJcoseqr9S4ArBAf36iH2gE6unjeS9EFHLjEgNlnNT4ein13yfigY3Oun35P3abd1AxM5J4ryVQY5nkH5hls0Z8+M/WiLXKefA7M351HPLWhTABHVT5lE93nUWcfvrPohdqKzLWS2f0hrltmFWYnvuZ0w2xwXWpWkLXHqsI6FCY5EnR1YDMz0oWXRd0LldV/0sPK0vrZU7puMN3JPSJfR8dyP5KVpXiHvZk260Lm4P+y3U3zP5QcYWeOQezwW+D5ZIrpVNfBsdjuMb+Gj+g7JiyrjlDrs+xiUPmq/Ff6JqdPWGD6vGjBTmnpAMY6mBLUG40ttYTEUsohaArFEMAjaXhrQyzORdZtT9OHO2n2IQI78XAZJ7cJ2c6KFWIdETE+EBBV3+eWRgzMijzq3WQim7EolRFLU4FDd9HxIdPKCK4Yi06W1177BloNoSpKgEzxm1YZT8rvx3YSeET785wwTJ5lTPaCxI2946wNv8wo5DV3ubGST/I5bIvFHBn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Eric, what's the status wrt. to this regression? Things from here look stalled, but I might be missing something. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 22.01.24 23:12, Kees Cook wrote: > On Mon, Jan 22, 2024 at 03:01:06PM -0600, Eric W. Biederman wrote: >> Kees Cook writes: >> >>> On Mon, Jan 22, 2024 at 10:43:59AM -0600, Eric W. Biederman wrote: >>>> Jan Bujak writes: >>>> >>>>> Hi. >>>>> >>>>> I recently updated my kernel and one of my programs started segfaulting. >>>>> >>>>> The issue seems to be related to how the kernel interprets PT_LOAD headers; >>>>> consider the following program headers (from 'readelf' of my reproduction): >>>>> >>>>> Program Headers: >>>>>   Type  Offset   VirtAddr  PhysAddr  FileSiz  MemSiz   Flg Align >>>>>   LOAD  0x001000 0x10000   0x10000   0x000010 0x000010 R   0x1000 >>>>>   LOAD  0x002000 0x11000   0x11000   0x000010 0x000010 RW  0x1000 >>>>>   LOAD  0x002010 0x11010   0x11010   0x000000 0x000004 RW  0x1000 >>>>>   LOAD  0x003000 0x12000   0x12000   0x0000d2 0x0000d2 R E 0x1000 >>>>>   LOAD  0x004000 0x20000   0x20000   0x000004 0x000004 RW  0x1000 >>>>> >>>>> Old kernels load this ELF file in the following way ('/proc/self/maps'): >>>>> >>>>> 00010000-00011000 r--p 00001000 00:02 131  ./bug-reproduction >>>>> 00011000-00012000 rw-p 00002000 00:02 131  ./bug-reproduction >>>>> 00012000-00013000 r-xp 00003000 00:02 131  ./bug-reproduction >>>>> 00020000-00021000 rw-p 00004000 00:02 131  ./bug-reproduction >>>>> >>>>> And new kernels do it like this: >>>>> >>>>> 00010000-00011000 r--p 00001000 00:02 131  ./bug-reproduction >>>>> 00011000-00012000 rw-p 00000000 00:00 0 >>>>> 00012000-00013000 r-xp 00003000 00:02 131  ./bug-reproduction >>>>> 00020000-00021000 rw-p 00004000 00:02 131  ./bug-reproduction >>>>> >>>>> That map between 0x11000 and 0x12000 is the program's '.data' and '.bss' >>>>> sections to which it tries to write to, and since the kernel doesn't map >>>>> them anymore it crashes. >>>>> >>>>> I bisected the issue to the following commit: >>>>> >>>>> commit 585a018627b4d7ed37387211f667916840b5c5ea >>>>> Author: Eric W. Biederman >>>>> Date:   Thu Sep 28 20:24:29 2023 -0700 >>>>> >>>>>     binfmt_elf: Support segments with 0 filesz and misaligned starts >>>>> >>>>> I can confirm that with this commit the issue reproduces, and with it >>>>> reverted it doesn't. >>>>> >>>>> I have prepared a minimal reproduction of the problem available here, >>>>> along with all of the scripts I used for bisecting: >>>>> >>>>> https://github.com/koute/linux-elf-loading-bug >>>>> >>>>> You can either compile it from source (requires Rust and LLD), or there's >>>>> a prebuilt binary in 'bin/bug-reproduction` which you can run. (It's tiny, >>>>> so you can easily check with 'objdump -d' that it isn't malicious). >>>>> >>>>> On old kernels this will run fine, and on new kernels it will >>>>> segfault. >>>> >>>> Frankly your ELF binary is buggy, and probably the best fix would be to >>>> fix the linker script that is used to generate your binary. >>>> >>>> The problem is the SYSV ABI defines everything in terms of pages and so >>>> placing two ELF segments on the same page results in undefined behavior. >>>> >>>> The code was fixed to honor your .bss segment and now your .data segment >>>> is being stomped, because you defined them to overlap. >>>> >>>> Ideally your linker script would place both your .data and .bss in >>>> the same segment. That would both fix the issue and give you a more >>>> compact elf binary, while not changing the generated code at all. >>>> >>>> >>>> That said regressions suck and it would be good if we could update the >>>> code to do something reasonable in this case. >>>> >>>> We can perhaps we can update the .bss segment to just memset an existing >>>> page if one has already been mapped. Which would cleanly handle a case >>>> like yours. I need to think about that for a moment to see what the >>>> code would look like to do that. >>> >>> It's the "if one has already been mapped" part which might >>> become expensive... >> >> I am wondering if perhaps we can add MAP_FIXED_NOREPLACE and take >> some appropriate action if there is already a mapping there. > > Yeah, in the general case we had to back out MAP_FIXED_NOREPLACE usage > for individual LOADs because there were so many cases of overlapping > LOADs. :( Currently it's only used during the initial mapping (when > "total_size" is set), to avoid colliding with the stack. > > But, as you suggest, if we only use it for filesz==0, it could work. > >> Such as printing a warning and skipping the action entirely for >> a pure bss segment. That would essentially replicate the previous >> behavior. > > Instead of failing, perhaps we just fallback to not using > MAP_FIXED_NOREPLACE and do the memset? (And maybe pr_warn_once?) > >> At a minimum adding MAP_FIXED_NOREPLACE should allow us to >> deterministically detect and warn about problems, making it easier >> for people to understand why their binary won't run. > > Yeah, it seems like it's the vm_brk_flags() that is clobber the mapping, > so we have to skip that for the MAP_FIXED_NOREPLACE fails on a filesz==0 > case? >