From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCD84C46CD2 for ; Mon, 22 Jan 2024 21:01:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64C6A6B008A; Mon, 22 Jan 2024 16:01:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FBF46B008C; Mon, 22 Jan 2024 16:01:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C3BE6B0092; Mon, 22 Jan 2024 16:01:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3D53F6B008A for ; Mon, 22 Jan 2024 16:01:46 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id EF681A1F26 for ; Mon, 22 Jan 2024 21:01:45 +0000 (UTC) X-FDA: 81708168570.23.6479AA9 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by imf09.hostedemail.com (Postfix) with ESMTP id 4E853140031 for ; Mon, 22 Jan 2024 21:01:43 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ebiederm@xmission.com designates 166.70.13.231 as permitted sender) smtp.mailfrom=ebiederm@xmission.com; dmarc=pass (policy=none) header.from=xmission.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705957303; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EJ4gAtM3LVB5jEelG5+vb5qJ+BJmzmjWutwWGxcO42s=; b=O4fha008xnET42l76e0mmcmCTYpLpnelss8NML4J2113Thz2ugh2+wvotEQt2bdtzSb1R1 8T4MbCCb2TZI0C5gZS+p/gznUEZLWVpliRrf0OxRwDM23dTdqnhQ2lJG4EKv22QK7NTX9z R783nIztfwxPkk0TPjWLxLViOjATGIY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705957303; a=rsa-sha256; cv=none; b=RMNS9OXCd5kqivb8ywKeUAV68pfuPXPSMtTXcIfnsUx3JXi1WDOekSWUtYnoyOuOlo3CE2 FzkKqXKwyObVDlNE2SAVAFcCcU6LVKi0zm3pTs7X26t84NKwOsl00ETwsrbURwp/50Kh/W Y008uX2syW/1i51uhVlQ6I2uzLRwiOk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ebiederm@xmission.com designates 166.70.13.231 as permitted sender) smtp.mailfrom=ebiederm@xmission.com; dmarc=pass (policy=none) header.from=xmission.com Received: from in01.mta.xmission.com ([166.70.13.51]:59060) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1rS1QK-000S5r-Uu; Mon, 22 Jan 2024 14:01:41 -0700 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:56840 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1rS1QJ-009TEc-RK; Mon, 22 Jan 2024 14:01:40 -0700 From: "Eric W. Biederman" To: Kees Cook Cc: Jan Bujak , linux-mm@kvack.org, linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, linux-fsdevel@vger.kernel.org References: <874jf5co8g.fsf@email.froward.int.ebiederm.org> <202401221226.DAFA58B78@keescook> Date: Mon, 22 Jan 2024 15:01:06 -0600 In-Reply-To: <202401221226.DAFA58B78@keescook> (Kees Cook's message of "Mon, 22 Jan 2024 12:48:06 -0800") Message-ID: <87v87laxrh.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-XM-SPF: eid=1rS1QJ-009TEc-RK;;;mid=<87v87laxrh.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/boYqZr0d2qjokihrLz2wqN5hYf1CEY2Q= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: Recent-ish changes in binfmt_elf made my program segfault X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) X-Rspamd-Queue-Id: 4E853140031 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 67aufrjbidchbc6twdkm8phnj7ayboyd X-HE-Tag: 1705957303-562353 X-HE-Meta: U2FsdGVkX192MMMNbha6Ie9yfQCaYtIZ9J8wEFK+LdRRuQx8dEq7cdo5T//c2Wn+GlH3eQ0GIVVhDT5rjB1Y1Ecq65bBMejXIHLpt9V7FEPNLcw5r/vPlP0mSBoI05sqzpNaF4SJtUkXlWiaPl2SFom+nVBwT59uCmkfpusaDEO24IFnF44UzWyrAOGVEQQ8lcJIAUTXBd1J2XCamt9H1Emixfgxdn2/48zcaZAojRdxvhKEZU9u1AMlniGixy87mikP5qBbMjPxugAcQG1pYlz5C7+FeurT+BUAvX50IitlbPcYafXDsEQ0alFtVRiWBMciwTYOwPAL8yfydH6MlbwCOR6T13i8JhCZnXw7iRmMqTHSrKOn5re0k4kDB91lJez9Z18Xdh7kPKBl80s/NZZxH3OAWK+oUp1NpiMu53CrBB4o4KQohRW2mR6UE5jMmpD0hdvIaVLSUPPMiEQ7e1cMb/cr8b5RDP1HXTUefPHNLyCkSl9oAm3Xx/m9W8gffUPuwFSiScsh03mQkaZkalV6aYBmE6BgRQyMA0BGZL7P+XnVa01ptyt70iZo89qCwC/h9Z4GsT6TfxiW+sNZV84vTHtZjLIOVED2lHYWe66Fok41NepZwGnP61HG/gbKR7anbDuQui+2mYIY2316ALE3MtV5pRiWfYfj9PrqH5ph7r0rVjDBEgmi87FkS0WVYLnQRMkBVyHq/9VP+FLzlW1MUntFla0x2XT5DgkjfcVirRErFV5qkn2S6cWiz5h6cq5gOPHib3ASUWHy+cLvGsbmuWtWVHvyiuOCX2rAC48LXDegKO404KBDVF5quL8chKex6Yw+4L4YFYgXmsdKKe0b0Njp5i21CyAV5q2yI4wnxw6mSamMJAjQF+mtdXXEWwgD38xNkfCavTSFC7UciKMJ8NrP+zW9aNkUMj3r0rGame9IytxRWUnlDwHDRuDUrpR2J6UK6m5CvdodvX4 hEjxMg1q OyzobNwK+lPgCbe+2a5jXKuplzFIgYlCQPteuSHTcpV+C5jRN6sbcb3Q9EDVTupqmzoTsn15BR7gpSXQjtUXlE0WOtqgQKGY6tEz+xfoUOmK1LQo6wmR3ruvchWypP1Q4lClvMY4YL1PCR2COmuXu+XeL+ycx62al/vGFhV79e3uHmDleg/b2d2+cnujIlbXv6b+5K6x2f2wrTtzO6/HnA4ozeonnj4UfepyiR1YKfqL2CY08g4pND+Ws5aSVh9Hfc0bN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kees Cook writes: > On Mon, Jan 22, 2024 at 10:43:59AM -0600, Eric W. Biederman wrote: >> Jan Bujak writes: >>=20 >> > Hi. >> > >> > I recently updated my kernel and one of my programs started segfaultin= g. >> > >> > The issue seems to be related to how the kernel interprets PT_LOAD hea= ders; >> > consider the following program headers (from 'readelf' of my reproduct= ion): >> > >> > Program Headers: >> > =C2=A0 Type=C2=A0 Offset=C2=A0=C2=A0 VirtAddr=C2=A0 PhysAddr=C2=A0 Fil= eSiz=C2=A0 MemSiz=C2=A0=C2=A0 Flg Align >> > =C2=A0 LOAD=C2=A0 0x001000 0x10000=C2=A0=C2=A0 0x10000=C2=A0=C2=A0 0x0= 00010 0x000010 R=C2=A0=C2=A0 0x1000 >> > =C2=A0 LOAD=C2=A0 0x002000 0x11000=C2=A0=C2=A0 0x11000=C2=A0=C2=A0 0x0= 00010 0x000010 RW=C2=A0 0x1000 >> > =C2=A0 LOAD=C2=A0 0x002010 0x11010=C2=A0=C2=A0 0x11010=C2=A0=C2=A0 0x0= 00000 0x000004 RW=C2=A0 0x1000 >> > =C2=A0 LOAD=C2=A0 0x003000 0x12000=C2=A0=C2=A0 0x12000=C2=A0=C2=A0 0x0= 000d2 0x0000d2 R E 0x1000 >> > =C2=A0 LOAD=C2=A0 0x004000 0x20000=C2=A0=C2=A0 0x20000=C2=A0=C2=A0 0x0= 00004 0x000004 RW=C2=A0 0x1000 >> > >> > Old kernels load this ELF file in the following way ('/proc/self/maps'= ): >> > >> > 00010000-00011000 r--p 00001000 00:02 131=C2=A0 ./bug-reproduction >> > 00011000-00012000 rw-p 00002000 00:02 131=C2=A0 ./bug-reproduction >> > 00012000-00013000 r-xp 00003000 00:02 131=C2=A0 ./bug-reproduction >> > 00020000-00021000 rw-p 00004000 00:02 131=C2=A0 ./bug-reproduction >> > >> > And new kernels do it like this: >> > >> > 00010000-00011000 r--p 00001000 00:02 131=C2=A0 ./bug-reproduction >> > 00011000-00012000 rw-p 00000000 00:00 0 >> > 00012000-00013000 r-xp 00003000 00:02 131=C2=A0 ./bug-reproduction >> > 00020000-00021000 rw-p 00004000 00:02 131=C2=A0 ./bug-reproduction >> > >> > That map between 0x11000 and 0x12000 is the program's '.data' and '.bs= s' >> > sections to which it tries to write to, and since the kernel doesn't m= ap >> > them anymore it crashes. >> > >> > I bisected the issue to the following commit: >> > >> > commit 585a018627b4d7ed37387211f667916840b5c5ea >> > Author: Eric W. Biederman >> > Date:=C2=A0=C2=A0 Thu Sep 28 20:24:29 2023 -0700 >> > >> > =C2=A0=C2=A0=C2=A0 binfmt_elf: Support segments with 0 filesz and misa= ligned starts >> > >> > I can confirm that with this commit the issue reproduces, and with it >> > reverted it doesn't. >> > >> > I have prepared a minimal reproduction of the problem available here, >> > along with all of the scripts I used for bisecting: >> > >> > https://github.com/koute/linux-elf-loading-bug >> > >> > You can either compile it from source (requires Rust and LLD), or ther= e's >> > a prebuilt binary in 'bin/bug-reproduction` which you can run. (It's t= iny, >> > so you can easily check with 'objdump -d' that it isn't malicious). >> > >> > On old kernels this will run fine, and on new kernels it will >> > segfault. >>=20 >> Frankly your ELF binary is buggy, and probably the best fix would be to >> fix the linker script that is used to generate your binary. >>=20 >> The problem is the SYSV ABI defines everything in terms of pages and so >> placing two ELF segments on the same page results in undefined behavior. >>=20 >> The code was fixed to honor your .bss segment and now your .data segment >> is being stomped, because you defined them to overlap. >>=20 >> Ideally your linker script would place both your .data and .bss in >> the same segment. That would both fix the issue and give you a more >> compact elf binary, while not changing the generated code at all. >>=20 >>=20 >> That said regressions suck and it would be good if we could update the >> code to do something reasonable in this case. >>=20 >> We can perhaps we can update the .bss segment to just memset an existing >> page if one has already been mapped. Which would cleanly handle a case >> like yours. I need to think about that for a moment to see what the >> code would look like to do that. > > It's the "if one has already been mapped" part which might > become expensive... I am wondering if perhaps we can add MAP_FIXED_NOREPLACE and take some appropriate action if there is already a mapping there. Such as printing a warning and skipping the action entirely for a pure bss segment. That would essentially replicate the previous behavior. At a minimum adding MAP_FIXED_NOREPLACE should allow us to deterministically detect and warn about problems, making it easier for people to understand why their binary won't run. Eric