From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C27BC636CD for ; Tue, 7 Feb 2023 08:02:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBACA6B0093; Tue, 7 Feb 2023 03:02:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6AA06B009B; Tue, 7 Feb 2023 03:02:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C59076B009D; Tue, 7 Feb 2023 03:02:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B70CB6B0093 for ; Tue, 7 Feb 2023 03:02:50 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 550831A0406 for ; Tue, 7 Feb 2023 08:02:50 +0000 (UTC) X-FDA: 80439754500.16.9720333 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf26.hostedemail.com (Postfix) with ESMTP id 7650814001B for ; Tue, 7 Feb 2023 08:02:48 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="V90u/9Vw"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675756968; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6UfHl/TwuLu5YPJNdqhlwOaj2iFEZKruxuDxUwBli0U=; b=k5HW9cAsQJksrM3BhZHOIrI+J01iIbxZdQZ2uBTnEn2a2apLpJ3U02a6uwXf25RKXw7//x 8LozHAPTnPMr5EW7JjS+XwQxvOxF1fh/xv9wkOZtanGHFfZGD3+H2zoEjjpyfv77dbbrbt fJPYfwoYZnjVnuFHUWqF+BAMdC9NVvc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="V90u/9Vw"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675756968; a=rsa-sha256; cv=none; b=YQYNY2Abb78rmf8moauPl5Ku4Yz3xH/bTa4ELVDHcAw935SHPCrwAMIqsUcY1BOella0Dj F7NR3o98ARP5zZgIArWwoMWiSSzAdyzuSdFR+L8L8L5wFhbHntW7a02tSIsO5gTgLwcgpZ D6eYDSbfUXXBl7e7n9flMQirrrokNTk= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D3D82B815FB; Tue, 7 Feb 2023 08:02:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E61B2C433D2; Tue, 7 Feb 2023 08:02:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675756965; bh=m1mY6LHjMtd4mCY+qvpdh/PwbL2CZe7f6wuaufWjflY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=V90u/9Vw4oNZBCLorHPSMddd8M1Af8iHu9LoqoQo96JJ2vlviIJhKOSymCMVOuOBO EeIQ03d2BeU6yZcJh9wyrveOBk6RbgEB3SCIxAEfgdGnCMuquWJBTM0371e77YgpoK KVOsbVrMgeVtcLietDw7bLqQDuqjDSAEwPibCeVAXbIppOpcNqgNifksdnKzQGaTbc p5LfITvA33HWdMh/BMqQpgIgnP+LPzyhKnQt142JzQtx11FigkfByV2xHfc5NPuPKk aOyIF/u8iiPB13lQUlHvt2tcTXBQJ0IXJBhe6JwvdXGN6t9ppIX+sDZ+IqUeSUWqYL f234q7cSCdeMQ== Date: Tue, 7 Feb 2023 10:02:32 +0200 From: Mike Rapoport To: Aaron Thompson Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Message-ID: References: <20230206071211.3157-1-dev@aaront.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230206071211.3157-1-dev@aaront.org> X-Rspamd-Queue-Id: 7650814001B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: qg8awxyp85g9sjbpqbaku5mt79sh94cj X-HE-Tag: 1675756968-791214 X-HE-Meta: U2FsdGVkX18TysmEAS9Mx0eVX2F5P0vpWMNW0pduDywNXeTzjq20qVXUlZdNb5DKBEzgLfpWwSisPmnyJMrybnd2t71dDzR+RK97NGSSXJWk7zuvE/nAwsP873fx4wugH4/oXR9z+DQOTUoDT6yHyfRiM8ONxCOjIwyG7q/VAiM5EeZDREVWvabjuyasoUj3iZLuKeFxgoVvYGgU54yHRv6hKXdo4VxlTktdgQ8JjUyeWNeKOizS1NgdhY8DeYsOgEmlIwNTA7GWy7XP8HEd1WmD+hO8AFyRYmssjF8awYmDdnQKeMC2AqndYyRlGHm0tmGqPi8+CuMeemSN5KrNoerZ/NK8zfWFx6VUTNAEZhe0jgbsnFvVShKCDR+LfkFNF2YwhzoxxwGXsFc7enUF0NUZav8hdWfgQSMMd9gW6/17Qr0zDvWHpjCwm8IjAVJoFo3ck8zav28O0jT8ehCuooo2AWz7P0TD6qJ1Ex5UijniNGW56A9pbjTYE9BQ8yK9Yn1fFltiRm4JDsA8IZUaOZSAELRtmq/LqOk7KdgigZHIDuatMH1+8NezXTpSBxWA+PSQp+XgZyR7XB1ukFJySqdMko8bWoHCgrSMo9cOUtKjJYdlxcdPn/zb29cclxWXdjBh+nCBpb2K1tghrJqT3E5mvaERtskhvQz4mSVoxiHGeY22WYKvY2Ly2FfpA6KZDFCn4Iw/T/AcQq7K3INcLelZJKT0k48N2fkGonQYoYg/kDMeCIjphRiVkdofzZl95vHwbg+tfEJXuz532p5nPIGtsH9XEWyVqS+jv8GoanUYsMxXO/Gz/XqHGrimAJNZSdZdXQU1xBINuBObYnYt1RRa9Jcz021AS+al36CxreeHNb0oYeJKk98e1OHqIaM7WW4Tv9k+issPTl1j8hZ49YiKum/vgONdfugHzUNnPLfhMSFhDUusjHDfTRRKNHENNS42GdJ/xQMphCG01f+ ELABQ1J/ QBkP3j9kCV7lTbbq71ThZ0TiGUqcPFdckwRJNfu3C2zHNfB2pxoPSSx6MZLtJyGeJl2QLLQIOhFiSvpMXHrD8WyRi0NRshNsb4UOWeP7qsGzZDIERgN3tRgorPlCmtgzrhUPlCo/9XHHRw2PwRt+AyDXoYFRbpW3TAtnwewdgAmTSUMse3o3FEl1djAe5M1rUzBU5afzkUIwSsIglD+azNuqeUTpX/R5bIlW/4H2LE7w/z0jXh21QI3v30apJvIhpWzaihTsxh6xNo/E4lplNQIpS/nECMke7yKJvR6QDObKAlxC3gHfVKD6T4QL1F/0cwWYK6nhpXoivcBQUDlWNgjLwEs+SmalMQdEI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Aaron, On Mon, Feb 06, 2023 at 07:12:09AM +0000, Aaron Thompson wrote: > Hi Mike, > > Unfortunately my attempted bugfix 115d9d77bb0f ("mm: Always release pages to the > buddy allocator in memblock_free_late()") is itself buggy. It's true that all > reserved pages are initialized by the time memblock_free_late() is called, but > if the pages being freed are in the deferred init range, __free_one_page() might > access nearby uninitialized pages when trying to coalesce buddies, in which case > badness ensues :( > > deferred_init_maxorder() handles this by initializing a max-order-sized block of > pages before freeing any of them. We could change memblock_free_late() to do > that, but it seems to me that having memblock_free_late() get involved in > initializing and freeing free pages would be a bit messy. I think it will be > simpler to free the reserved pages later, as part of deferred init or after. > > I can see a few ways to accomplish that: > > 1. Have memblock_free_late() remove the range from memblock.reserved. Deferred > init will then handle that range as just another free range, so the pages > will be initialized and freed by deferred_init_maxorder(). > > This is the simplest fix, but the problem is that the pages will be > initialized twice, first by memmap_init_reserved_pages() and again by > deferred_init_maxorder(). It looks risky to me to blindly zero out an > already-initialized page struct, but if we could say for certain that the > page structs for memblock-reserved ranges aren't actually used, at least > until after deferred init is done, then this could work. I don't know the > usage of page structs well enough to say. > > 2. Take 1 and fix the double-init problem. In addition to removing the range > from memblock.reserved, also set a flag on the range in memblock.memory that > indicates the pages for that range have already been initialized. > deferred_init_maxorder() would skip initializing pages for ranges with the > flag set, but it would still free them. > > This seems like a bit of a conceptual stretch of the memblock region flags > since this is not a property of the memory itself but rather of the page > structs corresponding to that memory. But it gets the job done. > > 3. Defer the freeing of reserved pages until after deferred init is completely > done. Have memblock_free_late() set a flag on the range in memblock.reserved, > and have memblock_discard() call __free_pages_core() on those ranges. > > I think this would be a reasonable use of flags on reserved regions. They are > not currently used. I think 3 is the most straight-forward as a concept. It'll need some care for ARCH_KEEP_MEMBLOCK architectures, e.g. arm64, though I also can think about 4. Extend initialization of the memory map around the reserved regions in memmap_init_reserved_pages()/reserve_bootmem_region(). If these functions initialize the memory map of the entire pageblock surrounding the reserved range, __free_one_page() will certainly access initialized struct pages. > The included patch implements option 1 because it is the simplest, but it should > not be used if the double-init of the page structs is unsafe. In my testing I > verified that the count, mapcount, and lru list head of all pages are at their > defaults when memblock_free_late() is called by efi_free_boot_services(), but > that's obviously not conclusive. I have draft versions of 2 and 3 that I can > finish up quickly if either of those are preferable. At this point of the release cycle I prefer to revert 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late()") and to work on the proper fix for the next release. > Please let me know what you think, and sorry for introducing this bug. > > Thanks, > Aaron > > Aaron Thompson (1): > mm: Defer freeing reserved pages in memblock_free_late() > > mm/internal.h | 2 ++ > mm/memblock.c | 36 ++++++++++++++++++++----------- > mm/page_alloc.c | 17 +++++++++++++++ > tools/testing/memblock/internal.h | 7 +++--- > 4 files changed, 47 insertions(+), 15 deletions(-) > > -- > 2.30.2 > > -- Sincerely yours, Mike.