From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D121CC05027 for ; Mon, 6 Feb 2023 07:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D9186B0072; Mon, 6 Feb 2023 02:12:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 088636B0073; Mon, 6 Feb 2023 02:12:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9706B0074; Mon, 6 Feb 2023 02:12:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DBA7A6B0072 for ; Mon, 6 Feb 2023 02:12:49 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9E3DF80AF4 for ; Mon, 6 Feb 2023 07:12:49 +0000 (UTC) X-FDA: 80435999658.29.CBDA703 Received: from smtp-out0.aaront.org (smtp-out0.aaront.org [52.10.12.108]) by imf27.hostedemail.com (Postfix) with ESMTP id CB16240009 for ; Mon, 6 Feb 2023 07:12:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=aaront.org header.s=elwxqanhxhag6erl header.b="XbO3MM0/"; dmarc=pass (policy=quarantine) header.from=aaront.org; spf=pass (imf27.hostedemail.com: domain of dev+gse@aaront.org designates 52.10.12.108 as permitted sender) smtp.mailfrom=dev+gse@aaront.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675667568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=xWp3+4U167iIUi9rXm3SNZHaMZRxjxlZetBDYYNY8mE=; b=PKZ3obUAjzTWj5ntijdK7Kb/gsdV+m43Y/Bk6Cmgtxd2ZvmzbjM8FcNkcwBwmn4Idf3Qy8 wi+zb7AoBaL0AqHPpe1UL+h2IgKxDP9Dj7R8g6dKzd80S0sS5iJX86lWKNM5dc8TjQz97Z XnKwGKDuTv9Ed1aFly1FPrkAs9iEE+0= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=aaront.org header.s=elwxqanhxhag6erl header.b="XbO3MM0/"; dmarc=pass (policy=quarantine) header.from=aaront.org; spf=pass (imf27.hostedemail.com: domain of dev+gse@aaront.org designates 52.10.12.108 as permitted sender) smtp.mailfrom=dev+gse@aaront.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675667568; a=rsa-sha256; cv=none; b=eqCEEpxpyQDLEMWB2e/Dm6eYwwSWmiz4EWI+3sJl2lg3VlULjfbjP79hEV5qlDQBhKY1PW tOS28FaKOEgNwglwUUlp1+Om27IKMS8m2rpFvscsvyk906I9i3OfhEEQ16QAS+fORYZ55l Ep1DhbUDZPG87B3qI/hMMdJZBR0hPRc= Received: by smtp-out0.aaront.org (Postfix) with ESMTP id 4P9HXF6qyvzMp; Mon, 6 Feb 2023 07:12:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=aaront.org; h= from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; s=elwxqanhxhag6erl; bh=7eklGvzQeRbVq GTFubE4B/V1RUz7+h7698FCisfN/as=; b=XbO3MM0/+ALEFVxpBhbm8oJOuFFiF 5L4LLHHpEjqrzFnzFpfDW+Ht2cmV63Ya44IF+Rdz0Q+h/LeAAqdlR2h59PgHJu2v KCsOgSBnjwcx7uyE+2ak0tN5BUZQTlNZoNx9JulvYFRCx0yR0ERafFsaIXI5ZXV8 mBZMuwunOaZ0AEAgRGkum80d0qauDlOYx5RS19tgMjjs4FZWH272b/GIxxCMcyG9 46GaAH/cgqrKRafZnohS2koMgQ95xQjZ+qc1ek9EpzjNEIh435pLyoD6Y2yRSBR3 OXG8Du42i4yWbfe7fhxRIKMeBLq37EeOqRUWeg7RVyjt3+ncr0DQOg1oA== From: Aaron Thompson To: Mike Rapoport Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aaron Thompson Subject: [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Date: Mon, 6 Feb 2023 07:12:09 +0000 Message-Id: <20230206071211.3157-1-dev@aaront.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CB16240009 X-Stat-Signature: grgbrhq7kcyieana85yuomfozwxx7zyi X-HE-Tag: 1675667567-12125 X-HE-Meta: U2FsdGVkX183gI+taYjlL10UKbmkNsVxhM8zG+VgMXWED2EPg/uZo2LSNoAunSLYK79OrDSMfCNSECFD9Hea2JZQtkQkmT0mmodR9JQj0Fe52qIzcenrR/lhNy1JJnF9Dpr0l1b0dJI9rpoTmlgw6gUHv43iRW/SlGLlefQA5i8qu80r6SqwdfTcNS6kR4Co0J76bMpxpTd3pV0+My4XUk5bjrxV7ULhxlL6vcBXCqYXaViOfH33xWKq5NndDiL3qwkNZ42lBPfg4t4TGxziostxU2Ndv0kIEAbw86IbSloL+TTpy1DijOC5UPlbkEcrfkqaMoJt7iBfv6M/wra2rqq/F9f4lpvVPRI/cOGEOlpXbBKofohwN1V+vJBTlYfXD4QzsRCJsJeTxAyJu6Q7OGbDeoAdseWzq4/jQfSZAu35au+FC5EMI4Abd9zIgIY52WXrgA9CzS4ou7QJQl/FlczuChLylAIX2YALmH+C6POttdtDennqJ4/GviVK3qSUr/yY8sIkf1nUN40EfEtMDnqCMVy4OLbr9phIpdxiTBIlK0Js4RsMTbc+t1F1diWZYEyOrf0vMhT5PUoCHf6VBwl0ReMWNuWSwlsA4zHFGWoeRRPTXpptzG7DHq1I2upWD9OJeMDcnq/jZoNThomvYALShV9z2v7F9qeYKm4VSoGZw0e0V6bslprN1H3n4cZIcsmUgYIxBCV8mMgiI7Q0/vfHDLM83PFHsNqcFp+5P/s7PEAbPrk23jBEUEgX3LyMydkthiPGdbIkd2KKWOCGBRSMwIGJZMnpOi/uk3aQ0wNY+0I7/2RCMeF3UIMwL0ENo+d6Xqe8ShxdZyhdn/RE4WKBj7/QtegpnXYL/g7k6AECc+xkuCqEuqiBkIxv/+icG0l/yfPQI1RofvduRn9KBszQQbd6nQQ+4mzpfvV59Az6HgWvYNn3vJ0mmlpBLpmYnzSgdhkI0GnvrvcEN1t Cte0yzIy t1Kt83lOabblnMS7Lhls34fFqYKnDJVh3vCF77UdEAaX9H+h4xfK7FyiRdssTQnlMW/PqG9zHxCIHacab+W6mYHGXMnOPHqUS02TsPglAp0AByjkpddcDXgMaRV60ak/io0alAt/PgCQmTyn61iJm4/9wKL5evFGYLWxtIZcUWo1Usp70rXY7J13c3WXtpiXIkUATJkRZ9Saie9FnUuBAtOjrSdX+JJPe5rtcxpIjvUftIeqpmVAKI9/MA7bczQRK/Z3Be3YVMkeOcQ78aNQUvnQQIDxNqG7SFM2ZK4lG+h2TXyGvO5ENgYliJU1wcDTnSoXpTwJn5c2SWnBOTJ/tHofAjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mike, Unfortunately my attempted bugfix 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late()") is itself buggy. It's true that all reserved pages are initialized by the time memblock_free_late() is called, but if the pages being freed are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies, in which case badness ensues :( deferred_init_maxorder() handles this by initializing a max-order-sized block of pages before freeing any of them. We could change memblock_free_late() to do that, but it seems to me that having memblock_free_late() get involved in initializing and freeing free pages would be a bit messy. I think it will be simpler to free the reserved pages later, as part of deferred init or after. I can see a few ways to accomplish that: 1. Have memblock_free_late() remove the range from memblock.reserved. Deferred init will then handle that range as just another free range, so the pages will be initialized and freed by deferred_init_maxorder(). This is the simplest fix, but the problem is that the pages will be initialized twice, first by memmap_init_reserved_pages() and again by deferred_init_maxorder(). It looks risky to me to blindly zero out an already-initialized page struct, but if we could say for certain that the page structs for memblock-reserved ranges aren't actually used, at least until after deferred init is done, then this could work. I don't know the usage of page structs well enough to say. 2. Take 1 and fix the double-init problem. In addition to removing the range from memblock.reserved, also set a flag on the range in memblock.memory that indicates the pages for that range have already been initialized. deferred_init_maxorder() would skip initializing pages for ranges with the flag set, but it would still free them. This seems like a bit of a conceptual stretch of the memblock region flags since this is not a property of the memory itself but rather of the page structs corresponding to that memory. But it gets the job done. 3. Defer the freeing of reserved pages until after deferred init is completely done. Have memblock_free_late() set a flag on the range in memblock.reserved, and have memblock_discard() call __free_pages_core() on those ranges. I think this would be a reasonable use of flags on reserved regions. They are not currently used. The included patch implements option 1 because it is the simplest, but it should not be used if the double-init of the page structs is unsafe. In my testing I verified that the count, mapcount, and lru list head of all pages are at their defaults when memblock_free_late() is called by efi_free_boot_services(), but that's obviously not conclusive. I have draft versions of 2 and 3 that I can finish up quickly if either of those are preferable. Please let me know what you think, and sorry for introducing this bug. Thanks, Aaron Aaron Thompson (1): mm: Defer freeing reserved pages in memblock_free_late() mm/internal.h | 2 ++ mm/memblock.c | 36 ++++++++++++++++++++----------- mm/page_alloc.c | 17 +++++++++++++++ tools/testing/memblock/internal.h | 7 +++--- 4 files changed, 47 insertions(+), 15 deletions(-) -- 2.30.2