From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A6CBC76188 for ; Wed, 5 Apr 2023 12:45:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 233076B0072; Wed, 5 Apr 2023 08:45:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E32E6B0074; Wed, 5 Apr 2023 08:45:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 083EE6B0078; Wed, 5 Apr 2023 08:45:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E925E6B0072 for ; Wed, 5 Apr 2023 08:45:25 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BA1EF140F0B for ; Wed, 5 Apr 2023 12:45:25 +0000 (UTC) X-FDA: 80647308210.04.D8ECE77 Received: from outbound-smtp06.blacknight.com (outbound-smtp06.blacknight.com [81.17.249.39]) by imf05.hostedemail.com (Postfix) with ESMTP id D268B100016 for ; Wed, 5 Apr 2023 12:45:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.39 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680698723; a=rsa-sha256; cv=none; b=refkraIdnwf2Tbj4IkwJQS+PYrwoQOppI3M1Md2wTYX8CUjGaDJMKjko3/d0qGVfR+mvHc wdgjk0RTr5AxnZMPCu93Qrs+b6Y8UF2ALabEbteCXw2J6G/WBAFvz/+M0q4UcX4c68sc3g EtzNXD8DV2xbn1Lvf27DRC0fKk3ofgk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.39 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680698723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NC/ZHy9MrZQ1YBUOzRjJ71kcu2R4v4V5EYy40U4A64=; b=WcGv0+YrmLFQVirl6ZbJwwzXYSykwJ9orOxwCx5Fj81mmaP/RqRcJimjEk3wGz4+THtx76 SS8au0Qvd7AakIDAIhR5o5Psj9dwMsSQ+kNLwfFwHN4Nf/+sfZaqPCMCYEJYCvCZrHZN5F rQOecGxSJduK/Vhl8lmEqlJwCokj5Ho= Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp06.blacknight.com (Postfix) with ESMTPS id ECA02C2B50 for ; Wed, 5 Apr 2023 13:45:20 +0100 (IST) Received: (qmail 19508 invoked from network); 5 Apr 2023 12:45:20 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.103]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 5 Apr 2023 12:45:20 -0000 Date: Wed, 5 Apr 2023 13:45:19 +0100 From: Mel Gorman To: Vlastimil Babka Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Kees Cook , linux-hardening@vger.kernel.org, Alexander Halbuer Subject: Re: [PATCH] mm, page_alloc: reduce page alloc/free sanity checks Message-ID: <20230405124519.ir7y54aunmyg3tcn@techsingularity.net> References: <20230216095131.17336-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230216095131.17336-1-vbabka@suse.cz> X-Rspam-User: X-Rspamd-Queue-Id: D268B100016 X-Rspamd-Server: rspam01 X-Stat-Signature: szpgkzpxdozzcreqb5cgfayir3hn1pb6 X-HE-Tag: 1680698722-698610 X-HE-Meta: U2FsdGVkX19eoaUOdE84EY6o5NNcGxX6yEGAo7azhhI7+dp/PvjZWwks2phYx5ILDOffXn+JgpUoBm1SHvRJiJx50m8wYMLMCpLAZiFTOXSvsoX7s3AmCJ5wY9zARFSkpP9bf0viXGeWgChSkW5edDAF/+owjPXgtrSz09OfMgXBauMvxr2GWVrfmr1VMj35mmzGek5kIBoBuPSPSyGMyrP4znYkFG/J8Xtn8WVNS4C9gsDqJVgtkaox5M7u9O75bJotm/5LbjOsBGA1MixU9Jq3tk7t5dl0/45xBJnnYdJyNsay8pX/05SL0CB+1rbT8MmRBgIcpvNhX4BnEEfsgKn9SDl3/z16dJzuaCmZALjq0RA6gRYKIuYt5nnIdd+GN9JsgrJSH5OA9LHqdFT3JKhMKH44MikImY43f2skOQUzM/f4WQVDxk2Xrkf5tpNw97aEWWt8IBFP7EO3aXb2G4TAchbpVqxr0a6fwH4zZyN+LUTXuWJTElH1zqOi+YgmYQBs7aIQbgvZbvUkaVe4ZiY3XFA5+QODtXcECCPdLs17G0LiaTdOEjvXZqxEMFTMvV1S1iSEPz4+1mxBsY1WW5JdcHeNcIp3stW7ZZc1Qm6vot3SH2bFJdZ9aFWEbejSlDldzzDtBZLy5rHM2zSmKvXYtC0WICQiGBTYjd0fnwT1sfwr/GPMBuQyFMp/Of4zucakjxQfCzyP1FzFBNFnXuneaU0TLLlC/qzi3PQxoH9qLqIekNiS1qN4JhQjT+ZqnDtwfeZpMhC3IZG6pMaNp1VqOYbxb6WDrMFffJkOv5EYVZ0Oo7UdRVtlITSHb97xIwTDz3XZBeVT2gNsnLQU8liPe8QrioFTZyRhgVxiYc3Mz7PDotFxLFDl3lQ6Xci55QgPeX1bPWNY9yeFQd/RCoeHDbei1cxV+jYkyQJJek8HFIHqNM9PSb6j06DUGu9jC8V5GHRPnSHNBcqAQ2w d/xLKrau pamKCgFJIxaW6RedbtSE7JySbu+H00zlUpEnyhE5soubBAvqDlEeBZ4L0RFVfzZwPKKHNxmzHl5OxdOYF9bdAeCbv8zlDjpIfeL1o0wjEmOCLtxIa0x/Vf/W4o3R/yiv3BcRfqXC7lkxSqpFkpb47ZwZZxO0drfGwuO3bURS7WrbAPDR/1iRHxB8TNwYO2oYTbPESKgo/kxM0qPT32QUJu7FJUeg+CMVY9KwVEHE1e4EcI4cGwxknC5Nx1sBpo7eC8skiGUv/KdoYQNNW0rghNHAoi7lNQa3W6yLrz/PHbQMcGIQAg5Ir143GpnX768gjQ/7DWmygVYLr8FT6dWKZTi3W0sQZM6hPpTcrItHSjCiof7en29MAm2B1d/xEiq4QHcsgwr216STIYAMHNJZ8xfjvKyqinA8/Mi2XogUhfy4hnk5/H+fO08+yug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 16, 2023 at 10:51:31AM +0100, Vlastimil Babka wrote: > Historically, we have performed sanity checks on all struct pages being > allocated or freed, making sure they have no unexpected page flags or > certain field values. This can detect insufficient cleanup and some > cases of use-after-free, although on its own it can't always identify > the culprit. The result is a warning and the "bad page" being leaked. > > The checks do need some cpu cycles, so in 4.7 with commits 479f854a207c > ("mm, page_alloc: defer debugging checks of pages allocated from the > PCP") and 4db7548ccbd9 ("mm, page_alloc: defer debugging checks of freed > pages until a PCP drain") they were no longer performed in the hot paths > when allocating and freeing from pcplists, but only when pcplists are > bypassed, refilled or drained. For debugging purposes, with > CONFIG_DEBUG_VM enabled the checks were instead still done in the > hot paths and not when refilling or draining pcplists. > > With 4462b32c9285 ("mm, page_alloc: more extensive free page checking > with debug_pagealloc"), enabling debug_pagealloc also moved the sanity > checks back to hot pahs. When both debug_pagealloc and CONFIG_DEBUG_VM > are enabled, the checks are done both in hotpaths and pcplist > refill/drain. > > Even though the non-debug default today might seem to be a sensible > tradeoff between overhead and ability to detect bad pages, on closer > look it's arguably not. As most allocations go through the pcplists, > catching any bad pages when refilling or draining pcplists has only a > small chance, insufficient for debugging or serious hardening purposes. > On the other hand the cost of the checks is concentrated in the already > expensive drain/refill batching operations, and those are done under the > often contended zone lock. That was recently identified as an issue for > page allocation and the zone lock contention reduced by moving the > checks outside of the locked section with a patch "mm: reduce lock > contention of pcp buffer refill", but the cost of the checks is still > visible compared to their removal [1]. In the pcplist draining path > free_pcppages_bulk() the checks are still done under zone->lock. > > Thus, remove the checks from pcplist refill and drain paths completely. > Introduce a static key check_pages_enabled to control checks during page > allocation a freeing (whether pcplist is used or bypassed). The static > key is enabled if either is true: > - kernel is built with CONFIG_DEBUG_VM=y (debugging) > - debug_pagealloc or page poisoning is boot-time enabled (debugging) > - init_on_alloc or init_on_free is boot-time enabled (hardening) > > The resulting user visible changes: > - no checks when draining/refilling pcplists - less overhead, with > likely no practical reduction of ability to catch bad pages > - no checks when bypassing pcplists in default config (no > debugging/hardening) - less overhead etc. as above > - on typical hardened kernels [2], checks are now performed on each page > allocation/free (previously only when bypassing/draining/refilling > pcplists) - the init_on_alloc/init_on_free enabled should be sufficient > indication for preferring more costly alloc/free operations for > hardening purposes and we shouldn't need to introduce another toggle > - code (various wrappers) removal and simplification > > [1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@sra.uni-hannover.de/ > [2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@mx.google.com/ > > Reported-by: Alexander Halbuer > Reported-by: Andrew Morton > Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Some minor comments below > @@ -1432,9 +1448,11 @@ static __always_inline bool free_pages_prepare(struct page *page, > for (i = 1; i < (1 << order); i++) { > if (compound) > bad += free_tail_pages_check(page, page + i); free_tail_pages_check is also a function that only does something useful when CONFIG_DEBUG_VM is set. While it might be outside the scope of the patch, it might also benefit from check_pages_enabled checks? > - if (unlikely(free_page_is_bad(page + i))) { > - bad++; > - continue; > + if (static_branch_unlikely(&check_pages_enabled)) { > + if (unlikely(free_page_is_bad(page + i))) { > + bad++; > + continue; > + } > } > (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > } The unlikely() within a static_branch_unlikely probably adds very little given the block is so tiny. > @@ -2392,56 +2369,20 @@ static inline int check_new_page(struct page *page) > return 1; > } > > -static bool check_new_pages(struct page *page, unsigned int order) > +static inline bool check_new_pages(struct page *page, unsigned int order) > { > - int i; > - for (i = 0; i < (1 << order); i++) { > - struct page *p = page + i; > + if (static_branch_unlikely(&check_pages_enabled)) { > + for (int i = 0; i < (1 << order); i++) { > + struct page *p = page + i; > > - if (unlikely(check_new_page(p))) > - return true; > + if (unlikely(check_new_page(p))) > + return true; > + } > } > unlikely() within static_branch_unlikely probably adds very little. Otherwise, looks good. I agree that with changes over time that the ability of the checks to detect anything is reduced and it's probably at the point where it can only detect a very specific bit corruption instead of broken code. Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") also likely reduced the ability of the checks to find anything. -- Mel Gorman SUSE Labs