From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01D22C7619A for ; Wed, 5 Apr 2023 14:19:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F5016B0072; Wed, 5 Apr 2023 10:19:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47D296B0074; Wed, 5 Apr 2023 10:19:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D09D6B0075; Wed, 5 Apr 2023 10:19:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 19A6D6B0072 for ; Wed, 5 Apr 2023 10:19:04 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AD7FEA1164 for ; Wed, 5 Apr 2023 14:19:03 +0000 (UTC) X-FDA: 80647544166.02.0342D08 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 9357E14000A for ; Wed, 5 Apr 2023 14:19:00 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=eG20DX7c; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="80oB9n/F"; dmarc=none; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680704340; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TwzVFcW7sE+w3bNrVql2K+XcKwi6CmdDhydZgcJtnrg=; b=26zRcPnpqQEwUb6OnmqqJzjhEyNv1nxuI5xV09mO+K1cgMm1O8avRKglHxxih4VrGB853J /SQWdvgvuwIAvvm+MHeZ5xolbZSVM5SK5/BT86T6emCrjiMzQTY50xtpkI0d4xTKqF8Aoc EX7jeuTiiL00xZFkxxqsDFAfppt3XZo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=eG20DX7c; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="80oB9n/F"; dmarc=none; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680704340; a=rsa-sha256; cv=none; b=z0MNc8brRXtNvcN3kNDIBvQHBSS3Hq8WD6V0Q/bQEdcr6bCCvPBf5DUHxveLizLKQpD4pj ZmmcT0j14oAgMHUQq0ehAyuFl+05INpRGDM4KqiHCETcSRpE+6XFyqU25somlBBBZRCNj3 GoVCu5AZjCwe9wlxhP7Hev81EIZlXTU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 075312298B; Wed, 5 Apr 2023 14:18:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1680704339; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TwzVFcW7sE+w3bNrVql2K+XcKwi6CmdDhydZgcJtnrg=; b=eG20DX7c4/v1CsN7+ArULNXpRSTSaKNuFTf1Vav2mw+Bi9RxvFpWbeejAs3arXTRSgEXcc 87GvHJ2kTrdvaDPXeQzVrExJVutL83RXg/8UQpldWtH450bQjTPM6/0x1Pzu5kl/1iDbun 108lhRYdFBdm6rUJqOuL5dcYomWs3SM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1680704339; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TwzVFcW7sE+w3bNrVql2K+XcKwi6CmdDhydZgcJtnrg=; b=80oB9n/FCTRxxxchFF7o0nzCgdzF0yoJaAY1+xya1LZ2Fk1S8AkXVS75k/s9bJwr8nWlRn Z9P8FqG0YA0onwAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D79C713A31; Wed, 5 Apr 2023 14:18:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id vIkKNFKDLWR5BgAAMHmgww (envelope-from ); Wed, 05 Apr 2023 14:18:58 +0000 Message-ID: Date: Wed, 5 Apr 2023 16:18:58 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH] mm, page_alloc: reduce page alloc/free sanity checks To: Mel Gorman Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Kees Cook , linux-hardening@vger.kernel.org, Alexander Halbuer References: <20230216095131.17336-1-vbabka@suse.cz> <20230405124519.ir7y54aunmyg3tcn@techsingularity.net> Content-Language: en-US From: Vlastimil Babka In-Reply-To: <20230405124519.ir7y54aunmyg3tcn@techsingularity.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9357E14000A X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: x5xkj7ox49591d6zppwrk4m56iwbbdeb X-HE-Tag: 1680704340-755012 X-HE-Meta: U2FsdGVkX1+RRpXsojxHQOuvODdwSzZYZzbo3zPJ3U73GP9CtrQU6CJC5lP7YgnXmuEL04bDrPD1bEPi7ACHDI54jAc2kiykx3FSET+m2orVVPhzQOWB59Wg7a2NaSv5p4ws1H6Qz4ho08fbnJp8NqyjiGZtBYGV5CdBBMAfcW27cbqa8Cfq9+uaS/P5AnV3a+Y5377hbIoBz4R618l77KMRAnMpn6fh3KQzUbmqDFxceuGJ5fhhy6pybTSuwN+d0/pp+GS/yMQ9T9kGTPU/fxz1O8Xe/x+jv7KUagdNmFCXdp6mcJd3pPdXMZ29KnHbO76FaIA54xh50HCzFWO6zHHqX6xNOtCHdnC0BilLUq8DKTfT9biNHO4x/KZoDSJsfx5GcE3NOb+7FVQRYBGCUFct2I0o8E5obCg+TtAyKspwEiGFwJFKAQSb0PBzhnW0+44miQ1GtoQHgM7ay1S1Xy8PHSjbr0BQUCZEBm5zeexnleoORegLFF533q+Meovkg+hXnkbKf0pHQY2cXXtICHZxbbBkyz3Mq2TqbyUahYhWGA0lsY3ZKugqA/MU4Cn0eKxTerVsTF4iP59ScIGkekHwPdDMbRqoO+A7VhHLebI21tESuc5kOOY59+hH/3gMT/iKXZxKdPD6TcZseCTR4VP/kC6C67bcfW/WAlp5h/GO5rxBn3tUsRDzT9XR3LgeIBbp3catD5glBijXDIk/aIFahMiWB/6vjpAUHZRD3FF7uG+4Q9EAcSQXxCtUcogIaiQffiuxipitkA+mJZTIThQxrwttHUWvozWD5JRBDQ7Uet1Ul6a8D++zej7zueBYUpeiMQI0TXX6WSj/ZVhuizD+hgGjqKXSvbi7uCebr8EdHhSVwADjOlyyVrWMvdDmv2eP22scXUf4fjt2p0bU25QsAiDlcgkLul7BfhRESnx+iD0li/S1WQzdsRE3QHoKCKcqdLms3lzeLELQ3TM z5U91egJ Jp3I/nz9f90k/bgweNc3PtJrrpjIMOcRH6U+m/bGy33y9gGnLR2NDpuQBizoHN6y3jBwhBo4giBN7nNnxZFwRsUj8uA/s0fIaQENJ0LG6cxj0ZIlXYFGihO0Xcoe/Ng7STsziN6MB/xmvIIsmwftcQYSXqU981g2Vk2OTEF6FjaO0lg5IDoYC55dE2C5bB7CgUY29aWTTfAEfvb5P5Ho2b3HW15GsnAFdihVSrXctEAMjdCiagQbiCHC0Ug1u6pDOBp7LDIod4t95DFe8cnXCeP5JNfFYPnTioqUkkqnvKx508SLeIX63ISejuR9MaA9sfwxMSFdANrFOuXov068UuDMTOOI94qjCzbKfE6XCNxnkxbCF/my+r8AhK/qExDKSdpd5TLTiUAwNJXfS1ckjZ4eJpMSTcjOnVzUKZBCJW/drPCf+u5Q5ux05uCZeYG9ZTc1SxeP4JqGzoAj0rTYz9wHVBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/5/23 14:45, Mel Gorman wrote: > On Thu, Feb 16, 2023 at 10:51:31AM +0100, Vlastimil Babka wrote: >> Historically, we have performed sanity checks on all struct pages being >> allocated or freed, making sure they have no unexpected page flags or >> certain field values. This can detect insufficient cleanup and some >> cases of use-after-free, although on its own it can't always identify >> the culprit. The result is a warning and the "bad page" being leaked. >> >> The checks do need some cpu cycles, so in 4.7 with commits 479f854a207c >> ("mm, page_alloc: defer debugging checks of pages allocated from the >> PCP") and 4db7548ccbd9 ("mm, page_alloc: defer debugging checks of freed >> pages until a PCP drain") they were no longer performed in the hot paths >> when allocating and freeing from pcplists, but only when pcplists are >> bypassed, refilled or drained. For debugging purposes, with >> CONFIG_DEBUG_VM enabled the checks were instead still done in the >> hot paths and not when refilling or draining pcplists. >> >> With 4462b32c9285 ("mm, page_alloc: more extensive free page checking >> with debug_pagealloc"), enabling debug_pagealloc also moved the sanity >> checks back to hot pahs. When both debug_pagealloc and CONFIG_DEBUG_VM >> are enabled, the checks are done both in hotpaths and pcplist >> refill/drain. >> >> Even though the non-debug default today might seem to be a sensible >> tradeoff between overhead and ability to detect bad pages, on closer >> look it's arguably not. As most allocations go through the pcplists, >> catching any bad pages when refilling or draining pcplists has only a >> small chance, insufficient for debugging or serious hardening purposes. >> On the other hand the cost of the checks is concentrated in the already >> expensive drain/refill batching operations, and those are done under the >> often contended zone lock. That was recently identified as an issue for >> page allocation and the zone lock contention reduced by moving the >> checks outside of the locked section with a patch "mm: reduce lock >> contention of pcp buffer refill", but the cost of the checks is still >> visible compared to their removal [1]. In the pcplist draining path >> free_pcppages_bulk() the checks are still done under zone->lock. >> >> Thus, remove the checks from pcplist refill and drain paths completely. >> Introduce a static key check_pages_enabled to control checks during page >> allocation a freeing (whether pcplist is used or bypassed). The static >> key is enabled if either is true: >> - kernel is built with CONFIG_DEBUG_VM=y (debugging) >> - debug_pagealloc or page poisoning is boot-time enabled (debugging) >> - init_on_alloc or init_on_free is boot-time enabled (hardening) >> >> The resulting user visible changes: >> - no checks when draining/refilling pcplists - less overhead, with >> likely no practical reduction of ability to catch bad pages >> - no checks when bypassing pcplists in default config (no >> debugging/hardening) - less overhead etc. as above >> - on typical hardened kernels [2], checks are now performed on each page >> allocation/free (previously only when bypassing/draining/refilling >> pcplists) - the init_on_alloc/init_on_free enabled should be sufficient >> indication for preferring more costly alloc/free operations for >> hardening purposes and we shouldn't need to introduce another toggle >> - code (various wrappers) removal and simplification >> >> [1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@sra.uni-hannover.de/ >> [2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@mx.google.com/ >> >> Reported-by: Alexander Halbuer >> Reported-by: Andrew Morton >> Signed-off-by: Vlastimil Babka > > Acked-by: Mel Gorman Thanks. > Some minor comments below > >> @@ -1432,9 +1448,11 @@ static __always_inline bool free_pages_prepare(struct page *page, >> for (i = 1; i < (1 << order); i++) { >> if (compound) >> bad += free_tail_pages_check(page, page + i); > > free_tail_pages_check is also a function that only does something useful > when CONFIG_DEBUG_VM is set. While it might be outside the scope of the > patch, it might also benefit from check_pages_enabled checks? True, will send a followup. Will also rename it to free_tail_page_prepare() as it in fact also combines a preparation component with optional checks component. Will remove the unlikely()s you pointed out as well. >> - if (unlikely(free_page_is_bad(page + i))) { >> - bad++; >> - continue; >> + if (static_branch_unlikely(&check_pages_enabled)) { >> + if (unlikely(free_page_is_bad(page + i))) { >> + bad++; >> + continue; >> + } >> } >> (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; >> } > > The unlikely() within a static_branch_unlikely probably adds very little > given the block is so tiny. > >> @@ -2392,56 +2369,20 @@ static inline int check_new_page(struct page *page) >> return 1; >> } >> >> -static bool check_new_pages(struct page *page, unsigned int order) >> +static inline bool check_new_pages(struct page *page, unsigned int order) >> { >> - int i; >> - for (i = 0; i < (1 << order); i++) { >> - struct page *p = page + i; >> + if (static_branch_unlikely(&check_pages_enabled)) { >> + for (int i = 0; i < (1 << order); i++) { >> + struct page *p = page + i; >> >> - if (unlikely(check_new_page(p))) >> - return true; >> + if (unlikely(check_new_page(p))) >> + return true; >> + } >> } >> > > unlikely() within static_branch_unlikely probably adds very little. > > Otherwise, looks good. I agree that with changes over time that the ability > of the checks to detect anything is reduced and it's probably at the point > where it can only detect a very specific bit corruption instead of broken > code. Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be > stored on the per-cpu lists") also likely reduced the ability of the checks > to find anything. >