From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CF2CD15DA8 for ; Mon, 21 Oct 2024 14:54:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1C0A6B0088; Mon, 21 Oct 2024 10:54:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCB9D6B008A; Mon, 21 Oct 2024 10:54:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6C846B008C; Mon, 21 Oct 2024 10:54:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A37506B0088 for ; Mon, 21 Oct 2024 10:54:11 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 11906AAB50 for ; Mon, 21 Oct 2024 14:53:41 +0000 (UTC) X-FDA: 82697903904.09.E128838 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf03.hostedemail.com (Postfix) with ESMTP id E288E20015 for ; Mon, 21 Oct 2024 14:54:01 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=s7jRDk65; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQdL5ECs; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=s7jRDk65; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQdL5ECs; spf=pass (imf03.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729522372; a=rsa-sha256; cv=none; b=MmKr6D/3NVwAtikuMytaqw+AvOGTD8eiWXwpzswfYwQefOIywM+iQA9cGw9OEgTHu/s15S 5qKAE1p61quUjdwi3TAhBAPaiGBuPXfk/r5XrRgccExugZI3N8hKpr0PQBvxmHEyDg/xsq dXrHbZ4+w/0mGnhZJ3JSwCQ53LMTV8I= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=s7jRDk65; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQdL5ECs; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=s7jRDk65; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQdL5ECs; spf=pass (imf03.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729522372; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nRI3sbkmnwOK+Lkb9FefXrZ7ZGht1xUlXQS9Se85dJs=; b=QyTNOF9E2AJEp8WiWoJlzg9ASQzF0nyRC/Gl943Xv7G4Fx/SfQ0hhhVOyHVyCVuZTkSDLb BVnCNC2yNMSm/WOcyFxq0ZHX22XRzNzhpGw7JTxJIgtb3h8Vm+DS/Jvj3QGeiMlV78Bq0W e5DU1DKJ8gWKFC+F9bh6JkcZ1wSnSxY= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 099A71F7E9; Mon, 21 Oct 2024 14:54:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729522447; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=nRI3sbkmnwOK+Lkb9FefXrZ7ZGht1xUlXQS9Se85dJs=; b=s7jRDk65dXPM9sqi5FebmAmjOq990Otj25sm+mTxtyacbJmiJCEXID7gzUvpYzrkLOCSlW 38IrqTGFOR5zzzBbPv0QeBohVAmVJWJ5Qmy5tceuYbG+JDnGVwHUWv/PZ+YG7gBSvmI4dd i0CXYQnDjq6LwCymkh4sBMop5v4ZN2U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729522447; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=nRI3sbkmnwOK+Lkb9FefXrZ7ZGht1xUlXQS9Se85dJs=; b=sQdL5ECsXTNp2qeCI6E997OfEUobMwAFczHwOsmd8cHt8PaDNU6lmWdd9itCT+EPWaswCA By1E737FIhl5z4BA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729522447; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=nRI3sbkmnwOK+Lkb9FefXrZ7ZGht1xUlXQS9Se85dJs=; b=s7jRDk65dXPM9sqi5FebmAmjOq990Otj25sm+mTxtyacbJmiJCEXID7gzUvpYzrkLOCSlW 38IrqTGFOR5zzzBbPv0QeBohVAmVJWJ5Qmy5tceuYbG+JDnGVwHUWv/PZ+YG7gBSvmI4dd i0CXYQnDjq6LwCymkh4sBMop5v4ZN2U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729522447; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=nRI3sbkmnwOK+Lkb9FefXrZ7ZGht1xUlXQS9Se85dJs=; b=sQdL5ECsXTNp2qeCI6E997OfEUobMwAFczHwOsmd8cHt8PaDNU6lmWdd9itCT+EPWaswCA By1E737FIhl5z4BA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id BBC18136DC; Mon, 21 Oct 2024 14:54:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id K03gLA5rFmfDXgAAD6G6ig (envelope-from ); Mon, 21 Oct 2024 14:54:06 +0000 Message-ID: Date: Mon, 21 Oct 2024 16:54:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/5] mm: add PTE_MARKER_GUARD PTE marker Content-Language: en-US To: Lorenzo Stoakes Cc: Andrew Morton , Suren Baghdasaryan , "Liam R . Howlett" , Matthew Wilcox , "Paul E . McKenney" , Jann Horn , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E . J . Bottomley" , Helge Deller , Chris Zankel , Max Filippov , Arnd Bergmann , linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linux-arch@vger.kernel.org, Shuah Khan , Christian Brauner , linux-kselftest@vger.kernel.org, Sidhartha Kumar , Jeff Xu , Christoph Hellwig , linux-api@vger.kernel.org, John Hubbard References: <081837b697a98c7fa5832542b20f603d49e0b557.1729440856.git.lorenzo.stoakes@oracle.com> <470886d2-9f6f-4486-a935-daea4c5bea09@suse.cz> <434a440a-d6a4-4144-b4fb-8e0d8535f03f@lucifer.local> From: Vlastimil Babka Autocrypt: addr=vbabka@suse.cz; keydata= xsFNBFZdmxYBEADsw/SiUSjB0dM+vSh95UkgcHjzEVBlby/Fg+g42O7LAEkCYXi/vvq31JTB KxRWDHX0R2tgpFDXHnzZcQywawu8eSq0LxzxFNYMvtB7sV1pxYwej2qx9B75qW2plBs+7+YB 87tMFA+u+L4Z5xAzIimfLD5EKC56kJ1CsXlM8S/LHcmdD9Ctkn3trYDNnat0eoAcfPIP2OZ+ 9oe9IF/R28zmh0ifLXyJQQz5ofdj4bPf8ecEW0rhcqHfTD8k4yK0xxt3xW+6Exqp9n9bydiy tcSAw/TahjW6yrA+6JhSBv1v2tIm+itQc073zjSX8OFL51qQVzRFr7H2UQG33lw2QrvHRXqD Ot7ViKam7v0Ho9wEWiQOOZlHItOOXFphWb2yq3nzrKe45oWoSgkxKb97MVsQ+q2SYjJRBBH4 8qKhphADYxkIP6yut/eaj9ImvRUZZRi0DTc8xfnvHGTjKbJzC2xpFcY0DQbZzuwsIZ8OPJCc LM4S7mT25NE5kUTG/TKQCk922vRdGVMoLA7dIQrgXnRXtyT61sg8PG4wcfOnuWf8577aXP1x 6mzw3/jh3F+oSBHb/GcLC7mvWreJifUL2gEdssGfXhGWBo6zLS3qhgtwjay0Jl+kza1lo+Cv BB2T79D4WGdDuVa4eOrQ02TxqGN7G0Biz5ZLRSFzQSQwLn8fbwARAQABzSBWbGFzdGltaWwg QmFia2EgPHZiYWJrYUBzdXNlLmN6PsLBlAQTAQoAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIe AQIXgBYhBKlA1DSZLC6OmRA9UCJPp+fMgqZkBQJkBREIBQkRadznAAoJECJPp+fMgqZkNxIQ ALZRqwdUGzqL2aeSavbum/VF/+td+nZfuH0xeWiO2w8mG0+nPd5j9ujYeHcUP1edE7uQrjOC Gs9sm8+W1xYnbClMJTsXiAV88D2btFUdU1mCXURAL9wWZ8Jsmz5ZH2V6AUszvNezsS/VIT87 AmTtj31TLDGwdxaZTSYLwAOOOtyqafOEq+gJB30RxTRE3h3G1zpO7OM9K6ysLdAlwAGYWgJJ V4JqGsQ/lyEtxxFpUCjb5Pztp7cQxhlkil0oBYHkudiG8j1U3DG8iC6rnB4yJaLphKx57NuQ PIY0Bccg+r9gIQ4XeSK2PQhdXdy3UWBr913ZQ9AI2usid3s5vabo4iBvpJNFLgUmxFnr73SJ KsRh/2OBsg1XXF/wRQGBO9vRuJUAbnaIVcmGOUogdBVS9Sun/Sy4GNA++KtFZK95U7J417/J Hub2xV6Ehc7UGW6fIvIQmzJ3zaTEfuriU1P8ayfddrAgZb25JnOW7L1zdYL8rXiezOyYZ8Fm ZyXjzWdO0RpxcUEp6GsJr11Bc4F3aae9OZtwtLL/jxc7y6pUugB00PodgnQ6CMcfR/HjXlae h2VS3zl9+tQWHu6s1R58t5BuMS2FNA58wU/IazImc/ZQA+slDBfhRDGYlExjg19UXWe/gMcl De3P1kxYPgZdGE2eZpRLIbt+rYnqQKy8UxlszsBNBFsZNTUBCACfQfpSsWJZyi+SHoRdVyX5 J6rI7okc4+b571a7RXD5UhS9dlVRVVAtrU9ANSLqPTQKGVxHrqD39XSw8hxK61pw8p90pg4G /N3iuWEvyt+t0SxDDkClnGsDyRhlUyEWYFEoBrrCizbmahOUwqkJbNMfzj5Y7n7OIJOxNRkB IBOjPdF26dMP69BwePQao1M8Acrrex9sAHYjQGyVmReRjVEtv9iG4DoTsnIR3amKVk6si4Ea X/mrapJqSCcBUVYUFH8M7bsm4CSxier5ofy8jTEa/CfvkqpKThTMCQPNZKY7hke5qEq1CBk2 wxhX48ZrJEFf1v3NuV3OimgsF2odzieNABEBAAHCwXwEGAEKACYCGwwWIQSpQNQ0mSwujpkQ PVAiT6fnzIKmZAUCZAUSmwUJDK5EZgAKCRAiT6fnzIKmZOJGEACOKABgo9wJXsbWhGWYO7mD 8R8mUyJHqbvaz+yTLnvRwfe/VwafFfDMx5GYVYzMY9TWpA8psFTKTUIIQmx2scYsRBUwm5VI EurRWKqENcDRjyo+ol59j0FViYysjQQeobXBDDE31t5SBg++veI6tXfpco/UiKEsDswL1WAr tEAZaruo7254TyH+gydURl2wJuzo/aZ7Y7PpqaODbYv727Dvm5eX64HCyyAH0s6sOCyGF5/p eIhrOn24oBf67KtdAN3H9JoFNUVTYJc1VJU3R1JtVdgwEdr+NEciEfYl0O19VpLE/PZxP4wX PWnhf5WjdoNI1Xec+RcJ5p/pSel0jnvBX8L2cmniYnmI883NhtGZsEWj++wyKiS4NranDFlA HdDM3b4lUth1pTtABKQ1YuTvehj7EfoWD3bv9kuGZGPrAeFNiHPdOT7DaXKeHpW9homgtBxj 8aX/UkSvEGJKUEbFL9cVa5tzyialGkSiZJNkWgeHe+jEcfRT6pJZOJidSCdzvJpbdJmm+eED w9XOLH1IIWh7RURU7G1iOfEfmImFeC3cbbS73LQEFGe1urxvIH5K/7vX+FkNcr9ujwWuPE9b 1C2o4i/yZPLXIVy387EjA6GZMqvQUFuSTs/GeBcv0NjIQi8867H3uLjz+mQy63fAitsDwLmR EP+ylKVEKb0Q2A== In-Reply-To: <434a440a-d6a4-4144-b4fb-8e0d8535f03f@lucifer.local> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Action: no action X-Stat-Signature: x59argm69rz77scxx9krirdztbupppqn X-Rspamd-Queue-Id: E288E20015 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729522441-237217 X-HE-Meta: U2FsdGVkX1/igCIEaZ5XDUiSa5O15RChfhVE84im+zVODAjb03tkOMcjezxiVlf8ay/R074HmD0il/loXAcroFgv/xEKr6y9jgKyupWOYJI4ldKNje/gSKtTYpNR3q843f40yjNJv2F5UD5mySBT53bQVJN6gQP5EthfxO3LLdZpLY16UUmvVSuWygGgDmg1EFanvDXJZyH/JL39qBcaZgdGnJMUxgJLw3aBVoIwcrynzQTpMIdN3SB3Kh3uRqal/nZydOlquEv6MXSKyxwpy0EHFZhnVAK8ZWoPzUTmdvJslG2Wi7k6QH+iarnKS94lW7LhXaD1lhwnQZSyP/3rffNjHbeU/Ju9sjn8fBzRRxQjBNQIiLti7jGgzAHc4syMmeXWo8kWUcrCaELVhSVCP3uwQeEwHkqzISFolJbh3bYnQDHpb7MWi4EWJ2mz0RAnM6hI/cuETyloJEBdc6+cmYyeVaW5ghSWYDlh9c984PEAVcc7982cPIelNuzaTCDQMfKOJRVsXWf+CkAsh1sQP4eh8jMTqBMF6dkznCGU+54BUkXft/gniEeSx0vom5EtIEtiS9d49uNmNCD62GMrIe4BstEj7NOk+VZjrWmZDRrqE0yRrB+lFynjdZrmgFIAc0tD4e1v78/SPN0j7rYG4b//aNNWzxA94+WWTQglpsUdbqqlepHc+P0nYPE/dp5Uf972Pds7oz9mfz119ZbJHAT/vZ9gRUJTxRNdDBDmVJeHvNtsrUjFInGbJ7fkVElL1wqe9GUphDH2cBVe0mWsKAmUlxVdALU4MYjUpTPP0i2q6seU4ix+wpJJKNyAqTxvnGkq+xONYqkBNIaORiQr6/SmZyy64jQ0iSF8ZggujWX7K5PrETm4gQTeFzI0o5aRCuiSWFebOuM899YgGbuP78To531nCnNkeUg5AA/71YL9YwrvXS4ie/2bUOXmVv/PujeqJYvqkA6RHuKwUb+ pXbZsD57 JuEhDrs91/VvSF6v3x/79B5I/Ex3qqRcKFznQHP/YbeB1wb9iE6gkkrqpnQB+CloCQjUEoWShtsxMhiY5RuGn/nn5etS9p2X9+05275qK3VflQlL8zyhAsW0pzpyFuavxzGBYE828ENlS/t9mPmGFzBPi0TwKWuCAR4UWEZUtIWJyZzvvwerFiGiylKK45HODCKylGdgBpT0wyeL3mw1hB1e6zhWotureRioAlrTFUdhR0e0gBMAzUc3MjOGf70fY6FEE6Z4mI+OFNjcmqdttVwugSsMf//1dtdoQrGycHu/P2/w+H8kw0zmKaGgmSytVcSsZ3w3JNezYtvMNLO4iYu0gUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/21/24 16:33, Lorenzo Stoakes wrote: > On Mon, Oct 21, 2024 at 04:13:34PM +0200, Vlastimil Babka wrote: >> On 10/20/24 18:20, Lorenzo Stoakes wrote: >> > Add a new PTE marker that results in any access causing the accessing >> > process to segfault. >> > >> > This is preferable to PTE_MARKER_POISONED, which results in the same >> > handling as hardware poisoned memory, and is thus undesirable for cases >> > where we simply wish to 'soft' poison a range. >> > >> > This is in preparation for implementing the ability to specify guard pages >> > at the page table level, i.e. ranges that, when accessed, should cause >> > process termination. >> > >> > Additionally, rename zap_drop_file_uffd_wp() to zap_drop_markers() - the >> > function checks the ZAP_FLAG_DROP_MARKER flag so naming it for this single >> > purpose was simply incorrect. >> > >> > We then reuse the same logic to determine whether a zap should clear a >> > guard entry - this should only be performed on teardown and never on >> > MADV_DONTNEED or the like. >> >> Since I would have personally put MADV_FREE among "or the like" here, it's >> surprising to me that it in fact it's tearing down the guard entries now. Is >> that intentional? It should be at least mentioned very explicitly. But I'd >> really argue against it, as MADV_FREE is to me a weaker form of >> MADV_DONTNEED - the existing pages are not zapped immediately but >> prioritized for reclaim. If MADV_DONTNEED leaves guard PTEs in place, why >> shouldn't MADV_FREE too? > > That is not, as I understand it, what MADV_FREE is, semantically. From the > man pages: > > MADV_FREE (since Linux 4.5) > > The application no longer requires the pages in the range > specified by addr and len. The kernel can thus free these > pages, but the freeing could be delayed until memory pressure > occurs. > > MADV_DONTNEED > > Do not expect access in the near future. (For the time > being, the application is finished with the given range, so > the kernel can free resources associated with it.) > > MADV_FREE is 'we are completely done with this range'. MADV_DONTNEED is 'we > don't expect to use it in the near future'. I think the description gives a wrong impression. What I think matters it what happens (limited to anon private case as MADV_FREE doesn't support any other) MADV_DONTNEED - pages discarded immediately, further access gives new zero-filled pages MADV_FREE - pages prioritized for discarding, if that happens before next write, it gets zero-filled page on next access, but a write done soon enough can cancel the upcoming discard. In that sense, MADV_FREE is a weaker form of DONTNEED, no? >> >> Seems to me rather currently an artifact of MADV_FREE implementation - if it >> encounters hwpoison entries it will tear them down because why not, we have >> detected a hw memory error and are lucky the program wants to discard the >> pages and not access them, so best use the opportunity and get rid of the >> PTE entries immediately (if MADV_DONTNEED doesn't do that too, it certainly >> could). > > Right, but we explicitly do not tear them down in the case of MADV_DONTNEED > which matches the description in the manpages that the user _might_ come > back to the range, whereas MADV_FREE means they are truly done but just > don't want the overhead of actually unmapping at this point. But it's also defined what happens if user comes back to the range after a MADV_FREE. I think the overhead saved happens in the case of actually coming back soon enough to prevent the discard. With MADV_DONTNEED its immediate and unconditional. > Seems to be this is moreso that MADV_FREE is a not-really-as-efficient > version of what Rik wants to do with his MADV_LAZYFREE thing. I think that further optimizes MADV_FREE, which is already more optimized than MADV_DONTNEED. >> >> But to extend this to guard PTEs which are result of an explicit userspace >> action feels wrong, unless the semantics is the same for MADV_DONTEED. The >> semantics chosen for MADV_DONTNEED makes sense, so MADV_FREE should behave >> the same? > > My understanding from the above is that MADV_FREE is a softer version of > munmap(), i.e. 'totally done with this range', whereas MADV_DONTNEED is a > 'revert state to when I first mapped this stuff because I'm done with it > for now but might use it later'. >From the implementation I get the opposite understanding. Neither tears down the vma like a proper unmap(). MADV_DONTNEED zaps page tables immediately, MADV_FREE effectively too but with a delay depending on memory pressure.