From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E74BC5475B for ; Mon, 11 Mar 2024 23:41:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B32E56B0160; Mon, 11 Mar 2024 19:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABA976B0161; Mon, 11 Mar 2024 19:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9345A6B0162; Mon, 11 Mar 2024 19:41:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7CC386B0160 for ; Mon, 11 Mar 2024 19:41:56 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 42D2B1A0594 for ; Mon, 11 Mar 2024 23:41:56 +0000 (UTC) X-FDA: 81886383432.04.01358FF Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 758AE40002 for ; Mon, 11 Mar 2024 23:41:54 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=L5ceodxM; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710200514; a=rsa-sha256; cv=none; b=EBAzNG5vFVtgcQE4hx9hg8QS9QGdZ/om60xlCT0VK2rcSErxUYwny8Qfs+o3AJz/C+LIlK 3Cu28MGDPC1ulFzAR86T3MSB5l71+Q5fEUf2V/8CuA0Kan9DRlYzD/iUintDTuvKfOdX3Z FmQC7vb3XEzgf3RpHPufnIxe0j5RqnQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=L5ceodxM; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710200514; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TXUC7EBSF7yZADz35qLKuiZ6DysFeoumL5EnOhDC4Mw=; b=ZDk8vYDh6VJqigfVyBrv952Ruwyk9HzOliV0PcZ+0oug5HJvEgQnU5H58WVAgt1Zzng4uI SJHY3nA5t8BInAGrpS/Zm3okSS3tegMFteTN+D5JmIGlYkSqI4yTvCB1UxIupHWRfq/xqV 38669pfDDTo9DzzGHA8XiKvWMpO+aI4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 45BBA61027; Mon, 11 Mar 2024 23:41:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8712C433F1; Mon, 11 Mar 2024 23:41:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710200512; bh=jjeKu/Fq+iDGPMTBTzqSsDgUhOYnfDFIBWLtDwFFWVc=; h=In-Reply-To:References:Date:From:To:Subject:From; b=L5ceodxMdFbJFXBShdbQkvUMn/xDml0vkeA8PIYiut/kgG5lKmnFORrqExlCq3iLX e0SnljJlSCIWfoRxLawkKz3zlKR1lcXGKkC43pSlHZ/b9QU/lgc+RkWp3gX3CK/Tth MEpV00gjknFVBvdvQ+0w35rpnuc42fDJnkGzeNxP8GZB5+H2yEcAKLOUHQ8yht9yX4 1I0LsY+CtASv7tn89qY7eof5Se28JWHBAEPVPSIXxWQUlo2E8gGugH/XLeLsgh7ZJL /FSK8ihDRPFUHDNhRfJlYHU4SQbnAoPAtbv6ZF7aO6vw9sjtsL9z0okrLjiNdTeuJg uRb49A1mkt0+Q== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailfauth.nyi.internal (Postfix) with ESMTP id D51E31200068; Mon, 11 Mar 2024 19:41:49 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Mon, 11 Mar 2024 19:41:49 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrjedvgddugecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedftehnugih ucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecuggftrf grthhtvghrnheptdfhheettddvtedvtedugfeuuefhtddugedvleevleefvdetleffgfef vdekgeefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh eprghnugihodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduudeiudekheei fedvqddvieefudeiiedtkedqlhhuthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuhigrd hluhhtohdruhhs X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 35B0031A0065; Mon, 11 Mar 2024 19:41:49 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-251-g8332da0bf6-fm-20240305.001-g8332da0b MIME-Version: 1.0 Message-Id: In-Reply-To: References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <3e180c07-53db-4acb-a75c-1a33447d81af@app.fastmail.com> Date: Mon, 11 Mar 2024 16:41:28 -0700 From: "Andy Lutomirski" To: "Dave Hansen" , "Pasha Tatashin" , "Linux Kernel Mailing List" , linux-mm@kvack.org, "Andrew Morton" , "the arch/x86 maintainers" , "Borislav Petkov" , "Christian Brauner" , bristot@redhat.com, "Ben Segall" , "Dave Hansen" , dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, "hch@infradead.org" , "H. Peter Anvin" , "Jacob Pan" , "Jason Gunthorpe" , jpoimboe@kernel.org, "Joerg Roedel" , juri.lelli@redhat.com, "Kent Overstreet" , kinseyho@google.com, "Kirill A. Shutemov" , lstoakes@gmail.com, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, "Ingo Molnar" , mjguzik@gmail.com, "Michael S. Tsirkin" , "Nicholas Piggin" , "Peter Zijlstra (Intel)" , "Petr Mladek" , "Rick P Edgecombe" , "Steven Rostedt" , "Suren Baghdasaryan" , "Thomas Gleixner" , "Uladzislau Rezki" , vincent.guittot@linaro.org, vschneid@redhat.com Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks Content-Type: text/plain X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 758AE40002 X-Stat-Signature: ua3rqj7ht8y8zhxtaceykhspyqjxjf8z X-HE-Tag: 1710200514-768816 X-HE-Meta: U2FsdGVkX1/uIt/jQ+aGWrIUYI57Fdby/lRZjWBNJvGfDFH/H9EEdv0cwOVx4gvdw3jwE7zoVYhWiMNbc+ZNrRDOm5c/+XHFjGSn9xlNA0L+LsL76gjYiCnXEPsHxSWcvTi8mLPpHm6KSg8L3Zu9MJaO8vniC/dMAuKKGE/0Tqf+L02UN1LyEgecEWcrr2kA/+D18F8f5YpT2U9K24w93JQ6prJiw+D06cRtlIKQFlthyoJkOTjst4sr9RYs3jl8lmPc+dGtBVVa5L6YB4Ps3QZ5oLBoPZNBQioMJBBanByYN1zZvCSskG4gTUTCl67CStXxFLqFzcNwk+LpblsnWv5QuGutMYWfjqNXlHbVQ2dwtozFZQH2wRkrYQr4aIQr1DAzry3aiT+/mulFLt+NNc18ZVSsr8+d8SIhSFSYOE7MXa2oRYs0Epp6EBIsElCEr71/ouYK50OW62657DL8WojiRPXQn/CI8mAAjia+ba2bO7Z4PgLfNZ0FLeq/0r8AF8JE2SGSyUP3qsqxxDbTbiksVpDebshar6iiLFj+beydFcihnBOpBAA6CqaXPXZi/diobyruHJPf4jqHNHXHEy78tUNkyBC6Om28XnCO15nStdHuBkOtOVqtS+4mfVVFdD3JsIxX2rmiGhMr3+vhbHxF0tqLIvKIPk0coVxymyEAw8vSFYfeCYNP15UtTAIQM8rMl8DDsVfxiOh4c/b482IcH12S/jSfUmF9wjSqoSigOlpbZb/O/L43ZSoT1a3CmntJNkjPbSF1oqZzJe051IsHPPQKB4yPIYEg7/oHMjKLqBGsuzKrqIQkoQA08dVwp3SooHB8Fi/EkueVSp784kPo831BzmtTjjY7TtLllPmhXndn25rIRo86j2h47l6ndKiFu/6YY26TsHNrVtz//CPzxFSA1GZY9GwGJbXWrN3XcgN/U0b2sw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024, at 4:34 PM, Dave Hansen wrote: > On 3/11/24 15:17, Andy Lutomirski wrote: >> I *think* that all x86 implementations won't fill the TLB for a >> non-accessed page without also setting the accessed bit, > > That's my understanding as well. The SDM is a little more obtuse about it: > >> Whenever the processor uses a paging-structure entry as part of >> linear-address translation, it sets the accessed flag in that entry >> (if it is not already set). > > but it's there. > > But if we start needing Accessed=1 to be accurate, clearing those PTEs > gets more expensive because it needs to be atomic to lock out the page > walker. It basically needs to start getting treated similarly to what > is done for Dirty=1 on userspace PTEs. Not the end of the world, of > course, but one more source of overhead. In my fantasy land where I understand the x86 paging machinery, suppose we're in finish_task_switch(), and suppose prev is Not Horribly Buggy (TM). In particular, suppose that no other CPU is concurrently (non-speculatively!) accessing prev's stack. Prev can't be running, because whatever magic lock prevents it from being migrated hasn't been released yet. (I have no idea what lock this is, but it had darned well better exist so prev isn't migrated before switch_to() even returns.) So the current CPU is not accessing the memory, and no other CPU is accessing the memory, and BPF doesn't exist, so no one is being utterly daft and a kernel read probe, and perf isn't up to any funny business, etc. And a CPU will never *speculatively* set the accessed bit (I told you it's fantasy land), so we just do it unlocked: if (!pte->accessed) { *pte = 0; reuse the memory; } What could possibly go wrong? I admit this is not the best idea I've ever had, and I will not waste anyone's time by trying very hard to defend it :)