From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA0AAC27C4F for ; Fri, 14 Jun 2024 02:45:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCD3E6B00DC; Thu, 13 Jun 2024 22:45:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7C726B00DE; Thu, 13 Jun 2024 22:45:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF5E96B00DF; Thu, 13 Jun 2024 22:45:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 52A366B00DC for ; Thu, 13 Jun 2024 22:45:28 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1084516039D for ; Fri, 14 Jun 2024 02:45:28 +0000 (UTC) X-FDA: 82227953136.30.1CD79C6 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf15.hostedemail.com (Postfix) with ESMTP id 6275EA0008 for ; Fri, 14 Jun 2024 02:45:25 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718333124; a=rsa-sha256; cv=none; b=VInWiJEHVJHgCcxgQluMDnzTjALyZ9dvyAALokzYyNa+BiWkrweXOb67Hh1ss4CRMVfuqF aCHTr1EpogKWQOtfkcDMZT7k1WkPj0pCkyq+oZnSnP3m5k9QvUWxYBQN2JRzDUqS8j7ZJG OveJl3qpYVZjY1eRm1EGA7xEO1hvwIU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718333124; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WHwyrAJfcXefMIgCCZiElooxwFFL0Sh9CZisTmEYZsA=; b=5s/wYDmUk8DsV089tOhA4z0/kd36A2247rPHoLrLpwlMpF51kZmkT2wZocG9gNpbtCKIPk EGeY53GqOMZV/g3cowS/FfcsFocqTp3cQSkdz4e7yut/1+6rSaM+osKH3ZDZVT21hjuZge AXBKPQAlEPFbEnlEGz0BkEp5KdxHX0k= X-AuditID: a67dfc5b-d6dff70000001748-18-666baec31553 Date: Fri, 14 Jun 2024 11:45:18 +0900 From: Byungchul Park To: Michal Hocko Cc: Matthew Wilcox , Dave Hansen , David Hildenbrand , Byungchul Park , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Message-ID: <20240614024518.GB47085@system.software.com> References: <26dc4594-430b-483c-a26c-7e68bade74b0@redhat.com> <20240603093505.GA12549@system.software.com> <35866f91-7d96-462a-aa0a-ac8a6b8cbcf8@redhat.com> <196481bb-b86d-4959-b69b-21fda4daae77@intel.com> <20240604003448.GA26609@system.software.com> <20240611005523.GA4384@system.software.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrAIsWRmVeSWpSXmKPExsXC9ZZnke7hddlpBtOWWFnMWb+GzeLzhn9s Fp9ePmC0eLGhndHi6/pfzBZPP/WxWFzeNYfN4t6a/6wWRzs3MVuc37WW1WLH0n1MFvf7HCwu HVjAZHG89wCTxfx7n9ksNm+aymxxfMpURovfP4C6Ts6azOIg7PG9tY/FY+esu+weCzaVemxe oeWxeM9LJo9NqzrZPDZ9msTu8e7cOXaPEzN+s3jMOxno8X7fVTaP9Vuusnhs/WXn0Tj1GpvH 501yAfxRXDYpqTmZZalF+nYJXBnTj7xnLjilUrH4zEWWBsY7kl2MnBwSAiYSc3r62GDsDx2d LCA2i4CqRNv7CWBxNgF1iRs3fjKD2CICShJdm3cCxbk4mAX+M0v8mHcMrEhYoEDi1YRJ7CA2 r4CFxPXW9WCDhAROM0vcf2oIEReUODnzCVicWUBL4sa/l0xdjBxAtrTE8n8cIGFOAU2Jy58/ g40RFVCWOLDtOBPILgmBY+wS805PYoY4VFLi4IobLBMYBWYhGTsLydhZCGMXMDKvYhTKzCvL TczMMdHLqMzLrNBLzs/dxAiM12W1f6J3MH66EHyIUYCDUYmH1+NZVpoQa2JZcWXuIUYJDmYl Ed5ZC4FCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeY2+lacICaQnlqRmp6YWpBbBZJk4OKUaGDec UHJfuOXayzvaD5SnxS4+YlmesyF1u/nr2EqxyG/LwlbqfOgO2rORg7NBd+nKn7J3LuVqqTXs +N+6XiWKhefaK1/Ri4fmOPc9ft0f8qz0i5qdEPu7+a561wz+P9x0aSlr+s8mqTz5mUx7E83M nre/3OOocl5A5PDKg9+mXJr1JX7plNWfHvEpsRRnJBpqMRcVJwIAdm4/+9MCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrOIsWRmVeSWpSXmKPExsXC5WfdrHt4XXaawY8ZMhZz1q9hs/i84R+b xaeXDxgtXmxoZ7T4uv4Xs8XTT30sFofnnmS1uLxrDpvFvTX/WS2Odm5itji/ay2rxY6l+5gs 7vc5WFw6sIDJ4njvASaL+fc+s1ls3jSV2eL4lKmMFr9/AHWdnDWZxUHE43trH4vHzll32T0W bCr12LxCy2PxnpdMHptWdbJ5bPo0id3j3blz7B4nZvxm8Zh3MtDj/b6rbB6LX3xg8li/5SqL x9Zfdh6NU6+xeXzeJBcgEMVlk5Kak1mWWqRvl8CVMf3Ie+aCUyoVi89cZGlgvCPZxcjJISFg IvGho5MFxGYRUJVoez+BDcRmE1CXuHHjJzOILSKgJNG1eSdQnIuDWeA/s8SPecfAioQFCiRe TZjEDmLzClhIXG9dDzZISOA0s8T9p4YQcUGJkzOfgMWZBbQkbvx7ydTFyAFkS0ss/8cBEuYU 0JS4/Pkz2BhRAWWJA9uOM01g5J2FpHsWku5ZCN0LGJlXMYpk5pXlJmbmmOoVZ2dU5mVW6CXn 525iBEbfsto/E3cwfrnsfohRgINRiYfX41lWmhBrYllxZe4hRgkOZiUR3lkLgUK8KYmVValF +fFFpTmpxYcYpTlYlMR5vcJTE4QE0hNLUrNTUwtSi2CyTBycUg2M+htj9DeExnqxHKh+sc5a 53exN8deDsc1V+uYjy8Vj5Baz3GLNfj0s+0HV994E2nceJPPKlrtY9n9ZbW3RDNcQidVTLOf +bub71SHj9PPyzJ9/2/vPfAoXeHx4gQnr51vJzdt1/65LWb77sIFhbr7rpdzJC55cq5CWmHS 3MU8O83qNB9zT9jfrcRSnJFoqMVcVJwIAB6WDw66AgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 6275EA0008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: o3894qhg8i5pniy79xx7yi3j6ttnggaf X-HE-Tag: 1718333125-852890 X-HE-Meta: U2FsdGVkX18ikbNDhrKfdRbTHWQVbLU/uSBPKxCxHvW9m4AnEojAxebS1wlR1aJDkL/vq7Pfkdmg2oQgqLQixQ9iaFWfIn1o0ZL9/xU2hTauKYZrQuQcob+feASzLLVAt1sRyTN+Lj6/aWzM0JV/soPrVcjTB72wJgv3mpsg7Cd7IiFhZVy8LL8wF43Q6DWnPN02lziC5nJR4bO7+hXgwvyAAs6dh70j3Q5/AtkxVeYla/TjsDS90Z/8FqCStmbipARIvUQYJwS+GABeLYt9gCJon5R/y1jKf3QWZ9tCINF8/p9JlonCU/yl7IAMI9Pyv0mJwzPLwbck8mA9ZqBG2RbocEK3+OEmPbcp9+TMC1RqeaS4G1QWUpk6q/XYrlNXbqSBf/yicJ35HpEfV877Mq2hCl5ZFcw+wOdiKn18yWuP8iI51p9yUFlcaTjYblt0kfYJ2r0TLCb0A48lqsrRbeBbGv9j0gaduT+n8tKlrpggFx3REa3q+T0mD9npqV4wnS97X3W64MwtYyGBYxBl8ldZ8e5hH0foNDNrImQqUVKUVEQ1N3zyZ/idRVlI9eW0d87A/dLHRQZzoF5iEeShzbFiAig/lOJjRknh/IWQA9zJWBp5X/Hamx9TNKNeQJPUE5yFmaC3us+DOBL6Q+jXQV3vfG4GmPc+Jme0+ysW2KJFGfth96P9GggBxgKXrHu1zDO0dONT5TqTG1m0LG+KtWmwsvS1Ngy6Sl/113OXQKhpIvYHeqPAuVqEBIGeQpf+/MzY8GsTWBTEkeb7smh6QwSTiWKe9q+filh4iHSJz6XYxp86gp5iTkwPGeyOwciOsCW/A4lWmJqAeP9vrtIZTkSHapU74OLvl8DB3xzzawOhdOr7KEMNw7LsFeeJ+H+J1qe9WFjjXoSbwInjgtDWOe9/n71lcwDKVyPMh6zvvoQXYAT5mHe6infVSldxpp7dQGj6cdidyBGlpnY0rKU h1TC3S1G 0QDf3F8v4v9B5bx4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 01:55:05PM +0200, Michal Hocko wrote: > On Tue 11-06-24 09:55:23, Byungchul Park wrote: > > On Mon, Jun 10, 2024 at 03:23:49PM +0200, Michal Hocko wrote: > > > On Tue 04-06-24 09:34:48, Byungchul Park wrote: > > > > On Mon, Jun 03, 2024 at 06:01:05PM +0100, Matthew Wilcox wrote: > > > > > On Mon, Jun 03, 2024 at 09:37:46AM -0700, Dave Hansen wrote: > > > > > > Yeah, we'd need some equivalent of a PTE marker, but for the page cache. > > > > > > Presumably some xa_value() that means a reader has to go do a > > > > > > luf_flush() before going any farther. > > > > > > > > > > I can allocate one for that. We've got something like 1000 currently > > > > > unused values which can't be mistaken for anything else. > > > > > > > > > > > That would actually have a chance at fixing two issues: One where a new > > > > > > page cache insertion is attempted. The other where someone goes to look > > > > > > in the page cache and takes some action _because_ it is empty (I think > > > > > > NFS is doing some of this for file locks). > > > > > > > > > > > > LUF is also pretty fundamentally built on the idea that files can't > > > > > > change without LUF being aware. That model seems to work decently for > > > > > > normal old filesystems on normal old local block devices. I'm worried > > > > > > about NFS, and I don't know how seriously folks take FUSE, but it > > > > > > obviously can't work well for FUSE. > > > > > > > > > > I'm more concerned with: > > > > > > > > > > - page goes back to buddy > > > > > - page is allocated to slab > > > > > > > > At this point, tlb flush needed will be performed in prep_new_page(). > > > > > > But that does mean that an unaware caller would get an additional > > > overhead of the flushing, right? I think it would be just a matter of > > > > pcp for locality is already a better source of side channel attack. FYI, > > tlb flush gets barely performed only if pending tlb flush exists. > > Right but rare and hard to predict latencies are much worse than > consistent once. No doubt it'd be the best if we keep things consistent as long as possible. How consistent *we require* it would be, matters. Lemme know criteria for that if any. I will check it. > > > time before somebody can turn that into a side channel attack, not to > > > mention unexpected latencies introduced. > > > > Nope. The pending tlb flush performed in prep_new_page() is the one > > that would've done already with the vanilla kernel. It's not additional > > tlb flushes but it's subset of all the skipped ones. > > But those skipped once could have happened in a completely different > context (e.g. a different process or even a diffrent security domain), > right? Right. > > It's worth noting all the existing mm reclaim mechaisms have already > > introduced worse unexpected latencies. > > Right, but a reclaim, especially direct reclaim, are expected to be > slow. It is much different to see spike latencies on system with a lot > of memory. Talking about rt system? In rt system, the system should prevent its memory from being reclaimed, IMHO, since these will add unexpected latencies. Reclaim and migrations alreay introduce unexpected latencies themselves. Why does only latencies by luf matter? I'm asking to understand what you mean, in order to fix luf if any. vanilla ------- alloc_page() { ... preempted by kswapd or direct reclaim { ... reclaim unmap file pages tlb shootdown ... migration unmap pages tlb shootdown ... } ... interrupted by tlb shootdown from other CPUs { ... } ... prep_new_page() { ... } } with luf -------- alloc_page() { ... preempted by kswapd or direct reclaim { ... reclaim unmap file pages (skip tlb shootdown) ... migration unmap pages (skip tlb shootdown) ... } ... interrupted by tlb shootdown from other CPUs { ... } ... prep_new_page() { ... /* * This can be tlb shootdown skipped in this context or others. */ tlb shootdown with much smaller cpumask ... } } I really want to understand why only latentcies introduced in luf matter? Why does not latencies already introduced in vanilla matter? Byungchul > -- > Michal Hocko > SUSE Labs