From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A776DD1BDD9 for ; Mon, 4 Nov 2024 18:13:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27D4B6B009C; Mon, 4 Nov 2024 13:13:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22E5A6B00A0; Mon, 4 Nov 2024 13:13:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CE356B00A1; Mon, 4 Nov 2024 13:13:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E0EF56B009C for ; Mon, 4 Nov 2024 13:13:08 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A73FF1A0401 for ; Mon, 4 Nov 2024 18:13:08 +0000 (UTC) X-FDA: 82749207660.13.436C429 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf18.hostedemail.com (Postfix) with ESMTP id 909301C000A for ; Mon, 4 Nov 2024 18:12:52 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="DZoj5z+/"; dmarc=none; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.182 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730743928; a=rsa-sha256; cv=none; b=ZNksp3GJ/R/p/+jyatHiOvLV5Tw0hPnYlw3npQo9kSX96kN+Qj4Ojvz2iy0g+awXbFuzYF a2zhyeiZoN5+IAJ9jOQSl70haZdzC26BcTG0Xva0EtkbjheYKakZ8a12R/5AqqjKmBtOQp iKbhIk1IUzvMtZAu88Ei3zQxAhyH8ik= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="DZoj5z+/"; dmarc=none; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.182 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730743928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fq65Jr156o7cXIfpMfyY2B+BHJCK8km9jpX2kv0T/W0=; b=VFmJUdETSwI0SYR/ys+b5DwJukiHxoJBXIRJ1oVBxv6l2B+9xIe986QlgVTjW1z2IfEe/9 pOAolwU8YXVvvDmdjHVfC9q6P321Nf5LT3vLY8njje/QzFk9c9iW5eglJV4qIjuQq74mS4 /cv8SUNB3EG+E7ucV87qIefBkx3lwFg= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-46089a6849bso31280641cf.3 for ; Mon, 04 Nov 2024 10:13:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1730743986; x=1731348786; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Fq65Jr156o7cXIfpMfyY2B+BHJCK8km9jpX2kv0T/W0=; b=DZoj5z+/ZhyHN1epVP8dU1LYNjltsn3mBPXzLSWlK/4fW1aBdMxrHRZTqNrzxHOlsR b+hw26MUFWCIIT88lqM7h5WLqvDozLtffTRXd77Y90U+V0b7GVx/nDLUEBowKl4SpdnK dKV8okiIsSdpyy7OBGi93w6VlvJjV7ASyffriqtTGuceMxOGT6mM3SKoLYVovH+g+3Q6 w4QsRK/rSw2VIklVQsYw/0UJnKgF7V7y71uu243jM2s1GpVBK/HCAof/CPATZr6+CdpU Rl9WuAMFpjgfb5Uom4NGIWheahhJd58c2z9NQPUhAvoaFMtuzbl2gcdA0v53p35tSI7s pdIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730743986; x=1731348786; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Fq65Jr156o7cXIfpMfyY2B+BHJCK8km9jpX2kv0T/W0=; b=e5D2w+MscC5uHhIzpN33L48vw501+NtH0v2u5U3X2Zv/y89SZeOwkPLV23L2AjqWwe mBOJlXIIYSU2o3xEm6K1g4H8YbmYGRoYmEtCuETeTomkQzdy0bTBMP0pODTa1+3KVm/c 9Y21o8gq22JZXS1QQVxo+Z5QkO7bSa/EwXnp2x2gBsIxNdwNb1Fu7IzTQ3wnBQgTd+i4 hAg2uID85dF6AUBeQfcocCuFlpwGqTkf/U0lSHfTsUT3Avy3Eqh8xRDR42l3PXxiMbA+ kbP1iyKj0qdf8oZ5wMAkS7hxbsfuXmFehIkZu67zUBJKwogsr48D6xJEpK/gBa8m3KXm soYg== X-Gm-Message-State: AOJu0Yy2d+xaOd2arrXsMZuYcZCYnOSzrIT3oxYolh55CfrnZwjuOjYa /+vqKs3/jhKAvCmI0PFHa0ChuwPegXj64OFcZA9vp/Z/9DNMjVmTuJi7wIgAF9Q= X-Google-Smtp-Source: AGHT+IFW3eozzyK4VqMdL5acvyw9I8slhGfHLF/JtHnAnaHK8kI5tQ6B6mtU2oEUeZcF1g6GUx8NKg== X-Received: by 2002:ac8:530a:0:b0:462:af30:48d with SMTP id d75a77b69052e-462af30082dmr184347881cf.49.1730743985571; Mon, 04 Nov 2024 10:13:05 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-462ad0a004asm49304471cf.23.2024.11.04.10.13.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Nov 2024 10:13:04 -0800 (PST) Date: Mon, 4 Nov 2024 13:12:57 -0500 From: Gregory Price To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, david@redhat.com, nphamcs@gmail.com, nehagholkar@meta.com, abhishekd@meta.com, Johannes Weiner , Feng Tang Subject: Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Message-ID: References: <20240803094715.23900-1-gourry@gourry.net> <875xrxhs5j.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ikvefswp.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ikvefswp.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspam-User: X-Rspamd-Queue-Id: 909301C000A X-Rspamd-Server: rspam01 X-Stat-Signature: uqtjic6qxnanrcf9m8f8rzgpb9u9xf3a X-HE-Tag: 1730743972-857584 X-HE-Meta: U2FsdGVkX18hp5rGmmWm8HpkeQlssXyZK20b/Jb9+XGK+K+x3p1luTk4DplxHeAJ8nik7nosO4Dv/3K/FURNNVuKQ9I9WyqLChuaCbKVdBGPEEZSCPFHuOMfy/RiqBH9l5adDeXfOCuizi2KW8/IRVaNfXJflB+DT10KUTLiEsbWge4T1CcBeJCIfRXfl5DyVndRN6Fv1YRuObMfAATMJTGrSYavItYrv8snKiSQEnBjH3CfZw+3WnPKO9ll7JM+aOSmFKAe56v0FvV6cGf1a3YiMeB7klWxVZk9kC4LWN3IcfBDYl9/MO/yd9ilUaxXRaqgAwEoYzDej4PZ/4agHY9Dd8sBGE6vFv6HoC4J+CEjrVm3v/49xA1c42b2sOd0u5MRNyOBigXcmO4Ksz5Upxn6YazplnIpW3D7Oq+FO8v8mnZjC2l9feCRSCdrFwhvZjjVgqZGq8EcCVXHYyBrXy+CApsbnYqHqiuzXFt/CR1voVq9BBVSQPW93RFGNefoyxK00xBs/mNkdXoDRiEGWPF+wcLDx7H1BxLjcEsIAjJkL/wBw8w2JjaQa4VO8CF2u1d1DYHRszc7E3utx/5NTxfFi2XJ2QKl7ERqZMa75wCqxyoHdFDz9Pj+OD7eKntPQ2yc1NLOjG+kLwskKISyznL9YW1NvWWvU0lj6JClu1Y0Sc94ReS7FzeUqPH4TT0t2TEonkzYsPDqh+EBdb3I40hItVaMdnPFZ+Emrcc7Dm3qb8qAvjSHojWb/ok2A2458bo0LgMKR5vhj8VYVIThYOvaeSKKQy9Bpwyot9rTzfIQyhP1GgCfmhDvkX9HxEzi/XtEwFrZ3B9SV3tWW3v2y7nnDZ55QtFH8IZxWEqRK9AkHpKqAfqFatKgunHMDDCDtKCgrEznvO8uckwQfEFeA+yRg9QP926hRAvPNYjcHaNOuQTQUUv9vq+uH1WEElaU6uciB7lvNXH4UKrqN/W 7LtZeu+h BvC3m/4yb2foGaKCLLb65C1e5Id+wzzte1L/HPae3mQmGcMYUY/BQ26o+IgQEQ8n9o1VUheiNA7+eMAq6pW8qFTAcrk3YyBk9zYG9BrRHOpa9TuGh5/vTcX+8zwcF4S26eQjR/w7cIbbZzDD7CdNFhNVI0jfifmXuSlgG4s4k2nfwfNM406GFniU6ocyY4f5wayzsads4VwQCA/rh5gJU73+Eh5B15lEHCsMSHcmJZpLpkQkU96MsWDux78VxT78GpXfAuaI12XOLLN38YMuk8Iio+I7c+XHgSvLnMA/MmmDQVGbAzToDY7SD9S+HYAKAbKEDu59XNOKyluJWCxOCoowuvVZ+/5rVPTfvG63+jncZu1OReC9YcDvzhQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote: > Gregory Price writes: > > > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: > >> Gregory Price writes: > >> > >> > Unmapped pagecache pages can be demoted to low-tier memory, but > >> > they can only be promoted if a process maps the pages into the > >> > memory space (so that NUMA hint faults can be caught). This can > >> > cause significant performance degradation as the pagecache ages > >> > and unmapped, cached files are accessed. > >> > > >> > This patch series enables the pagecache to request a promotion of > >> > a folio when it is accessed via the pagecache. > >> > > >> > We add a new `numa_hint_page_cache` counter in vmstat to capture > >> > information on when these migrations occur. > >> > >> It appears that you will promote page cache page on the second access. > >> Do you have some better way to identify hot pages from the not-so-hot > >> pages? How to balance between unmapped and mapped pages? We have hot > >> page selection for hot pages. > >> > >> [snip] > >> > > > > I've since explored moving this down under a (referenced && active) check. > > > > This would be more like promotion on third access within an LRU shrink > > round (the LRU should, in theory, hack off the active bits on some decent > > time interval when the system is pressured). > > > > Barring adding new counters to folios to track hits, I don't see a clear > > and obvious way way to track hotness. The primary observation here is > > that pagecache is un-mapped, and so cannot use numa-fault hints. > > > > This is more complicated with MGLRU, but I'm saving that for after I > > figure out the plan for plain old LRU. > > Several years ago, we have tried to use the access time tracking > mechanism of NUMA balancing to track the access time latency of unmapped > file cache folios. The original implementation is as follows, > > https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 > > What do you think about this? > Coming back around to explore this topic a bit more, dug into this old patch and the LRU patch by Keith - I'm struggling find a good option that doesn't over-complicate or propose something contentious. I did a browse through lore and did not see any discussion on this patch or on Keith's LRU patch, so i presume discussion on this happened largely off-list. So if you have any context as to why this wasn't RFC'd officially I would like more information. My observations between these 3 proposals: - The page-lock state is complex while trying interpose in mark_folio_accessed, meaning inline promotion inside that interface is a non-starter. We found one deadlock during task exit due to the PTL being held. This worries me more generally, but we did find some success changing certain calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than modifying mark_folio_accessed. This ends up changing code in similar places to your hook - but catches a more conditions that mark a page accessed. - For Keith's proposal, promotions via LRU requires memory pressure on the lower tier to cause a shrink and therefore promotions. I'm not well versed in LRU LRU sematics, but it seems we could try proactive reclaim here. Doing promote-reclaim and demote/swap/evict reclaim on the same triggers seems counter-intuitive. - Doing promotions inline with access creates overhead. I've seen some research suggesting 60us+ per migration - so aggressiveness could harm performance. Doing it async would alleviate inline access overheads - but it could also make promotion pointless if time-to-promote is to far from liveliness of the pages. - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed by Keith's patch), which will obviously be a very contentious topic. tl;dr: I'm learning towards a solution like you have here, but we may need to make a sysfs switch similar to demotion_enabled in case of poor performance due to heuristically degenerate access patterns, and we may need to expose some form of adjustable aggressiveness value to make it tunable. Reading more into the code surrounding this and other migration logic, I also think we should explore an optimization to mempolicy that tries to aggressively keep certain classes of memory on the local node (RX memory and stack for example). Other areas of reclaim try to actively prevent demoting this type of memory, so we should try not to allocate it there in the first place. ~Gregory > -- > Best Regards, > Huang, Ying