From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0CB77E7718B
	for <linux-mm@archiver.kernel.org>; Thu,  2 Jan 2025 02:58:53 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 047936B007B; Wed,  1 Jan 2025 21:58:53 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id F146F6B0083; Wed,  1 Jan 2025 21:58:52 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DB3AF6B0085; Wed,  1 Jan 2025 21:58:52 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id BCC476B007B
	for <linux-mm@kvack.org>; Wed,  1 Jan 2025 21:58:52 -0500 (EST)
Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 2EEFDC118A
	for <linux-mm@kvack.org>; Thu,  2 Jan 2025 02:58:52 +0000 (UTC)
X-FDA: 82961002488.27.8B4A5BF
Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99])
	by imf14.hostedemail.com (Postfix) with ESMTP id 3C93410000A
	for <linux-mm@kvack.org>; Thu,  2 Jan 2025 02:57:52 +0000 (UTC)
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b="Sz9gGJ/+";
	spf=pass (imf14.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1735786678;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=6Yh6vf8+ethHojYy3krdcXOephCjMdEDw+C6MkIdQ9k=;
	b=AFGIGzrBNUmwumCT7olIdvpPl56sK3pdiV3mY182V//tKyj84XHIym/S2buRP5vzG6edWg
	+Oca2xyvJzvD/gqiaP556XKbIOx2p6FD0uU8WOQoGaD1yVySBFYpREfGVaDntusHQ6O53/
	Fzmpq9QYsf8I5UwFGfw3Qf9os7IO7rA=
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b="Sz9gGJ/+";
	spf=pass (imf14.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735786678; a=rsa-sha256;
	cv=none;
	b=QmnlGxRlkr8lZBlTWeZtDaJNQVJ+VPmcAuG3mj6yXLuJYFQxjCyF1dsK+Fp89tB2O4ZiU4
	UMu5WwaPU2YQuVk7/lGe3NAJvfCSnNJPqBiJ3EuK1W1miz3Gq7AXS0wHkHXX88D6ZDLF53
	nCWs2JrY33O+iGJKqY9973zp+MRod/g=
DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=linux.alibaba.com; s=default;
	t=1735786725; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type;
	bh=6Yh6vf8+ethHojYy3krdcXOephCjMdEDw+C6MkIdQ9k=;
	b=Sz9gGJ/+iq7OPrdLjvfmmoJlSjiShrIXr7X/vpOxlTC1Inv0bDypgWFt/nEyWpQ3fdgxVvULRoFym8Dp8Lhuc5qnnfCkJYiggU6swT3yWYIn2ngmAnhhoj7LAZ4i2Lnmk5OYq/8T7H2L8pj4bUhZ6WO7FTnUfZ3XeoC+eXllGqo=
Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0WMiM1b8_1735786723 cluster:ay36)
          by smtp.aliyun-inc.com;
          Thu, 02 Jan 2025 10:58:44 +0800
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
  nehagholkar@meta.com,  abhishekd@meta.com,  kernel-team@meta.com,
  david@redhat.com,  nphamcs@gmail.com,  akpm@linux-foundation.org,
  hannes@cmpxchg.org,  kbusch@meta.com
Subject: Re: [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios.
In-Reply-To: <Z3OeJ9KGLQOt1KOI@gourry-fedora-PF4VCD3F> (Gregory Price's
	message of "Tue, 31 Dec 2024 02:32:55 -0500")
References: <20241210213744.2968-1-gourry@gourry.net>
	<87o715r4vn.fsf@DESKTOP-5N7EMDA>
	<Z2bVWWuGe0aiv-t_@gourry-fedora-PF4VCD3F>
	<87wmfsi47b.fsf@DESKTOP-5N7EMDA>
	<Z2g8xuCgqoqbpmtZ@gourry-fedora-PF4VCD3F>
	<87v7v5g99x.fsf@DESKTOP-5N7EMDA>
	<Z27KdHq2cwPg0w7S@gourry-fedora-PF4VCD3F>
	<Z277fuEdZldMdmQA@gourry-fedora-PF4VCD3F>
	<Z29yxfeZMowr27ZZ@gourry-fedora-PF4VCD3F>
	<Z3OeJ9KGLQOt1KOI@gourry-fedora-PF4VCD3F>
Date: Thu, 02 Jan 2025 10:58:40 +0800
Message-ID: <87ed1lexb3.fsf@DESKTOP-5N7EMDA>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 3C93410000A
X-Rspam-User: 
X-Stat-Signature: 1kiqwrzp8pby9krjiqg87ithe69g9p5r
X-HE-Tag: 1735786672-94127
X-HE-Meta: U2FsdGVkX18xxIhvp2vQURaO18g1kr90IFMAVvMqMTaRSyBDwj4PE3LgtQpHQqMqp6bdID71Hdw68UplFAE20WFfstvOWBmqhtcO1KXQFvuerICBhBuN+XLKjd9+efZKY61StDCLfRJfrQNCGxBz25Czpn/nlfWpyNLnwlISzl6dXr9S0w+0jJ3Tj0chrOZBKxf7LRQ4K8XxbPqZ56blsJ+4yzZuUAlcVlmqS9kIdIkNvQApEEqPBUn7Fwv54x9JCk16UkS184WuRJ4k67IgDiGAxXL7TOoa1zcqG7dQY2GZ0IVXJ/ZLwZGORJC6GZGmAPCp7b2TSjV/EhmXaGM6s8qsRv+fyYD5vm98PRHSnBQzzJvl0zI/v8XkoGPlZptw8orAxHtEFnYe4bMquE7g3wfo0/qGsvZgXHYdYX/lNfJTPOINNjqJxV8WcdO7eXVfnvn0PeKznnLsdgx0X8Tk0hkB+bgBflJakj2Dl99oQRQsBLrqeEuD2D10VWanE9M1LBvBv05cwGa2f6Y1fpJH1D+HJePYSUbhokAfZwVzM55Gmru+LibyFeoFMK60KyW8sPwTFRMCDkZiCl7pPh3321YLr7ZDlusxJqX+t8JCLhJ52rt6WQ7cGqPN6GjNtxmMkKcW1RJs3vFCvY2uOczLSn1sK0vvNhgBTWIzSK8I+Z+XJ+dMRcUOKLCL20mIJE5+w7emwdDgNRu4NU2EiQLrhwihRJ21WBn+KpIMaur5kBD5SX/VI21zEHPaAzjQLGGc3UlPZV+A+oJKlWUykLQ6QCqPGKD7EOnD62+XeTewOh98AsLQjr0j6dg1j6cpD7+UxI1FtiHIo6VIzeMajAom9M+R88s270yIEuSicjNh68w93vzhqdcXGTRqO7xgqAFJY7cMIBRNr8ybv75jGL13RCahwf6W5mJf6pduYecz6QlFhn0PHOZkUI8gDJklamBUnj1aFcfEGBSPQQTw2Ys
 zqei/Om8
 PticQsqNydMhyUYGoUFNP8OZ6ha5nt7AgiWIYyWIjk8rWga1QAgc6YaHpKOpTx3ZogNzX/PUOdYS38CpvzmbjVnn1P2hQPiliHZzd56jWggtduIpH9G2IvR1Weem7wLQ/dRgBXWBaaFrFjhF1QVvb7T4T9PKouDvhs/UJAN6+cVX1Eao4VfwGfh6QGHPLdjn58p3lHK1mz57xg7T1kizADHL72ErN3cmJWmEObp/FSpBnjMq3MY2X68iQPpQ/Ld5Q9IQ6xOilKBVHLfk=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Gregory Price <gourry@gourry.net> writes:

> On Fri, Dec 27, 2024 at 10:38:45PM -0500, Gregory Price wrote:
>> On Fri, Dec 27, 2024 at 02:09:50PM -0500, Gregory Price wrote:
>> 
>> This seems to imply that the overhead we're seeing from read() even
>> when filecache is on the remote node isn't actually related to the
>> memory speed, but instead likely related to some kind of stale
>> metadata in the filesystem or filecache layers.
>> 
>> ~Gregory
>
> Mystery solved
>
>> +void promotion_candidate(struct folio *folio)
>> +{
> ... snip ...
>> +	list_add(&folio->lru, promo_list);
>> +}
>
> read(file, length) will do a linear read, and promotion_candidate will
> add those pages to the promotion list head resulting into a reversed
> promotion order
>
> so you read [1,2,3,4] folios, you'll promote in [4,3,2,1] order.
>
> The result of this, on an unloaded system, is essentially that pages end
> up in the worst possible configuration for the prefetcher, and therefore
> TLB hits.  I figured this out because i was seeing the additional ~30%
> overhead show up purely in `copy_page_to_iter()` (i.e. copy_to_user).
>
> Swapping this for list_add_tail results in the following test result:
>
> initializing
> Read loop took 9.41 seconds  <- reading from CXL
> Read loop took 31.74 seconds <- migration enabled
> Read loop took 10.31 seconds

This shows that migration causes large disturbing to the workload.  This
may be not acceptable in real life.  Can you check whether promoting
rate limit can improve the situation?

> Read loop took 7.71 seconds  <-  migration finished
> Read loop took 7.71 seconds
> Read loop took 7.70 seconds
> Read loop took 7.75 seconds
> Read loop took 19.34 seconds <- dropped caches
> Read loop took 13.68 seconds <- cache refilling to DRAM
> Read loop took 7.37 seconds
> Read loop took 7.68 seconds
> Read loop took 7.65 seconds  <- back to DRAM baseline
>
> On our CXL devices, we're seeing a 22-27% performance penalty for a file
> being hosted entirely out of CXL.  When we promote this file out of CXL,
> we set a 22-27% performance boost.

This is a good number!  Thanks!

> Probably list_add_tail is right here, but since files *tend to* be read
> linearly with `read()` this should *tend toward* optimal.  That said, we
> can probably make this more reliable by adding batch migration function
> `mpol_migrate_misplaced_batch()` which also tries to do bulk allocation
> of destination folios.  This will also probably save us a bunch of
> invalidation overhead.
>
> I'm also noticing that the migration limit (256mbps) is not being
> respected, probably because we're doing 1 folio at a time instead of a
> batch.  Will probably look at changing promotion_candidate to limit the
> number of selected pages to promote per read-call.

The migration limit is checked in should_numa_migrate_memory().  You may
take a look at that function.

> ---
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f965814b7d40..99b584f22bcb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2675,7 +2675,7 @@ void promotion_candidate(struct folio *folio)
>                 folio_putback_lru(folio);
>                 return;
>         }
> -       list_add(&folio->lru, promo_list);
> +       list_add_tail(&folio->lru, promo_list);
>
>         return;
>  }

[snip]

---
Best Regards,
Huang, Ying