From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51AD3E77188 for ; Tue, 31 Dec 2024 07:33:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91B446B007B; Tue, 31 Dec 2024 02:33:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CA8A6B0082; Tue, 31 Dec 2024 02:33:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 791EF6B0083; Tue, 31 Dec 2024 02:33:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5A68A6B007B for ; Tue, 31 Dec 2024 02:33:02 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C93B580D98 for ; Tue, 31 Dec 2024 07:33:01 +0000 (UTC) X-FDA: 82954435578.19.C39BBCC Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf15.hostedemail.com (Postfix) with ESMTP id 7F4B4A0002 for ; Tue, 31 Dec 2024 07:31:34 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=BPJwg9KD; spf=pass (imf15.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.173 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735630343; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8GdCZ2+3WfpH4K7/PaKyBk2I+NRHqlahYd4+UosuTIA=; b=vew6RCUIRm3h+k9N1lo2zeujv7qQl9MjtYQWqa+B6ulRAM1XohPFAx8xkyRbKFOYhYqoH5 Gn3UnCWEzsjabqItemEZBoxBDP8EeSONwSJ1Cp2+IklRTAauRM+LlQg6OT5Er/t7RKrd1z y6Q0pK0D2GB/xbjAcXsOW9vwU5Tgg0w= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=BPJwg9KD; spf=pass (imf15.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.173 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735630343; a=rsa-sha256; cv=none; b=tRptbfZPyG6flOmAA+6ygdz/pV9+NjzvQFNSCUgEAlEVv2rfTDyBmcLPwhiD42HkAP+g5Y YbG5VVE+sK+W+HA3rbeBRZn2WGoXnKugZ3AEuETT0w7/KDaEjg3m6fv66t9DqgoVsqAjH8 Ghsd51uLOgN6MCGab0MohYffPvKRA3w= Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-7b9c6c2c44eso581553985a.1 for ; Mon, 30 Dec 2024 23:32:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1735630379; x=1736235179; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8GdCZ2+3WfpH4K7/PaKyBk2I+NRHqlahYd4+UosuTIA=; b=BPJwg9KDo0MVxGwdlmko7YmWwScOyisCPt6bMFDRCF2qFhebb82voP8jR92Eppj/hB eNvHB4wyDYq3SXYEhv1iz90intQpA6C5JB13ez6LS9pIim899JZKQYsmwqYeo35jh4uZ 0sJ6eHgKZbikz7xSIgMh9GKFjg6gnn+edJ6ye0p89CI8tXt0BfPqEK/UsVY5ChB65mqe LgPNyEad3g4tesyNpW4hsNbf9Ba8yRp4LAUtzcyl8k5sMci1B8cpBi0QL1gXcaDZpM9K qYZGg2NNVZNThooWsmql4mayL1Y0TLOVcSgk6fRXCNS9nzox3A2exMbOcyS2u8dWxT/6 PG7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735630379; x=1736235179; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8GdCZ2+3WfpH4K7/PaKyBk2I+NRHqlahYd4+UosuTIA=; b=ZxQhvDdIZhw3nwZmcJZz1nP+6pKfR1Q9yjwid+FLlR5sW3kIxaQKEh3qo+1kdaaZHc j3KO4xfLFa6e3oUDYYgHUaC7tYVblEsaZAzaHzDc/vzJyklNGh74ByPEVMl4I44deAHW 3jBqqxlSO/mdtxrQiHYYHxC7xPUr9iCMFXkU+FV7IlwmlmWTYwmC2b40AqcP3f0afPnq OUsTzfzzSWKzaVBkhhghIAkDfZ/pQtUzOtYHkYNunYr0PCEDDReHCsWEerGkYXDCHOk1 0j6iChw7ynvR/ZnyF8L98veXrN/7cjJsyZ92fWXFEC1pehOS4929tYcYPeHK4BDWwLKa PfLg== X-Gm-Message-State: AOJu0YyP7iYJYBzbmG4OgpfloW7VsIHlOZWICz3WwzxN7m+NJkoNq13o fH+yR452LdtleYwbpaFGWyE4VZvzt0wP/gkOSm5wb7d8GXkCVDXbgDatlC1xqmE= X-Gm-Gg: ASbGnculYXVgNc55z88TlWCShMMH3OAlV69VQ7tgtqd1iHfHNhEIXYzjgKiVkdEgI0b fr8ilReHzLeuysJ8BsFXNWvkQ3/RVLkfXY0zCo13IJkXQE6npML6xhkR5Cx7ALVeN4bvvM2YVsl JTVRhCOzyq2mEmf4MWGyIh64wlphWI4jh5cP0XBDOWRauySBXFCyHLK3KAGPsquo4SmZsdMIfVT +yf+6sU5f2LfontD6Bs+uNoEJHXgd6SbSEYX5nFziqEEuATfZbYacIkz67wmafQZT1wmv9rvmra Efjfj4tBPLpo0cOit5MciiKaCYZihKlNh78SElU= X-Google-Smtp-Source: AGHT+IGKzFSQnlgEVQ7W9fzOdt8YoCw/kHV5wnhl3R8BviW+UeBxsY+IExYXvcAWzL0EG/6iyEmOrg== X-Received: by 2002:a05:620a:4493:b0:7b6:de3e:1831 with SMTP id af79cd13be357-7b9ba6fdcc6mr4897776885a.9.1735630379039; Mon, 30 Dec 2024 23:32:59 -0800 (PST) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-46a3e6a1283sm112843631cf.40.2024.12.30.23.32.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Dec 2024 23:32:58 -0800 (PST) Date: Tue, 31 Dec 2024 02:32:55 -0500 From: Gregory Price To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, nphamcs@gmail.com, akpm@linux-foundation.org, hannes@cmpxchg.org, kbusch@meta.com Subject: Re: [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios. Message-ID: References: <20241210213744.2968-1-gourry@gourry.net> <87o715r4vn.fsf@DESKTOP-5N7EMDA> <87wmfsi47b.fsf@DESKTOP-5N7EMDA> <87v7v5g99x.fsf@DESKTOP-5N7EMDA> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 7F4B4A0002 X-Rspamd-Server: rspam12 X-Stat-Signature: 7aohj7qtxtpsqffco6pku6dzwhazufh7 X-Rspam-User: X-HE-Tag: 1735630294-257340 X-HE-Meta: U2FsdGVkX18IRuO4jTEMfzMvDpmjAl/1SS5XwjvQPSYeXFq9BneMjGrN9mmH24tf6qdThdVzVn/pghKbIgFBswwW2L/SmdG3xpbxuIKSOx344KqxbsVoA7+8qL1W48AeufN6p6nHHBUtY3fZgmmmleOwh0sQHOSMwU+tqEQIt+vmKA1u+4vkU3SQelAcbR6K52dbaWJ5vhX0hygK1WvzUO9hK19W5ddYDoeZlU+pS/3eqSD7Iessb8Jw2XJzZaTavuu6xWoAmwivGSfutES2hdl0yx5Vv6Fvqs/kMycBrrPEpAzhe1fIkDJ+OuJSG3ldbv9jENTwAa1q0NSoAaBv2M05hVtsT0osjaqJbSMlgQAGwxDMXBgDXR0AfH9yBkQjzeVDgmkDia5y7y3JwlVobAedz0AF3hj89B07+Mrk5WRFJdBNRmsvBizJf3yuZCzFK5nUEB0VnXWcCPYX6lyMMAvUX7suTB/XBTErByKNFWzTZPH6o0BvB/jlpuLWYa9WNeSQai9hQHLJA7UmtPlpe3aIKPrNHGOMXWt8ZpRxnlsjfe0tOYJ3LIM3EVuhbop9uSOzP7qW8b0w3x8wEqACqM9qTwhTLYcpIDjzNbw+U8c7gx2qCSygu30hbAUQsm4s+Rqw2QGQkRndfAE+CgZ7HDkZRK8LBOq9dOOL7eHN6h1kNORxxGyGNmGHCuE3t8/JP3YyXO8HmoyFm3K+Q69rpPcTCN0ZYfiKjWKxx/VZ7gzxVFomcRvOy9AOCdjtKUZiI9ggOSybdUM7WTytRg1hKJjDpjyRAwL7bThvwcMx0836dkhv+Q0YYPRU2s1O+PuJHPKpAyRM3s0X2nFsoMEsUbTpIW+a0+0SGMAEMKZYbzGEA6fcv11hCW8bwK+Otmx0LRd2mSX0I1QFwEF6nX5TACU0cPRUTnrMOhULaicHgPiJhxdxZK1g7tAbl5WRpfJVxOD+ILgYaEffidBCtIK Tf6Q4ef0 usLpI+12/uPaiUialDMnFdnZksAPm2xFM4aTQS+51PjlGHWiIpZVKpC3VJWjJONhUhXWqRIPNze6tl2wGHs+c01i9AvjoiL1cc7gkl1JHCMQODQxRQJ7+UGTSe938TeIDNk+LulxhXTupTh463oRQwqBfA583o1+D5yLhCQO5iHxpX4+JuJ+T5MFxCjB2vH70i9/zPJE40cMAzUWkN+tfQZsWXk8EQMK9Ywk4n4MvT1dlaTurPT5UbTE/B/zksbXbawvz0YluvCK1fzbdn8/hefPowTjhiiPKawKkejlFzUTFWgqLjmOr5bg33r9P9EGgxHXcBmFDPAJ+sDQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 27, 2024 at 10:38:45PM -0500, Gregory Price wrote: > On Fri, Dec 27, 2024 at 02:09:50PM -0500, Gregory Price wrote: > > This seems to imply that the overhead we're seeing from read() even > when filecache is on the remote node isn't actually related to the > memory speed, but instead likely related to some kind of stale > metadata in the filesystem or filecache layers. > > ~Gregory Mystery solved > +void promotion_candidate(struct folio *folio) > +{ ... snip ... > + list_add(&folio->lru, promo_list); > +} read(file, length) will do a linear read, and promotion_candidate will add those pages to the promotion list head resulting into a reversed promotion order so you read [1,2,3,4] folios, you'll promote in [4,3,2,1] order. The result of this, on an unloaded system, is essentially that pages end up in the worst possible configuration for the prefetcher, and therefore TLB hits. I figured this out because i was seeing the additional ~30% overhead show up purely in `copy_page_to_iter()` (i.e. copy_to_user). Swapping this for list_add_tail results in the following test result: initializing Read loop took 9.41 seconds <- reading from CXL Read loop took 31.74 seconds <- migration enabled Read loop took 10.31 seconds Read loop took 7.71 seconds <- migration finished Read loop took 7.71 seconds Read loop took 7.70 seconds Read loop took 7.75 seconds Read loop took 19.34 seconds <- dropped caches Read loop took 13.68 seconds <- cache refilling to DRAM Read loop took 7.37 seconds Read loop took 7.68 seconds Read loop took 7.65 seconds <- back to DRAM baseline On our CXL devices, we're seeing a 22-27% performance penalty for a file being hosted entirely out of CXL. When we promote this file out of CXL, we set a 22-27% performance boost. Probably list_add_tail is right here, but since files *tend to* be read linearly with `read()` this should *tend toward* optimal. That said, we can probably make this more reliable by adding batch migration function `mpol_migrate_misplaced_batch()` which also tries to do bulk allocation of destination folios. This will also probably save us a bunch of invalidation overhead. I'm also noticing that the migration limit (256mbps) is not being respected, probably because we're doing 1 folio at a time instead of a batch. Will probably look at changing promotion_candidate to limit the number of selected pages to promote per read-call. --- diff --git a/mm/migrate.c b/mm/migrate.c index f965814b7d40..99b584f22bcb 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2675,7 +2675,7 @@ void promotion_candidate(struct folio *folio) folio_putback_lru(folio); return; } - list_add(&folio->lru, promo_list); + list_add_tail(&folio->lru, promo_list); return; }