From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C833FC636D7 for ; Wed, 22 Feb 2023 03:08:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E52AA6B0071; Tue, 21 Feb 2023 22:08:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E02A56B0073; Tue, 21 Feb 2023 22:08:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCA3A6B0074; Tue, 21 Feb 2023 22:08:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B9B426B0071 for ; Tue, 21 Feb 2023 22:08:10 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8691E140649 for ; Wed, 22 Feb 2023 03:08:10 +0000 (UTC) X-FDA: 80493443940.17.C9206B5 Received: from out-32.mta1.migadu.com (out-32.mta1.migadu.com [95.215.58.32]) by imf04.hostedemail.com (Postfix) with ESMTP id 8645E40006 for ; Wed, 22 Feb 2023 03:08:08 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pJ7n8xcP; spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.32 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677035289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fh+ZWupQZLC960FUtMOl+KhoqTzp8W4OBFpCPFVSY8g=; b=fvUdxejs7gKwMqN+ZaeLLqSAPKiQv1Ws+v4xQ6x+CryCI3Dk3+qszTjWezan7vDjvx93dC gS0PDL/wmo9awoZ3VCdwiDdY2+cIkhF1a8EpUNJBscVXoTpsw0R/FqNKARBiusllr/WLwd 4cOSuydCKP79tJQwXJeSwwy8mzCtL+E= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pJ7n8xcP; spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.32 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677035289; a=rsa-sha256; cv=none; b=S7UI8+nB8h+ywwGCgnWU4slFOYRXfleQTfke/0kTNL2uDXqeNylGc+oTYOrah5+lSAahzm NxpIWwXBf6b9sy3e+d+MQmYzqoADdS6OnChm5XirgLBpKCg2iUP6zGjrWGwSlZPepg89iV l7pR3AnJVMIHHoDJ/TkkTwlA3RbwQBg= Date: Tue, 21 Feb 2023 22:08:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1677035284; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fh+ZWupQZLC960FUtMOl+KhoqTzp8W4OBFpCPFVSY8g=; b=pJ7n8xcP3A9T8tZjf80xmcq102eUtVS/EpajkfOeF5d6nJYCbEGJPE//No+GSJLwD7eZaV Sgs97WX3MuZL+6uxHL2ubR0U0WPVBeutZZYqC1BfULQOQpZSq7EiLP+/x1UWtv0NvjBGpl gTEbcN+Y8sVp7TUBe6Uo7Ph2xjeSkJk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Yang Shi Cc: Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: What size anonymous folios should we allocate? Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 7tt9sze8ce3rwrsf6ccfwyzarnhst9n3 X-Rspamd-Queue-Id: 8645E40006 X-HE-Tag: 1677035288-74809 X-HE-Meta: U2FsdGVkX1/8zIRweQPMYxnb2BuMfo+akVgc54yWss4YLBfa71yEOuLhjBmlsF+iBNR2K4z/esVkFEccnAAgg83v3GYEl+SQ1YnRJ7dxgHL8+ReINTKdal2/03N8xb2lRY19xyigdJM8FSufwgOhMB5FVHb2og+Jl4tm2XXyikShpL+eG5duHaBYH+oNWGDHy8qnKf22jQQzLfoDPIW1LfNdyt+AeLgFuUMKBBEMJZbFozqeG3RoiJhx6prSIEfNjaLD9ED6SF20HBF32OvM1Xdgq+v8P4s7ffWx8ddDAzaqIWwdLl9JZ9GuyWF/cRiXfprz7BRzNNjwGKjSYXUfJ1GbdV/xli7n/MbWUp6jwvDfnEGAyvedM7vzfQisM8Vcah+j+/xwSegB54mCLTvzcBfiF+nY9uDJKh8dTD9CuC1BTH5fhzqqrNIBOiV6bTE9F/Fdv3Fi3SUuNtb/dkh1rerRFGp12mEYuV6C58JlUiH7R5Dk3e//OOwaht/iUpDCqtVB61ggeyi5RzWjHP/QUkJ0LoGnEOeuO0RV6MydM8F5wOxxuj+RRpPMvaYLEdppm1NQfmOQiJGXA2ZsuN2B1u9cfLoQb+TvG5e6GAcOAubTzpc40MW/N9nveDQW0sZHcoreCsZCSvntt005VFfdqVwP4xDaOIYVCKViPTYJjTCibDhfYUO8Z9WEiBVVeQa7nZekJfM3OyLcihCfhO3tc88pJc+BnpXqqVHeFQ5FIe8DC3mjeq8Rklqxp0+KekMBio63N1EaJFslIuTpwtdcjVJqTfPrTbw3QDGOqxZHEi8hZpTnCu7DOXCZPzvk2WdVQmeaiFu+Zl4531nI+kh+ezVJoOumOKmV9XoF5HAv76US8Pu0o60TViaUn+fsoR8c2gTlwhpvduWzVBuxT97/xTmCvGaLIQT8/5Ir/rbm+04a6ybIY46vg5lCNuyXujo37Aec69fSHZ/IGbe7TDP KG3crBsI XTvMx3ygUDGJuM0/hl93kBfCnCU29dR+gWQLx7pKeDFP4E/AwYLXltUCqU+bWSvUWcrY4YrVOVdXV+zH2Juz4DT42Z098sqA6edzr6SQ+pmi0A2ZO/Ty4LRlggf2An8VpdNn9eGhoTs7QP0e5T6yP+wFyR0fywjv5J6DWj/548QFKU90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 21, 2023 at 03:05:33PM -0800, Yang Shi wrote: > On Tue, Feb 21, 2023 at 1:49 PM Matthew Wilcox wrote: > > > > In a sense this question is premature, because we don't have any code > > in place to handle folios which are any size but PMD_SIZE or PAGE_SIZE, > > but let's pretend that code already exists and is just waiting for us > > to answer this policy question. > > > > I'd like to reject three ideas up front: 1. a CONFIG option, 2. a boot > > option and 3. a sysfs tunable. It is foolish to expect the distro > > packager or the sysadmin to be able to make such a decision. The > > correct decision will depend upon the instantaneous workload of the > > entire machine and we'll want different answers for different VMAs. > > Yeah, I agree those 3 options should be avoided. For some > architectures, there are a or multiple sweet size(s) benefiting from > hardware. For example, ARM64 contiguous PTE supports up to 16 > consecutive 4K pages to form a 64K entry in TLB instead of 16 4K > entries. Some implementations may support intermediate sizes (for > example, 8K, 16K and 32K, but this may make the hardware design > harder), but some may not. AMD's coalesce PTE supports a different > size (128K if I remember correctly). So the multiple of the size > supported by hardware (64K or 128K) seems like the common ground from > maximizing hardware benefit point of view. Of course, nothing prevents > the kernel from allocating other orders. > > ARM even supports contiguous PMD, but that would be too big to > allocate by buddy allocator. Every time this discussion comes up it seems like MM people have a major blind spot, where they're only thinking about PTE looking and TLB overhead and forgetting every other codepath in the kernel that deals with cached data - historically one physical page at a time. By framing the discussion in terms of what's best for the hardware, you're screwing over all the pure software codepaths. This stupity has gone on for long enough with the ridicurous normalpage/hugepage split, let's not continue it. Talk to any filesystem person, you don't want to fragment data unnecessarily. That's effectively what you're advocating for, by continuing to talk about hardware page sizes. You need to get away from designing things around hardware limitations and think in more general terms. The correct answer is "anonymous pages should be any power of two size".