From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB435C77B6E for ; Fri, 14 Apr 2023 16:06:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB52B900002; Fri, 14 Apr 2023 12:06:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E63C36B0075; Fri, 14 Apr 2023 12:06:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D52B0900002; Fri, 14 Apr 2023 12:06:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C379F6B0072 for ; Fri, 14 Apr 2023 12:06:55 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9AECFA0381 for ; Fri, 14 Apr 2023 16:06:55 +0000 (UTC) X-FDA: 80680475190.17.4CF22EF Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf25.hostedemail.com (Postfix) with ESMTP id AE62DA0033 for ; Fri, 14 Apr 2023 16:06:52 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681488413; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lIbKC7uPSHMbCrXCAmSsyRWtf/P4VOjV6NkWcDKspFg=; b=aIBJNhgGpgZ9gw8qtgG5bN10lPZbYQBJefBlpKG843CwbqcsXQFFB/fX+TS47PESsYMKjU 9tZ04+F9phgZmK4QpZmrTDkiXtxlhTf0Z2zX5yKpMwAtTdtyCCMxzAltcxa5xWHnT7j+Xx IMx4LmwFU343AxHXKB+eK4ekCDqnRgQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681488413; a=rsa-sha256; cv=none; b=mhFeG+6nzASLuQdJMe9UdF5oLybJ/siKF5IWPcvegtq+BLVC6qq9BEDJuv8UZo9pjuj3yi vTxbDsP71+oQPF2yuB5vRjvvkFPIQaAC1ctg1X+uAwJ8Bl+3PqpX+CkV30gjZ279hCW4zO U2D1uaFCEkpdqZdeziIn+cHzRYOemsg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F3B512F4; Fri, 14 Apr 2023 09:07:35 -0700 (PDT) Received: from [10.57.68.227] (unknown [10.57.68.227]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 733DC3F6C4; Fri, 14 Apr 2023 09:06:50 -0700 (PDT) Message-ID: Date: Fri, 14 Apr 2023 17:06:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [RFC v2 PATCH 05/17] mm: Routines to determine max anon folio allocation order Content-Language: en-US To: "Kirill A. Shutemov" Cc: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org References: <20230414130303.2345383-1-ryan.roberts@arm.com> <20230414130303.2345383-6-ryan.roberts@arm.com> <20230414140948.7pcaz6niyr2tpa7s@box.shutemov.name> <2b76ee7e-06d1-94ca-d22e-46b6302b7c30@arm.com> <20230414153747.n5kyhvb5a726lvrz@box.shutemov.name> From: Ryan Roberts In-Reply-To: <20230414153747.n5kyhvb5a726lvrz@box.shutemov.name> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AE62DA0033 X-Stat-Signature: oaybf6pin3zdpm1xierfgj7cc54fet94 X-Rspam-User: X-HE-Tag: 1681488412-352241 X-HE-Meta: U2FsdGVkX1/kjSGgKH65WOwAg/5TIZUGPa4XTp2hi17JUKQlAGDEKtCgRO0nTRgJw9QUWM7LheJoDQjSGUNGZEPRWUcLyU5vx+BrOhHbIv/7Phs4znRqkssNPsF49vloZBokrQFT66LeX2vPNPvFbPLS8uRGPj+Cx7NcDQhmYByGSYxlviwUMMIjC9HfDXeyq0aGhkNKTI5b8UsRJK2/Q5R0/dT7SIDp6TAdovLl0/nFUFkM+EgoKKe+BYJYKtyM1x1S2DizWygEHjXrmSpr/VyAOVyR6pO1Vjx8vWcQcbT+lR7Ga80XaTWfhAbMsmru1B5NaDPLC4kHjzy7ddmLTENzq8eJ1lPqviUJ+PFFiNAFfeQdv+o5NQ3mdFNz8kctZZlLZLeChQIJur2tCDIlCNMxip+yT3gfYbsjxNyo9fTE2mrEYCHkAcPSoeLbI/UrCzG2BTFOhpN0Tlmwi2jXpXdA7/5AsbRcZTVdEDC04un0RRloBcHUw/xPxLlIHqZEMyEv7HIeIQkPWjZy2nixFkIpDt92cjjNI5ssR2s56bYkuQk63EJWrlPxXBN1GpCx4jZVVK9S25PCA6PcohWCWlK79FU7q8UIAnT8VQAmsjNlJ4v1p3IfCJacotBGzDxESwIpZdEj6wCt/lI/0kvaOe7ymMlyYnr+N3fub3/+s4MVSiJZ3Hu8adGHx9jdVyDPZfFp9UcoqgRxPXaR7x22biQN95nQHhUSgkrYOUkMGFhMbq7BJJxK+cOKZ+38vTcvnrgbzblI042Ao+hJPKpCSESbf0bN3bLyJ8matje1MJTSm3n7ZuOv6Oqf+yzTIb4KLkK5JPRd1/EvWRYcR6wDbfDXW1/NYBE62XOODMkvIf4ctvwFR9WFGbF2WYwoXQLT5ZYPd2OIMElKTZ9VsE+ITVTIZviKjcUfb0dS/Ffflc9JTmfAeZSoIlxRvKFrLwczVRPpRKzEWzQuqCEB7tZ sVdWfSr+ B4sENC+stsSbuIh2YVPRqT/0joj0Jf4nwa72U2YgjufUxWeffM26BQH2f2mGYyy/66cPHLKp/+VrNiPZxosauGXs4aCsOJ+5ep5Xm/b4cSeeXeDKh68AtU4JIMwwjIDpe85zuhRQlcvB6X7u8zqprhBvwRhblDSTa30zCQdrNOTC1UK7UwJETFbJmfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14/04/2023 16:37, Kirill A. Shutemov wrote: > On Fri, Apr 14, 2023 at 03:38:35PM +0100, Ryan Roberts wrote: >> On 14/04/2023 15:09, Kirill A. Shutemov wrote: >>> On Fri, Apr 14, 2023 at 02:02:51PM +0100, Ryan Roberts wrote: >>>> For variable-order anonymous folios, we want to tune the order that we >>>> prefer to allocate based on the vma. Add the routines to manage that >>>> heuristic. >>>> >>>> TODO: Currently we always use the global maximum. Add per-vma logic! >>>> >>>> Signed-off-by: Ryan Roberts >>>> --- >>>> include/linux/mm.h | 5 +++++ >>>> mm/memory.c | 8 ++++++++ >>>> 2 files changed, 13 insertions(+) >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index cdb8c6031d0f..cc8d0b239116 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -3674,4 +3674,9 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, >>>> } >>>> #endif >>>> >>>> +/* >>>> + * TODO: Should this be set per-architecture? >>>> + */ >>>> +#define ANON_FOLIO_ORDER_MAX 4 >>>> + >>> >>> I think it has to be derived from size in bytes, not directly specifies >>> page order. For 4K pages, order 4 is 64k and for 64k pages it is 1M. >>> >> >> Yes I see where you are coming from. What's your feel for what a sensible upper >> bound in bytes is? >> >> My difficulty is that I would like to be able to use this allocation mechanism >> to enable using the "contiguous bit" on arm64; that's a set of contiguous PTEs >> that are mapped to physically contiguous memory, and the HW can use that hint to >> coalesce the TLB entries. >> >> For 4KB pages, the contig size is 64KB (order-4), so that works nicely. But for >> 16KB and 64KB pages, its 2MB (order-7 and order-5 respectively). Do you think >> allocating 2MB pages here is going to lead to too much memory wastage? > > I think it boils down to the specifics of the microarchitecture. > > We can justify 2M PMD-mapped THP in many cases. But PMD-mapped THP is not > only reduces TLB pressure (that contiguous bit does too, I believe), but > also saves one more memory access on page table walk. > > It may or may not matter for the processor. It has to be evaluated. I think you are saying that if the performance uplift is good, then some extra memory wastage can be justified? The point I'm thinking about is for 4K pages, we need to allocate 64K blocks to use the contig bit. Roughly I guess that means going from average of 2K wastage per anon VMA to 32K. Perhaps you can get away with that for a decent perf uplift. But for 64K pages, we need to allocate 2M blocks to use the contig bit. So that takes average wastage from 32K to 1M. That feels a bit harder to justify. Perhaps here, we should make a decision based on MADV_HUGEPAGE? So perhaps we actually want 2 values: one for if MADV_HUGEPAGE is not set on the VMA, and one if it is? (with 64K pages I'm guessing there are many cases where we won't PMD-map THPs - its 512MB). > > Maybe moving it to per-arch is the right way. With default in generic code > to be ilog2(SZ_64K >> PAGE_SIZE) or something. Yes, I agree that sounds like a good starting point for the !MADV_HUGEPAGE case.