From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39DE8C76195 for ; Tue, 28 Mar 2023 11:57:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D35C56B0072; Tue, 28 Mar 2023 07:57:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE6186B0074; Tue, 28 Mar 2023 07:57:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD51A6B0075; Tue, 28 Mar 2023 07:57:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AEB926B0072 for ; Tue, 28 Mar 2023 07:57:06 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 975E4120214 for ; Tue, 28 Mar 2023 11:57:05 +0000 (UTC) X-FDA: 80618156010.16.0DA7BE3 Received: from r3-11.sinamail.sina.com.cn (r3-11.sinamail.sina.com.cn [202.108.3.11]) by imf30.hostedemail.com (Postfix) with ESMTP id 88FC18000B for ; Tue, 28 Mar 2023 11:57:02 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.11 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680004623; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N6gdttKF5pJbtd9vzHQkfj7gLzgszMZpHyEph1oaDkM=; b=RIHs77FF2A5b66avPwYlKbiYErWC8OrQqiu2ECpucYD53qzt5kLMeaKzkqK8QY2vfGqwhR dkZvOljfb+NjwDyTeSYDEzG5M09d90i7mnemaBAZffTpbCzr/Jy2zsjgAmOoyxxo0Th2aa tIcy5lByo8jE7b/koojSDfgMuKMtUPE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.11 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680004623; a=rsa-sha256; cv=none; b=4OswujMfDrAhXLmYavxVdGy1uI0o/+GX59w+san1SwEAdXiXxJ5LvNH9CsWp5lTc7zQ03o YWm9X9z+JiExzKV0EmcDgYzw1ZHTQY2Z3OP6kcL1pi7OxPLW/ne+XKsfMYlV7KRxba8KNH u9qJUJFxrgqR2/p4C7iz6BeklIGZFck= X-SMAIL-HELO: localhost.localdomain Received: from unknown (HELO localhost.localdomain)([114.249.61.130]) by sina.com (172.16.97.27) with ESMTP id 6422D5F500014A30; Tue, 28 Mar 2023 19:56:39 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 33019249283435 From: Hillf Danton To: Ryan Roberts Cc: Vlastimil Babka , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: What size anonymous folios should we allocate? Date: Tue, 28 Mar 2023 19:56:48 +0800 Message-Id: <20230328115648.2557-1-hdanton@sina.com> In-Reply-To: <7981dd12-4e56-a449-980b-52f27279df81@arm.com> References: <022e1c15-7988-9975-acbc-e661e989ca4a@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 88FC18000B X-Stat-Signature: qre1akhcy4mmutweophhsbmr8n9aj9o4 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680004622-415327 X-HE-Meta: U2FsdGVkX1/2ATvpoDAnvOKk3KTmkN6pPnBWzKtdRRiBmLCU74mfgk4jdt+ded3CqF6pMARhRFIAF0WR98BLELauWkekSvKDAmd1uVFTvE+NIMm0wp7Wefg4L+tZsk3YIoopX2jte6bz3yVvrmWazqO4ZIXYUJV43av00ewW2KLqBKeVIIWFgaJXupdwFTHv5mAbUsQhJzgECnqnODNndCdpC9CmQ2bZ4+TGUEWbgWTa3S2jH20es4uq11Xv2vWdDQCgViCGm0rQBlreF1XsKUAV8CHZjfnHA28aAa+FwLnHrctYsX90pP7qQnoP/RyK78csrpA/kCf3TdeQaegObkieiblIQJmO4LsF47gFfIctz4qqERcoOC5gA44XMoCnFt+MKE0YYot/zVpjVubLkzhYxyLjEK2iQoSnJvSpRnziCem4ix32V2f6s5WuBEAf1oAdfeUDV7frO7GNhu53tCn/5MEo8hjHRXoVN+wvcki1XJSGHko77fNpks82R2bqnlehpiNqvEJBSMNJMhL+zXO1qVq2wGBLrAkm/QWQqv+Cx9apEe7F27wYQGNAwewOX1kRF71HwOBVW/nLn5nYbwgLG0A6su93gvKInCiKUXVKy3wcwRhrTnnzCJb+xXogR5IknLFa9u+APzOLQc2/gk1gRCnxzjS9bOrnmURrh9e1cJ6LFGORhrhak61Pn360Niv1uGHRK+6CK1pmH1nWEidbl9hZn44LxGk2G0bJIjXEJvKLWMLb9m7RRaUlrkYS1P74d9gcDhP7/QWp79VqD9ICRozvpEtLmwSPDjM8OFVKB0Nlb6F/Rhrem9UoaFs8m8yBdUDBhRmjruPlCsZhj4JhIrt13FwK3YsPNmO//2pLm8hB65WJ/LaarF1Td3NmAxiIfMwZQ1ekpwB12nwlVKMRCqObHGjQ1Vq74BIGliFBi/iT+ku0ESj9PkTV6rjz2LWyWqxT7I93HxFh1Vb zYGiz1ZJ bbgoUUCFrDZQzmjlnT3FPYAUPIUszn/sB2G4Gsd9LuzM8IzCWVuxvuHw3o8NA+8Tp/s8DHosrrR3e1FhTQh/NUrWJ52IZDReu26cWzbatC3Hrggb34KKFMp8NyIf3i6OeqRMybNYI2VSCTFWfOAFH8BkUy/6v9Qt1Ei1Kq7P228Nz4hGoS4ezPKb3gUurZ+0zgHTlSkP0jycpfxE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.045705, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 28 Mar 2023 11:12:54 +0100 Ryan Roberts > On 27/03/2023 16:48, Vlastimil Babka wrote: > > > > Hm if it's 4 entries on arm64 and presumably 8 on AMD, maybe we can only > > care about how actively accessed are the individual "subpages" above that > > size, to avoid dealing with this uncertainty whether HW tracks them. At such > > smallish sizes we shouldn't induce massive overhead? > > I'm not sure I've fully understood this point. For arm64's HPA, there is no > "uncertainty [about] whether HW tracks them"; HW will always track access/dirty > individually for each base page. The problem is the inverse; if SW (or HW) sets > those bits differently in each page, then TLB coalescing performance may > decrease. Or are you actually suggesting that SW should always set the bits the > same for a 4 or 8 page run, and forgo the extra granularity? That inverse side looks like a bleak cloud above anon order-5/6 pages for instance. > >> I'm hearing that there are workloads where being able to use the contiguous bit > >> really does make a difference, so I would like to explore solutions that can > >> work when we only have access/dirty at the folio level. > > > > And on the higher orders where we have explicit control via bits, we could > > split the explicitly contiguous mappings once in a while to determine if the > > sub-folios are still accessed? Although maybe with 16x4kB pages limit it may > > still be not worth the trouble? > > I have a bigger-picture question; why is it useful to split these large folios? > I think there are 2 potential reasons (but would like to be educated): > > 1. If a set of sub-pages that were pre-faulted as part of a large folio have > _never_ been accessed and we are under memory pressure, I guess we would like to > split the folio and free those pages? > > 2. If a set of subpages within a folio are cold (but were written in the past) > and a separate set of subpages within the same folio are hot and we are under > memory pressure, we would like to swap out the cold pages? > > If the first reason is important, I guess we would want to initially map > non-contig, then only remap as contig once every subpage has been touched at > least once. > > For the second reason, my intuition says that a conceptual single access and > dirty bit per folio should be sufficient, and folios could be split from > time-to-time to see if one half is cold? It makes no sense to detect hot pages at the cost of order-5 compound page particularly given the bleakness above.