From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82654C001DE for ; Thu, 10 Aug 2023 10:41:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2196A6B0071; Thu, 10 Aug 2023 06:41:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C9A66B0074; Thu, 10 Aug 2023 06:41:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B96B6B0075; Thu, 10 Aug 2023 06:41:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F009F6B0071 for ; Thu, 10 Aug 2023 06:40:59 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D15FCB1A60 for ; Thu, 10 Aug 2023 10:40:59 +0000 (UTC) X-FDA: 81107852238.07.8D38D82 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id D6EB5C0022 for ; Thu, 10 Aug 2023 10:40:57 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691664058; a=rsa-sha256; cv=none; b=mmmtLeT44CQC1lvXn9kFefD3qRHPLJqJ0KHrO72GVgYUh5YPPF80V6frM1UJdbogs9yVpG B3dZlQKfAM89NA0wHselgsMtV2jcDbnBVTCbp/1Oq86tuRZaufgsi3RRxITsbOqtxFwH81 D58V0LFN4HjJgj3WQ8zIgSX2zOgrBZo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691664058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J/aQDsLPVt0IRruRwWl6IeMgWJ3rPknxey2+RFHV8rI=; b=JB/vwKK3EXgdbriAm8vJMWyFjUE2GTjF3Hd09zd0BlsKv67sD7xROfa9dXzvqeH6mM09gK R/5BOmRwbOGppZN7DulUTClkt4mzQ4wEpFUE+jpOviE1SlIWc0wLq1Mw8pehEUksBmzDBE Ns2HAJq102XT8uv+/M0qvbnj+K4UadY= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 631762F4; Thu, 10 Aug 2023 03:41:39 -0700 (PDT) Received: from [10.1.27.169] (XHFQ2J9959.cambridge.arm.com [10.1.27.169]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 645A93F6C4; Thu, 10 Aug 2023 03:40:55 -0700 (PDT) Message-ID: <09b0f874-b84a-45a5-ab9b-53e348eae3df@arm.com> Date: Thu, 10 Aug 2023 11:40:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios Content-Language: en-GB To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , "Matthew Wilcox (Oracle)" , Yin Fengwei , Yang Shi , Zi Yan References: <20230809083256.699513-1-david@redhat.com> <181fcc79-b1c6-412f-9ca1-d1f21ef33e32@arm.com> <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com> From: Ryan Roberts In-Reply-To: <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D6EB5C0022 X-Stat-Signature: 4bsms3a1asiobxsc3cgwcugt6c4eopk4 X-Rspam-User: X-HE-Tag: 1691664057-53467 X-HE-Meta: U2FsdGVkX19xK8yuEmF/Umi98WaASW2X7oplwzHaGx0dYQBRePBCiKu1UgvXPr7fT+eb5HG8M7pM2cwvzH7WeW0eAnWK2THzeKbDfSKvDMuVWgnp6fRkwA3HzFYpxNK7udo2QHi9QTdPexK/jKjygdJs9LqVXw1JfBhmJhA7C4RASvpCq4BYfPebf5D19Fi0d5ShjgK/ARgX9zYQCXAYu70IGCvNXRBxmLevWhZQXR7yZbEb4H04QITkBxcuGs3NZ/BRu6QZqyRwD6w8K6Am+XXPuX1H0O1QupjleDq450ghpb4fpG4Is+lCvpU4jM9/RANrCImxZtZhRL7NYSWeLJL7ehg6GWctvpeUB2jLEokdVPeaJrtgn0IE1vJGFLnV2xTDBcsaINUxlRm2cR364okJtpJsI1bMenp7k8hEraxuktBSvarA6xdgrQYUVSvZEXTUbW4l86+Hjb3skEvHH9/7huXuHCwPWmFQNEz9GMMDQKqU5bEpYABx4VGmMUSrmOzop0DQX33BOKxUE9D4LHZw6wIGlabnYi3ERN+tXix/wgdhtnYqL0eECOv8LT3J8rN2hPoOpR/GHtO+Svmz4+5T0TTIkQA1PIB9rXNTTYuQXpycaNOd4fTjy72eT68ojiKfYHw0XQLA3nGRsc+k6yo/6ylih7+SqrfS8LU+NRzXemJO+8zTu/alaStHletrvHaC7pXdezm+jC30oH32ElN+mA9wyUyETGLNrs0cz7CF0uaWJIERsKqkHzOXZVh5A5l9M81ZPG750nc7ctUmxI5j/4YN88R8j9e3xGCWZnpYKELNn4XtUxiEiy7kQmmieSeY6pKXVQ24CpsZz+IGiUL6nGLoaMzP+w1lgsJWcEBKFh7SgRlG0VzvZHtL8XFY4qmsmbJtxjOfvP5kbJD8l/4vf1xprE3fu7RMTp3pljxe54oO80U2OgP2CrPACneBdLv7yGQ9KCAKx/0qkge 3xuForZ7 hnnXPygSQrs3VrFhSOgC3xrsi945wky8tEL8Nql+emKOSU+NRpDJVRcACEP/1Z5iVe2/M7s3i9p0kTaEntK39aQv2wowFUhMCVY8X3IoRa6fWB54TMn4aIm506lBXrKdPZ6EUo1//vBSWLaE9LGxVIQz7nAUxA/6Z9bqbjbPR+Y8ryFDLF0uSK2IVTfiEcXv+J82XJgueILaXWamVOSrYMuwaq357aro55N0TKMakdxRblATkAn8B72+22kfqWEAWc17S486MWXjWcaDV1Mf76LnksRLpo+d/r65tZNYT2RDi1oe+/oYRRm6drgHCygGcmdD/FSfkqGw2v9PaL1F6AoEqGPQpjqd9bmEIKbTwlR5lvfcIY44kP0EKlO+n4Xk5j2X2ckPzl0n1AZ25Y97gqq5UDuFGUka1TP6JsTqvg9Z7PAxD5mHyXxigtyGwDZnPXwQoqB/yPwMgXecXYrGaAX2QBzCD5V9VGReRdQAiWXY8oWkkyXp/hsJ/Cep74gb+001PeRWZIPQSib2cn3snKaENYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09/08/2023 20:17, David Hildenbrand wrote: > On 09.08.23 21:07, Ryan Roberts wrote: >> On 09/08/2023 09:32, David Hildenbrand wrote: >>> Let's track the total mapcount for all large folios in the first subpage. >>> >>> The total mapcount is what we actually want to know in folio_mapcount() >>> and it is also sufficient for implementing folio_mapped(). This also >>> gets rid of any "raceiness" concerns as expressed in >>> folio_total_mapcount(). >>> >>> With sub-PMD THP becoming more important and things looking promising >>> that we will soon get support for such anon THP, we want to avoid looping >>> over all pages of a folio just to calculate the total mapcount. Further, >>> we may soon want to use the total mapcount in other context more >>> frequently, so prepare for reading it efficiently and atomically. >>> >>> Make room for the total mapcount in page[1] of the folio by moving >>> _nr_pages_mapped to page[2] of the folio: it is not applicable to hugetlb >>> -- and with the total mapcount in place probably also not desirable even >>> if PMD-mappable hugetlb pages could get PTE-mapped at some point -- so we >>> can overlay the hugetlb fields. >>> >>> Note that we currently don't expect any order-1 compound pages / THP in >>> rmap code, and that such support is not planned. If ever desired, we could >>> move the compound mapcount to another page, because it only applies to >>> PMD-mappable folios that are definitely larger. Let's avoid consuming >>> more space elsewhere for now -- we might need more space for a different >>> purpose in some subpages soon. >>> >>> Maintain the total mapcount also for hugetlb pages. Use the total mapcount >>> to implement folio_mapcount(), total_mapcount(), folio_mapped() and >>> page_mapped(). >>> >>> We can now get rid of folio_total_mapcount() and >>> folio_large_is_mapped(), by just inlining reading of the total mapcount. >>> >>> _nr_pages_mapped is now only used in rmap code, so not accidentially >>> externally where it might be used on arbitrary order-1 pages. The remaining >>> usage is: >>> >>> (1) Detect how to adjust stats: NR_ANON_MAPPED and NR_FILE_MAPPED >>>   -> If we would account the total folio as mapped when mapping a >>>      page (based on the total mapcount), we could remove that usage. >>> >>> (2) Detect when to add a folio to the deferred split queue >>>   -> If we would apply a different heuristic, or scan using the rmap on >>>      the memory reclaim path for partially mapped anon folios to >>>      split them, we could remove that usage as well. >>> >>> So maybe, we can simplify things in the future and remove >>> _nr_pages_mapped. For now, leave these things as they are, they need more >>> thought. Hugh really did a nice job implementing that precise tracking >>> after all. >>> >>> Note: Not adding order-1 sanity checks to the file_rmap functions for >>>        now. For anon pages, they are certainly not required right now. >>> >>> Cc: Andrew Morton >>> Cc: Jonathan Corbet >>> Cc: Mike Kravetz >>> Cc: Hugh Dickins >>> Cc: "Matthew Wilcox (Oracle)" >>> Cc: Ryan Roberts >>> Cc: Yin Fengwei >>> Cc: Yang Shi >>> Cc: Zi Yan >>> Signed-off-by: David Hildenbrand >> >> Other than the nits and query on zeroing _total_mapcount below, LGTM. If zeroing >> is correct: >> >> Reviewed-by: Ryan Roberts > > Thanks for the review! > > [...] > >>>     static inline int total_mapcount(struct page *page) >> >> nit: couldn't total_mapcount() just be implemented as a wrapper around >> folio_mapcount()? What's the benefit of PageCompound() check instead of just >> getting the folio and checking if it's large? i.e: > > Good point, let me take a look tomorrow if the compiler can optimize in both > cases equally well. > > [...] > >>>   diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 5f498e8025cc..6a614c559ccf 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1479,7 +1479,7 @@ static void __destroy_compound_gigantic_folio(struct >>> folio *folio, >>>       struct page *p; >>>         atomic_set(&folio->_entire_mapcount, 0); >>> -    atomic_set(&folio->_nr_pages_mapped, 0); >>> +    atomic_set(&folio->_total_mapcount, 0); >> >> Just checking this is definitely what you intended? _total_mapcount is -1 when >> it means "no pages mapped", so 0 means 1 page mapped? > > I was blindly doing what _entire_mapcount is doing: zeroing out the values. ;) > > But let's look into the details: in __destroy_compound_gigantic_folio(), we're > manually dissolving the whole compound page. So instead of actually returning a > compound page to the buddy (where we would make sure the mapcounts are -1, to > then zero them out !), we simply zero-out the fields we use and then dissolve > the compound page: to be left with a bunch of order-0 pages where the memmap is > in a clean state. > > (the buddy doesn't handle that page order, so we have to do things manually to > get to order-0 pages we can reuse or free) > Yeah fair enough, thanks for the explanation.