From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83941C636CD for ; Wed, 1 Feb 2023 15:56:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C02A66B0074; Wed, 1 Feb 2023 10:56:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB14D6B0075; Wed, 1 Feb 2023 10:56:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A78F46B0078; Wed, 1 Feb 2023 10:56:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 986A86B0074 for ; Wed, 1 Feb 2023 10:56:51 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 62D441201B1 for ; Wed, 1 Feb 2023 15:56:51 +0000 (UTC) X-FDA: 80419176222.27.4DE286C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 38A8820021 for ; Wed, 1 Feb 2023 15:56:48 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BiP0v+zi; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675267008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tBjSgCtYL9pRIH06vB5AfhXuNshNbv18W+fXQWNuIh8=; b=5ZNuR56ujHWCwl6fjt5uTPzxkYh/m0do9y4Xra1X8ik3Y/Td7oV6a2hgBIQXnzgZ34v+qh 2dWVGr2r5ec54o/4Tdw2VqTP/saXA3d3NeiEvv3uyvF5B2nYgClobgBWFmaF3gVXSCD5cZ aRrIOxDMLkDWxckEjmGQvPPJLDAmR5c= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BiP0v+zi; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675267008; a=rsa-sha256; cv=none; b=xwFYpGfIsWWDtIvRLuqxjZrJe890yMKlqzWcCd9uCtG+tkrg+yLxWGhQBo2jLOAmtPwl7/ XizWSQIRzgHLZPwTCyyQ+6mxWFIAmg6ktwffOvvOD1TlkMMMB8lz25GGIe/ukCo7zTBBVB f7M9Rf5jNMuTVs/d6vRq2xwQ9usr9DA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675267007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tBjSgCtYL9pRIH06vB5AfhXuNshNbv18W+fXQWNuIh8=; b=BiP0v+ziJ4fhksxgIr9F0U0I7OmTh3a/zCSH0HzrxQYScI7FvP0oBjqAWTVSvxK77m5Yjo ptmp9xVErxpON7mz50i+PHa3vzvpU0uwpvr6Xn9iGlpohU6hThKHH/kZQholH6Di56o2zv JYNhZbEaTymCGbM24Fg6ICy6V3MYNr8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-441-CtFpAruYOCu_TICsgYMsKg-1; Wed, 01 Feb 2023 10:56:45 -0500 X-MC-Unique: CtFpAruYOCu_TICsgYMsKg-1 Received: by mail-wm1-f69.google.com with SMTP id bi10-20020a05600c3d8a00b003dd1b5d2a36so1415454wmb.1 for ; Wed, 01 Feb 2023 07:56:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tBjSgCtYL9pRIH06vB5AfhXuNshNbv18W+fXQWNuIh8=; b=21LnU4ddcA2cdwQCtiHpBnOfqkkfNYrjsNnIicyAN2sUpFpCXiHjfvxHWSe0SBlQ/S MbYmDUTOC/VYU+wr8pMmSya11OZ7o14XnFbVfgXlWgajFvUm5nBSgTLGslpwQYf9ymuQ mj2Wuai3+xCR4/S4on48kyWqQaMcK7lA8RRfJEen+EgBtdpiSpMh9I1YCa4CTnKmzgRF 8BFD9/ZDDQNFr9sZWfpzsLqocLxA7UVsf9ZoysG8ctJoeo1bliOARQixNrEzepoMhWtA F2z8jvsYLF5TnQLzM/2xWGVX8AE2UH3NTFm0O0qgyDVxM3Dd4x9X6J84iSoCjzWwre9U GPfw== X-Gm-Message-State: AO0yUKW/wYzJq5eYlq0BB3fU8lunNDQUDAw4qTWMRmctStXMZcZx3jPk mCdBbtZ0K9/nwb+LLOF4NMOs3H8BwUQJ1pFHBLUuUL7FrezK3VO2BuIg3EdWu0d9zAIUNYbCIEG NsOiHUtf5FV4= X-Received: by 2002:a5d:5147:0:b0:2bf:bd9f:23cb with SMTP id u7-20020a5d5147000000b002bfbd9f23cbmr2498531wrt.52.1675267004058; Wed, 01 Feb 2023 07:56:44 -0800 (PST) X-Google-Smtp-Source: AK7set8XqaFVv4ncJA5Y7M29pRXfoqCeQ8PCSF6C1hZlw67zZMrKZ0LMDaOzcgq+aAai1tuI92rJIg== X-Received: by 2002:a5d:5147:0:b0:2bf:bd9f:23cb with SMTP id u7-20020a5d5147000000b002bfbd9f23cbmr2498507wrt.52.1675267003788; Wed, 01 Feb 2023 07:56:43 -0800 (PST) Received: from ?IPV6:2003:cb:c705:3100:e20e:4ace:6f25:6a79? (p200300cbc7053100e20e4ace6f256a79.dip0.t-ipconnect.de. [2003:cb:c705:3100:e20e:4ace:6f25:6a79]) by smtp.gmail.com with ESMTPSA id f7-20020a5d64c7000000b002bdfe3aca17sm19206282wri.51.2023.02.01.07.56.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Feb 2023 07:56:43 -0800 (PST) Message-ID: <7a1bc3c5-6efe-87ab-1276-f71fc440c821@redhat.com> Date: Wed, 1 Feb 2023 16:56:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range To: James Houghton , Peter Xu Cc: Mike Kravetz , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 38A8820021 X-Stat-Signature: 7smet13jqs616eqa8sprxa4mkxsdtu54 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1675267008-189407 X-HE-Meta: U2FsdGVkX1/CUi2kAVY33Sapf+SIqDiqck1d6byl3K2yWKOLShXxnntd4mk+hiCrDHN1iMh6D9QWeb8QYAp4Qv2MEq030FouunWswaiHIgSnrQqA9luouH4F8gjcyhEJUA2T9n1k6O+8OyXM/bs58GUhpi/g5IaxkEieXQ6eIdK9AWY6SR8ekXtnfJLXEBAHvX+xfMF77acwAnrI6th6EEwL+mLZtxgxJxBcwni4aEgRla5raPbP/vYzAMwBiREOs/q4eli+jqfYOKXFB7mmbt17uprqe0VymMVrihq3soV0aePCnhXcnb9k9Z/xjqgBK1QVIJxtohrM3rz9mUdRSBG4iQmoYKRehQNPN4EB/NTZXEbc4zcLtgz41+JbylpbjTqTpAhzoIf1n3bnVuwE6zqhGbeLb4ZqpiEhSMKmuQRjaXFuikJ8I0XtDui+m4P1RLsU9p8UISOyTFaR1PXxZaKlZ2HrSa7oENOooqur0Tzy77iASSk7LFE4wGFsurFmFLg+TF8NOvPpUhqljDcLpeUKFfzRP1R0mW68rZwUvJSeoTKAgoNF2d1FfgIvsOjSXlDVnKc1NmF0aro+W++V06kwBDJOcA50XaoBTez7cV6hXDWKAUXSPOaRbnSnwRmOeWi99mggnGGqx4qWH9HTtG4wwtbJJbtr/CAALT/9ptDqoH1+lL/Z7LUBbGKnOUKK8wAbPy8l0MdgE0ewyUCadj/8Q9sDy//LSY7q50KYrRB9j3qOpQA5bUE6wMRg78Ju2QbcmyGJRHu8D8cNvct8Nc2ilqJwPq7lNNTfGoBVaEluP/jkWAiEbEsowRR35en54xQ1QKznZGiQCfiY5H5DZkPXzw8C7EW3Z+AgEkOBTdFWS9iMjonIRJeRXNiC68IhCME8citH39c1gkgjs+2znAouXP9bwAnQlfUYZHSeC87nkRO57I033CDXgEV1xi47Qe8HP1izoBwSJhc+ahR jK2YBBRs 4bM5v2FbqEtL2ZuWyde0cpm5la1y0sKfm1KMsOsof9qrhlNlL77M2t6ob7sxzPMzmxtJjUQPQVjtOD5tjFDm33LsHo16BE6P3TQYpujuvY3oqVaJ4EBOQgkQL9sOAs8tWzX6tPcypu6kUSLaTn2V01XY5H/wWayABaKyh/koBetlhjXKPjjP7p2MT9+g9euvAKyVJZIWdvnBibG5Cjy4cc7eX1oMWIlVokgvVAd5FYz8XtXm2fskinyyl15UjDbwNwjdxbhii84HYvpYVqqg2nFtL+bP3uBZS0M/ZktcMmqvb1np85HlRvHvCeOouCbd4S+nhHUv9oJZ42irlKLppVHEod4JMHsjEpfx1Fq9LlufsUprNVwUuUTXsThZ1nURwAShLiB92BsPXpXmy4MEykdm064hO3UZeOMf7Wd3D9qH1wk9jWbN1bwUR7qWQcnTY2BZi/zwOyma+yHnmIvGs9LVMh4WiEn9sBp+4/CpwrT1WVbCwozjzXcqUV0JIQFsEmkjcNsDSfcvm/9IxVv+MxgtQeavfUjkPR2yAALkZBfL4tsEPqlc/6wBTVYCw5QtuJ1lFC9V2tyIljs6Z7ZfD1+qcvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.02.23 16:45, James Houghton wrote: > On Tue, Jan 31, 2023 at 5:24 PM Peter Xu wrote: >> >> On Tue, Jan 31, 2023 at 04:24:15PM -0800, James Houghton wrote: >>> On Mon, Jan 30, 2023 at 1:14 PM Peter Xu wrote: >>>> >>>> On Mon, Jan 30, 2023 at 10:38:41AM -0800, James Houghton wrote: >>>>> On Mon, Jan 30, 2023 at 9:29 AM Peter Xu wrote: >>>>>> >>>>>> On Fri, Jan 27, 2023 at 01:02:02PM -0800, James Houghton wrote: > [snip] >>>>>> Another way to not use thp mapcount, nor break smaps and similar calls to >>>>>> page_mapcount() on small page, is to only increase the hpage mapcount only >>>>>> when hstate pXd (in case of 1G it's PUD) entry being populated (no matter >>>>>> as leaf or a non-leaf), and the mapcount can be decreased when the pXd >>>>>> entry is removed (for leaf, it's the same as for now; for HGM, it's when >>>>>> freeing pgtable of the PUD entry). >>>>> >>>>> Right, and this is doable. Also it seems like this is pretty close to >>>>> the direction Matthew Wilcox wants to go with THPs. >>>> >>>> I may not be familiar with it, do you mean this one? >>>> >>>> https://lore.kernel.org/all/Y9Afwds%2FJl39UjEp@casper.infradead.org/ >>> >>> Yep that's it. >>> >>>> >>>> For hugetlb I think it should be easier to maintain rather than any-sized >>>> folios, because there's the pgtable non-leaf entry to track rmap >>>> information and the folio size being static to hpage size. >>>> >>>> It'll be different to folios where it can be random sized pages chunk, so >>>> it needs to be managed by batching the ptes when install/zap. >>> >>> Agreed. It's probably easier for HugeTLB because they're always >>> "naturally aligned" and yeah they can't change sizes. >>> >>>> >>>>> >>>>> Something I noticed though, from the implementation of >>>>> folio_referenced()/folio_referenced_one(), is that folio_mapcount() >>>>> ought to report the total number of PTEs that are pointing on the page >>>>> (or the number of times page_vma_mapped_walk returns true). FWIW, >>>>> folio_referenced() is never called for hugetlb folios. >>>> >>>> FWIU folio_mapcount is the thing it needs for now to do the rmap walks - >>>> it'll walk every leaf page being mapped, big or small, so IIUC that number >>>> should match with what it expects to see later, more or less. >>> >>> I don't fully understand what you mean here. >> >> I meant the rmap_walk pairing with folio_referenced_one() will walk all the >> leaves for the folio, big or small. I think that will match the number >> with what got returned from folio_mapcount(). > > See below. > >> >>> >>>> >>>> But I agree the mapcount/referenced value itself is debatable to me, just >>>> like what you raised in the other thread on page migration. Meanwhile, I >>>> am not certain whether the mapcount is accurate either because AFAICT the >>>> mapcount can be modified if e.g. new page mapping established as long as >>>> before taking the page lock later in folio_referenced(). >>>> >>>> It's just that I don't see any severe issue either due to any of above, as >>>> long as that information is only used as a hint for next steps, e.g., to >>>> swap which page out. >>> >>> I also don't see a big problem with folio_referenced() (and you're >>> right that folio_mapcount() can be stale by the time it takes the >>> folio lock). It still seems like folio_mapcount() should return the >>> total number of PTEs that map the page though. Are you saying that >>> breaking this would be ok? >> >> I didn't quite follow - isn't that already doing so? >> >> folio_mapcount() is total_compound_mapcount() here, IIUC it is an >> accumulated value of all possible PTEs or PMDs being mapped as long as it's >> all or part of the folio being mapped. > > We've talked about 3 ways of handling mapcount: > > 1. The RFC v2 way, which is head-only, and we increment the compound > mapcount for each PT mapping we have. So a PTE-mapped 2M page, > compound_mapcount=512, subpage->_mapcount=0 (ignoring the -1 bias). > 2. The THP-like way. If we are fully mapping the hugetlb page with the > hstate-level PTE, we increment the compound mapcount, otherwise we > increment subpage->_mapcount. > 3. The RFC v1 way (the way you have suggested above), which is > head-only, and we increment the compound mapcount if the hstate-level > PTE is made present. > > With #1 and #2, there is no concern with folio_mapcount(). But with > #3, folio_mapcount() for a PTE-mapped 2M page mapped in a single VMA > would yield 1 instead of 512 (right?). That's what I mean. My 2 cents: The mapcount is primarily used (in hugetlb context) to (a) Detect if a page might be shared. mapcount > 1 implies that two independent page table hierarchies are mapping the page. We care about mapcount == 1 vs. mapcount != 1. (b) Detect if unmapping was sucessfull. We care about mapcount == 0 vs. mapcount != 0. For hugetlb, I don't see why we should care about the subpage mapcount at all. For (a) it's even good to count "somehow mapped into a single page table structure" as "mapcount == 1" For (b), we don't care as long as "still mapped" implies "mapcount != 0". -- Thanks, David / dhildenb