From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F392AEB64DD
	for <linux-mm@archiver.kernel.org>; Fri, 11 Aug 2023 15:04:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3607F6B0071; Fri, 11 Aug 2023 11:04:04 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2E88E6B0072; Fri, 11 Aug 2023 11:04:04 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 162086B0074; Fri, 11 Aug 2023 11:04:04 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 03BA76B0071
	for <linux-mm@kvack.org>; Fri, 11 Aug 2023 11:04:04 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 7D062803EE
	for <linux-mm@kvack.org>; Fri, 11 Aug 2023 15:04:03 +0000 (UTC)
X-FDA: 81112143966.16.F5CE9CF
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf05.hostedemail.com (Postfix) with ESMTP id 842391000C0
	for <linux-mm@kvack.org>; Fri, 11 Aug 2023 15:03:27 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hAruQau9;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf05.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1691766207;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=cUfn2yRxDKKinH9zdb9JbhnFsj88xeOeJ1clvcuNk24=;
	b=PtRkSGgLJsxOQK6e2cLW7njHox2HeJA86LH1FoIYaJedsHSt65RkbogXGP0d7+P6GrskSY
	NV1eG8aAYY0LT/e4bDPFTAwXy/LO1zkXtjhORNLEBwXjRxZ8vNOlTzjftpYLpkazZllkK6
	6jwpIl3j8UKcmSPnIpOUUXZs1sdcaw8=
ARC-Authentication-Results: i=1;
	imf05.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hAruQau9;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf05.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691766207; a=rsa-sha256;
	cv=none;
	b=yv9eefY8TyWILnLQSdzbpvMS3Tl+UtAqmsbu92UcPRLpzqPh+3T5Ts6vcsI58Q5HU/9/1a
	oVuKQK6gbiJT5M6MRrQWU58uYwruy8668vuQWvmgf6M0xE6hm45tVwZFpbVdVL0nPdhp6J
	AbDI9mGn7z3MHyWRXT1JunQLNo4D7ko=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1691766206;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=cUfn2yRxDKKinH9zdb9JbhnFsj88xeOeJ1clvcuNk24=;
	b=hAruQau9O6rFtv6nMF2YagcNzB72Jc5NIQ42G4E/A67F58csy5ILK9be7KkqDh1wmUSHPx
	duXvoizVh6V1roQ8HPzeyxuoBr0+4hHgEWlWN1FpCBwVoyloLdFehIn01ZOBOIXMbVHKYS
	+RTa1dvFyVFzhmwDJUZ1tH1KR2E1Tds=
Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com
 [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-487-2A-IxusjOlyQNVVFtcL7wQ-1; Fri, 11 Aug 2023 11:03:25 -0400
X-MC-Unique: 2A-IxusjOlyQNVVFtcL7wQ-1
Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-76ca3baaec8so49702185a.1
        for <linux-mm@kvack.org>; Fri, 11 Aug 2023 08:03:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1691766205; x=1692371005;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=cUfn2yRxDKKinH9zdb9JbhnFsj88xeOeJ1clvcuNk24=;
        b=e+RR1kdQITth0ombstdgq9qGpF2TcEhC7dW+1js6iRw8xPCuHL+YV7ANVpqV5fRVft
         QZeO8xv/3dQDVXY5POxzJePIKQf4y/6I/08VR23TfWcsxJCTDssDiGOUkL6OxgA99+92
         Z6KYTWrKKdHuLNCeW9TUTeEY5xnHNqVnrDRypPYZUw05IS7Yhrl4HsY2MviGp7nGttO/
         dn5tNcvEFqA5IIJ6TfMgo7BzF3FESbZPSIFKXPO6ErpXDJb4cem13dRisFDzKVkP/+Uz
         ff5Fz3Ao7158Vc4JuScQ4bE5//n9gcPF/sAVBRAmuUDusxbALxZslvvYxa8OYm3ph1nv
         +s1w==
X-Gm-Message-State: AOJu0YxiqnSdfQMj4jUUgQyrFbN/V9WjGijojk0/vEFfm/TzVrT/lOzW
	zJWfRAq7UUKf1Ld4tKlgZQMNym12f07w/1uMxWmCb2E9xWpMcl0q2r2gUgu6J3OdkNDW9W5+4zZ
	glfMpedc+JHE=
X-Received: by 2002:a05:620a:1914:b0:76c:e76b:4192 with SMTP id bj20-20020a05620a191400b0076ce76b4192mr2509976qkb.0.1691766204696;
        Fri, 11 Aug 2023 08:03:24 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IElkIDda5pJfgdjDlpX1qemYlXQScA3leBwo8YPETDmV4jI/g/VWmSele5L1o3V/7C8717Izw==
X-Received: by 2002:a05:620a:1914:b0:76c:e76b:4192 with SMTP id bj20-20020a05620a191400b0076ce76b4192mr2509905qkb.0.1691766204124;
        Fri, 11 Aug 2023 08:03:24 -0700 (PDT)
Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39])
        by smtp.gmail.com with ESMTPSA id q29-20020a05620a039d00b0076c96e571f3sm1241719qkm.26.2023.08.11.08.03.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 11 Aug 2023 08:03:23 -0700 (PDT)
Date: Fri, 11 Aug 2023 11:03:22 -0400
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Yin Fengwei <fengwei.yin@intel.com>, Yang Shi <shy828301@gmail.com>,
	Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios
Message-ID: <ZNZNuooaFH9P4raS@x1n>
References: <20230809083256.699513-1-david@redhat.com>
 <ZNQD4pxo8svpGmvX@x1n>
 <ZNRYx8GhYftE4Xeb@casper.infradead.org>
 <7e31254d-8889-7e79-50e1-2630bd493d59@redhat.com>
 <ZNVbIyHcqeKUDuSg@x1n>
 <ZNVcfdH8+N5Q83J/@casper.infradead.org>
 <73d6d29f-9947-9b50-3b94-77f1ee547387@redhat.com>
MIME-Version: 1.0
In-Reply-To: <73d6d29f-9947-9b50-3b94-77f1ee547387@redhat.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: 842391000C0
X-Stat-Signature: 3rp5iutshni3k486o4ppjm7xh6kdcrgx
X-Rspam-User: 
X-HE-Tag: 1691766207-190880
X-HE-Meta: U2FsdGVkX1/Jwplx+R2Qe6q9O0MPUv15xBiJbZn+lEt1nUCVpMpbt8unopbabIGgwE16dQsa2k6MqxyfoeMmyv8/llX+f5AilZJSgNaklNwtNCkW7vGBwKvlcnJESsfjcxic2cnyG8YiHB4VbeLN5bH1AMNyM5EJGoMuw/6kro2O8R2wsZ1VR8zymyemQWmevObXipZFQT3ti/OvW80pZGkAzkBCeiDz9WRYaSeAJrTdhrj+Y3rvWYrONNdbvBkh/fxFO/FlGsftTGFvCe9BZpEv7FQqbwdSdGqZqEBhbUJ8/tQ8qTwQx/y7yK5IeLlFaZqkJ/jU1HW4NKT/l2EhRGjRuEDBAF432j8fWd1OE37YOfAPvN+elxmPA8gWaAYoS5F3OFNyADe8SJG6KkPDi5bofeG2LT4yCvHavM/XZkWzzuDeWy8tBxH5tpOxevudo7ytOxUOQo3mWqijSxZxafIeFxExjbK1z767Et4GttYL08k4WW28tvBVa0uOWuST0/3gle402V6xeNXdO7zBeYgSKBgXvgFtypQN9trgsBjKvBHaCRJdLj8IheA6jP9a9i9Kog66JtNJEYRi9xSlGxUKunc8c5cTDaiRkbeQwm2MUfdNoxE/lQC80mxb/XDlpOAPF51F85Xx7GOIGaZVIjryTu5wzSnuJjPZ1qJ75nper5EJYcI557sTA66u8ECrdudZ2/k1MMfXcM09msSz6eIkIUXpr4ZJFt13Gi6pyrE9c5Ts9rvA9/7lfmmjf2AG6OZ07TWrWkZ2bo9kKl++0n3Fp8ENTA4g1CK7vst0yjQVJyTq2+rcO943d0s78OPxUYXDWaawJxnevy0HEv0mjUNv5SfjA2KRPlh6xQyAVVVI3U1CAJFWrNS/kMDm6arlbcS78Ox8anIe7c6GbEef3bscaGCMXwkTC5eQ+wRSCVq/z0AtY+ijSxbodAvSCBVKf8oYIUV1yV/TglTmBQN
 iQjbMwhr
 C51wZOWh6QW2yl62qyjrzlrS6qRUiM0xMjHeY9D2Qm8fuQxQwux/OisT/YEMzI7AameR3w/qQXllXu6rgrgAizDRtKxtA0EBsISI9KAw40KtAn2n7ExT1jFl3cT9XvfuGNzC4neaoV9NE8kjan74Es1pX2xqf1gu6r66GHvPO3vu9hzNOsZZ0QkpW37k0qv/O0OsalmHxVOvkPvo35pThzeKtEti9hPhic04OKR4OIFhYaBlsY+LLcR8+cPT3PmzRcdjwFTNkIX2ivWVWLRJZg793+Q8zcUqMqnNR1pKw8FMopy84YBTfAGM8Xf+/zW/GvoLAJqAumbDoNmUb8xrSkr0nia9yzK6Xc7k5XsLnk6YCqU0CPJHG3oHb6JizCPqf3ASae8ybEAnIqXJF9y4MZX5gDjfWBviSDkeIZdH6l9T/9un7jIZm6/XtWwKdh+zHrBiJzvJJUXcZmEbBW7n4ODQf2SWFwJDbJ7UdzPIDKe+3aeWJ8Vz9Tdzbl6adk4hEyRQT
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Aug 10, 2023 at 11:59:25PM +0200, David Hildenbrand wrote:
> On 10.08.23 23:54, Matthew Wilcox wrote:
> > On Thu, Aug 10, 2023 at 05:48:19PM -0400, Peter Xu wrote:
> > > > Yes, that comment from Hugh primarily discusses how we could possibly
> > > > optimize the loop, and if relying on folio_nr_pages_mapped() to reduce the
> > > > iterations would be racy. As far as I can see, there are cases where "it
> > > > would be certainly a bad idea" :)
> > > 
> > > Is the race described about mapcount being changed right after it's read?
> > > Are you aware of anything specific that will be broken, and will be fixed
> > > with this patch?
> > 
> > The problem is that people check the mapcount while holding no locks;
> > not the PTL, not the page lock.  So it's an unfixable race.
> > 
> > > Having a total mapcount does sound helpful if partial folio is common
> > > indeed.
> > > 
> > > I'm curious whether that'll be so common after the large anon folio work -
> > > isn't it be sad if partial folio will be a norm?  It sounds to me that's
> > > the case when small page sizes should be used.. and it's prone to waste?
> > 
> > The problem is that entire_mapcount isn't really entire_mapcount.
> > It's pmd_mapcount.  I have had thoughts about using it as entire_mapcount,
> > but it gets gnarly when people do partial unmaps.  So the _usual_ case
> > ends up touching every struct page.  Which sucks.  Also it's one of the
> > things which stands in the way of shrinking struct page.
> 
> Right, so one current idea is to have a single total_mapcount and look into
> removing the subpage mapcounts (which will require first removing
> _nr_pages_mapped, because that's still one of the important users).
> 
> Until we get there, also rmap code has to do eventually "more tracking" and
> might, unfortunately, end up slower.
> 
> > 
> > But it's kind of annoying to explain all of this to you individually.
> > There have been hundreds of emails about it over the last months on
> > this mailing list.  It would be nice if you could catch up instead of
> > jumping in.
> 
> To be fair, a lot of the details are not readily available and in the heads
> of selected people :)
> 
> Peter, if you're interested, we can discuss the current plans, issues and
> ideas offline!

Thanks for offering help, David.

Personally I still am unclear yet on why entire_mapcount cannot be used as
full-folio mapcounts, and why "partial unmap" can happen a lot (I don't
expect), but yeah I can try to catch up to educate myself first.

The only issue regarding an offline sync-up is that even if David will help
Peter on catching up the bits, it'll not scale when another Peter2 had the
same question..  So David, rather than I waste your time on helping one
person, let me try to catch up with the public threads - I'm not sure how
far I can go myself; otoh thread links will definitely be helpful to be
replied here, so anyone else can reference too.  I collected a list (which
can be enriched) of few threads that might be related, just in case helpful
to anyone besides myself:

[PATCH 0/2] don't use mapcount() to check large folio sharing
https://lore.kernel.org/r/20230728161356.1784568-1-fengwei.yin@intel.com

[PATCH v1-v2 0/3] support large folio for mlock
https://lore.kernel.org/r/20230728070929.2487065-1-fengwei.yin@intel.com
https://lore.kernel.org/r/20230809061105.3369958-1-fengwei.yin@intel.com

[PATCH v1 0/4] Optimize mmap_exit for large folios
https://lore.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com

[PATCH v4-v5 0/5] variable-order, large folios for anonymous memory
https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/
https://lore.kernel.org/r/20230810142942.3169679-1-ryan.roberts@arm.com

[PATCH v3-v4 0/3] Optimize large folio interaction with deferred split
(I assumed Ryan's this one goes into the previous set v5 finally, so just
 the discussions as reference)
https://lore.kernel.org/r/20230720112955.643283-1-ryan.roberts@arm.com
https://lore.kernel.org/r/20230727141837.3386072-1-ryan.roberts@arm.com

[RFC PATCH v2 0/4] fix large folio for madvise_cold_or_pageout()
https://lore.kernel.org/r/20230721094043.2506691-1-fengwei.yin@intel.com

I'm not sure how far I'll go; maybe I'll start working on something else
before I finish all of them.  I'll see..

Not allowing people to jump in will definitely cause less interactions and
less involvement/open-ness for the mm community, as sometimes people can't
easily judge when it's proper to jump in.

IMHO the ideal solution is always keep all discussions public (either
meetings with recordings, or shared online documents, always use on-list
discussions, etc.), then share the links.

-- 
Peter Xu