From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 53DAEC83F04
	for <linux-mm@archiver.kernel.org>; Fri,  4 Jul 2025 02:35:57 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B7243440147; Thu,  3 Jul 2025 22:35:56 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B49AB8E0009; Thu,  3 Jul 2025 22:35:56 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A8743440147; Thu,  3 Jul 2025 22:35:56 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 92F9C8E0009
	for <linux-mm@kvack.org>; Thu,  3 Jul 2025 22:35:56 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id DC3E5140F89
	for <linux-mm@kvack.org>; Fri,  4 Jul 2025 02:35:51 +0000 (UTC)
X-FDA: 83625016902.16.32EF3B2
Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130])
	by imf03.hostedemail.com (Postfix) with ESMTP id DF58920003
	for <linux-mm@kvack.org>; Fri,  4 Jul 2025 02:35:48 +0000 (UTC)
Authentication-Results: imf03.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=ZgYI4nov;
	spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1751596550;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=qLrT+Pn2nKOOuoMSI85wSadp3jfVdP14YjNT3e39XJA=;
	b=zceeC4HNSz2xdm+b/cQ2/YUUxR1JWGryhhwkv9txKkr08ONoTd7Qf/xI3fbokFoBk5HULJ
	Vl/s4r6nToBckUzlV1x19+VANan3ysbC4ctFqost6iqQJdYnWvzq5U/mmP0oeHJmvj0qHz
	0Eb39XrtqhK2C9OrCZW25/h+KcofIuc=
ARC-Authentication-Results: i=1;
	imf03.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=ZgYI4nov;
	spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751596550; a=rsa-sha256;
	cv=none;
	b=axRQW1qcLUDGCAQxMSZoVl6UigQNa5H4tnfQbTLL6blq7G+rF1k9CxfPBVpoumE3WHaraM
	LWZSnWdglwr866uf6a2fS2roOFoihEpR2QIZHPOxL9Zr06Bc0aoV/wsRRElADeQSSiQzBs
	j7mykQkBmuq/EFh0/IAOzzktgcHhOZ4=
DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=linux.alibaba.com; s=default;
	t=1751596545; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type;
	bh=qLrT+Pn2nKOOuoMSI85wSadp3jfVdP14YjNT3e39XJA=;
	b=ZgYI4novpbNhPsv2nOFud0fkC7dUsyUJZCVUJd5+jZLAvDoer4lMl2LkPzDP1kqe/iPmJ1rBiOpdW0sy623fX68gm1T0A9Zs6XDT2YfJRjdXglH3lBcqgiYElxrI+een8tmW31hKlDVWCMWpmckJKDAs42o2lvsv6ISFIhXNN48=
Received: from 30.74.144.116(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wh8LBkk_1751596543 cluster:ay36)
          by smtp.aliyun-inc.com;
          Fri, 04 Jul 2025 10:35:44 +0800
Message-ID: <8b4603f3-19c7-4979-ae4a-ef99690562d1@linux.alibaba.com>
Date: Fri, 4 Jul 2025 10:35:43 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] mm: support large mapping building for tmpfs
To: David Hildenbrand <david@redhat.com>, akpm@linux-foundation.org,
 hughd@google.com
Cc: ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
 npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
 baohua@kernel.org, vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
 mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <d4cb6e578bca8c430174d5972550cbeb530ec3fe.1751359073.git.baolin.wang@linux.alibaba.com>
 <b8258f91-ad92-419e-a0a1-a8db706c814c@redhat.com>
 <fca114c1-9699-4dd7-9bca-83a5f5ac615d@linux.alibaba.com>
 <ec5d4e52-658b-4fdc-b7f9-f844ab29665c@redhat.com>
 <67c79f65-ca6d-43be-a4ec-decd08bbce0a@linux.alibaba.com>
 <7b17af10-b052-4719-bbce-ffad2d74006a@redhat.com>
 <4c5ea64d-c33c-4cf5-8e71-08bc50a5f940@redhat.com>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
In-Reply-To: <4c5ea64d-c33c-4cf5-8e71-08bc50a5f940@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: DF58920003
X-Rspam-User: 
X-Stat-Signature: h3dnfgrh8a7i36hw9fouq45gopi9kupf
X-HE-Tag: 1751596548-640846
X-HE-Meta: U2FsdGVkX1/ErEyLXbEnB0nPkw0Z+8WwYI4JtK0fcGxF6iYtVC6TAsMwIvawrq9FSP+nyJ42jyney7oTMKNNlJMXgAjtUO0zSK3TkVpmRwdfJ6As56Okro1ofTsziPwHIevQTtKV14nGzUh8OxaBiy9s4UQi5xGsyKUrrt0HeGmFDbdm3eDsLecjUZDiHn18Q+cH+u6SX3u9j7Lf278nt3e++boaAdClSMcxQUpu0MHeEjPrv1USK1lcNh/QIF9jcxjK3XBvYzDj+a85rSTOc+VrAVgvOGFl/a7A55dJ/XEdpw0IcSBUYJOnb6097BrOm0x3ibnAeSqkNswUliBHMmfQgTjgLLmkHLuopo0AdjkttQzp7lJj0BKSx48Xa+NtEdwzzeoNNQLaj8xmINsfYwz7AFM4qoJR96TL9/JK/kQlDq9sbqq8hXyTMQ206XbxNV+J4nSg1PminKOE/+u0mYzPFVkCGyF/eRmuvqc6r4SCNyDX7KKazOyUBZy7q/nfz+fncIaNWAAHmkP8z6sE1wVWvBiJTdm6qLNkz+YcACpft83l4wlkjaHcP8J2wPySLuWoU3h9sRdwZqEDNHOAVZUMPrc6kDSMvlN5qQYiT4hVsbf9IluzzKod85Yp/TUGqFqe/IN4XbhAMu26OI3pT+PVH+PtpWhYUmqMjesBQjcYhUAtFEr+iiHzNvO681xnr8XTEx9y9VZ0bBxTpU7K7b7FA1ayS6Jl2eJwWaTQf+WvpgBFQ4BNANYmqV9qy24Ed8vM4HXgVA3xRYaDN0McB4Z09IWOXVqhWzSnAkUhS6fLC5vN5LzeYc3dghDKFy56sBzESubeX98faKrr/9UCoNWUyMolyA2TWMyrR6J77p7uSaiNzK1JRTneECRgnFhQ/t/n27inf3acZVv6xSmPCZjve/wLTl8jZESnd8YT1Vn/Cf6Q35ytPC5jCXvS7k6C27tu4PNYQB50ZlNll2c
 QlM4JKSF
 2HGb1PeOfA1KuewKsUpXjWgz9SEiRni+geGoxp/V+ynNm7RaM54DyM5znojyAX+FGS/adUqZ1uu/zXUUPS11h2VIiHTIU5HTi/VL2t8IJW2f44JGJAdyuLlX77HANosd0JdxTX1jTkB59HGkghKXmBCBm4BvNJbm3C/2Y+fVbek5jxMlTB2tq2OvvzkjK+WM3qnDSTtXmDTWExcZ37f+ewmYcjqzahzHYKEG2jHCJKPxm5x4dKAn1SbnkeJa6XFoN5tV4pB+Jgfn0XVE+elvLd4P+T15LkrD0+OK7k1qUJ+JHwk/VpbhY7N4Qlg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


On 2025/7/2 19:55, David Hildenbrand wrote:
> On 02.07.25 13:38, David Hildenbrand wrote:
>>
>>>> So by mapping more in a single page fault, you end up increasing "RSS".
>>>> But I wouldn't
>>>> call that "expected". I rather suspect that nobody will really care :)
>>>
>>> But tmpfs is a little special here. It uses the 'huge=' option to
>>> control large folio allocation. So, I think users should know they want
>>> to use large folios and build the whole mapping for the large folios.
>>> That is why I call it 'expected'.
>>
>> Well, if your distribution decides to set huge= on /tmp or something
>> like that, your application might have very little saying in that, 
>> right? :)
>>
>> Again, I assume it's fine, but we might find surprises on the way.
>>
>>>>
>>>> The thing is, when you *allocate* a new folio, it must adhere at 
>>>> least to
>>>> pagecache alignment (e.g., cannot place an order-2 folio at pgoff 1) --
>>>
>>> Yes, agree.
>>>
>>>> that is what
>>>> thp_vma_suitable_order() checks. Otherwise you cannot add it to the
>>>> pagecache.
>>>
>>> But this alignment is not done by thp_vma_suitable_order().
>>>
>>> For tmpfs, it will check the alignment in shmem_suitable_orders() via:
>>> "
>>>     if (!xa_find(&mapping->i_pages, &aligned_index,
>>>             aligned_index + pages - 1, XA_PRESENT))
>>> "
>>
>> That's not really alignment check, that's just checking whether a
>> suitable folio order spans already-present entries, no?
>>
>> Finding suitable orders is still up to other code IIUC.
>>
>>>
>>> For other fs systems, it will check the alignment in
>>> __filemap_get_folio() via:
>>> "
>>>     /* If we're not aligned, allocate a smaller folio */
>>>     if (index & ((1UL << order) - 1))
>>>         order = __ffs(index);
>>> "
>>>
>>>> But once you *obtain* a folio from the pagecache and are supposed to 
>>>> map it
>>>> into the page tables, that must already hold true.
>>>>
>>>> So you should be able to just blindly map whatever is given to you here
>>>> AFAIKS.
>>>>
>>>> If you would get a pagecache folio that violates the linear page offset
>>>> requirement
>>>> at that point, something else would have messed up the pagecache.
>>>
>>> Yes. But the comments from thp_vma_suitable_order() is not about the
>>> pagecache alignment, it says "the order-aligned addresses in the VMA map
>>> to order-aligned offsets within the file",
>>
>> Let's dig, it's confusing.
>>
>> The code in question is:
>>
>> if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>>         hpage_size >> PAGE_SHIFT))
>>
>> So yes, I think this tells us: if we would have a PMD THP in the
>> pagecache, would we be able to map it with a PMD. If not, then don't
>> bother with allocating a PMD THP.
>>
>> Of course, this also applies to other orders, but for PMD THPs it's
>> probably most relevant: if we cannot even map it through a PMD, then
>> probably it could be a wasted THP.
>>
>> So yes, I agree: if we are both no missing something, then this
>> primarily relevant for the PMD case.
>>
>> And it's more about "optimization" than "correctness" I guess?
> 
> Correction: only if a caller doesn't assume that this is an implicit 
> pagecache alignment check. Not sure if that might be the case for shmem 
> when it calls thp_vma_suitable_order() with a VMA ...

I am sure shmem will not use thp_vma_suitable_order() for pagecache 
alignment checks, because shmem has explicit code for pagecache 
alignment checks.

Adding thp_vma_suitable_order() in shmem is more about following the 
allocation logic of anonymous pages. I also think the 'IS_ALIGNED()' 
check in thp_vma_suitable_order() for shmem is more about 'optimization' 
rather than 'correction'. Anyway, I will take another look at shmem's 
checking logic.