From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5AF07C83F04
	for <linux-mm@archiver.kernel.org>; Fri,  4 Jul 2025 02:04:50 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CBEA96B024F; Thu,  3 Jul 2025 22:04:49 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C6E166B0251; Thu,  3 Jul 2025 22:04:49 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B5D3F6B0253; Thu,  3 Jul 2025 22:04:49 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id A4F6B6B024F
	for <linux-mm@kvack.org>; Thu,  3 Jul 2025 22:04:49 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 26EE15D7D9
	for <linux-mm@kvack.org>; Fri,  4 Jul 2025 02:04:49 +0000 (UTC)
X-FDA: 83624938698.03.9FF465F
Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101])
	by imf03.hostedemail.com (Postfix) with ESMTP id 8E2D020003
	for <linux-mm@kvack.org>; Fri,  4 Jul 2025 02:04:46 +0000 (UTC)
Authentication-Results: imf03.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=nDWRnRon;
	spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Authentication-Results: i=1;
	imf03.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=nDWRnRon;
	spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751594687; a=rsa-sha256;
	cv=none;
	b=ia4ewe81tCd123SqJI8lYH+n6aeJnMV9000DJYd35kia1oKkknVLUBrUqc5iLx8T+TwN2e
	XAH42+uzKZ9zCt8IB6j4rXyQ14qaC4IxNeUdIsc1iGwiRYiLw+0IYnQEZ1zTfzb4aStIDw
	W8K+wJzEm6dg/4aBtHywPWXxjh50v9c=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1751594687;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=WFJuHlHY3Ldt7R2cbF8nsU275Mhe5OrHAgm5sTSJxVE=;
	b=d1VwsbP+rTIEIDqbx31h3iq36i9ZGMkAV18Jar2Y7d31Dy9wh80q7jGvd9iFLLlYpOF26S
	efhnzirI+GmIn7Ehmt5Q4iEpnQ/10Z49gPtUlwZ1YfEgWfQMWwXZ1D8bMvTBJiarJMsstB
	yH+T2d3tagOIIHA4RMGNu8sVkFaGIdY=
DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=linux.alibaba.com; s=default;
	t=1751594683; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type;
	bh=WFJuHlHY3Ldt7R2cbF8nsU275Mhe5OrHAgm5sTSJxVE=;
	b=nDWRnRonVKFXWixom4W9GK2aiI5gVxIQJCLRfBMbeSzP8svemaGC8QGVMFoTV0WI8sbpI6vttnjHBaVaElfqnJWW77nZv6y2KCeu7G9KCKCi371N1Dua72XrtuVppe/PKhywCP7ZuOegMA2toeOsc6J/POFQxjb0/QBONzqdoQo=
Received: from 30.74.144.116(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wh7to5V_1751594679 cluster:ay36)
          by smtp.aliyun-inc.com;
          Fri, 04 Jul 2025 10:04:40 +0800
Message-ID: <9771e4ac-4f25-4822-9882-d8a94813e7c0@linux.alibaba.com>
Date: Fri, 4 Jul 2025 10:04:39 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] mm: support large mapping building for tmpfs
To: David Hildenbrand <david@redhat.com>, akpm@linux-foundation.org,
 hughd@google.com
Cc: ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
 npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
 baohua@kernel.org, vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
 mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <d4cb6e578bca8c430174d5972550cbeb530ec3fe.1751359073.git.baolin.wang@linux.alibaba.com>
 <b8258f91-ad92-419e-a0a1-a8db706c814c@redhat.com>
 <fca114c1-9699-4dd7-9bca-83a5f5ac615d@linux.alibaba.com>
 <ec5d4e52-658b-4fdc-b7f9-f844ab29665c@redhat.com>
 <67c79f65-ca6d-43be-a4ec-decd08bbce0a@linux.alibaba.com>
 <7b17af10-b052-4719-bbce-ffad2d74006a@redhat.com>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
In-Reply-To: <7b17af10-b052-4719-bbce-ffad2d74006a@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: 8E2D020003
X-Stat-Signature: hpuk9ejrr9ckas53d7e1iooh8j1ytnqj
X-Rspam-User: 
X-HE-Tag: 1751594686-172003
X-HE-Meta: U2FsdGVkX1/tMDbPEDg25I7loQCgA9KGTIko1n8xMzduQtoqggFeb3VBzSNjCIvixKrxF9zGTkt6evhV/mnshxjr5Sgupq1yd1XqOtbKTz4thoIHzVUJTWgq/XHEN0LCYiP0C6SZq/+4K4XVByB5PV/HxGfvH1wiGdOJbrCKfjuOh7mlbA9hKFnR9rfqd0EIG/s+f8hMAe3axHMW5sXDc3XXs4HHG9nNufNRiCB9r16yqqb+3AFdeUnT6Z+PBcXZuIfkOCvcZL1dIclgWox8tJ0tJv+qopuMtqcVbcmG6lu9wWuhZEuhV07bVqYlWDiWq+UgGOvU3iom8oqzC3se/lUCMLjmcgl2ZmC93LziPT3AmKU4fsmExXkY5ZIobe3n+tSOX+a0eciEVuSq+gt/c40pbhbsg1z4RPy7IwVjFavLxQ+jGOTpFqwLQT2GU3tqy6bHmAwcwUHA+SSieIdXSZySmswrXGgv6Mk6YHs/bICqXfh04QZo3Hr4fdqFQDTCdOA10e50JTFslmH7guaJE/eKrS4GWuj6TSRO2RJr4nWEYucV1ABU/YGMggzsdlMgx5hI5m+A7bxn/29kRAo932RsCpC85aLeqvI/gfDCXySioN4gGN0TZ5+wbyDHhVgw+kipkwezikLt9WJim86moTSFGUEF/aok8U64ofz2bQbGX1K5tyr4AXUf+CU6aVkSCT8XiX5nOwb3brpzquZKcIft9GQth2I7aO8dgKYgPUdpTu9CnUaPI6FvDshfUcN5JRonzmpic8usO33Rd3rNo8poETCvIZx1oCMpw+ccOfD9UZ+89T1eLkA0Vp01znULY3eIJcec+ySGVBPj25/p26VlKQsRQUxfpVsWqpXRSa0iE58ZE/fAWg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


On 2025/7/2 19:38, David Hildenbrand wrote:
> 
>>> So by mapping more in a single page fault, you end up increasing "RSS".
>>> But I wouldn't
>>> call that "expected". I rather suspect that nobody will really care :)
>>
>> But tmpfs is a little special here. It uses the 'huge=' option to
>> control large folio allocation. So, I think users should know they want
>> to use large folios and build the whole mapping for the large folios.
>> That is why I call it 'expected'.
> 
> Well, if your distribution decides to set huge= on /tmp or something 
> like that, your application might have very little saying in that, 
> right? :)
> 
> Again, I assume it's fine, but we might find surprises on the way.
> 
>>>
>>> The thing is, when you *allocate* a new folio, it must adhere at 
>>> least to
>>> pagecache alignment (e.g., cannot place an order-2 folio at pgoff 1) --
>>
>> Yes, agree.
>>
>>> that is what
>>> thp_vma_suitable_order() checks. Otherwise you cannot add it to the
>>> pagecache.
>>
>> But this alignment is not done by thp_vma_suitable_order().
>>
>> For tmpfs, it will check the alignment in shmem_suitable_orders() via:
>> "
>>     if (!xa_find(&mapping->i_pages, &aligned_index,
>>             aligned_index + pages - 1, XA_PRESENT))
>> "
> 
> That's not really alignment check, that's just checking whether a 
> suitable folio order spans already-present entries, no?

Because 'aligned_index' already did round_down() before checking if it's 
suitable. So it's still considered an implicit alignment check.

"
	pages = 1UL << order;
	aligned_index = round_down(index, pages);
"

> Finding suitable orders is still up to other code IIUC.
> 
>>
>> For other fs systems, it will check the alignment in
>> __filemap_get_folio() via:
>> "
>>     /* If we're not aligned, allocate a smaller folio */
>>     if (index & ((1UL << order) - 1))
>>         order = __ffs(index);
>> "
>>
>>> But once you *obtain* a folio from the pagecache and are supposed to 
>>> map it
>>> into the page tables, that must already hold true.
>>>
>>> So you should be able to just blindly map whatever is given to you here
>>> AFAIKS.
>>>
>>> If you would get a pagecache folio that violates the linear page offset
>>> requirement
>>> at that point, something else would have messed up the pagecache.
>>
>> Yes. But the comments from thp_vma_suitable_order() is not about the
>> pagecache alignment, it says "the order-aligned addresses in the VMA map
>> to order-aligned offsets within the file",
> 
> Let's dig, it's confusing.
> 
> The code in question is:
> 
> if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>          hpage_size >> PAGE_SHIFT))
> 
> So yes, I think this tells us: if we would have a PMD THP in the 
> pagecache, would we be able to map it with a PMD. If not, then don't 
> bother with allocating a PMD THP.
> 
> Of course, this also applies to other orders, but for PMD THPs it's 
> probably most relevant: if we cannot even map it through a PMD, then 
> probably it could be a wasted THP.
> 
> So yes, I agree: if we are both no missing something, then this 
> primarily relevant for the PMD case.
> 
> And it's more about "optimization" than "correctness" I guess?
> 
> But when mapping a folio that is already in the pagecache, I assume this 
> is not required.
> 
> Assume we have a 2 MiB THP in the pagecache.
> 
> If someone were to map it at virtual addr 1MiB, we could still map 1MiB 
> worth of PTEs into a single page table in one go, and not fallback to 
> individual PTEs.

That's how I understand the code as well, I just wasn't very sure 
before. Thanks for your explanation which clarified my doubts.

I will drop the check in next version.