From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>,
Matthew Wilcox <willy@infradead.org>,
"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Anna Schumaker <Anna.Schumaker@netapp.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
Date: Wed, 9 Oct 2024 16:52:48 +0800 [thread overview]
Message-ID: <796d33c3-f97d-41ad-9ba7-99ade5dcfcee@linux.alibaba.com> (raw)
In-Reply-To: <7d76fe98-4f7f-4f3d-9e8e-79d836f945cb@huawei.com>
On 2024/10/9 15:09, Kefeng Wang wrote:
>
>
> On 2024/9/30 14:48, Baolin Wang wrote:
>>
>>
>> On 2024/9/30 11:15, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/30 10:52, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2024/9/30 10:30, Kefeng Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/9/30 10:02, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung)
>>>>>>> wrote:
>>>>>>>>> So this is why I don't use mapping_set_folio_order_range()
>>>>>>>>> here, but
>>>>>>>>> correct me if I am wrong.
>>>>>>>>
>>>>>>>> Yeah, the inode is active here as the max folio size is decided
>>>>>>>> based on
>>>>>>>> the write size, so probably mapping_set_folio_order_range() will
>>>>>>>> not be
>>>>>>>> a safe option.
>>>>>>>
>>>>>>> You really are all making too much of this. Here's the patch I
>>>>>>> think we
>>>>>>> need:
>>>>>>>
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2831,7 +2831,8 @@ static struct inode
>>>>>>> *__shmem_get_inode(struct mnt_idmap *idmap,
>>>>>>> cache_no_acl(inode);
>>>>>>> if (sbinfo->noswap)
>>>>>>> mapping_set_unevictable(inode->i_mapping);
>>>>>>> - mapping_set_large_folios(inode->i_mapping);
>>>>>>> + if (sbinfo->huge)
>>>>>>> + mapping_set_large_folios(inode->i_mapping);
>>>>>>>
>>>>>>> switch (mode & S_IFMT) {
>>>>>>> default:
>>>>>>
>>>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after
>>>>>> adding support for large folios in the tmpfs write and fallocate
>>>>>> paths [1].
>>>
>>> Forget to mention, we still need to check sbinfo->huge, if mount with
>>> huge=never, but we fault in large chunk, write is slower than without
>>> 9aac777aaf94, the above changes or my patch could fix it.
>>
>> My patch will allow allocating large folios in the tmpfs write and
>> fallocate paths though the 'huge' option is 'never'.
>
> Yes, indeed after checking your patch,
>
> The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on
> next-20241008,
>
> 1) huge=never
> the base: 2016438 K/Sec
> my v1/v2 or Matthew's patch : 2874504 K/Sec
> your patch with filemap_get_order() fix: 6330604 K/Sec
>
> 2) huge=always
> the write performance: 7168917 K/Sec
>
> Since large folios supported in the tmpfs write, we do have better
> performance shown above, that's great.
Great. Thanks for testing.
>> My initial thought for supporting large folio is that, if the 'huge'
>> option is enabled, to maintain backward compatibility, we only allow
>> 2M PMD-sized order allocations. If the 'huge' option is
>> disabled(huge=never), we still allow large folio allocations based on
>> the write length.
>>
>> Another choice is to allow the different sized large folio allocation
>> based on the write length when the 'huge' option is enabled, rather
>> than just the 2M PMD sized. But will force the huge orders off if
>> 'huge' option is disabled.
>>
>
> "huge=never Do not allocate huge pages. This is the default."
> From the document, it's better not to allocate large folio, but we need
> some special handle for huge=never or runtime deny/force.
Yes. I'm thinking of adding a new option (something like 'huge=mTHP') to
allocate large folios based on the write size.
I will resend the patchset, and we can discuss it there.
next prev parent reply other threads:[~2024-10-09 8:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-14 14:06 [PATCH -next] " Kefeng Wang
2024-09-15 10:40 ` Matthew Wilcox
2024-09-18 3:55 ` Kefeng Wang
2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
2024-09-22 0:35 ` Matthew Wilcox
2024-09-23 1:39 ` Kefeng Wang
2024-09-26 8:38 ` Pankaj Raghav (Samsung)
2024-09-26 13:52 ` Matthew Wilcox
2024-09-26 14:20 ` Kefeng Wang
2024-09-26 14:58 ` Matthew Wilcox
2024-09-30 1:27 ` Kefeng Wang
2024-09-30 2:02 ` Baolin Wang
2024-09-30 2:30 ` Kefeng Wang
2024-09-30 2:52 ` Baolin Wang
2024-09-30 3:15 ` Kefeng Wang
2024-09-30 6:48 ` Baolin Wang
2024-10-09 7:09 ` Kefeng Wang
2024-10-09 8:52 ` Baolin Wang [this message]
2024-10-11 6:59 ` [PATCH v3] tmpfs: don't enable large folios if not supported Kefeng Wang
2024-10-12 3:59 ` Baolin Wang
2024-10-14 2:36 ` Kefeng Wang
2024-10-17 14:17 ` [PATCH v4] " Kefeng Wang
2024-10-18 1:48 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=796d33c3-f97d-41ad-9ba7-99ade5dcfcee@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=Anna.Schumaker@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kernel@pankajraghav.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox