From: Yang Shi <yang.shi@linux.alibaba.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: hughd@google.com, kirill.shutemov@linux.intel.com,
aarcange@redhat.com, akpm@linux-foundation.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Mon, 25 Nov 2019 10:24:38 -0800 [thread overview]
Message-ID: <3a35da3a-dff0-a8ca-8269-3018fff8f21b@linux.alibaba.com> (raw)
In-Reply-To: <20191125093611.hlamtyo4hvefwibi@box>
On 11/25/19 1:36 AM, Kirill A. Shutemov wrote:
> On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote:
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache. This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory. But, the inflation
>> usecase really needs get the memory freed. Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> To protect the usecases which may prefer preserving THP, introduce a
>> new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is
>> preferred behavior if truncating partial THP. This mode just makes
>> sense to tmpfs for the time being.
> We need to clarify interaction with khugepaged. This implementation
> doesn't do anything to prevent khugepaged from collapsing the range back
> to THP just after the split.
Yes, it doesn't. Will clarify this in the commit log.
>
>> @@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> }
>> unlock_page(page);
>> }
>> +rescan_split:
>> pagevec_remove_exceptionals(&pvec);
>> pagevec_release(&pvec);
>> +
>> + if (split && PageTransCompound(page)) {
>> + /* The THP may get freed under us */
>> + if (!get_page_unless_zero(compound_head(page)))
>> + goto rescan_out;
>> +
>> + lock_page(page);
>> +
>> + /*
>> + * The extra pins from page cache lookup have been
>> + * released by pagevec_release().
>> + */
>> + if (!split_huge_page(page)) {
>> + unlock_page(page);
>> + put_page(page);
>> + /* Re-look up page cache from current index */
>> + goto again;
>> + }
>> + unlock_page(page);
>> + put_page(page);
>> + }
>> +rescan_out:
>> index++;
>> }
> Doing get_page_unless_zero() just after you've dropped the pin for the
> page looks very suboptimal.
If I don't drop the pins the THP can't be split. And, there might be
more than one pins from find_get_entries() if I read the code correctly.
For example, truncate 8K length in the middle of THP, the THP's refcount
would get bumpped twice since two sub pages would be returned.
>
next prev parent reply other threads:[~2019-11-25 18:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-23 1:05 Yang Shi
2019-11-25 9:36 ` Kirill A. Shutemov
2019-11-25 18:24 ` Yang Shi [this message]
2019-11-25 18:33 ` Kirill A. Shutemov
2019-11-25 19:33 ` Yang Shi
2019-11-26 23:34 ` Yang Shi
2019-11-28 3:06 ` Hugh Dickins
2019-11-28 11:34 ` Kirill A. Shutemov
2019-12-02 23:14 ` Yang Shi
2019-12-02 23:07 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3a35da3a-dff0-a8ca-8269-3018fff8f21b@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox