linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Zi Yan <zi.yan@sent.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [RFC PATCH 1/4] mm: madvise: read loop's step size beforehand in madvise_inject_error(), prepare for THP support.
Date: Thu, 24 Aug 2017 04:26:10 +0000	[thread overview]
Message-ID: <20170824042608.GA30150@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <41F3B393-6FDA-4CF9-A790-A1B4B4FDFA58@sent.com>

On Wed, Aug 23, 2017 at 10:20:02AM -0400, Zi Yan wrote:
> On 23 Aug 2017, at 3:49, Naoya Horiguchi wrote:
> 
> > On Mon, Aug 14, 2017 at 09:52:13PM -0400, Zi Yan wrote:
> >> From: Zi Yan <zi.yan@cs.rutgers.edu>
> >>
> >> The loop in madvise_inject_error() reads its step size from a page
> >> after it is soft-offlined. It works because the page is:
> >> 1) a hugetlb page: the page size does not change;
> >> 2) a base page: the page size does not change;
> >> 3) a THP: soft-offline always splits THPs, thus, it is OK to use
> >>    PAGE_SIZE as step size.
> >>
> >> It will be a problem when soft-offline supports THP migrations.
> >> When a THP is migrated without split during soft-offlining, the THP
> >> is split after migration, thus, before and after soft-offlining page
> >> sizes do not match. This causes a THP to be unnecessarily soft-lined,
> >> at most, 511 times, wasting free space.
> >
> > Hi Zi Yan,
> >
> > Thank you for the suggestion.
> >
> > I think that when madvise(MADV_SOFT_OFFLINE) is called with some range
> > over more than one 4kB page, the caller clearly intends to call
> > soft_offline_page() over all 4kB pages within the range in order to
> > simulate the multiple soft-offline events. Please note that the caller
> > only knows that specific pages are half-broken, and expect that all such
> > pages are offlined. So the end result should be same, whether the given
> > range is backed by thp or not.
> >
> 
> But if the given virtual address is backed by a THP and the THP is soft-offlined
> without splitting (enabled by following patches), the old for-loop will cause extra
> 511 THPs being soft-offlined.
> 
> For example, the caller wants to offline VPN 0-511, which is backed by a THP whose
> address range is PFN 0-511. In the first iteration of the for-loop,
> get_user_pages_fast(VPN0, ...) will return the THP and soft_offline_page() will offline the THP,
> replacing it with a new THP, say PFN 512-1023, so VPN 0-511 is backed by PFN 512-1023.
> But the original THP will be split after it is freed, thus, for-loop will not end
> at this moment, but continues to offline VPN1, which leads to PFN 512-1023 being offlined
> and replaced by another THP, say 1024-1535. This will go on and end up with
> 511 extra THPs are offlined. That is why we need to this patch to tell
> whether the THP is offlined as a whole or just its head page is offlined.

Thanks for elaborating this. I understand your point.
But I still not sure what the best behavior is.

madvise(MADV_SOFT_OFFLINE) is a test feature and giving multi-page range
on the call works like some stress testing. So multiple thp migrations
seem to me an expected behavior. At least it behaves in the same manner
as calling madvise(MADV_SOFT_OFFLINE) 512 times on VPN0-VPN511 separately,
which is consistent.

So I still feel like leaving the current behavior as long as your later
patches work without this change.

Thanks,
Naoya Horiguchi

> 
> Let me know if it is still not clear to you. Or I missed something.
> 
> >>
> >> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> >> ---
> >>  mm/madvise.c | 21 ++++++++++++++++++---
> >>  1 file changed, 18 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/madvise.c b/mm/madvise.c
> >> index 47d8d8a25eae..49f6774db259 100644
> >> --- a/mm/madvise.c
> >> +++ b/mm/madvise.c
> >> @@ -612,19 +612,22 @@ static long madvise_remove(struct vm_area_struct *vma,
> >>  static int madvise_inject_error(int behavior,
> >>  		unsigned long start, unsigned long end)
> >>  {
> >> -	struct page *page;
> >> +	struct page *page = NULL;
> >> +	unsigned long page_size = PAGE_SIZE;
> >>
> >>  	if (!capable(CAP_SYS_ADMIN))
> >>  		return -EPERM;
> >>
> >> -	for (; start < end; start += PAGE_SIZE <<
> >> -				compound_order(compound_head(page))) {
> >> +	for (; start < end; start += page_size) {
> >>  		int ret;
> >>
> >>  		ret = get_user_pages_fast(start, 1, 0, &page);
> >>  		if (ret != 1)
> >>  			return ret;
> >>
> >> +		page_size = (PAGE_SIZE << compound_order(compound_head(page))) -
> >> +			(PAGE_SIZE * (page - compound_head(page)));
> >> +
> >
> > Assigning a value which is not 4kB or some hugepage size into page_size
> > might be confusing because that's not what the name says. You can introduce
> > 'next' virtual address and ALIGN() might be helpful to calculate it.
> 
> Like:
> 
> next = ALIGN(start, PAGE_SIZE<<compound_order(compound_head(page))) - start;
> 
> I think it works. Thanks.
> 
> 
> --
> Best Regards
> Yan Zi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-24  4:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  1:52 [RFC PATCH 0/4] mm: hwpoison: soft-offline support for thp migration Zi Yan
2017-08-15  1:52 ` [RFC PATCH 1/4] mm: madvise: read loop's step size beforehand in madvise_inject_error(), prepare for THP support Zi Yan
2017-08-23  7:49   ` Naoya Horiguchi
2017-08-23 14:20     ` Zi Yan
2017-08-24  4:26       ` Naoya Horiguchi [this message]
2017-08-24 14:26         ` Zi Yan
2017-08-15  1:52 ` [RFC PATCH 2/4] mm: soft-offline: Change soft_offline_page() interface to tell if the page is split or not Zi Yan
2017-08-15  1:52 ` [RFC PATCH 3/4] mm: soft-offline: retry to split and soft-offline the raw error if the original THP offlining fails Zi Yan
2017-08-24  7:31   ` Naoya Horiguchi
2017-08-15  1:52 ` [RFC PATCH 4/4] mm: hwpoison: soft offline supports thp migration Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170824042608.GA30150@hori1.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=zi.yan@sent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox