linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Li Xinhai <lixinhai.lxh@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: akpm <akpm@linux-foundation.org>, mhocko <mhocko@suse.com>,
	n-horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
Date: Wed, 15 Jan 2020 13:30:58 -0800	[thread overview]
Message-ID: <9fabdc16-fc31-7e06-e306-af38850de771@linux.alibaba.com> (raw)
In-Reply-To: <c102d153-5093-94c9-7e6d-65fd6881831d@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 5502 bytes --]



On 1/15/20 1:07 PM, Mike Kravetz wrote:
> Naoya, would appreciate any comments you have as this seems to be related
> to your changes to add huge page migration support.
>
> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>     - Is that leftover from the the days when huge page migration was not
>>       supported?
>>     - Is it just because huge page migration is more likely to fail than
>>       base page migration.
>
> According to man page, mbind was added to v2.6.7 kernel.  git history does
> not go back that far, so I first looked at v2.6.12.
>
> Definitions from include/linux/mempolicy.h
> ------------------------------------------
> /* Policies */
> #define MPOL_DEFAULT    0
> #define MPOL_PREFERRED  1
> #define MPOL_BIND       2
> #define MPOL_INTERLEAVE 3
>
> /* Flags for mbind */
> #define MPOL_MF_STRICT  (1<<0)  /* Verify existing pages in the mapping */
>
> Note the MPOL_MF_STRICT would simply verify current page placement.  At
> this time, hugetlb pages were not part of this verification.
>
>  From v2.6.12 routine check_range()
> ...
> 	for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
> 		if (!vma->vm_next && vma->vm_end < end)
> 			return ERR_PTR(-EFAULT);
> 		if (prev && prev->vm_end < vma->vm_start)
> 			return ERR_PTR(-EFAULT);
> 		if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
> 			err = verify_pages(vma->vm_mm,
> 					   vma->vm_start, vma->vm_end, nodes);
> 			if (err) {
> 				first = ERR_PTR(err);
> 				break;
> 			}
> 		}
> 		prev = vma;
> 	}
> ...
>
> man page history does not go back this far.  The first mbind man page has
> some things that are interesting:
> 1) It DOES NOT have the note saying MPOL_MF_STRICT is ignored on huge
>     page mappings (even though the code does this).
> 2) Support for huge page policy was added with 2.6.16
> 3) For MPOL_MF_STRICT, in 2.6.16 or later the kernel will also try to
>     move pages to the requested node with this flag.
>
> Next look at v2.6.16 kernel
> ==========================
> This kernel introduces the MPOL_MF_MOVE* flags
> #define MPOL_MF_MOVE    (1<<1)  /* Move pages owned by this process to conform
> 				   to mapping */
> #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
>
> Once again, the up front check_range() routine will skip hugetlb vma's.
> Note that check_range() will also isolate pages.  So since hugetlb vma's
> are skipped, no hugetlb pages will be isolated.  Since no hugetlb pages
> are isolated, none will be on the list to be migrate.  Therefore, hugetlb
> pages are effectively ignored for the new MPOL_MF_MOVE* flags.  This makes
> sense as huge page migration support was not added until the v3.12 kernel.
>
> Note that at when MPOL_MF_MOVE* flags were added to the mbind man page,
> the statement "MPOL_MF_STRICT is ignored on huge page mappings right now."
> was added.  This was later changed to "MPOL_MF_STRICT is ignored on huge
> page mappings."
>
> Next look at v3.12 kernel
> =========================
> The v3.12 code looks similiar to today's code.  In the verify/isolation
> phase, v3.12 routine queue_pages_hugetlb_pmd_range() looks very similiar
> to queue_pages_hugetlb().  Neither will cause errors at this point in the
> process.  And, hugetlb pages with a mapcount == 1 will potentially be
> isolated and added to the list of pages to be migrated.  In addition, if
> the subsequent call to migrate_pages() fails to migrate ANY page, we
> return -EIO if MPOL_MF_STRICT is set.  This is true even if the only page(s)
> that failed to migrate were hugetlb pages.
>
> It should also be noted that no mbind man page updates were made WRT
> hugetlb pages after migration support was added.
>
> Summary
> =======
> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
> mappings." is left over from the original mbind implementation.  When
> the huge page migration support was added, I can not be sure if ignoring
> MPOL_MF_STRICT for huge pages during the verify/isolation phase was
> intentional.  It seems like it was as the return value from
> isolate_huge_page() is ignored.

Thanks a lot for digging into the history.

>
> What should we do?
> ==================
> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>     seen as conflicting with man page has existed since v3.12 and I am
>     not aware of any complaints.
> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>     MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>     after a failure of migrate_pages().  We could simply traverse the list
>     of pages that were not migrated looking for any non-hugetlb page.

I don't think we can do this easily since migrate_pages() would put the 
migration failed hugetlb pages back to hugepage_activelist so there 
should not any hugetlb page reside on the pagelist regardless of failure 
if I read the code correctly.

Other than that traversing page list to look for a certain type page 
doesn't sound scalable to me.

> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>     and modify code accordingly.
>
> My suggestion would be for 1 or 2.  Thoughts?

By rethinking the history (thanks again for digging into it), it sounds 
#3 should be more reasonable. It sounds like the behavior was changed 
since hugetlb migration was added but the man page was not updated to 
reflect the change.



[-- Attachment #2: Type: text/html, Size: 6455 bytes --]

  reply	other threads:[~2020-01-15 21:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
2020-01-14 14:09   ` Li Xinhai
2020-01-14 18:27     ` Yang Shi
2020-01-15  1:07     ` Mike Kravetz
2020-01-15  1:24       ` Yang Shi
2020-01-15  4:28         ` Mike Kravetz
2020-01-15  5:23           ` Yang Shi
2020-01-15  7:36             ` Li Xinhai
2020-01-15 17:16               ` Yang Shi
2020-01-15 21:07       ` Mike Kravetz
2020-01-15 21:30         ` Yang Shi [this message]
2020-01-15 21:45           ` Mike Kravetz
2020-01-15 21:59             ` Yang Shi
2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
2020-01-16 15:32                 ` Li Xinhai
2020-01-16  7:59         ` Michal Hocko
2020-01-16 19:22           ` Mike Kravetz
2020-01-17  2:32             ` Yang Shi
2020-01-17  2:38             ` Li Xinhai
2020-01-17  7:57               ` Michal Hocko
2020-01-17 12:05                 ` Li Xinhai
2020-01-17 15:20                   ` Michal Hocko
2020-01-17 15:46                     ` Li Xinhai
2020-01-20 12:45                       ` Michal Hocko
2020-01-21 14:15                         ` Li Xinhai
2020-01-21 14:53                           ` Michal Hocko
2020-01-22 13:55                             ` Li Xinhai
2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
2020-01-15  1:25 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fabdc16-fc31-7e06-e306-af38850de771@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=lixinhai.lxh@gmail.com \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox