linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Qian Cai <cai@lca.pw>, John Hubbard <jhubbard@nvidia.com>
Cc: fabecassis@nvidia.com, mhocko@suse.com, cl@linux.com,
	vbabka@suse.cz, mgorman@techsingularity.net,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
Date: Thu, 5 Dec 2019 17:11:58 -0800	[thread overview]
Message-ID: <9588b886-7ced-6502-67f0-0eb623045903@linux.alibaba.com> (raw)
In-Reply-To: <D0A99204-A60F-428E-B77A-63DBCD7103F4@lca.pw>



On 12/5/19 4:19 PM, Qian Cai wrote:
>
>> On Dec 5, 2019, at 7:04 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> Felix's code is not random test code. It's code he wrote and he expected it to work.
> Sure, but could he show a bit detail if the kernel will start to behavior as expected like what was written in the manpage to return ENOENT in this case, why is it not acceptable? i.e., why is it important to depend on the broken behavior?


I think I found the root cause. It did return -ENOENT when the syscall 
was introduced (my first impression was wrong), but the behavior was 
changed since 2.6.28 by commit e78bbfa82624 ("mm: stop returning -ENOENT 
from sys_move_pages() if nothing got migrated"). The full commit log is 
as below:

Author: Brice Goglin <Brice.Goglin@inria.fr>
Date:   Sat Oct 18 20:27:15 2008 -0700

     mm: stop returning -ENOENT from sys_move_pages() if nothing got 
migrated

     A patchset reworking sys_move_pages().  It removes the possibly large
     vmalloc by using multiple chunks when migrating large buffers. It also
     dramatically increases the throughput for large buffers since the 
lookup
     in new_page_node() is now limited to a single chunk, causing the 
quadratic
     complexity to have a much slower impact.  There is no need to use any
     radix-tree-like structure to improve this lookup.

     sys_move_pages() duration on a 4-quadcore-opteron 2347HE (1.9Gz),
     migrating between nodes #2 and #3:

             length          move_pages (us)         move_pages+patch (us)
             4kB             126                     98
             40kB            198                     168
             400kB           963                     937
             4MB             12503                   11930
             40MB            246867                  11848

     Patches #1 and #4 are the important ones:
     1) stop returning -ENOENT from sys_move_pages() if nothing got migrated
     2) don't vmalloc a huge page_to_node array for do_pages_stat()
     3) extract do_pages_move() out of sys_move_pages()
     4) rework do_pages_move() to work on page_sized chunks
     5) move_pages: no need to set pp->page to ZERO_PAGE(0) by default

     This patch:

     There is no point in returning -ENOENT from sys_move_pages() if all 
pages
     were already on the right node, while we return 0 if only 1 page 
was not.
     Most application don't know where their pages are allocated, so 
it's not
     an error to try to migrate them anyway.

     Just return 0 and let the status array in user-space be checked if the
     application needs details.

     It will make the upcoming chunked-move_pages() support much easier.

     Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
     Acked-by: Christoph Lameter <cl@linux-foundation.org>
     Cc: Nick Piggin <nickpiggin@yahoo.com.au>
     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


So the behavior was changed in kernel intentionally but never reflected 
in the manpage. I will propose a patch to fix the document.


  reply	other threads:[~2019-12-06  1:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 18:54 Yang Shi
2019-12-05 19:19 ` Qian Cai
2019-12-05 19:27   ` Yang Shi
2019-12-05 19:34     ` Qian Cai
2019-12-05 22:09       ` Yang Shi
2019-12-05 22:23         ` Qian Cai
2019-12-05 22:41           ` John Hubbard
2019-12-05 23:16             ` Qian Cai
2019-12-05 23:24               ` John Hubbard
2019-12-05 23:58                 ` Qian Cai
2019-12-06  0:04                   ` John Hubbard
2019-12-06  0:19                     ` Qian Cai
2019-12-06  1:11                       ` Yang Shi [this message]
2019-12-05 19:45 ` Christopher Lameter
2019-12-05 21:59   ` Yang Shi
2019-12-06  7:35   ` Michal Hocko
2019-12-05 21:27 ` John Hubbard
2019-12-05 22:00   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9588b886-7ced-6502-67f0-0eb623045903@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=cl@linux.com \
    --cc=fabecassis@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox