linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: p.jaroszynski@gmail.com, linux-mm@kvack.org,
	Jan Stancek <jstancek@redhat.com>
Subject: Re: [PATCH] Fix do_move_pages_to_node() error handling
Date: Thu, 15 Nov 2018 09:47:52 +0100	[thread overview]
Message-ID: <20181115084752.GF23831@dhcp22.suse.cz> (raw)
In-Reply-To: <33626151-aeea-004a-36f5-27ddf6ff9008@nvidia.com>

On Wed 14-11-18 17:04:37, Piotr Jaroszynski wrote:
> On 11/14/18 1:22 PM, Michal Hocko wrote:
> > On Wed 14-11-18 10:04:45, Piotr Jaroszynski wrote:
> >> On 11/14/18 3:29 AM, Michal Hocko wrote:
> >>> On Wed 14-11-18 08:34:15, Michal Hocko wrote:
> >>>> On Tue 13-11-18 16:40:59, p.jaroszynski@gmail.com wrote:
> >>>>> From: Piotr Jaroszynski <pjaroszynski@nvidia.com>
> >>>>>
> >>>>> migrate_pages() can return the number of pages that failed to migrate
> >>>>> instead of 0 or an error code. If that happens, the positive return is
> >>>>> treated as an error all the way up through the stack leading to the
> >>>>> move_pages() syscall returning a positive number. I believe this
> >>>>> regressed with commit a49bd4d71637 ("mm, numa: rework do_pages_move")
> >>>>> that refactored a lot of this code.
> >>>>
> >>>> Yes this is correct.
> >>>>
> >>>>> Fix this by treating positive returns as success in
> >>>>> do_move_pages_to_node() as that seems to most closely follow the
> >>>>> previous code. This still leaves the question whether silently
> >>>>> considering this case a success is the right thing to do as even the
> >>>>> status of the pages will be set as if they were successfully migrated,
> >>>>> but that seems to have been the case before as well.
> >>>>
> >>>> Yes, I believe the previous semantic was just wrong and we want to fix
> >>>> it. Jan has already brought this up [1]. I believe we want to update the
> >>>> documentation rather than restore the previous hazy semantic.
> >>
> >> That's probably fair although at least some code we have will have to be
> >> updated as it just checks for non-zero returns from move_pages() and
> >> assumes errno is set when that happens.
> > 
> > Can you tell me more about your usecase plase? I definitely do not want
> > to break any existing userspace. Making the syscall return code more
> > reasonable is still attractive. So if this new semantic can work better
> > for you it would be one argument more to keep it this way.
> >  
> 
> One of our APIs exposes a way to move a VA range to a GPU NUMA node or one of
> the CPU NUMA nodes. The code keeps retrying move_pages() and relies on
> the reported page status to decide whether each page is done, needs a
> retry (EAGAIN or EBUSY) or possibly needs a fallback (EMEM).
> 
> With the previous behaviour we would get a success, but the page status
> would be reported incorrectly. That's bad as we skip the migration
> without knowing about it.

Exactly.

> With the current code we get what we interpret as success as errno is 0,
> but the page status is gargabe/untouched. That's also bad.

Agreed.

> The proposed solution adds a new case to handle, but it will just tell
> us the page status is unusable and all we can do is just retry blindly.
> If it was possible to plumb through the migration status for each page
> accurately that would allow us to save redoing the call for pages that
> actually worked. Perhaps we would need a special status for pages
> skipped due to errors.

This would be possible but with this patch applied you should know how
many pages to skip from the tail of the array.

> But maybe this is all a tiny corner case short of the bug I hit (see
> more below) and it's not worth thinking too much about.
> 
> >>>> Just wondering, how have you found out? Is there any real application
> >>>> failing because of the change or this is a result of some test?
> >>
> >> I have a test that creates a tmp file, mmaps it as shared, memsets the
> >> memory and then attempts to move it to a different node. It used to
> >> work, but now fails. I suspect the filesystem's migratepage() callback
> >> regressed and will look into it next. So far I have only tested this on
> >> powerpc with the xfs filesystem.
> > 
> > I would be surprise if the rewor changed the migration behavior.
> 
> It didn't, I tracked it down to the new fs/iomap.c code used by xfs not
> being compatible with migrate_page_move_mapping() and prepared a perhaps
> naive fix in [1].

I am not familiar with iomap code much TBH so I cannot really judge your
fix.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-11-15  8:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  0:40 p.jaroszynski
2018-11-14  7:34 ` Michal Hocko
2018-11-14 11:29   ` Michal Hocko
2018-11-14 18:04     ` Piotr Jaroszynski
2018-11-14 21:22       ` Michal Hocko
2018-11-15  1:04         ` Piotr Jaroszynski
2018-11-15  8:47           ` Michal Hocko [this message]
2018-11-15 18:58             ` Piotr Jaroszynski
2018-11-16 11:49               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181115084752.GF23831@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=jstancek@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=p.jaroszynski@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox