linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: p.jaroszynski@gmail.com, linux-mm@kvack.org,
	Jan Stancek <jstancek@redhat.com>
Subject: Re: [PATCH] Fix do_move_pages_to_node() error handling
Date: Wed, 14 Nov 2018 22:22:24 +0100	[thread overview]
Message-ID: <20181114212224.GE23419@dhcp22.suse.cz> (raw)
In-Reply-To: <ddf79812-7702-d513-3f83-70bba1b258db@nvidia.com>

On Wed 14-11-18 10:04:45, Piotr Jaroszynski wrote:
> On 11/14/18 3:29 AM, Michal Hocko wrote:
> > On Wed 14-11-18 08:34:15, Michal Hocko wrote:
> >> On Tue 13-11-18 16:40:59, p.jaroszynski@gmail.com wrote:
> >>> From: Piotr Jaroszynski <pjaroszynski@nvidia.com>
> >>>
> >>> migrate_pages() can return the number of pages that failed to migrate
> >>> instead of 0 or an error code. If that happens, the positive return is
> >>> treated as an error all the way up through the stack leading to the
> >>> move_pages() syscall returning a positive number. I believe this
> >>> regressed with commit a49bd4d71637 ("mm, numa: rework do_pages_move")
> >>> that refactored a lot of this code.
> >>
> >> Yes this is correct.
> >>
> >>> Fix this by treating positive returns as success in
> >>> do_move_pages_to_node() as that seems to most closely follow the
> >>> previous code. This still leaves the question whether silently
> >>> considering this case a success is the right thing to do as even the
> >>> status of the pages will be set as if they were successfully migrated,
> >>> but that seems to have been the case before as well.
> >>
> >> Yes, I believe the previous semantic was just wrong and we want to fix
> >> it. Jan has already brought this up [1]. I believe we want to update the
> >> documentation rather than restore the previous hazy semantic.
> 
> That's probably fair although at least some code we have will have to be
> updated as it just checks for non-zero returns from move_pages() and
> assumes errno is set when that happens.

Can you tell me more about your usecase plase? I definitely do not want
to break any existing userspace. Making the syscall return code more
reasonable is still attractive. So if this new semantic can work better
for you it would be one argument more to keep it this way.
 
> >> Just wondering, how have you found out? Is there any real application
> >> failing because of the change or this is a result of some test?
> 
> I have a test that creates a tmp file, mmaps it as shared, memsets the
> memory and then attempts to move it to a different node. It used to
> work, but now fails. I suspect the filesystem's migratepage() callback
> regressed and will look into it next. So far I have only tested this on
> powerpc with the xfs filesystem.

I would be surprise if the rewor changed the migration behavior.

[...]
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index f7e4bfdc13b7..aa53ebc523eb 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1615,8 +1615,16 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >  			goto out_flush;
> >  
> >  		err = do_move_pages_to_node(mm, &pagelist, current_node);
> > -		if (err)
> > +		if (err) {
> > +			/*
> > +			 * Possitive err means the number of failed pages to
> > +			 * migrate. Make sure to report the rest of the
> > +			 * nr_pages is not migrated as well.
> > +			 */
> > +			if (err > 0)
> > +				err += nr_pages - i - 1;
> >  			goto out;
> 
> Ok, so we give up after the first failure to migrate everything. That
> probably makes sense although I don't have a good idea about how
> frequent it is for the migration to give up in such a manner (short of
> the issue I'm seeing that I suspect is a separate bug). In this case,
> should the status of each page be updated to something instead of being
> left undefined? Or should it be specified that page status is only valid
> for the first N - not migrated pages?

I believe this is consistent with the previous behavior. I do not
remember in detail but I believe we haven't set the status for the
remaining pages. With this patch it seems straightforward to skip over
exact number of pages that failed.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-11-14 21:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  0:40 p.jaroszynski
2018-11-14  7:34 ` Michal Hocko
2018-11-14 11:29   ` Michal Hocko
2018-11-14 18:04     ` Piotr Jaroszynski
2018-11-14 21:22       ` Michal Hocko [this message]
2018-11-15  1:04         ` Piotr Jaroszynski
2018-11-15  8:47           ` Michal Hocko
2018-11-15 18:58             ` Piotr Jaroszynski
2018-11-16 11:49               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181114212224.GE23419@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=jstancek@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=p.jaroszynski@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox