linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()
Date: Thu, 27 Sep 2018 14:12:39 -0700	[thread overview]
Message-ID: <20180927211238.ly3e7cyvfu3rswcv@ca-dmjordan1.us.oracle.com> (raw)
In-Reply-To: <87r2hfhger.fsf@yhuang-dev.intel.com>

On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> >> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >> >>  /*
> >> >>   * Increase reference count of swap entry by 1.
> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> >> >> - * but could not be atomically allocated.  Returns 0, just as if it succeeded,
> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> >> >> - * might occur if a page table entry has got corrupted.
> >> >> + *
> >> >> + * Return error code in following case.
> >> >> + * - success -> 0
> >> >> + * - swap_count_continuation is required but could not be atomically allocated.
> >> >> + *   *entry is used to return swap entry to call add_swap_count_continuation().
> >> >> + *								      -> ENOMEM
> >> >> + * - otherwise same as __swap_duplicate()
> >> >>   */
> >> >> -int swap_duplicate(swp_entry_t entry)
> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >> >>  {
> >> >>  	int err = 0;
> >> >>  
> >> >> -	while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> >> -		err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> >> +	while (!err &&
> >> >> +	       (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> >> >> +		err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >> >>  	return err;
> >> >
> >> > Now we're returning any error we get from __swap_duplicate, apparently to
> >> > accommodate ENOTDIR later in the series, which is a change from the behavior
> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  This might
> >> > belong in a separate patch given its potential for side effects.
> >> 
> >> I have checked all the calls of the function and found there will be no
> >> bad effect.  Do you have any side effect?
> >
> > Before I was just being vaguely concerned about any unintended side effects,
> > but looking again, yes I do.
> >
> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
> > a (potentially nonzero) entry.val, which copy_pte_range interprets
> > unconditionally as 'try adding a swap count continuation.'  Not what we want
> > for returns other than -ENOMEM.
> 
> Thanks for pointing this out!  Before the change in the patchset, the
> behavior is,
> 
> Something wrong is detected in swap_duplicate(), but the error is
> ignored.  Then copy_one_pte() will think everything is OK, so that it
> can proceed to call set_pte_at().  The system will be in inconsistent
> state and some data may be polluted!

Yes, the part about page table corruption in the comment above swap_duplicate.

> But this doesn't cause any problem in practical.  Per my understanding,
> because if other part of the kernel works correctly, it's impossible for
> swap_duplicate() return any error except -ENOMEM before the change in
> this patchset.

I agree with that, but it's not what I'm trying to explain.  I didn't go into
enough detail, let me try again.  Hopefully I'm understanding this right.

While running with these patches, say we're at

  copy_pte_range
   copy_one_pte
    swap_duplicate
     __swap_duplicate
      __swap_duplicate_locked
    
And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
-EEXIST.  That means __swap_duplicate and swap_duplicate also return -EEXIST.
copy_one_pte returns entry.val, which can be and usually is nonzero, so we
break out of the loop in copy_pte_range and then--erroneously--call
add_swap_count_continuation.

The add_swap_count_continuation call was added in 570a335b8e22 and relies on
the assumption that callers can only get -ENOMEM from swap_duplicate.  This
patch changes that assumption.

Not a big deal: the continuation call just returns early, no harm done, but it
allocs and frees a page needlessly, so we should fix it.  One way is to change
copy_one_pte's return to int so we can just pass the error code back to
copy_pte_range so it knows whether to try adding the continuation.

The other swap_duplicate caller, try_to_unmap_one, seems ok.

> But the error may be possible during development, and it
> may serve as some kind of document too.  So I suggest to add
> 
> VM_BUG_ON(error != -ENOMEM);
> 
> in swap_duplicate().  What do you think about that?

That doesn't seem necessary.

> > So it might make sense to have a separate patch that changes swap_duplicate's
> > return and makes callers handle it.
> 
> Thanks for your help to take a deep look at this.  I want to try to fix
> all potential problems firstly, because the number of the caller is
> quite limited.  Do you agree?

Yes, makes sense to me.

Daniel

  reply	other threads:[~2018-09-27 21:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25  7:13 [PATCH -V5 RESEND 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 02/21] swap: Add __swap_duplicate_locked() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying
2018-09-25 19:19   ` Daniel Jordan
2018-09-26 12:55     ` Huang, Ying
2018-09-26 14:51       ` Daniel Jordan
2018-09-27  1:34         ` Huang, Ying
2018-09-27 21:12           ` Daniel Jordan [this message]
2018-09-28  8:19             ` Huang, Ying
2018-09-28 21:32               ` Daniel Jordan
2018-09-29  0:50                 ` Huang, Ying
2018-10-01 17:21                   ` Daniel Jordan
2018-09-25  7:13 ` [PATCH -V5 RESEND 04/21] swap: Support PMD swap mapping in put_swap_page() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 07/21] swap: Support PMD swap mapping in split_swap_cluster() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 09/21] swap: Swapin a THP in one piece Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 10/21] swap: Support to count THP swapin and its fallback Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 11/21] swap: Add sysfs interface to configure THP swapin Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 12/21] swap: Support PMD swap mapping in swapoff Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 13/21] swap: Support PMD swap mapping in madvise_free() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 14/21] swap: Support to move swap account for PMD swap mapping Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 15/21] swap: Support to copy PMD swap mapping when fork() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 18/21] swap: Support PMD swap mapping in mincore() Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 19/21] swap: Support PMD swap mapping in common path Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 20/21] swap: create PMD swap mapping when unmap the THP Huang Ying
2018-09-25  7:13 ` [PATCH -V5 RESEND 21/21] swap: Update help of CONFIG_THP_SWAP Huang Ying
  -- strict thread matches above, loose matches on Subject: below --
2018-09-12  0:43 [PATCH -V5 RESEND 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-09-12  0:43 ` [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927211238.ly3e7cyvfu3rswcv@ca-dmjordan1.us.oracle.com \
    --to=daniel.m.jordan@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox