From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()
Date: Thu, 27 Sep 2018 14:12:39 -0700 [thread overview]
Message-ID: <20180927211238.ly3e7cyvfu3rswcv@ca-dmjordan1.us.oracle.com> (raw)
In-Reply-To: <87r2hfhger.fsf@yhuang-dev.intel.com>
On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> >> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >> >> /*
> >> >> * Increase reference count of swap entry by 1.
> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> >> >> - * but could not be atomically allocated. Returns 0, just as if it succeeded,
> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> >> >> - * might occur if a page table entry has got corrupted.
> >> >> + *
> >> >> + * Return error code in following case.
> >> >> + * - success -> 0
> >> >> + * - swap_count_continuation is required but could not be atomically allocated.
> >> >> + * *entry is used to return swap entry to call add_swap_count_continuation().
> >> >> + * -> ENOMEM
> >> >> + * - otherwise same as __swap_duplicate()
> >> >> */
> >> >> -int swap_duplicate(swp_entry_t entry)
> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >> >> {
> >> >> int err = 0;
> >> >>
> >> >> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> >> - err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> >> + while (!err &&
> >> >> + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> >> >> + err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >> >> return err;
> >> >
> >> > Now we're returning any error we get from __swap_duplicate, apparently to
> >> > accommodate ENOTDIR later in the series, which is a change from the behavior
> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations"). This might
> >> > belong in a separate patch given its potential for side effects.
> >>
> >> I have checked all the calls of the function and found there will be no
> >> bad effect. Do you have any side effect?
> >
> > Before I was just being vaguely concerned about any unintended side effects,
> > but looking again, yes I do.
> >
> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
> > a (potentially nonzero) entry.val, which copy_pte_range interprets
> > unconditionally as 'try adding a swap count continuation.' Not what we want
> > for returns other than -ENOMEM.
>
> Thanks for pointing this out! Before the change in the patchset, the
> behavior is,
>
> Something wrong is detected in swap_duplicate(), but the error is
> ignored. Then copy_one_pte() will think everything is OK, so that it
> can proceed to call set_pte_at(). The system will be in inconsistent
> state and some data may be polluted!
Yes, the part about page table corruption in the comment above swap_duplicate.
> But this doesn't cause any problem in practical. Per my understanding,
> because if other part of the kernel works correctly, it's impossible for
> swap_duplicate() return any error except -ENOMEM before the change in
> this patchset.
I agree with that, but it's not what I'm trying to explain. I didn't go into
enough detail, let me try again. Hopefully I'm understanding this right.
While running with these patches, say we're at
copy_pte_range
copy_one_pte
swap_duplicate
__swap_duplicate
__swap_duplicate_locked
And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
-EEXIST. That means __swap_duplicate and swap_duplicate also return -EEXIST.
copy_one_pte returns entry.val, which can be and usually is nonzero, so we
break out of the loop in copy_pte_range and then--erroneously--call
add_swap_count_continuation.
The add_swap_count_continuation call was added in 570a335b8e22 and relies on
the assumption that callers can only get -ENOMEM from swap_duplicate. This
patch changes that assumption.
Not a big deal: the continuation call just returns early, no harm done, but it
allocs and frees a page needlessly, so we should fix it. One way is to change
copy_one_pte's return to int so we can just pass the error code back to
copy_pte_range so it knows whether to try adding the continuation.
The other swap_duplicate caller, try_to_unmap_one, seems ok.
> But the error may be possible during development, and it
> may serve as some kind of document too. So I suggest to add
>
> VM_BUG_ON(error != -ENOMEM);
>
> in swap_duplicate(). What do you think about that?
That doesn't seem necessary.
> > So it might make sense to have a separate patch that changes swap_duplicate's
> > return and makes callers handle it.
>
> Thanks for your help to take a deep look at this. I want to try to fix
> all potential problems firstly, because the number of the caller is
> quite limited. Do you agree?
Yes, makes sense to me.
Daniel
next prev parent reply other threads:[~2018-09-27 21:12 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-25 7:13 [PATCH -V5 RESEND 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 02/21] swap: Add __swap_duplicate_locked() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying
2018-09-25 19:19 ` Daniel Jordan
2018-09-26 12:55 ` Huang, Ying
2018-09-26 14:51 ` Daniel Jordan
2018-09-27 1:34 ` Huang, Ying
2018-09-27 21:12 ` Daniel Jordan [this message]
2018-09-28 8:19 ` Huang, Ying
2018-09-28 21:32 ` Daniel Jordan
2018-09-29 0:50 ` Huang, Ying
2018-10-01 17:21 ` Daniel Jordan
2018-09-25 7:13 ` [PATCH -V5 RESEND 04/21] swap: Support PMD swap mapping in put_swap_page() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 07/21] swap: Support PMD swap mapping in split_swap_cluster() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 09/21] swap: Swapin a THP in one piece Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 10/21] swap: Support to count THP swapin and its fallback Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 11/21] swap: Add sysfs interface to configure THP swapin Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 12/21] swap: Support PMD swap mapping in swapoff Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 13/21] swap: Support PMD swap mapping in madvise_free() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 14/21] swap: Support to move swap account for PMD swap mapping Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 15/21] swap: Support to copy PMD swap mapping when fork() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 18/21] swap: Support PMD swap mapping in mincore() Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 19/21] swap: Support PMD swap mapping in common path Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 20/21] swap: create PMD swap mapping when unmap the THP Huang Ying
2018-09-25 7:13 ` [PATCH -V5 RESEND 21/21] swap: Update help of CONFIG_THP_SWAP Huang Ying
-- strict thread matches above, loose matches on Subject: below --
2018-09-12 0:43 [PATCH -V5 RESEND 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-09-12 0:43 ` [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180927211238.ly3e7cyvfu3rswcv@ca-dmjordan1.us.oracle.com \
--to=daniel.m.jordan@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=ying.huang@intel.com \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox