linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Richard Weinberger <richard@nod.at>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"paul.gortmaker@windriver.com" <paul.gortmaker@windriver.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: swapoff() runs forever
Date: Wed, 11 Apr 2012 23:40:26 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.00.1204112241510.28009@eggly.anvils> (raw)
In-Reply-To: <4F860F17.2090400@nod.at>

On Thu, 12 Apr 2012, Richard Weinberger wrote:
> Am 09.04.2012 20:40, schrieb Hugh Dickins:
> > I've not seen any such issue in recent months (or years), but
> > I've not been using UML either.  The most likely cause that springs
> > to mind would be corruption of the vmalloc'ed swap map: that would
> > be very likely to cause such a hang.
> 
> It does not look like a swap map corruption.
> If I restart most user space processes swapoff() terminates fine.

Right, thanks, that's very useful info.

> Maybe it is a refcounting problem?

You may prove to be correct; but since killing and restarting
processes fixes it up without (I presume) issuing warnings,
it doesn't sound like a refcounting problem to me.

> 
> > You say "recent Linux kernels": I wonder what "recent" means.
> > Is this something you can reproduce quickly and reliably enough
> > to do a bisection upon?
> > 
> 
> I can reproduce the issue on any UML kernel.
> The oldest I've tested was 2.6.20.
> Therefore, bug was not introduced by me. B-)

More useful info, thank you.

I think I've spotted two problems in the UML swp_entry_t handling;
but checking if I'm right, and if they're relevant, and how to fix them,
I'll leave to you - it's years since I tried UML and I remember 0.

One, likely to be your problem.  Take a look at unuse_pte_range() in
mm/swapfile.c, where it searches the page table for the swp_pte it's
trying to "unuse".  And take a look at set_pte() in
arch/um/include/asm/pgtable.h, which appears to add a mysterious
_PAGE_NEWPAGE bit into the page table entry.  And UML doesn't provide
an alternative to generic pte_same() in include/asm-genric/pgtable.h.

My guess is that the _NEWPAGE bit prevents swapoff from matching pte
against swap entry in all or some cases (I didn't look to see if
_NEWPAGE is sometimes cleared later).

Probably a good fix to try would be providing a UML pte_same() which
takes that into account; but I don't know what conditionals it should
contain, and whether it would become too inefficient.  Or, if _NEWPAGE
is always set in a swap pte, then swp_entry_to_pte() needs to set it.

(A word of warning if you're unfamiliar with swap entries: there's the
kernel's internal representation swp_entry_t, which has offset in the
low-order and type in the high-order, for efficient use with radix_tree
- see include/linux/swapops.h; and then there's the arch-dependent
representation as a page table entry, which rearranges the bits so
as not to be confused with a good present page table entry, and
traditionally has type on the lower side of offset.)

The other thing I noticed first, probably not relevant to the bug you're
seeing since I think you'd have mentioned if you had two swapfiles; but
the two or more swapfile case looks very broken to me.  _PAGE_PROTNONE is
0x010 but __swp_type(x) is (((x).val >> 4) & 0x3f): unless I'm confused,
a swap entry of type 1 will look just like a PROT_NONE pte.

Or maybe that's resolved by the _PAGE_NEWPAGE and _PAGE_NEWPROT bits,
I didn't spend time working out what they're up to.

include/linux/swap.h does not allow MAX_SWAPFILES to exceed 32,
so you can easily change __swp_type(x) to use 5 and 0x1f instead
(with 5 instead of 4 in __swp_entry too of course).  Though it doesn't
cause error, I wonder where the 11 in __swp_offset and __swp_entry
comes from: I think you can support larger swap by making it 10.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2012-04-12  6:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-08 20:30 Richard Weinberger
2012-04-09  5:35 ` Konstantin Khlebnikov
2012-04-09 10:16   ` Richard Weinberger
2012-04-09 18:40     ` Hugh Dickins
2012-04-09 19:43       ` Richard Weinberger
2012-04-11 23:09       ` Richard Weinberger
2012-04-12  6:40         ` Hugh Dickins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.00.1204112241510.28009@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox