From: Hugh Dickins <hughd@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Hillf Danton <dhillf@gmail.com>, Linux-MM <linux-mm@kvack.org>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: unused swap offset / bad page map.
Date: Mon, 26 Aug 2013 15:08:45 -0700 (PDT) [thread overview]
Message-ID: <alpine.LNX.2.00.1308261448490.4982@eggly.anvils> (raw)
In-Reply-To: <CA+55aFwQbJbR3xij1+iGbvj3EQggF9NLGAfDbmA54FkKz9xfew@mail.gmail.com>
On Mon, 26 Aug 2013, Linus Torvalds wrote:
> On Mon, Aug 26, 2013 at 1:15 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I'm almost likely to think that we are more likely to have
> > something wrong in the messy magical special cases.
>
> Of course, the good news would be if it actually ends up being the
> soft-dirty stuff, and bisection hits something recent.
I suspect so.
>
> So maybe I'm overly pessimistic. That messy swap_map[] code really
> _is_ messy, but at the same time it should also be pretty well-tested.
> I don't think it's been touched in years.
Blame me for the byte-instead-of-short continuation stuff.
But it's never yet shown any problem (okay, perhaps that's
because it's so rare to need any continuation anyway).
>
> That said, google does find "swap_free: Unused swap offset entry"
> reports from over the years. Most of them seem to be single-bit
> errors, though (ie when the entry is 00000100 or similar I'm more
> inclined to blame a bit error
Yes, historically they have usually represented either single-bit
errors, or corruption of page tables by other kernel data. The
swap subsystem discovers it, but it's rarely an error of swap.
So I don't care for Dave's suggestion much earlier in this thread,
that swapoff should fail with -EINVAL if there has been a bad page
taint: that doesn't necessarily interfere with swapoff at all.
And besides, swapoff is killable: yes, if counts go wrong, it
can cycle around endlessly, but it checks for signal_pending()
each time around the loop.
> - in contrast your values look like "real" swap entries).
Indeed they do.
I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
a line in mremap which worries me. That set_pte_at() is operating
on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
prone to corrupt a swap entry.
I've not tried matching up bits with Dave's reports, and just going
into a meeting now, but this patch looks worth a try: probably Cyrill
can improve it meanwhile to what he actually wants there (I'm
surprised anything special is needed for just moving a pte).
Hugh
--- 3.11-rc7/mm/mremap.c 2013-07-14 17:10:16.640003652 -0700
+++ linux/mm/mremap.c 2013-08-26 14:46:14.460027627 -0700
@@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
continue;
pte = ptep_get_and_clear(mm, old_addr, old_pte);
pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
- set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
+ set_pte_at(mm, new_addr, new_pte, pte);
}
arch_leave_lazy_mmu_mode();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-08-26 22:09 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-07 5:51 Dave Jones
2013-08-07 10:04 ` Hillf Danton
2013-08-07 15:30 ` Dave Jones
2013-08-08 15:20 ` Hillf Danton
2013-08-08 15:36 ` Dave Jones
2013-08-19 23:18 ` Dave Jones
2013-08-20 4:39 ` Hillf Danton
2013-08-21 20:49 ` Dave Jones
2013-08-22 0:35 ` Hillf Danton
2013-08-22 3:21 ` Hillf Danton
2013-08-23 3:21 ` Dave Jones
2013-08-23 3:27 ` Hillf Danton
2013-08-23 3:53 ` Dave Jones
2013-08-26 3:45 ` Hillf Danton
2013-08-26 19:08 ` Dave Jones
2013-08-26 20:15 ` Linus Torvalds
2013-08-26 20:46 ` Linus Torvalds
2013-08-26 22:08 ` Hugh Dickins [this message]
2013-08-26 22:28 ` Dave Jones
2013-08-27 8:37 ` Cyrill Gorcunov
2013-08-27 16:24 ` Dave Jones
2013-08-27 16:32 ` Cyrill Gorcunov
2013-08-26 23:15 ` Linus Torvalds
2013-08-27 5:44 ` Cyrill Gorcunov
2013-08-26 20:18 ` Cyrill Gorcunov
2013-08-26 20:37 ` Dave Jones
2013-08-26 20:42 ` Cyrill Gorcunov
2013-08-26 21:37 ` Cyrill Gorcunov
2013-08-26 21:42 ` Dave Jones
2013-08-26 21:49 ` Cyrill Gorcunov
2013-08-26 21:59 ` Dave Jones
2013-08-07 15:54 ` Dave Jones
2013-08-23 9:08 Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1308261448490.4982@eggly.anvils \
--to=hughd@google.com \
--cc=davej@redhat.com \
--cc=dhillf@gmail.com \
--cc=gorcunov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox