linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Richard Yao <ryao@gentoo.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, mthode@mthode.org,
	kernel@gentoo.org, Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Glauber Costa <glommer@openvz.org>,
	Rik van Riel <riel@redhat.com>,
	Vladimir Davydov <vdavydov@parallels.com>,
	Dave Chinner <dchinner@redhat.com>,
	open@kvack.org, list@kvack.org,
	MEMORY MANAGEMENT <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim
Date: Fri, 18 Jul 2014 14:51:43 -0400	[thread overview]
Message-ID: <53C96CBF.4040705@gentoo.org> (raw)
In-Reply-To: <20140718163843.GK29639@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

On 07/18/2014 12:38 PM, Johannes Weiner wrote:
> I don't really understand how the scenario you describe can happen.
> 
> Successfully reclaiming a page means that __remove_mapping() was able
> to freeze a page count of 2 (page cache and LRU isolation), but
> filemap_fault() increases the refcount on the page before trying to
> lock the page.  If __remove_mapping() wins, find_get_page() does not
> work and the fault does not lock the page.  If find_get_page() wins,
> __remove_mapping() does not work and the reclaimer aborts and does a
> regular unlock_page().
> 
> page_check_references() is purely about reclaim strategy, it should
> not be essential for correctness.
> 

You are right that something else is happened here. I had not spotted
the cmpxchg being done in __remove_mapping(). If I spot something that
looks like it could be what went wrong doing this, I will propose a new
fix to the list for review. Thanks for your time.

P.S. The system had ECC RAM, so this was not a bit flip. My current
method for debugging this involves using cscope to construct possible
call paths under a couple of assumptions:

1. Something set PG_locked without calling unlock_page().
2. The only ways of doing #1 that I see in the code are calling
__clear_page_locked() or failing to clear the bit. I do not believe that
a patch was accepted that did the latter, so I assume the former.

I have root access to the system, so each time I do a lookup using
cscope, I go through the list to logically eliminate possibilities by
inspecting the system where the problem occurred. When I cannot
eliminate a possibility, I recurse. This is prone to fail positives
should I miss a subtle piece of code that prevents a problem and it is
very tedious, but I do not see a better way of debugging based on what I
have at my disposal. If anyone has any suggestions, I would appreciate them.

P.P.S. I *really* wish that I had used kdump when this issue happened,
but sadly, the system is not setup for kdump.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

  reply	other threads:[~2014-07-18 18:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18 15:48 Richard Yao
2014-07-18 16:38 ` Johannes Weiner
2014-07-18 18:51   ` Richard Yao [this message]
2014-07-21  7:18     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C96CBF.4040705@gentoo.org \
    --to=ryao@gentoo.org \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=glommer@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@gentoo.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=list@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=mthode@mthode.org \
    --cc=open@kvack.org \
    --cc=riel@redhat.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox