Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andy Lutomirski <luto@kernel.org>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
Date: Thu, 1 Jun 2017 11:38:38 -0700	[thread overview]
Message-ID: <CALCETrXsDj5wEq9womFRA0JzijmPr05vNc5gqVQK6-RrK+kPzQ@mail.gmail.com> (raw)
In-Reply-To: <20170601151107.GC3961@redhat.com>

On Thu, Jun 1, 2017 at 8:11 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 07:38:25AM -0700, Andy Lutomirski wrote:
>> On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@redhat.com> wrote:
>> > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
>> >> On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@redhat.com> wrote:
>> >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
>> >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
>> >> > sync_global_pgds() is wrong.
>> >> >
>> >> > This patch remove the BUG_ON() and unconditionaly update stall pgd
>> >> > entries.
>> >> >
>> >> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>> >> > Cc: Ingo Molnar <mingo@kernel.org>
>> >> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >> > ---
>> >> >  arch/x86/mm/init_64.c | 7 +------
>> >> >  1 file changed, 1 insertion(+), 6 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> >> > index ff95fe8..36b9020 100644
>> >> > --- a/arch/x86/mm/init_64.c
>> >> > +++ b/arch/x86/mm/init_64.c
>> >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>> >> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
>> >> >                         spin_lock(pgt_lock);
>> >> >
>> >> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
>> >> > -                               BUG_ON(p4d_page_vaddr(*p4d)
>> >> > -                                      != p4d_page_vaddr(*p4d_ref));
>> >> > -
>> >> > -                       if (p4d_none(*p4d))
>> >> > -                               set_p4d(p4d, *p4d_ref);
>> >> > +                       set_p4d(p4d, *p4d_ref);
>> >>
>> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
>> >> screw up and we'll end up using incorrect page tables.
>> >>
>> >> What's causing the mismatch?  If you're hitting this BUG in practice,
>> >> I suspect we have a bug elsewhere.
>> >
>> > No bug elsewhere, simply hotplug memory then hotremove same memory you
>> > just hotplugged then hotplug it again and you will trigger this as on
>> > the first hotplug we allocate p4d/pud for the struct pages area, then on
>> > hot remove we free that memory and clear the p4d/pud in the mm_init pgd
>> > but not in any of the other pgds.
>>
>> That sounds like a bug to me.  Either we should remove the stale
>> entries and fix all the attendant races, or we should unconditionally
>> allocate second-highest-level kernel page tables in unremovable memory
>> and never free them.  I prefer the latter even though it's slightly
>> slower.
>>
>> > So at that point the next hotplug
>> > will trigger the BUG because of stall entries from the first hotplug.
>>
>> By the time we have a pgd with an entry pointing off into the woods,
>> we've already lost.  Removing the BUG just hides the problem.
>
> Then why did you sign of on af2cf278ef4f9289f88504c3e03cb12f76027575
> this is what introduced this change in behavior.
>
> If i understand you correctly you want to avoid deallocating p4d/pud
> directory page when hotremove happen ? But this happen in common non
> arch specific code vmemmap_populate_basepages() thought we can make
> x86 vmemmap_populate() code arch different thought.
>
> So i am not sure how to proceed here. My first attempt was to undo
> af2cf278ef4f9289f88504c3e03cb12f76027575 so that we keep all pgds
> synchronize. No if we want special case p4d/pud allocation that's
> a different approach all together.
>

The intent of that patch was to leave the pud table allocated and to
leave the pgd entry in place.  The code appears to do that.

I'm not terribly familiar with memory hotplug.  Where's the
problematic code?  I suspect there's a fairly simple bug somewhere
that needs fixing.  kernel_physical_mapping_init() looks right,
though.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2017-06-01 18:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 15:03 Jérôme Glisse
2017-06-01  9:57 ` Kirill A. Shutemov
2017-06-01 13:59 ` Andy Lutomirski
2017-06-01 14:33   ` Jerome Glisse
2017-06-01 14:38     ` Andy Lutomirski
2017-06-01 15:11       ` Jerome Glisse
2017-06-01 18:38         ` Andy Lutomirski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrXsDj5wEq9womFRA0JzijmPr05vNc5gqVQK6-RrK+kPzQ@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox