linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: <kkabe@vega.pgw.jp>
To: bhe@redhat.com
Cc: bugzilla-daemon@bugzilla.kernel.org, akpm@linux-foundation.org,
	richardw.yang@linux.intel.com, david@redhat.com,
	mhocko@kernel.org, n-horiguchi@ah.jp.nec.com, linux-mm@kvack.org,
	kkabe@vega.pgw.jp
Subject: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add
Date: Mon, 17 Feb 2020 14:46:27 +0900	[thread overview]
Message-ID: <200217144627.M0113305@vega.pgw.jp> (raw)
In-Reply-To: Your message of "Wed, 12 Feb 2020 15:31:23 +0800". <20200212073123.GG8965@MiWiFi-R3L-srv>

bhe@redhat.com sed in <20200212073123.GG8965@MiWiFi-R3L-srv>

>> On 02/11/20 at 04:41pm, Andrew Morton wrote:
>> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > 
>> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote:
>> > > >On 02/10/20 at 02:09pm, Baoquan He wrote:
>> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote:
>> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He <bhe@redhat.com> wrote:
>> > > >> > 
>> > > >> > > Hi Andrew,
>> > > >> > > 
>> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote:
>> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
>> > > >> > > > 
>> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206401
>> > > >> > > > > 
>> > > >> > > > 
>> > > >> > > > An oops during mem hotadd.  Could someone please take a look when
>> > > >> > > > convenient?
>> > > >> > > 
>> > > >> > > This has been addressed by Wei Yang's patch, please check it here:
>> > > >> > > 
>> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com
>> > > >> > > 
>> > > >> > 
>> > > >> > hm, OK, thanks.  It's unfortunate that a 5.5 fix is buried in a
>> > > >> > six-patch series which is still in progress!  Can we please merge that
>> > > >> > as a standalone fix with a cc:stable, Fixes:, etc?
>> > > >
>> > > >Maybe can add Fixes tag as follow when merge:
>> > > >
>> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> > > >
>> > 
>> > The reporter (cc'ed here) is still seeing issues:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=206401
>> > 
>> > Could we please continue this investigation via emailed reply-to-all,
>> > rather than via the bugzilla interface?
>> 
>> Yes, people prefer mailing list to discuss issues.


I found perplexing behavior in populate_section_memmap().

populate_section_memmap() calls alloc_pages(), and if that fails,
falls back to vmalloc().

But according to the trace, populate_section_memmap() seems to
throw out the alloc_pages() result and always falls back to vmalloc(),
which could be a wrong area to use.

I sprinkled pr_info() in mm/sparse.c:populate_section_memmap() as below:

===========================================
struct page * __meminit populate_section_memmap(unsigned long pfn,
                unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
{
        struct page *page, *ret;
        unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;

        page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
        if (page) {
                goto got_map_page;
        }
pr_info("%s: alloc_pages() returned 0x%p (should be 0), reverting to vmalloc(memmap_size=%lu)\n", __func__, page, memmap_size);
BUG_ON(page != 0);

        ret = vmalloc(memmap_size);
pr_info("%s: vmalloc(%lu) returned 0x%p\n", __func__, memmap_size, ret);
        if (ret) {
                goto got_map_ptr;
        }

        return NULL;
got_map_page:
        ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
pr_info("%s: allocated struct page *page=0x%p\n", __func__, page);
got_map_ptr:

pr_info("%s: returning struct page * =0x%p\n", __func__, ret);
        return ret;
}
==================================================

and got a following panic.
It even ignores BUG_ON() (perhaps optimized out).

Is this worth investigating?
Disassembly doesn't reveal anything suspicious, but I have feeling that
I'm looking at disassembly different than that the CPU is seeing.
It's too trivial to be a compiler bug.


==================================================
[root@localhost ~]# readelf -l /proc/kcore

Elf file type is CORE (Core file)
Entry point 0x0
There are 3 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x000094 0x00000000 0x00000000 0x01304 0x00000     0
  LOAD           0xaff2000 0xcaff0000 0xffffffff 0x3400e000 0x3400e000 RWE 0x1000
  LOAD           0x002000 0xc0000000 0x00000000 0xa7f0000 0xa7f0000 RWE 0x1000


[  302.784196] hv_balloon: Max. dynamic memory size: 1048576 MB
[  643.475080] hv_balloon: hv_mem_hot_add: calling add_memory(nid=0, ((start_pfn=0x10000) << PAGE_SHIFT)=0x10000000, (HA_CHUNK << PAGE_SHIFT)=134217728)
[  643.513804] populate_section_memmap: alloc_pages() returned 0xb1a7c4b2 (should be 0), reverting to vmalloc(memmap_size=655360)
[  643.513849] populate_section_memmap: vmalloc(655360) returned 0x11b0e715
[  643.513872] populate_section_memmap: returning struct page * =0x11b0e715
[  643.525352] populate_section_memmap: alloc_pages() returned 0xb1a7c4b2 (should be 0), reverting to vmalloc(memmap_size=655360)
[  643.536698] populate_section_memmap: vmalloc(655360) returned 0xf2ba6510
[  643.536722] populate_section_memmap: returning struct page * =0xf2ba6510
[  643.536749] hv_balloon: hv_mem_hot_add: add_memory() returned 0
[  645.394458] BUG: unable to handle page fault for address: d13ff000
[  645.394518] #PF: supervisor write access in kernel mode
[  645.394565] #PF: error_code(0x0002) - not-present page
[  645.394584] *pde = 00000000
[  645.394601] Oops: 0002 [#1] SMP
[  645.394614] CPU: 0 PID: 361 Comm: systemd-udevd Not tainted 5.6.0-rc1.el8.i586 #1
[  645.394636] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[  645.394670] EIP: wp_page_copy+0x8e/0x750
[  645.394690] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
[  645.394739] EAX: d13ff000 EBX: c752df28 ECX: 00000000 EDX: c5e0d000
[  645.394767] ESI: c5e0d000 EDI: d13ff004 EBP: c752deec ESP: c752dea8
[  645.394790] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[  645.394815] CR0: 80050033 CR2: d13ff000 CR3: 08e5a000 CR4: 003406d0
[  645.394840] Call Trace:
[  645.394852]  ? reuse_swap_page+0x83/0x390
[  645.394873]  do_wp_page+0x87/0x6e0
[  645.394885]  handle_mm_fault+0x808/0xe30
[  645.394893]  do_page_fault+0x19f/0x4d0
[  645.394901]  ? do_kern_addr_fault+0x80/0x80
[  645.394915]  common_exception_read_cr2+0x15a/0x15f
[  645.394930] EIP: 0xb7aaf8bb
[  645.394944] Code: 24 0c e3 2c 89 d7 83 e2 03 74 11 7a 04 aa 49 74 1f aa 49 74 1b 83 f2 01 75 02 aa 49 89 ca c1 e9 02 83 e2 03 69 c0 01 01 01 01 <f3> ab 89 d1 f3 aa 8b 44 24 08 5f c3 66 90 66 90 66 90 66 90 90 f3
[  645.394973] EAX: 00000000 EBX: b7f05f60 ECX: 0000000d EDX: 00000000
[  645.394988] ESI: 02194db4 EDI: 02194db4 EBP: b7f05db4 ESP: bffed978
[  645.395003] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210206
[  645.395018] Modules linked in: rfkill intel_rapl_msr intel_rapl_common crc32_pclmul snd_pcm snd_timer snd soundcore intel_rapl_perf sg pcspkr hv_netvsc i2c_piix4 hyperv_fb hv_utils hv_balloon joydev ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod t10_pi ata_generic hyperv_keyboard hid_hyperv hv_storvsc scsi_transport_fc ata_piix crc32c_intel serio_raw hv_vmbus libata
[  645.395101] CR2: 00000000d13ff000
[  645.395121] ---[ end trace 3bb1d66cb8b20841 ]---
[  645.395144] EIP: wp_page_copy+0x8e/0x750
[  645.395157] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
[  645.395206] EAX: d13ff000 EBX: c752df28 ECX: 00000000 EDX: c5e0d000
[  645.395235] ESI: c5e0d000 EDI: d13ff004 EBP: c752deec ESP: c752dea8
[  645.395261] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[  645.395278] CR0: 80050033 CR2: d13ff000 CR3: 08e5a000 CR4: 003406d0
[  645.395308] Kernel panic - not syncing: Fatal exception
[  645.395329] Kernel Offset: 0x3e00000 from 0xc1000000 (relocation range: 0xc0000000-0xcafeffff)
[  645.395354] ---[ end Kernel panic - not syncing: Fatal exception ]---
==================================================


  parent reply	other threads:[~2020-02-17  5:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-206401-27@https.bugzilla.kernel.org/>
     [not found] ` <bug-206401-27-zYD8WfDKqD@https.bugzilla.kernel.org/>
2020-02-10  5:32   ` Andrew Morton
2020-02-10  5:40     ` Baoquan He
2020-02-10  5:56       ` Andrew Morton
2020-02-10  6:09         ` Baoquan He
2020-02-10  6:15           ` Baoquan He
2020-02-10 23:07             ` Wei Yang
2020-02-12  0:41               ` Andrew Morton
2020-02-12  7:31                 ` Baoquan He
2020-02-12  8:21                   ` David Hildenbrand
2020-02-13  4:22                   ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due tomemory hot-add kabe
2020-02-13  8:19                     ` Baoquan He
2020-02-14 14:26                       ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add kkabe
2020-02-14 14:48                         ` Baoquan He
2020-02-14 15:01                           ` Baoquan He
2020-02-17  4:48                         ` Baoquan He
2020-02-17  5:31                           ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemoryhot-add kkabe
2020-02-17  8:00                             ` David Hildenbrand
2020-02-17 10:33                         ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add Michal Hocko
2020-02-17 11:21                           ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add kkabe
2020-02-17  5:46                   ` kkabe [this message]
2020-02-17  7:44                     ` Baoquan He
2020-02-17  9:34                     ` Oscar Salvador
2020-02-17 10:13                       ` Baoquan He
2020-02-17 10:17                         ` Baoquan He
2020-02-17 10:24                         ` David Hildenbrand
2020-02-17 10:33                           ` Baoquan He
2020-02-17 10:38                             ` David Hildenbrand
2020-02-17 11:20                               ` Baoquan He
2020-02-17 12:47                                 ` Michal Hocko
2020-02-18  6:24                                 ` kkabe
2020-02-18  8:47                                   ` Michal Hocko
2020-02-18  9:19                                     ` kkabe
2020-02-18  9:26                                       ` David Hildenbrand
2020-02-18 10:05                                       ` [RFC PATCH] memory_hotplug: disable the functionality for 32b (was: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due to) " Michal Hocko
2020-02-18 10:11                                         ` David Hildenbrand
2020-02-19  3:23                                         ` Baoquan He
2020-02-19 21:46                                         ` Andrew Morton
2020-02-19 23:07                                           ` [RFC PATCH] memory_hotplug: disable the functionality for 32b Robin Murphy
2020-02-19  3:39                                   ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200217144627.M0113305@vega.pgw.jp \
    --to=kkabe@vega.pgw.jp \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=richardw.yang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox