From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
anshuman.khandual@arm.com, Will Deacon <will@kernel.org>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Steve Capper <Steve.Capper@arm.com>
Subject: Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
Date: Tue, 22 Mar 2022 17:56:17 +0000 [thread overview]
Message-ID: <YjoNwe7WvoGozemH@arm.com> (raw)
In-Reply-To: <f7534997-3e24-6c01-7510-becce0b810ac@redhat.com>
Adding Steve as well who wrote the initial hugetlb code for arm64 (and
not trimming the quoted text).
On Mon, Mar 21, 2022 at 05:34:18PM +0100, David Hildenbrand wrote:
> On 08.03.22 00:06, Mike Kravetz wrote:
> > On 2/28/22 16:26, Mike Kravetz wrote:
> >> On 2/28/22 07:39, David Hildenbrand wrote:
> >>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
> >>>
> >>> [ 124.770288] ------------[ cut here ]------------
> >>> [ 124.774899] kernel BUG at mm/mmu_gather.c:70!
> >>> [ 124.779244] Internal error: Oops - BUG: 0 [#1] SMP
> >>> [ 124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
> >>> [ 124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
> >>> [ 124.830428] Hardware name: Lenovo HR350A 7X35CTO1WW /HR350A , BIOS hve104r-1.15 02/26/2021
> >>> [ 124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>> [ 124.847189] pc : __tlb_remove_page_size+0x88/0xe4
> >>> [ 124.851885] lr : __unmap_hugepage_range+0x260/0x504
> >>> [ 124.856751] sp : ffff80000f6f3ae0
> >>> [ 124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
> >>> [ 124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
> >>> [ 124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
> >>> [ 124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
> >>> [ 124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >>> [ 124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
> >>> [ 124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
> >>> [ 124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
> >>> [ 124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
> >>> [ 124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
> >>> [ 124.931284] Call trace:
> >>> [ 124.933718] __tlb_remove_page_size+0x88/0xe4
> >>> [ 124.938062] __unmap_hugepage_range+0x260/0x504
> >>> [ 124.942580] __unmap_hugepage_range_final+0x24/0x40
> >>> [ 124.947445] unmap_single_vma+0x100/0x11c
> >>> [ 124.951443] unmap_vmas+0x7c/0xf4
> >>> [ 124.954746] unmap_region+0xa4/0xf0
> >>> [ 124.958222] __do_munmap+0x1b8/0x50c
> >>> [ 124.961785] __vm_munmap+0x74/0x120
> >>> [ 124.965261] __arm64_sys_munmap+0x40/0x54
> >>> [ 124.969257] invoke_syscall+0x50/0x120
> >>> [ 124.972995] el0_svc_common.constprop.0+0x4c/0x100
> >>> [ 124.977774] do_el0_svc+0x34/0xa0
> >>> [ 124.981077] el0_svc+0x30/0xd0
> >>> [ 124.984120] el0t_64_sync_handler+0xa4/0x130
> >>> [ 124.988377] el0t_64_sync+0x1a4/0x1a8
> >>> [ 124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000)
> >>> [ 124.998109] ---[ end trace a74a76b89c9f2d88 ]---
> >>> [ 125.002900] ------------[ cut here ]------------
> >>>
> >>>
> >>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
> >>>
> >>> #define _GNU_SOURCE
> >>> #include <string.h>
> >>> #include <unistd.h>
> >>> #include <sys/mman.h>
> >>> #include <linux/memfd.h>
> >>>
> >>> void main(void)
> >>> {
> >>> const size_t size = 64*1024;
> >>> unsigned long cur;
> >>> char *area;
> >>> int fd;
> >>>
> >>> fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
> >>> ftruncate(fd, size);
> >>> area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> >>>
> >>> memset(area, 0, size);
> >>>
> >>> munmap(area, size);
> >>> }
> >>>
> >>>
> >>>
> >>> I assume __unmap_hugepage_range() does a
> >>>
> >>> a) tlb_remove_huge_tlb_entry()
> >>>
> >>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
> >>>
> >>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
> >>>
> >>> b) tlb_remove_page_size()
> >>>
> >>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
> >>>
> >>>
> >>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
> >>> some tlb range adjustment missing.
> >>>
> >>
> >> To me, it looks like we are missing range adjustment in the case where
> >> hugetlb page size != PMD_SIZE and != PUD_SIZE. Not sure how those ranges
> >> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
> >> much a NOP in this case on aarch64.
> >>
> >> Cc'ing Will and Peter as they most recently changed this code. Commit
> >> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
> >> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
> >> That might have taken care of range adjustments in earlier versions of
> >> the code. Not exactly sure what is needed now.
> >
> > I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
> > unconditional call to __tlb_adjust_range(). However, I need some assistance
> > on the proper solution.
> >
> > Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
> > the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
> > However, one outcome of 2631ed00b049 is that cleared_p* is set if
> > __tlb_adjust_range is ever called.
> >
> > It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
> > on arm64, and tlb_flush_pmd_range() should be called for CONT PMD. But, this
> > would require an arch specific version of tlb_remove_huge_tlb_entry.
> >
> > FYI - This same issue should exist on other architectures that support
> > hugetlb page sizes != PMD_SIZE and != PUD_SIZE.
> >
> > Suggestions on how to proceed?
>
> Unfortunately, I have absolutely no clue what would be the right thing
> to do. Any aarch64 CONT experts?
At a quick look, we wouldn't have a problem with missing TLB flushing
since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
why it needs this though, Steve added it in commit d8bdcff28764. I think
we can defer this flushing to tlb_remove_page_size().
As Mike noted, tlb_remove_huge_tlb_entry() could call
tlb_flush_pte_range() and I think this would work even when dealing with
a CONT PMD case (just more flushing at page granularity). I need to
check the range TLBI ops in the arm64 __flush_tlb_range() but if they
are fine, we can do this as a quick fix.
A better solution is probably to allow arch-specific
tlb_remove_huge_tlb_entry() that understands whether it's a contiguous
pmd or pte and sets the clear_p* accordingly.
--
Catalin
next prev parent reply other threads:[~2022-03-22 17:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-28 15:39 David Hildenbrand
2022-03-01 0:26 ` Mike Kravetz
2022-03-07 23:06 ` Mike Kravetz
2022-03-21 16:34 ` David Hildenbrand
2022-03-22 17:56 ` Catalin Marinas [this message]
2022-03-23 11:51 ` Steve Capper
2022-03-23 16:21 ` Catalin Marinas
2022-03-23 16:34 ` Steve Capper
2022-03-23 17:21 ` Mike Kravetz
2022-05-06 12:49 ` Will Deacon
2022-05-09 11:19 ` Anshuman Khandual
2022-05-09 12:11 ` Catalin Marinas
2022-05-11 3:42 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjoNwe7WvoGozemH@arm.com \
--to=catalin.marinas@arm.com \
--cc=Steve.Capper@arm.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anshuman.khandual@arm.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox