On 19 Sep 2023, at 8:37, Zi Yan wrote: > On 19 Sep 2023, at 2:49, Johannes Weiner wrote: > >> On Mon, Sep 18, 2023 at 10:40:37AM -0700, Mike Kravetz wrote: >>> On 09/18/23 10:52, Johannes Weiner wrote: >>>> On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote: >>>>> On 9/16/23 21:57, Mike Kravetz wrote: >>>>>> On 09/15/23 10:16, Johannes Weiner wrote: >>>>>>> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote: >>>>>> >>>>>> With the patch below applied, a slightly different workload triggers the >>>>>> following warnings. It seems related, and appears to go away when >>>>>> reverting the series. >>>>>> >>>>>> [ 331.595382] ------------[ cut here ]------------ >>>>>> [ 331.596665] page type is 5, passed migratetype is 1 (nr=512) >>>>>> [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200 >>>>> >>>>> Initially I thought this demonstrates the possible race I was suggesting in >>>>> reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we >>>>> are trying to get a MOVABLE page from a CMA page block, which is something >>>>> that's normally done and the pageblock stays CMA. So yeah if the warnings >>>>> are to stay, they need to handle this case. Maybe the same can happen with >>>>> HIGHATOMIC blocks? >> >> Ok, the CMA thing gave me pause because Mike's pagetypeinfo didn't >> show any CMA pages. >> >> 5 is actually MIGRATE_ISOLATE - see the double use of 3 for PCPTYPES >> and HIGHATOMIC. >> >>>> This means we have an order-10 page where one half is MOVABLE and the >>>> other is CMA. >> >> This means the scenario is different: >> >> We get a MAX_ORDER page off the MOVABLE freelist. The removal checks >> that the first pageblock is indeed MOVABLE. During the expand, the >> second pageblock turns out to be of type MIGRATE_ISOLATE. >> >> The page allocator wouldn't have merged those types. It triggers a bit >> too fast to be a race condition. >> >> It appears that MIGRATE_ISOLATE is simply set on the tail pageblock >> while the head is on the list, and then stranded there. >> >> Could this be an issue in the page_isolation code? Maybe a range >> rounding error? >> >> Zi Yan, does this ring a bell for you? > > Since isolation code works on pageblocks, a scenario I can think of > is that alloc_contig_range() is given a range starting from that tail > pageblock. > > Hmm, I also notice that move_freepages_block() called by > set_migratetype_isolate() might change isolation range by your change. > I wonder if reverting that behavior would fix the issue. Basically, > do > > if (!zone_spans_pfn(zone, start)) > start = pfn; > > in prep_move_freepages_block(). Just a wild guess. Mike, do you mind > giving it a try? > > Meanwhile, let me try to reproduce it and look into it deeper. > >> >> I don't quite see how my patches could have caused this. But AFAICS we >> also didn't have warnings for this scenario so it could be an old bug. >> >>>> Mike, could you describe the workload that is triggering this? >>> >>> This 'slightly different workload' is actually a slightly different >>> environment. Sorry for mis-speaking! The slight difference is that this >>> environment does not use the 'alloc hugetlb gigantic pages from CMA' >>> (hugetlb_cma) feature that triggered the previous issue. >>> >>> This is still on a 16G VM. Kernel command line here is: >>> "BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.6.0-rc1-next-20230913+ >>> root=UUID=49c13301-2555-44dc-847b-caabe1d62bdf ro console=tty0 >>> console=ttyS0,115200 audit=0 selinux=0 transparent_hugepage=always >>> hugetlb_free_vmemmap=on" >>> >>> The workload is just running this script: >>> while true; do >>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote >>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >>> done >>> >>>> >>>> Does this reproduce instantly and reliably? >>>> >>> >>> It is not 'instant' but will reproduce fairly reliably within a minute >>> or so. >>> >>> Note that the 'echo 4 > .../hugepages-1048576kB/nr_hugepages' is going >>> to end up calling alloc_contig_pages -> alloc_contig_range. Those pages >>> will eventually be freed via __free_pages(folio, 9). >> >> No luck reproducing this yet, but I have a question. In that crash >> stack trace, the expand() is called via this: I cannot reproduce it locally either. Do you mind sharing your config file? Thanks. -- Best Regards, Yan, Zi