From: Hailong Liu <hailong.liu@oppo.com>
To: Baoquan He <bhe@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Tangquan Zheng <zhengtangquan@oppo.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect __vmap_pages_range_noflush() if vm_area_alloc_pages() from high order fallback to order0
Date: Fri, 26 Jul 2024 13:03:56 +0800 [thread overview]
Message-ID: <20240726050356.ludmpxfee6erlxxt@oppo.com> (raw)
In-Reply-To: <20240726040052.hs2gvpktrnlbvhsq@oppo.com>
On Fri, 26. Jul 12:00, Hailong Liu wrote:
> On Fri, 26. Jul 10:31, Baoquan He wrote:
> [...]
> > > The logic of this patch is somewhat similar to my first one. If high order
> > > allocation fails, it will go normal mapping.
> > >
> > > However I also save the fallback position. The ones before this position are
> > > used for huge mapping, the ones >= position for normal mapping as Barry said.
> > > "support the combination of PMD and PTE mapping". this will take some
> > > times as it needs to address the corner cases and do some tests.
> >
> > Hmm, we may not need to worry about the imperfect mapping. Currently
> > there are two places setting VM_ALLOW_HUGE_VMAP: __kvmalloc_node_noprof()
> > and vmalloc_huge().
> >
> > For vmalloc_huge(), it's called in below three interfaces which are all
> > invoked during boot. Basically they can succeed to get required contiguous
> > physical memory. I guess that's why Tangquan only spot this issue on kvmalloc
> > invocation when the required size exceeds e.g 2M. For kvmalloc_node(),
> > we have told that in the code comment above __kvmalloc_node_noprof(),
> > it's a best effort behaviour.
> >
> Take a __vmalloc_node_range(2.1M, VM_ALLOW_HUGE_VMAP) as a example.
> because the align requirement of huge. the real size is 4M.
> if allocation first order-9 successfully and the next failed. becuase the
> fallback, the layout out pages would be like order9 - 512 * order0
> order9 support huge mapping, but order0 not.
> with the patch above, would call vmap_small_pages_range_noflush() and do normal
> mapping, the huge mapping would not exist.
>
> > mm/mm_init.c <<alloc_large_system_hash>>
> > table = vmalloc_huge(size, gfp_flags);
> > net/ipv4/inet_hashtables.c <<inet_pernet_hashinfo_alloc>>
> > new_hashinfo->ehash = vmalloc_huge(ehash_entries * sizeof(struct inet_ehash_bucket),
> > net/ipv4/udp.c <<udp_pernet_table_alloc>>
> > udptable->hash = vmalloc_huge(hash_entries * 2 * sizeof(struct udp_hslot)
> >
> > Maybe we should add code comment or document to notice people that the
> > contiguous physical pages are not guaranteed for vmalloc_huge() if you
> > use it after boot.
> >
> > >
> > > IMO, the draft can fix the current issue, it also does not have significant side
> > > effects. Barry, what do you think about this patch? If you think it's okay,
> > > I will split this patch into two: one to remove the VM_ALLOW_HUGE_VMAP and the
> > > other to address the current mapping issue.
> > >
> > > --
> > > help you, help me,
> > > Hailong.
> > >
> >
> >
I check the code, the issue only happen in gfp_mask with __GFP_NOFAIL and
fallback to order 0, actuaally without this commit
e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
if __vmalloc_area_node allocation failed, it will goto fail and try order-0.
fail:
if (shift > PAGE_SHIFT) {
shift = PAGE_SHIFT;
align = real_align;
size = real_size;
goto again;
}
So do we really need fallback to order-0 if nofail?
>
> --
> help you, help me,
> Hailong.
>
--
help you, help me,
Hailong.
next prev parent reply other threads:[~2024-07-26 5:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-25 3:53 hailong.liu
2024-07-25 6:21 ` Barry Song
2024-07-25 9:17 ` Hailong Liu
2024-07-25 9:34 ` Barry Song
2024-07-25 9:58 ` Hailong Liu
2024-07-25 10:22 ` Barry Song
2024-07-25 11:39 ` Baoquan He
2024-07-25 16:40 ` Hailong Liu
2024-07-26 1:31 ` Barry Song
2024-07-26 2:31 ` Baoquan He
2024-07-26 3:53 ` Barry Song
2024-07-29 1:48 ` Baoquan He
2024-07-30 3:24 ` Hailong Liu
2024-07-30 4:29 ` Baoquan He
2024-07-26 4:00 ` Hailong Liu
2024-07-26 5:03 ` Hailong Liu [this message]
2024-07-26 5:29 ` Barry Song
2024-07-26 8:37 ` Baoquan He
2024-07-26 8:48 ` Hailong Liu
2024-07-26 9:00 ` Hailong Liu
2024-07-26 9:29 ` Baoquan He
2024-07-26 9:15 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240726050356.ludmpxfee6erlxxt@oppo.com \
--to=hailong.liu@oppo.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=mhocko@suse.com \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zhengtangquan@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox