From: Mike Kravetz <mike.kravetz@oracle.com>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
"Andrew Morton" <akpm@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Muchun Song <songmuchun@bytedance.com>,
Joao Martins <joao.m.martins@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH v2] hugetlb: freeze allocated pages before creating hugetlb pages
Date: Tue, 20 Sep 2022 20:17:31 -0700 [thread overview]
Message-ID: <YyqCS6+OXAgoqI8T@monkey> (raw)
In-Reply-To: <20220921014810.GA2053328@hori.linux.bs1.fc.nec.co.jp>
On 09/21/22 01:48, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Sep 16, 2022 at 02:46:38PM -0700, Mike Kravetz wrote:
> > v1 -> v2
> > - Fixed up head page in error path of __prep_compound_gigantic_page as
> > discovered by Miaohe Lin.
> > - Updated link to Matthew's Allocate and free frozen pages series.
> > - Rebased on next-20220916
> >
> > mm/hugetlb.c | 102 +++++++++++++++++++--------------------------------
> > 1 file changed, 38 insertions(+), 64 deletions(-)
>
> Hello Mike,
>
> I accidentally found a NULL pointer dereference when testing the latest
> mm-unstable, which seems to be caused (or exposed?) by this patch
> (I confirmed that it disappeared by reverting this patch).
>
> It's reproduced by doing like `sysctl vm.nr_hugepages=1000000` to allocate
> hugepages as much as possible.
>
> Could you check that this patch is related to the issue?
>
> Thanks,
> Naoya Horiguchi
>
> ---
> [ 25.634476] BUG: kernel NULL pointer dereference, address: 0000000000000034
> [ 25.635980] #PF: supervisor write access in kernel mode
> [ 25.637283] #PF: error_code(0x0002) - not-present page
> [ 25.638365] PGD 0 P4D 0
> [ 25.638906] Oops: 0002 [#1] PREEMPT SMP PTI
> [ 25.639779] CPU: 4 PID: 819 Comm: sysctl Tainted: G E N 6.0.0-rc3-v6.0-rc1-220920-1758-1398-g2b3f5+ #12
> [ 25.641928] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [ 25.643727] RIP: 0010:alloc_buddy_huge_page.isra.0+0x8c/0x140
> [ 25.645071] Code: fe ff 41 83 fc 01 0f 84 54 94 8b 00 41 bc 01 00 00 00 44 89 f7 4c 89 f9 44 89 ea 89 de e8 7c b9 fe ff 48 89 c7 b8 01 00 00 00 <f0> 0f b1 6f 34 66 90 83 f8 01 75 c5 48 85 ff 74 52 65 48 ff 05 03
> [ 25.649006] RSP: 0018:ffffaa7181fffc18 EFLAGS: 00010286
> [ 25.650215] RAX: 0000000000000001 RBX: 0000000000000009 RCX: 0000000000000009
> [ 25.651672] RDX: ffffffffae3b6df0 RSI: ffffffffae8f7ce0 RDI: 0000000000000000
> [ 25.653115] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000c01
> [ 25.654579] R10: 0000000000000f90 R11: 0000000000000000 R12: 0000000000000002
> [ 25.656176] R13: 0000000000000000 R14: 0000000000346cca R15: ffffffffae8f7ce0
> [ 25.657637] FS: 00007f9252f2a740(0000) GS:ffff98cebbc00000(0000) knlGS:0000000000000000
> [ 25.659292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 25.660469] CR2: 0000000000000034 CR3: 000000014924c004 CR4: 0000000000170ee0
> [ 25.661928] Call Trace:
> [ 25.662469] <TASK>
> [ 25.662927] alloc_fresh_huge_page+0x16f/0x1d0
> [ 25.663859] alloc_pool_huge_page+0x6d/0xb0
> [ 25.664734] __nr_hugepages_store_common+0x189/0x3e0
> [ 25.665764] ? __do_proc_doulongvec_minmax+0x31f/0x340
> [ 25.666832] hugetlb_sysctl_handler_common+0xbf/0xd0
> [ 25.667861] ? hugetlb_register_node+0xe0/0xe0
> [ 25.668786] proc_sys_call_handler+0x196/0x2b0
> [ 25.669724] vfs_write+0x29b/0x3a0
> [ 25.670454] ksys_write+0x4f/0xd0
> [ 25.671153] do_syscall_64+0x3b/0x90
> [ 25.671909] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 25.672958] RIP: 0033:0x7f9252d3e727
> [ 25.673712] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [ 25.677470] RSP: 002b:00007ffcdf9904a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 25.679002] RAX: ffffffffffffffda RBX: 000055c6ae683210 RCX: 00007f9252d3e727
> [ 25.680456] RDX: 0000000000000006 RSI: 000055c6ae683250 RDI: 0000000000000003
> [ 25.681910] RBP: 000055c6ae685380 R08: 0000000000000003 R09: 0000000000000077
> [ 25.683373] R10: 000000000000006b R11: 0000000000000246 R12: 0000000000000006
> [ 25.684824] R13: 0000000000000006 R14: 0000000000000006 R15: 00007f9252df59e0
> [ 25.686293] </TASK>
Hello Naoya,
My bad for an obvious mistake! Note this change in the patch,
> @@ -1951,7 +1953,21 @@ static struct page *alloc_buddy_huge_page(struct hstate *h,
> gfp_mask |= __GFP_RETRY_MAYFAIL;
> if (nid == NUMA_NO_NODE)
> nid = numa_mem_id();
> +retry:
> page = __alloc_pages(gfp_mask, order, nid, nmask);
> +
> + /* Freeze head page */
> + if (!page_ref_freeze(page, 1)) {
> + __free_pages(page, order);
> + if (retry) { /* retry once */
> + retry = false;
> + goto retry;
> + }
> + /* WOW! twice in a row. */
> + pr_warn("HugeTLB head page unexpected inflated ref count\n");
> + page = NULL;
> + }
> +
> if (page)
> __count_vm_event(HTLB_BUDDY_PGALLOC);
> else
It does not check for a successful return from __alloc_pages before trying to
freeze 'page'. It should obviously check for page == NULL before trying to
freeze with something like this.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index db90658db171..a092dd639687 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1961,7 +1961,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h,
page = __alloc_pages(gfp_mask, order, nid, nmask);
/* Freeze head page */
- if (!page_ref_freeze(page, 1)) {
+ if (page && !page_ref_freeze(page, 1)) {
__free_pages(page, order);
if (retry) { /* retry once */
retry = false;
I will send out a v3 (my) tomorrow.
--
Mike Kravetz
prev parent reply other threads:[~2022-09-21 3:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 21:46 Mike Kravetz
2022-09-20 4:02 ` Oscar Salvador
2022-09-20 17:10 ` Mike Kravetz
2022-09-21 3:54 ` Oscar Salvador
2022-09-21 1:48 ` HORIGUCHI NAOYA(堀口 直也)
2022-09-21 3:17 ` Mike Kravetz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YyqCS6+OXAgoqI8T@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=joao.m.martins@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox