From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 230A0C433EF for ; Mon, 29 Nov 2021 12:26:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4844C6B0071; Mon, 29 Nov 2021 07:26:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 433396B0072; Mon, 29 Nov 2021 07:26:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F9BA6B0073; Mon, 29 Nov 2021 07:26:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1DF9D6B0071 for ; Mon, 29 Nov 2021 07:26:24 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C017B1843F4D4 for ; Mon, 29 Nov 2021 12:26:13 +0000 (UTC) X-FDA: 78861890226.13.5D6F48B Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf13.hostedemail.com (Postfix) with ESMTP id C20471046297 for ; Mon, 29 Nov 2021 12:26:08 +0000 (UTC) Received: by mail-yb1-f179.google.com with SMTP id e136so41658104ybc.4 for ; Mon, 29 Nov 2021 04:26:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=jLSyxGIPPAZsrcMRpmNOT8BBNjdeI6FtIZETH5WpPBg=; b=S2zPfUWZjvsNyCsE248fr7k5eKZE6C1Jtf1JUjOR03Hhlodvx3Jf/Xe07usCe47xRn fjQGM86pHpajJeSWPLX5/2gfjlpCGGqPXtTdJrWv0OtW0zIggGmI2EHaVcGKTV+cyrwT 0wXLzxBB6mGaUQ2ZAKYBu0Gomc+GPdSNarMtWHOHNwpGlXdeUApLgT4QQNblJaNhtqMx Pp+fJq9GpP0F105StjTuADFewvrCMDYAfHBRPXUynKmyaA1mHAIVigAfO6a8GrGXmRe9 uRsTDpZ2QFUmpOEOKX5jaqFZ/LsZLvu78FBWCql3dSqKzdDLCSpXUUVggmAAvM2CYLFG 6HUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=jLSyxGIPPAZsrcMRpmNOT8BBNjdeI6FtIZETH5WpPBg=; b=vFrpjcaRSfZp9OwZ8JGNm8hOdi5DKYOwqja5XUK6VgXXn/79+CyPZ9AF4TB7nQmsAb C6ShXQpokIE6SZmAr8aGnpfTJ5vWgMb4iD1ixDNfR+2Ould3fdd8fDdwb9JoBZTzhN/l KLEmn4H9i2nB/CU/KBp7ulglfDOTI0K0IQ06aFLZcBOiN26a4x9tornFK2DBPtiJr4Ly ru+L+6ijAui26bEit2hdF86jui5BTN53fxVv9WvCKGVhUgOwleer+PWPUfw/Qyu6HEoa MK/PjQ2AmVkRngCuCdMCWKcE7e4i20QfMIzOE4llt2ztNlMspfZi9nAfC+R6vc+kYTqi fBeA== X-Gm-Message-State: AOAM533LzXmXMcIQy9M57UUlpzAnrtVwg0D2tg5GbUWlUeqFtP44TjmK 2hIZ8WqzT39ug+g8F+3fkjdWxLtL8grXproep9Q= X-Google-Smtp-Source: ABdhPJw2xUJawW9/Ze4jSl0i30u0x4kQAg098sTZsGUOC0qnIfL3NuTn+D6yxKstU2B0yewGsDbX4yS8WerhhqAi7Y8= X-Received: by 2002:a25:7084:: with SMTP id l126mr1773231ybc.310.1638188772741; Mon, 29 Nov 2021 04:26:12 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Zhenguo Yao Date: Mon, 29 Nov 2021 20:26:02 +0800 Message-ID: Subject: Re: Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot. To: Mike Kravetz Cc: Maxim Levitsky , Andrew Morton , Linux Memory Management List , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C20471046297 X-Stat-Signature: 9a9ns8dpj9nwz9nnewgesu5p4yhdoz1p Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=S2zPfUWZ; spf=pass (imf13.hostedemail.com: domain of yaozhenguo1@gmail.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=yaozhenguo1@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1638188768-171691 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mike Kravetz =E4=BA=8E2021=E5=B9=B411=E6=9C=8829= =E6=97=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=8812:31=E5=86=99=E9=81=93=EF=BC= =9A > > On 11/28/21 03:18, Maxim Levitsky wrote: > > > > dmesg prints this: > > > > HugeTLB: allocating 64 of page size 1.00 GiB failed. Only allocated 0 = hugepages > > > > Huge pages were allocated on kernel command line (1/2 of 128GB system): > > > > 'default_hugepagesz=3D1G hugepagesz=3D1G hugepages=3D64' > > > > This is 3970X and no real support/need for NUMA, thus only fake NUMA no= de 0 is present. > > > > Reverting the commit helps. > > > > New syntax also works ( hugepages=3D0:64 ) > > > > I can test any patches for this bug. > > Argh! I think preallocation of gigantic pages on all systems with only > a single node is broken. The issue is at the beginning of > __alloc_bootmem_huge_page: > > int __alloc_bootmem_huge_page(struct hstate *h, int nid) > { > struct huge_bootmem_page *m =3D NULL; /* initialize for clang */ > int nr_nodes, node; > > if (nid >=3D nr_online_nodes) > return 0; > > Without using the node specific syntax, nid =3D=3D NUMA_NO_NODE =3D=3D -1= . For the > comparison, nid will be converted to an unsigned into to match nr_online_= nodes > so we will immediately return 0 instead of doing the allocations. > > Zhenguo Yao, > Can you verify and perhaps put together a patch?does > Preallocation of gigantic pages cant=E2=80=98 work in all the environment, = not only in single node. I think the issue is because of the replacement nodes_weight(node_states[N_MEMORY] with nr_online_nodes in my patch of last version. Sorry for my careless. I didn't notice that parameter nid is int =EF=BC=8Cbut nr_online_nodes is unsigned int. so, this if (nid >= =3D nr_online_nodes) is always true when nid is NUMA_NO_NODE(-1). I will send a fix as soon as passible. This is really a low-level mistake ^^ > > > > Also unrelated, is there any progress on allocating 1GB pages on demand= so that I could > > allocate them only when I run a VM? > > That should be possible. Such support was added back in 2014 with commit > 944d9fec8d7a "hugetlb: add support for gigantic page allocation at runtim= e". > > > > > i don't mind having these pages to be marked as to be used for userspac= e only, > > since as far as I remember its the kernel usage that makes some page un= moveable. > > > > Of course, finding 1GB of contiguous space for a gigantic page is often > difficult at runtime. So, allocations are likely to fail the longer the > system is up and running and fragmentation increases. > > > Last time (many years ago) I tried to create a zone with only userspace= pages > > (I don't remember what options I used) but it didn't work. > > Not too long ago, support was added to use CMA for gigantic page allocati= on. > See commit cf11e85fc08c "mm: hugetlb: optionally allocate gigantic hugepa= ges > using cma". This sounds like something you might want to try. > -- > Mike Kravetz > > > > > Is there a way to debug what is causing unmoveable pages and doesn't le= t > > /proc/sys/vm/nr_hugepages work (I tried it today and as usual the numbe= r > > it can allocate steadly decreases over time). > > > >