From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3ED0C433EF for ; Mon, 4 Apr 2022 10:42:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4981C6B0072; Mon, 4 Apr 2022 06:42:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 421196B0073; Mon, 4 Apr 2022 06:42:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 299C46B0074; Mon, 4 Apr 2022 06:42:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 1339F6B0072 for ; Mon, 4 Apr 2022 06:42:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B5D6C183CF8B5 for ; Mon, 4 Apr 2022 10:42:01 +0000 (UTC) X-FDA: 79318856442.27.CD17E68 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 16FF91C0011 for ; Mon, 4 Apr 2022 10:42:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649068920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K7rIkWLmrLGTa+92JOT4Yi/iErXijZyARlybj4MoGJo=; b=XlTRzFlDjLHFlXEn2utWC6ETU1Q9j6Asj6HoSpo4U7M5wQc3/OMuV/U2poL2mJ0I0eJ7qR n5LZw5FjymxESyAjY7DOG/fzrypIZSvSoAyyeb4/HYF/hQiayLkuP53394B0K6cV6cMiUV ysH9SPxXEyM8y+yIxEdfsZS5bBS9WPA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-522-JoKetGjsOVaGerc02B5lFg-1; Mon, 04 Apr 2022 06:41:59 -0400 X-MC-Unique: JoKetGjsOVaGerc02B5lFg-1 Received: by mail-wr1-f69.google.com with SMTP id q15-20020adfab0f000000b002060c75e65aso672860wrc.5 for ; Mon, 04 Apr 2022 03:41:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=K7rIkWLmrLGTa+92JOT4Yi/iErXijZyARlybj4MoGJo=; b=O6pQmHqmpSxNzCaW5iVrFegMcRZXYy8LUDACuwZ3g0WSJbUIqC94YA8YUL4nOBTL6n tf07bflo/DzAuC/VGPZUnheSnqeWHu61rZwfCMCPaw7CMy3ooM89kzn68Hd8w2M1cXut M9yHEtSr47KlBDP9x8XJTXucUo0WSDDW2qW/LlMnhVnpUkzUaSoYv1C3jfLUepu7MXkh x7ISSbPGCT+L5mxVINxu6rNNR1+8fT3VKjvteiAwuW37/FxKu+Ql1xKC4LgyBvIjhRAP zOhMnjya4Tq7WMx03MC7BnRihyFh2eCIB7Nwyj/12wqzQ4lXM6vp5sjWHnIoXM7W0W4F X0ZQ== X-Gm-Message-State: AOAM530SmtGqd/xo3WeZrcXqrCaD7kmWpChLtOJ+lPTNgrSr5aK8NThV +mdoSWCdQIxezrUULKcm2JLACqpFYZ9++DyjaUHO+BN47e0OtMYUFIr9a4/DycEGVkGsBeOwpMK YGzzfzBs3z00= X-Received: by 2002:a05:600c:3ca4:b0:38e:54d0:406d with SMTP id bg36-20020a05600c3ca400b0038e54d0406dmr13488804wmb.199.1649068918203; Mon, 04 Apr 2022 03:41:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6a6zfyr1sGtwloIiL5KkgvfWKLJ1JA6cVZ5DHKeAjtvZH2ze39DIK8JP7Ioh7t18Pin7Wjw== X-Received: by 2002:a05:600c:3ca4:b0:38e:54d0:406d with SMTP id bg36-20020a05600c3ca400b0038e54d0406dmr13488786wmb.199.1649068917896; Mon, 04 Apr 2022 03:41:57 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:4100:c220:ede7:17d4:6ff4? (p200300cbc7044100c220ede717d46ff4.dip0.t-ipconnect.de. [2003:cb:c704:4100:c220:ede7:17d4:6ff4]) by smtp.gmail.com with ESMTPSA id f11-20020a7bcc0b000000b0037e0c362b6dsm8996818wmh.31.2022.04.04.03.41.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Apr 2022 03:41:57 -0700 (PDT) Message-ID: Date: Mon, 4 Apr 2022 12:41:56 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Mike Kravetz , Peng Liu , akpm@linux-foundation.org, yaozhenguo1@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20220401101232.2790280-1-liupeng256@huawei.com> <20220401101232.2790280-2-liupeng256@huawei.com> <0aefbc18-4232-0bae-b37a-d4c6995e3d00@redhat.com> <508fd247-b809-27d7-6bc8-a08c4c73cbb5@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode In-Reply-To: <508fd247-b809-27d7-6bc8-a08c4c73cbb5@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 7ure3ucxaton87cuf1ouatqf4notixia Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XlTRzFlD; spf=none (imf18.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 16FF91C0011 X-HE-Tag: 1649068920-906669 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.04.22 19:23, Mike Kravetz wrote: > On 4/1/22 03:43, David Hildenbrand wrote: >> On 01.04.22 12:12, Peng Liu wrote: >>> Hugepages can be specified to pernode since "hugetlbfs: extend >>> the definition of hugepages parameter to support node allocation", >>> but the following problem is observed. >>> >>> Confusing behavior is observed when both 1G and 2M hugepage is set >>> after "numa=off". >>> cmdline hugepage settings: >>> hugepagesz=1G hugepages=0:3,1:3 >>> hugepagesz=2M hugepages=0:1024,1:1024 >>> results: >>> HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages >>> HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages >>> >>> Furthermore, confusing behavior can be also observed when invalid >>> node behind valid node. >>> >>> To fix this, hugetlb_hstate_alloc_pages should be called even when >>> hugepages_setup going to invalid. >> >> Shouldn't we bail out if someone requests node-specific allocations but >> we are not running with NUMA? > > I thought about this as well, and could not come up with a good answer. > Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific > allocations on the same command line. I would have no problem bailing out > in such situations. But, I think that would also require the hugetlb command > line processing to look for such situations. Yes. Right now I see if (tmp >= nr_online_nodes) goto invalid; Which seems a little strange, because IIUC, it's the number of online nodes, which is completely wrong with a sparse online bitmap. Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming that "node < 2" is valid is wrong. Why don't we check for node_online() and bail out if that is not the case? Is it too early for that check? But why does comparing against nr_online_nodes() work, then? Having that said, I'm not sure if all usage of nr_online_nodes in mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that, it's really just used for "nr_online_nodes > 1". I might be wrong, though. > > One could also argue that if there is only a single node (not numa=off on > command line) and someone specifies node local allocations we should bail. I assume "numa=off" is always parsed before hugepages_setup() is called, right? So we can just rely on the actual numa information. > > I was 'thinking' about a situation where we had multiple nodes and node > local allocations were 'hard coded' via grub or something. Then, for some > reason one node fails to come up on a reboot. Should we bail on all the > hugetlb allocations, or should we try to allocate on the still available > nodes? Depends on what "bail" means. Printing a warning and stopping to allocate further is certainly good enough for my taste :) > > When I went back and reread the reason for this change, I see that it is > primarily for 'some debugging and test cases'. > >> >> What's the result after your change? >> >>> >>> Cc: >> >> I am not sure if this is really stable material. > > Right now, we partially and inconsistently process node specific allocations > if there are missing nodes. We allocate 'regular' hugetlb pages on existing > nodes. But, we do not allocate gigantic hugetlb pages on existing nodes. > > I believe this is worth fixing in stable. I am skeptical. https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst " - It must fix a real bug that bothers people (not a, "This could be a problem..." type thing)." While the current behavior is suboptimal, it's certainly not an urgent bug (?) and the kernel will boot and work just fine. As you mentioned "nobody SHOULD specify both 'numa=off' and ask for node specific allocations on the same command line.", this is just a corner case. Adjusting it upstream -- okay. Backporting to stable? I don't think so. -- Thanks, David / dhildenb