From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76A10ECAAD2 for ; Thu, 1 Sep 2022 06:57:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8ACE6B0072; Thu, 1 Sep 2022 02:57:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A3A1D6B0073; Thu, 1 Sep 2022 02:57:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DB388D0001; Thu, 1 Sep 2022 02:57:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 797DA6B0072 for ; Thu, 1 Sep 2022 02:57:55 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4CA5640594 for ; Thu, 1 Sep 2022 06:57:55 +0000 (UTC) X-FDA: 79862611710.19.5980FED Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 267B8140058 for ; Thu, 1 Sep 2022 06:57:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662015473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WGBsp8/YfADsG+07MiqOBFMVvtF7ThPtdD2L04nbutQ=; b=ePWs2dUoAbQJzqSInrknyGpkH7mLUiePOMXB22nZwSx9EvccSKmw8lUo2cWHx7UUbliiFR bPkycwEljjir3yprSf1odoEtp+OZ2nZJ+Nfj6CKMGyfshinbuiviTu8/8kaCnzVrMis0pw xzAbBVp4kVo/yAYVmivaxyWc0rWtbBo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-76-yh-n2fDNPQWULcZtMQqCjg-1; Thu, 01 Sep 2022 02:57:49 -0400 X-MC-Unique: yh-n2fDNPQWULcZtMQqCjg-1 Received: by mail-wr1-f70.google.com with SMTP id r19-20020adfa153000000b00226d74bc332so2229173wrr.3 for ; Wed, 31 Aug 2022 23:57:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=WGBsp8/YfADsG+07MiqOBFMVvtF7ThPtdD2L04nbutQ=; b=D+IclB5aAj0Fp/RCQmTPrJWQaFgiM3tyoFXGCikzvOiN088Zfj0PPfHZtbl5E2HhdY rn+B+Fb+NgZtZOpijjW/1MnBwRurr5JQExCRuHShkbUAf/gs3t8z6RseyqGP1PvbmTgD WrB/zLXH506m9BBaI5+zLwn+0n3prbh/4tOaYlMVnaDqdrAoAfslOXBqbReLHSn4jl+7 eg/BCNXZ2MYqBh7m7ax3GP6huQuXxBTnmb/4q0I6XwAmHUMiVBMM8ssvI1LtcRZCSeiL oB8Ng15AAkNJjJvYQP+5aCCAaYIC6SmxPgRsInap9ZSNbSpimQRckpinEXtwdQYTsLIc JYww== X-Gm-Message-State: ACgBeo1oiPEyDBHAhyUkDcYlLqdiiHnWPxvMKEmg9sHYUcMAOR1F4gh1 bO94pIZ7MTWGKdoai0ScVpY2Fxs1dK1WB1qM0KCc1gfJLfij8qNTDxXXZSRpFspMf01fasZpwph OCqaGH+keNOc= X-Received: by 2002:a05:6000:1563:b0:222:c827:1a19 with SMTP id 3-20020a056000156300b00222c8271a19mr13328996wrz.705.1662015468676; Wed, 31 Aug 2022 23:57:48 -0700 (PDT) X-Google-Smtp-Source: AA6agR5mJHY5ht5hskPoGrlBOujfn7oZJGESLD9rrJV6B+HJOGJDhQH/AR12e1myIPVQZaDeiRqDyw== X-Received: by 2002:a05:6000:1563:b0:222:c827:1a19 with SMTP id 3-20020a056000156300b00222c8271a19mr13328976wrz.705.1662015468281; Wed, 31 Aug 2022 23:57:48 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:9e00:fec0:7e96:15cb:742? (p200300cbc7079e00fec07e9615cb0742.dip0.t-ipconnect.de. [2003:cb:c707:9e00:fec0:7e96:15cb:742]) by smtp.gmail.com with ESMTPSA id i19-20020a05600c355300b003a6125562e1sm4787902wmq.46.2022.08.31.23.57.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 31 Aug 2022 23:57:47 -0700 (PDT) Message-ID: Date: Thu, 1 Sep 2022 08:57:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 To: Muchun Song Cc: Muchun Song , Greg KH , rafael@kernel.org, Mike Kravetz , Andrew Morton , Oscar Salvador , andi@firstfloor.org, linux-kernel@vger.kernel.org, Linux MM , David Rientjes References: <20220819080029.12241-1-songmuchun@bytedance.com> <60933ffc-b850-976c-78a0-0ee6e0ea9ef0@redhat.com> <685F3238-D727-4C1F-9BA4-36651ABCD91A@linux.dev> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662015474; a=rsa-sha256; cv=none; b=6s9AOZVtiVNab04ARhxHTG/mZAp4NaIdDz5jObz+FZn3wavXgItCu42bo/z6pXKKc/6ZyE KyO9t7NOYX2cUgHEoagu4Fz3kZ58xOrDKXiQSRtV5rHudh6gzeqOkOJeMOCuiUm/YN1Bfe j+E9i18Trt3iEhFHhCnV8qk9eUsOd4A= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ePWs2dUo; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662015474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WGBsp8/YfADsG+07MiqOBFMVvtF7ThPtdD2L04nbutQ=; b=Yc/T9+ta32JdIg7cKE+z0n227Kkb8gQpD/SdirB89RVAz6OLbwj+Q14Fk3lvtRlugi4Jdu vjojHBA/4i1INP7VgSmfQ7EIN9Wtqw2cgaYsm94pXAk+9DAdfu+zWTtXMMPpvqXmAUhSOU TDvmUAtBSP7Pn/HeUZUEjd6xx6Cgniw= X-Rspamd-Queue-Id: 267B8140058 X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ePWs2dUo; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam03 X-Stat-Signature: 3jxxf1pgnybzukb7xzquq9rzq31fuc8y X-HE-Tag: 1662015473-737164 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.09.22 08:35, Muchun Song wrote: > > >> On Aug 24, 2022, at 11:23, Muchun Song wrote: >> >> >> >>> On Aug 23, 2022, at 18:21, David Hildenbrand wrote: >>> >>> On 19.08.22 10:00, Muchun Song wrote: >>>> The following commit offload per-node sysfs creation and removal to a kworker and >>>> did not say why it is needed. And it also said "I don't know that this is >>>> absolutely required". It seems like the author was not sure as well. Since it >>>> only complicates the code, this patch will revert the changes to simplify the code. >>>> >>>> 39da08cb074c ("hugetlb: offload per node attribute registrations") >>>> >>>> We could use memory hotplug notifier to do per-node sysfs creation and removal >>>> instead of inserting those operations to node registration and unregistration. >>>> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can >>>> simplify the code. >>>> >>>> Signed-off-by: Muchun Song >>> >>> >>> [...] >>> >>>> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num) >>>> void unregister_node(struct node *node) >>>> { >>>> compaction_unregister_node(node); >>>> - hugetlb_unregister_node(node); /* no-op, if memoryless node */ >>>> node_remove_accesses(node); >>>> node_remove_caches(node); >>>> device_unregister(&node->dev); >>>> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn, >>>> (void *)&nid, func); >>>> return; >>>> } >>> >>> [...] >>> >>>> /* >>>> * Create all node devices, which will properly link the node >>>> * to applicable memory block devices and already created cpu devices. >>>> diff --git a/include/linux/node.h b/include/linux/node.h >>>> index 40d641a8bfb0..ea817b507f54 100644 >>>> --- a/include/linux/node.h >>>> +++ b/include/linux/node.h >>>> @@ -2,15 +2,15 @@ >>>> /* >>>> * include/linux/node.h - generic node definition >>>> * >>>> - * This is mainly for topological representation. We define the >>>> - * basic 'struct node' here, which can be embedded in per-arch >>>> + * This is mainly for topological representation. We define the >>>> + * basic 'struct node' here, which can be embedded in per-arch >>>> * definitions of processors. >>>> * >>>> * Basic handling of the devices is done in drivers/base/node.c >>>> - * and system devices are handled in drivers/base/sys.c. >>>> + * and system devices are handled in drivers/base/sys.c. >>>> * >>>> * Nodes are exported via driverfs in the class/node/devices/ >>>> - * directory. >>>> + * directory. >>> >>> Unrelated changes. >> >> Yep, a minor cleanup BTW. >> >>> >>>> */ >>>> #ifndef _LINUX_NODE_H_ >>>> #define _LINUX_NODE_H_ >>>> @@ -18,7 +18,6 @@ >>>> #include >>>> #include >>>> #include >>>> -#include >>>> >>>> /** >>>> * struct node_hmem_attrs - heterogeneous memory performance attributes >>>> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid, >>>> struct node { >>>> struct device dev; >>>> struct list_head access_list; >>>> - >>>> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS) >>>> - struct work_struct node_work; >>>> -#endif >>>> #ifdef CONFIG_HMEM_REPORTING >>>> struct list_head cache_attrs; >>>> struct device *cache_dev; >>>> @@ -96,7 +91,6 @@ struct node { >>>> >>>> struct memory_block; >>>> extern struct node *node_devices[]; >>>> -typedef void (*node_registration_func_t)(struct node *); >>>> >>>> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA) >>>> void register_memory_blocks_under_node(int nid, unsigned long start_pfn, >>>> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk); >>>> extern int register_memory_node_under_compute_node(unsigned int mem_nid, >>>> unsigned int cpu_nid, >>>> unsigned access); >>>> - >>>> -#ifdef CONFIG_HUGETLBFS >>>> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister, >>>> - node_registration_func_t unregister); >>>> -#endif >>>> #else >>>> static inline void node_dev_init(void) >>>> { >>>> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) >>>> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk) >>>> { >>>> } >>>> - >>>> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg, >>>> - node_registration_func_t unreg) >>>> -{ >>>> -} >>>> #endif >>>> >>>> #define to_node(device) container_of(device, struct node, dev) >>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>> index 536a52c29035..9a72499486c1 100644 >>>> --- a/mm/hugetlb.c >>>> +++ b/mm/hugetlb.c >>>> @@ -33,6 +33,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> >>>> #include >>>> #include >>>> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node) >>>> * Register hstate attributes for a single node device. >>>> * No-op if attributes already registered. >>>> */ >>>> -static void hugetlb_register_node(struct node *node) >>>> +static int hugetlb_register_node(struct node *node) >>>> { >>>> struct hstate *h; >>>> struct node_hstate *nhs = &node_hstates[node->dev.id]; >>>> int err; >>>> >>>> if (nhs->hugepages_kobj) >>>> - return; /* already allocated */ >>>> + return 0; /* already allocated */ >>>> >>>> nhs->hugepages_kobj = kobject_create_and_add("hugepages", >>>> &node->dev.kobj); >>>> if (!nhs->hugepages_kobj) >>>> - return; >>>> + return -ENOMEM; >>>> >>>> for_each_hstate(h) { >>>> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj, >>>> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node) >>>> pr_err("HugeTLB: Unable to add hstate %s for node %d\n", >>>> h->name, node->dev.id); >>>> hugetlb_unregister_node(node); >>>> - break; >>>> + return -ENOMEM; >>>> } >>>> } >>>> + return 0; >>>> +} >>>> + >>>> +static int __meminit hugetlb_memory_callback(struct notifier_block *self, >>>> + unsigned long action, void *arg) >>>> +{ >>>> + int ret = 0; >>>> + struct memory_notify *mnb = arg; >>>> + int nid = mnb->status_change_nid; >>>> + >>>> + if (nid == NUMA_NO_NODE) >>>> + return NOTIFY_DONE; >>>> + >>>> + if (action == MEM_GOING_ONLINE) >>>> + ret = hugetlb_register_node(node_devices[nid]); >>>> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE) >>>> + hugetlb_unregister_node(node_devices[nid]); >>>> + >>>> + return notifier_from_errno(ret); >>>> } >>>> >>>> /* >>>> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void) >>>> { >>>> int nid; >>>> >>>> - for_each_node_state(nid, N_MEMORY) { >>>> - struct node *node = node_devices[nid]; >>>> - if (node->dev.id == nid) >>>> - hugetlb_register_node(node); >>>> - } >>>> - >>>> - /* >>>> - * Let the node device driver know we're here so it can >>>> - * [un]register hstate attributes on node hotplug. >>>> - */ >>>> - register_hugetlbfs_with_node(hugetlb_register_node, >>>> - hugetlb_unregister_node); >>>> + get_online_mems(); >>>> + hotplug_memory_notifier(hugetlb_memory_callback, 0); >>>> + for_each_node_state(nid, N_MEMORY) >>>> + hugetlb_register_node(node_devices[nid]); >>>> + put_online_mems(); >>>> } >>>> #else /* !CONFIG_NUMA */ >>> >>> Do we really *need* the memory hotplug notifier and the added complexity >>> due for handling memory-less nodes? > > Hi David, > Hi, thanks for playing with the idea. > After some tries, I think it may not reduce the complexity. node_dev_init() > is called at early stage before hugetlb_register_all_nodes(). So we need to > add a mechanism to detect if the hugetlb subsystem finishes initialization > in node_dev_init() so that it can determine to help hugetlb create /sysfs > files, the mechanism is similar with the changes in drivers/base/node.c of > commit 9a30523066cd ("hugetlb: add per node hstate attributes”). This approach If I'm not wrong, all you need is a single call from node_dev_init() into hugetlb code. There, you create the hugetlb sysfs if hugetlb was already initialized, otherwise it's a NOP as you initialize when hugetlb gets initialized. When initializing hugetlb, you go over all added nodes and create hugetlb sysfs. Testing/remembering if hugetlb was initialized should be easy, no? What's the complicated part I am missing? > may add more code than the memory-notify-based approach like this patch > implemented. And it also add the code coupling between node.c and hugetlb.c. > So I tend to use memory hotplug notifier. What’s your opinion? We have hugetlb special casing all over the place, it's an integral MM part -- not some random driver where we'd really want decoupling. So I don't see why the decouling would be beneficial here and how using the memory notifier is any better then a simple callback. But again, I did not look into the details of the necessary implementation. Thanks! -- Thanks, David / dhildenb