From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 070AEC433EF for ; Thu, 28 Apr 2022 16:34:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40F008D0008; Thu, 28 Apr 2022 12:34:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3983F8D0005; Thu, 28 Apr 2022 12:34:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 211268D0008; Thu, 28 Apr 2022 12:34:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 0C0678D0005 for ; Thu, 28 Apr 2022 12:34:16 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D9C2529DA for ; Thu, 28 Apr 2022 16:34:15 +0000 (UTC) X-FDA: 79406835270.24.987886F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 6B18C160052 for ; Thu, 28 Apr 2022 16:34:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651163654; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dnyhoc/cjBYTIjG859/Q+ADC9xE0V02YdHx7bZpU3Gc=; b=ZOkgl1MjGMKz3LfQweNGWVkpagDhGR7CF6k4elLM2THiaZ+Jeor5BHgXw3fcXywSJ4boMY BBEsMy7e2H14AEic+AXK8ZBgUOyv49qoBzJJnq3Y7DAxtav1hmciTTFqSTbJ36kfWhHkuV 2AXtP5coXTH45do+OGBNsUz2gfqnMV4= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-198-QFyR8gkzNJGTTkz7tEN5XA-1; Thu, 28 Apr 2022 12:34:10 -0400 X-MC-Unique: QFyR8gkzNJGTTkz7tEN5XA-1 Received: by mail-il1-f199.google.com with SMTP id m3-20020a056e02158300b002b6e3d1f97cso2067168ilu.19 for ; Thu, 28 Apr 2022 09:34:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=dnyhoc/cjBYTIjG859/Q+ADC9xE0V02YdHx7bZpU3Gc=; b=LyUA3WawArtazMfQNlokj4BrnKyEZlGZFzlfEOIHXO4gPbpE8p+pUpmfCC8pLvM/qN +csJiUMJp9EbGQUAd734RBffWAiXDlpY1CnjtNZkWnch9HBl5ZhJ6ujMEmhcV9KI4iEI tlV9nBYCmISDHsenaSRODay+YYc/hTeduwlglZwvORoShMXulaKQacjwihyErg+CiX4S p4LReulk3E5okZPSwpzL0vPKegDdS4nDuFCiCX5apvPGdbdlzg+I3vMSrfEAzwaqQI0i bh7TGzWYYGqt+wUxKClCEDrRrmw8RxjPp/xlV4JQPBunQGJKZ4cmUzSsq8pWtuo+VCgj klmw== X-Gm-Message-State: AOAM530mg1s1ARU54TLRaWP3b17NzDwsWLUilhAmebChcPhSmLa1YFnK 7ENecT5yqA9QmG6gD6QouL6HnNqhecqIRjzNMb2aI/tL38Hvv06gCsx94SKT0ApT83klIqk68f5 /ViOKABMusLs= X-Received: by 2002:a05:6638:1306:b0:326:33da:c673 with SMTP id r6-20020a056638130600b0032633dac673mr15510973jad.270.1651163649738; Thu, 28 Apr 2022 09:34:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxxtoPvNh2i+CT27XPlF1bYSAsfyfPsfyIrKSmugUrpy5xhGvlUArte8ycYE6V0fBJY5+ZhkA== X-Received: by 2002:a05:6638:1306:b0:326:33da:c673 with SMTP id r6-20020a056638130600b0032633dac673mr15510963jad.270.1651163649459; Thu, 28 Apr 2022 09:34:09 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id i1-20020a6b5401000000b00657c9b6e5c6sm195868iob.43.2022.04.28.09.34.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Apr 2022 09:34:08 -0700 (PDT) Date: Thu, 28 Apr 2022 12:34:07 -0400 From: Peter Xu To: David Hildenbrand Cc: Bibo Mao , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yang Shi Subject: Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page Message-ID: References: <20220317065024.2635069-1-maobibo@loongson.cn> <3a441789-b3e4-236e-2e44-e7a1c7258a94@redhat.com> MIME-Version: 1.0 In-Reply-To: <3a441789-b3e4-236e-2e44-e7a1c7258a94@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 6B18C160052 X-Stat-Signature: udeknrwz4huxoue9jcuq195bego6gwnb X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZOkgl1Mj; spf=none (imf08.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam09 X-HE-Tag: 1651163648-130713 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 28, 2022 at 05:17:07PM +0200, David Hildenbrand wrote: > On 17.03.22 07:50, Bibo Mao wrote: > > collapse huge page will copy huge page from general small pages, > > dest node is calculated from most one of source pages, however > > THP daemon is not scheduled on dest node. The performance may be > > poor since huge page copying across nodes, also cache is not used > > for target node. With this patch, khugepaged daemon switches to > > the same numa node with huge page. It saves copying time and makes > > use of local cache better. > > > > With this patch, specint 2006 base performance is improved with 6% > > on Loongson 3C5000L platform with 32 cores and 8 numa nodes. > > If it helps, that's nice as long as it doesn't hurt other cases. > > > > > Signed-off-by: Bibo Mao > > --- > > changelog: > > V2: remove node record for thp daemon > > V3: remove unlikely statement > > --- > > mm/khugepaged.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 131492fd1148..b3cf0885f5a2 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm, > > struct vm_area_struct *vma; > > struct mmu_notifier_range range; > > gfp_t gfp; > > + const struct cpumask *cpumask; > > > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > > > > @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm, > > * that. We will recheck the vma after taking it again in write mode. > > */ > > mmap_read_unlock(mm); > > + > > + /* sched to specified node before huage page memory copy */ > > huage? I assume "huge" > > > + if (task_node(current) != node) { > > + cpumask = cpumask_of_node(node); > > + if (!cpumask_empty(cpumask)) > > + set_cpus_allowed_ptr(current, cpumask); > > + } > > I wonder if that will always be optimized out without NUMA and if we > want to check for IS_ENABLED(CONFIG_NUMA). > > > Regarding comments from others, I agree: I think what we'd actually want > is something like "try to reschedule to one of these CPUs immediately. > If they are all busy, just stay here. > > > Also, I do wonder if there could already be scenarios where someone > wants to let khugepaged run only on selected housekeeping CPUs (e.g., > when pinning VCPUs in a VM to physical CPUs). It might even degrade the > VM performance in that case if we schedule something unrelated on these > CPUs. (I don't know which interfaces we might already have to configure > housekeeping CPUs for kthreads). > > I can spot in kernel/kthread.c:kthread() > > set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); > > Hmmmmm ... Yes that's a valid point, for RT afaik many users tunes the kernel threads specifically on demand by pinning them. So I'm not sure how this new algorithm could break some users already, by either (1) trying to pin khugepaged onto some isolated cores (which can cause spikes?), or (2) mess up with the admin's previous pin settings on the khugepagd kthread. The other thing is the khugepaged movement on the cores seems to be quite random, because the pages it scans can be unpredictably stored on different numa nodes, so logically it can start bouncing easily on some hosts and that does sound questionalbe.. as I raised the (pure) question previously on the 2nd point irrelevant of the benchmark result. -- Peter Xu