From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 090F0C433F5 for ; Wed, 27 Apr 2022 22:29:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59E0C6B007D; Wed, 27 Apr 2022 18:29:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54F506B007E; Wed, 27 Apr 2022 18:29:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F00C6B0080; Wed, 27 Apr 2022 18:29:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 2BCC56B007D for ; Wed, 27 Apr 2022 18:29:40 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F102828DC0 for ; Wed, 27 Apr 2022 22:29:39 +0000 (UTC) X-FDA: 79404102078.26.92FF606 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 8530D120065 for ; Wed, 27 Apr 2022 22:29:36 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id cu23-20020a17090afa9700b001d98d8e53b7so4806338pjb.0 for ; Wed, 27 Apr 2022 15:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5hA8Xq3m1wbXqH/a0qmeDa8TEICOu26xCCl40xL9zKg=; b=Herq3K81McYpTzvqFagSy1plhq9FEVLAQJ8qC9KUf2XMhZskLHV+XWGU0n9cS2agrU D6RecRZ2Jyfts5heQzpiDSjrRR9yonolcoSQa443BAV7XZEIXu68BLoZMXCDeHcLBn5A vfFKGPQchZCaphz6AWPPGj9aO/w9OMOMNKPMkktWvyZDkfK7NZjzEf+T8SbzT+ln1uvB vEiK5SGg6/OKPeqYdRumwdTnaQgsS+8TNm+xEHlqSfsJVY2mE2RSGGc4iXI7z01gm5Fg cmsuRvENz6YUo7aMZv7ltCb+c/uBoGKV+Ine5BIA1cY7/L1wI0dM4j/O0fw0xrKzABv6 dvLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5hA8Xq3m1wbXqH/a0qmeDa8TEICOu26xCCl40xL9zKg=; b=OX6Jr4VQ98sLGg7bsw/BkNWOjkAGjbrxu4aylHGusBVvczHxW+zJmJGfYVC8iczx9H r4JaQSpcOSI4AwgKlTKf3YO7lLM7ew2OHaw5fdk+XlYE40z9P5qIivb1DoU3g7IDEMvE 5oG1WozF8DGcZaoJRZz8cklH3scpP1EATfnSEadaW3sXhtxLNw7Q9kjfk8nF7kM5HpoR vl5227zc6lkmmF0ydEk/949rb8HqCp2avOOmweOsJZAtP1jznDRSZ3fmpWqr5sjuB357 EFEoKScnxe3LMhFtm7b1Hsz6cBqT53JkgnsYZVPCTXSOC6FSR6gJNZDa45mtVTQOeaNy Do7A== X-Gm-Message-State: AOAM5338k2aNO6bqL4FUtkkf7Dmtj0jNvoitg/fU4bYziM4G6/+Iz9SH fjhq+3qaBw4/5izpys5/4jmr/Jy4b0NV39JQptc= X-Google-Smtp-Source: ABdhPJxXGAMdkbpoJkYXcXbM4mJ3u4915SZ/ZExCjr10hG0A0gbchKsFGHeRE5A4+UcYOsXETcMs1KPF7CUx29sBu9E= X-Received: by 2002:a17:903:32d2:b0:15d:ea5:3e0f with SMTP id i18-20020a17090332d200b0015d0ea53e0fmr18844184plr.117.1651098578376; Wed, 27 Apr 2022 15:29:38 -0700 (PDT) MIME-Version: 1.0 References: <20220317065024.2635069-1-maobibo@loongson.cn> <20220427134843.576f0a18bea28de9e798004a@linux-foundation.org> In-Reply-To: <20220427134843.576f0a18bea28de9e798004a@linux-foundation.org> From: Yang Shi Date: Wed, 27 Apr 2022 15:29:25 -0700 Message-ID: Subject: Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page To: Andrew Morton Cc: Bibo Mao , Linux MM , Linux Kernel Mailing List , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Herq3K81; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8530D120065 X-Rspam-User: X-Stat-Signature: xj8j4cyb4eeriumcwjksiuzazkfxffps X-HE-Tag: 1651098576-436978 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 27, 2022 at 1:48 PM Andrew Morton wrote: > > On Thu, 17 Mar 2022 02:50:24 -0400 Bibo Mao wrote: > > > collapse huge page will copy huge page from general small pages, > > dest node is calculated from most one of source pages, however > > THP daemon is not scheduled on dest node. The performance may be > > poor since huge page copying across nodes, also cache is not used > > for target node. With this patch, khugepaged daemon switches to > > the same numa node with huge page. It saves copying time and makes > > use of local cache better. > > > > With this patch, specint 2006 base performance is improved with 6% > > on Loongson 3C5000L platform with 32 cores and 8 numa nodes. > > > > Are there any acks for this one please? TBH, I'm a little bit reluctant to this patch. I agree running khugepaged on the same node with the source and dest pages could reduce cross socket traffic and use cache more efficiently. But I'm not sure whether it is really worth it or not. For example, on a busy system, khugepaged may jump from cpus to cpus, that may interfere with the scheduler, and khugepaged has to wait to run on the target cpu, it may take indefinite time. In addition the yield also depends on the locality of source pages (how many of them are on the same node), how often khugepaged is woken up on a different node, etc. Even though it was proved worth it, I'd prefer set_cpus_allowed_ptr() is called between mmap_read_unlock() and mmap_write_lock() in order to avoid waste effort for some error paths. > > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm, > > struct vm_area_struct *vma; > > struct mmu_notifier_range range; > > gfp_t gfp; > > + const struct cpumask *cpumask; > > > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > > > > @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm, > > * that. We will recheck the vma after taking it again in write mode. > > */ > > mmap_read_unlock(mm); > > + > > + /* sched to specified node before huage page memory copy */ > > + if (task_node(current) != node) { > > + cpumask = cpumask_of_node(node); > > + if (!cpumask_empty(cpumask)) > > + set_cpus_allowed_ptr(current, cpumask); > > + } > > new_page = khugepaged_alloc_page(hpage, gfp, node); > > if (!new_page) { > > result = SCAN_ALLOC_HUGE_PAGE_FAIL; > > -- > > 2.31.1 > >