From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C012C433B4 for ; Mon, 26 Apr 2021 01:48:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D882611AB for ; Mon, 26 Apr 2021 01:48:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D882611AB Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0CAC16B0075; Sun, 25 Apr 2021 21:48:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07AF56B0078; Sun, 25 Apr 2021 21:48:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE7BD6B007B; Sun, 25 Apr 2021 21:48:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id B93F56B0075 for ; Sun, 25 Apr 2021 21:48:25 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 75B7F1F1A for ; Mon, 26 Apr 2021 01:48:25 +0000 (UTC) X-FDA: 78072833370.18.21C34A4 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf26.hostedemail.com (Postfix) with ESMTP id 363D040002C1 for ; Mon, 26 Apr 2021 01:48:17 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id lt13so17201237pjb.1 for ; Sun, 25 Apr 2021 18:48:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=yjddGSgjNoB5y1kSapRuBtROwu0My00oEQswcqrZTOk=; b=BekNpRLMi9FI8lqejNA/kp83jdzG17igVs9SpF0PYEW29tM9UMPOS+fNgQr35u5ISn +YqNxpOJUA77dF/HeBu938gafRUvci20dXAZ6oNEQxNuSULIGYUB00fBkA2JPCY1H1Ft gIcGclV5n+RtMsvj7xc7Mp0QJr7YzT04zA01paZ7FVcH10xA7FzYtk7qEwtisvXX0ap8 sDH5PPomcrzRxoGzaJ5Jz4dj0fUxNt23UmdaLnSN48C1KAHWxpiBZiJjnH5SPPnc86eH 8k2JYtnL7ZvuGyyloLRX7uG/x+b4uuEZFkopCfIVjjObK6OBVRXB9iMPRa9jC3mS2gi0 VaCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=yjddGSgjNoB5y1kSapRuBtROwu0My00oEQswcqrZTOk=; b=mfcd2p/vklVKIwKnNlr6cQpVqX843OOmZGBFeywptEeq1WZoRr01/KNGEW3SaNniCB 9SpTDC9KxxDhW7+kP1vNeWlA28ycKSaMnpJ1WB6Dyvj5e9s+EC6TDfFDkZt0TeJsxEWp XdC64Y6vbadfgV5r70fAnipqLxNtqlbD+U7+LxoKmrZenWDF8kq1UWQ2FAaJuPc2TIJa VW+js/NMJWK8KAapQ9HV2YaGR+24sycygacPlJSUZ+l6AFlBAdhyZShKM9HyZA8TrJP6 w+8NESOwDmgZGgAEeoDXch2Re5Z04g3nXvPwisEVJ9YBVDYRJ83hsGWOljWtlmdrxb5Q K+9w== X-Gm-Message-State: AOAM530VmCI7gFoQ1Py/2mTYP83ktlVMqeQP2LA6cXiBKxWgTyzyKNAT ZgOdF3U3ZBl+7uBf6dBNNBfC+g== X-Google-Smtp-Source: ABdhPJzUczibBKb6cNI8OnKjHbmCtV9sPgW3t5k83bgUxci4EBbFgYvsP2Tt0GvMRV21Q6Nv2/k38g== X-Received: by 2002:a17:90b:1183:: with SMTP id gk3mr17730833pjb.172.1619401703977; Sun, 25 Apr 2021 18:48:23 -0700 (PDT) Received: from [2620:15c:17:3:f93a:1c09:9070:93a7] ([2620:15c:17:3:f93a:1c09:9070:93a7]) by smtp.gmail.com with ESMTPSA id j24sm4531456pjy.1.2021.04.25.18.48.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Apr 2021 18:48:23 -0700 (PDT) Date: Sun, 25 Apr 2021 18:48:22 -0700 (PDT) From: David Rientjes To: "Chu,Kaiping" cc: "mcgrof@kernel.org" , "keescook@chromium.org" , "yzaikin@google.com" , "akpm@linux-foundation.org" , "vbabka@suse.cz" , "nigupta@nvidia.com" , "bhe@redhat.com" , "khalid.aziz@oracle.com" , "iamjoonsoo.kim@lge.com" , "mateusznosek0@gmail.com" , "sh_def@163.com" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" Subject: =?UTF-8?Q?Re=3A_=E7=AD=94=E5=A4=8D=3A_=5BPATCH_v3=5D_mm=2Fcompact?= =?UTF-8?Q?ion=3Alet_proactive_compaction_order_configurable?= In-Reply-To: <14f6897b3dfd4314b85c5865a2f2b5d0@baidu.com> Message-ID: <8ba0751b-8310-dcb8-5f74-97b9cb65a199@google.com> References: <1619313662-30356-1-git-send-email-chukaiping@baidu.com> <14f6897b3dfd4314b85c5865a2f2b5d0@baidu.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="1482994552-1277834948-1619401703=:2029386" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 363D040002C1 X-Stat-Signature: wwqe7r5gwff3hejyjw9g6y9qco1m5kop Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=mail-pj1-f42.google.com; client-ip=209.85.216.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619401697-196167 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --1482994552-1277834948-1619401703=:2029386 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable On Mon, 26 Apr 2021, Chu,Kaiping wrote: > Hi Rientjes > I already answered your question in 4.19. > " We turn off the transparent huge page in our machines, so we don't ca= re about the order 9. > There are many user space applications, different application maybe all= ocate different order of memory, we can't know the "known order of intere= st" in advance. Our purpose is to keep the overall fragment index as low = as possible, not care about the specific order.=20 Ok, so you don't care about a specific order but you are adding a=20 vm.compaction_order sysctl? I think what you're trying to do is invoke full compaction (cc.order =3D = -1)=20 at some point in time that will (1) keep node-wide fragmentation low over= =20 the long run and (2) be relatively lightweight at the time it is done. I can certainly understand (1) on your configuration that is mostly=20 consumed by 1GB gigantic pages, you are likely dealing with significant=20 memory pressure that causes fragmentation to increase over time and=20 eventually become unrecoverable for the most part. And for (2), yes, using vm.compact_memory will become very heavyweight if= =20 it's done too late. So since proactive compaction uses cc.order =3D 1, same as=20 vm.compact_memory, it should be possible to monitor extfrag_index under=20 debugfs and manually trigger compaction when necessary without=20 intervention of the kernel. I think we can both agree that we wouldn't want to add obscure and=20 undocumented sysctls that that can easily be replaced by a userspace=20 implementation. > Although current proactive compaction mechanism only check the fragment= index of specific order, but it can do memory compaction for all order(.= order =3D -1 in proactive_compact_node), so it's still useful for us.=20 > We set the compaction_order according to the average fragment index of = all our machines, it's an experience value, it's a compromise of keep mem= ory fragment index low and not trigger background compaction too much, th= is value can be changed in future. > We did periodically memory compaction by command "echo 1 > /proc/sys/vm= /compact_memory " previously, but it's not good enough, it's will compact= all memory forcibly, it may lead to lots of memory move in short time, a= nd affect the performance of application." >=20 >=20 > BR, > Chu Kaiping >=20 > -----=D3=CA=BC=FE=D4=AD=BC=FE----- > =B7=A2=BC=FE=C8=CB: David Rientjes =20 > =B7=A2=CB=CD=CA=B1=BC=E4: 2021=C4=EA4=D4=C226=C8=D5 9:15 > =CA=D5=BC=FE=C8=CB: Chu,Kaiping > =B3=AD=CB=CD: mcgrof@kernel.org; keescook@chromium.org; yzaikin@google.= com; akpm@linux-foundation.org; vbabka@suse.cz; nigupta@nvidia.com; bhe@r= edhat.com; khalid.aziz@oracle.com; iamjoonsoo.kim@lge.com; mateusznosek0@= gmail.com; sh_def@163.com; linux-kernel@vger.kernel.org; linux-fsdevel@vg= er.kernel.org; linux-mm@kvack.org > =D6=F7=CC=E2: Re: [PATCH v3] mm/compaction:let proactive compaction ord= er configurable >=20 > On Sun, 25 Apr 2021, chukaiping wrote: >=20 > > Currently the proactive compaction order is fixed to=20 > > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of=20 > > normal 4KB memory, but it's too high for the machines with small=20 > > normal memory, for example the machines with most memory configured a= s=20 > > 1GB hugetlbfs huge pages. In these machines the max order of free=20 > > pages is often below 9, and it's always below 9 even with hard=20 > > compaction. This will lead to proactive compaction be triggered very=20 > > frequently. In these machines we only care about order of 3 or 4. > > This patch export the oder to proc and let it configurable by user,=20 > > and the default value is still COMPACTION_HPAGE_ORDER. > >=20 >=20 > As asked in the review of the v1 of the patch, why is this not a usersp= ace policy decision? If you are interested in order-3 or order-4 fragmen= tation, for whatever reason, you could periodically check /proc/buddyinfo= and manually invoke compaction on the system. >=20 > In other words, why does this need to live in the kernel? >=20 > > Signed-off-by: chukaiping > > Reported-by: kernel test robot > > --- > >=20 > > Changes in v3: > > - change the min value of compaction_order to 1 because the fragm= entation > > index of order 0 is always 0 > > - move the definition of max_buddy_zone into #ifdef=20 > > CONFIG_COMPACTION > >=20 > > Changes in v2: > > - fix the compile error in ia64 and powerpc, move the initializat= ion > > of sysctl_compaction_order to kcompactd_init because=20 > > COMPACTION_HPAGE_ORDER is a variable in these architectures > > - change the hard coded max order number from 10 to MAX_ORDER - 1 > >=20 > > include/linux/compaction.h | 1 + > > kernel/sysctl.c | 10 ++++++++++ > > mm/compaction.c | 9 ++++++--- > > 3 files changed, 17 insertions(+), 3 deletions(-) > >=20 > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h=20 > > index ed4070e..151ccd1 100644 > > --- a/include/linux/compaction.h > > +++ b/include/linux/compaction.h > > @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned in= t=20 > > order) #ifdef CONFIG_COMPACTION extern int sysctl_compact_memory; =20 > > extern unsigned int sysctl_compaction_proactiveness; > > +extern unsigned int sysctl_compaction_order; > > extern int sysctl_compaction_handler(struct ctl_table *table, int wr= ite, > > void *buffer, size_t *length, loff_t *ppos); extern int=20 > > sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c=20 > > b/kernel/sysctl.c index 62fbd09..e50f7d2 100644 > > --- a/kernel/sysctl.c > > +++ b/kernel/sysctl.c > > @@ -196,6 +196,7 @@ enum sysctl_writes_mode { #endif /*=20 > > CONFIG_SCHED_DEBUG */ > > =20 > > #ifdef CONFIG_COMPACTION > > +static int max_buddy_zone =3D MAX_ORDER - 1; > > static int min_extfrag_threshold; > > static int max_extfrag_threshold =3D 1000; #endif @@ -2871,6 +2872,= 15=20 > > @@ int proc_do_static_key(struct ctl_table *table, int write, > > .extra2 =3D &one_hundred, > > }, > > { > > + .procname =3D "compaction_order", > > + .data =3D &sysctl_compaction_order, > > + .maxlen =3D sizeof(sysctl_compaction_order), > > + .mode =3D 0644, > > + .proc_handler =3D proc_dointvec_minmax, > > + .extra1 =3D SYSCTL_ONE, > > + .extra2 =3D &max_buddy_zone, > > + }, > > + { > > .procname =3D "extfrag_threshold", > > .data =3D &sysctl_extfrag_threshold, > > .maxlen =3D sizeof(int), > > diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..70c0acd= =20 > > 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1925,16 +1925,16 @@ static bool kswapd_is_running(pg_data_t=20 > > *pgdat) > > =20 > > /* > > * A zone's fragmentation score is the external fragmentation wrt to= =20 > > the > > - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100]. > > + * sysctl_compaction_order. It returns a value in the range [0, 100]= . > > */ > > static unsigned int fragmentation_score_zone(struct zone *zone) { > > - return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER); > > + return extfrag_for_order(zone, sysctl_compaction_order); > > } > > =20 > > /* > > * A weighted zone's fragmentation score is the external=20 > > fragmentation > > - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It > > + * wrt to the sysctl_compaction_order scaled by the zone's size. It > > * returns a value in the range [0, 100]. > > * > > * The scaling factor ensures that proactive compaction focuses on=20 > > larger @@ -2666,6 +2666,7 @@ static void compact_nodes(void) > > * background. It takes values in the range [0, 100]. > > */ > > unsigned int __read_mostly sysctl_compaction_proactiveness =3D 20; > > +unsigned int __read_mostly sysctl_compaction_order; > > =20 > > /* > > * This is the entry point for compacting all nodes via @@ -2958,6=20 > > +2959,8 @@ static int __init kcompactd_init(void) > > int nid; > > int ret; > > =20 > > + sysctl_compaction_order =3D COMPACTION_HPAGE_ORDER; > > + > > ret =3D cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, > > "mm/compaction:online", > > kcompactd_cpu_online, NULL); > > -- > > 1.7.1 > >=20 > >=20 >=20 --1482994552-1277834948-1619401703=:2029386--