From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11F27C4345F for ; Fri, 12 Apr 2024 09:27:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4E1C6B0092; Fri, 12 Apr 2024 05:27:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A25516B0093; Fri, 12 Apr 2024 05:27:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C61A6B0095; Fri, 12 Apr 2024 05:27:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6785A6B0092 for ; Fri, 12 Apr 2024 05:27:46 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 28760C0DC5 for ; Fri, 12 Apr 2024 09:27:46 +0000 (UTC) X-FDA: 82000352532.08.8E0FCBC Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf12.hostedemail.com (Postfix) with ESMTP id 853274000B for ; Fri, 12 Apr 2024 09:27:44 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PZlwCfjd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712914064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UagXId7cRif8x61mxRpgvyeVRGMhUfmv6V6k9eKHP1g=; b=YIJtMUKSyNTCfz6kk+t7tixzaLs3I3JiUUbpuWzjNXxaAfg5CyiTt0YEfzaoxxx6Jggux9 qcjMYywh8Xkfs4vb192tmU/dng7nhS4h68TlKQlexmw/WkLbyo4ZjtHAfGaLiAZmSWqqtN LjS0mtSzLxV7U+6d8Xy4UTddYA2I0Z4= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PZlwCfjd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712914064; a=rsa-sha256; cv=none; b=XUz1iuc6iPVQ4jAF5iMcZ6iXNuVPpND8jV6flFvt71YZ8JlXKotqVNDFpaJalHxtdHDnai +KIMCW3YVml8toCjDnjTNmy07LiTLj1V60JJyt5EaQ8NVmkNalJx0u6uSdhTKcQfPCNzBC 8BwFr4c5aexrueQLnQBrrp2fwlsoNrY= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-7dacc916452so73619241.0 for ; Fri, 12 Apr 2024 02:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712914063; x=1713518863; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UagXId7cRif8x61mxRpgvyeVRGMhUfmv6V6k9eKHP1g=; b=PZlwCfjd1tdIyU9X7CGCidbkXF6e9swEblxcFfHDFajajUDdlZPIeKHjvNTCQT/3k4 0vSeszG0GkcAdlk36sD4UpIPcyq5DN3rqftrUltSd+/ToEUAouwbZaJLAJ7GkDea5gMZ 0hYeu4B4qJzsoYhTnDxSTOc8ZSVKWDw8lJVnOmct7OvdSefKd3ZtXfnsbXGLkQF/PlhE sc7HlQNv4cEN/JMZQsOfxXlG5YDxC+qKkFkqBb3okriq1CXji3S5YV8/vmBHy/U9zZHY vxSpM0uYUvW3NUrQmJTa/X4YyQGPxjPlLeJfjtWzfswBdHT6um3jPr/67P2z+SlDAyyH YA1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712914063; x=1713518863; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UagXId7cRif8x61mxRpgvyeVRGMhUfmv6V6k9eKHP1g=; b=pxNPVgVdyQ/QSkXxxfiv9BtstYnrVJ90WZDytk8HKa//TkoW06RjnVQ4TKMmBl9yOF 43sFlMuukmyw/5TIgXlY3BRu9p4WxCo0qmbE5CKmF/Js/JbuIOboj0Xo16bXJivWClNP QQtXHB0Fj1TdmzPg3PXtiDUXy2H3652OyhkZMX1tNt4sHsU8RSSRMd6SHcyzyWfH0XO4 i1G/wk9aFsTmHMNKpymq61bvyUlDLNq78J7B3ICuwOr8wOx5JN/As1o482RHjaTlelHJ GKdEU+ftDmZ82lGGhT9oOf3D+DmEWzyRmCIqVVGCxHYefetOR4Jo/S23FtOlTYfLnnp9 SQ/w== X-Forwarded-Encrypted: i=1; AJvYcCUdW4YBZ5p2yfqcLyCos8jlJwm6P0gcNIlDBAOK+pu7EamYaiBZFcUHvPUOMSJ/dP1IS58Xr+CHnJ7b97+ACX6if+Q= X-Gm-Message-State: AOJu0Yww4yIel7PCHosvnDWxHrAfIvupnPNuNE0kfe1JvpFaWOeDM7He S5PRf17wVELl/FDDDBmIgMlvLXjrawfaVW71+njQxgj638s60B4/pM4WSKeIN+w8S7PpusHGhQb PvvJYo3jUBK4/gbK37XEIAIoUi4g= X-Google-Smtp-Source: AGHT+IF7h60JamTAFY/4dM0uh2ZgmJrt1mlyvp9s00aVgRzR7HB/GAAhj130tz7KYoMU7HFUIl5TGRKEPl0P6/htdsc= X-Received: by 2002:a05:6122:99e:b0:4db:223b:1c0a with SMTP id g30-20020a056122099e00b004db223b1c0amr2053524vkd.11.1712914063358; Fri, 12 Apr 2024 02:27:43 -0700 (PDT) MIME-Version: 1.0 References: <20240405102704.77559-1-21cnbao@gmail.com> <20240405102704.77559-2-21cnbao@gmail.com> <7cf0a47b-0347-4e81-956f-34bef4ef794a@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 12 Apr 2024 21:27:31 +1200 Message-ID: Subject: Re: [PATCH v4 1/2] mm: add per-order mTHP anon_alloc and anon_alloc_fallback counters To: Ryan Roberts Cc: david@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, cerasuolodomenico@gmail.com, chrisl@kernel.org, kasong@tencent.com, peterx@redhat.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 853274000B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: yq9rc6nctwkupp9pq6aq91fewhtyq34e X-HE-Tag: 1712914064-262235 X-HE-Meta: U2FsdGVkX1/3tfFUMpbYSSuqKXlpZy1zJrmWEWADCynnKL9AIebEhQeTDhHY1wFi4DShpGpXEddL0HoKUPoPG3OliOt2xb0OfodqkiKwgGgldgTmRvEGx0877DidhHidOOk/0lAzhlV4ogbaJsn2yldbexfkdR3GRYzlfkybXkLW+oMUD+uStdI7lq3FolwhcwU/ofFDf3ugA6VXkj/lz70Jv0qc0aIjS+IgcRkqMcNLtTqiiIIeQAuwLxmqV+A1jCI7XeWk7REOP3P1icBy0ACLNODoZHy1xNquMYFIH87J+McT8KNoQ1oOzXqEEQwDs+Kme68MSkSnA+ssPyUNhMjl9PWnU3v141YnC9mnPYKNTYZ7oCRVZc8VRmGbQ6YVZdLiL8BkNSr9GBPa0EXrXzw8PmPdCP8DAf2HfTshodVoWLiX+J2GPeLgCtov+7fWb5Fvyw32PXRfkzY53iYVqDDrpRrKpH1EiyVgcafT899sv80+uUuk+ymuG2a9JgoqgrIFqTM+N7zlH6RQCvBBNTJS7aBjIJtibWsR/jMEB0db30BaYzDjo3GfdEVsJgVf9lOi0AKLDE8hux9ypSBz4wPsvhq2tUSAHksYnLIym8lUFFx9LYX3aZGAmBBH2Pc7nWl2FG7Mk4pNDx6aNYd/LEvkU7/5fN/LIsPSImDecREeAzyJjFHCQTFmAkz06UlobmdddygriDR1qAcqclhO2g7aTIjfJLCn58G+vkRrzTZL48PPJxYE0tzPBE9dwWXCfbUA5h/W1FYgAckUznN6KUXdu8WYI3upwIhjW0NXc+4pR9qEQce9n/oFA852P9wPU6udHd10OzB2gLDPpN7AWFwozDPokmfR5csYufgU1k6oBxKVFm9jZ4PLZs7XQYZEbIEox3yMlE7j+Rt+7A76P+NDYFPeb0rHce2V+CQepXg2NLNJMxWk6XQLJrw0avLUULIUspT6LmP79xXP5SX 0PBd6h3U qRZRBcTujU9GuDsfNrBgCp5YJXovDUxv6CI4DNpNib9HWoo05SnuLAWm67x/76IQzU3iooNcBZsyCuIflhXQ8mqHdgQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 12, 2024 at 9:16=E2=80=AFPM Ryan Roberts = wrote: > > On 11/04/2024 23:40, Barry Song wrote: > > On Fri, Apr 12, 2024 at 4:38=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 05/04/2024 11:27, Barry Song wrote: > >>> From: Barry Song > >>> > >>> Profiling a system blindly with mTHP has become challenging due to th= e > >>> lack of visibility into its operations. Presenting the success rate o= f > >>> mTHP allocations appears to be pressing need. > >>> > >>> Recently, I've been experiencing significant difficulty debugging > >>> performance improvements and regressions without these figures. > >>> It's crucial for us to understand the true effectiveness of mTHP in > >>> real-world scenarios, especially in systems with fragmented memory. > >>> > >>> This patch sets up the framework for per-order mTHP counters, startin= g > >>> with the introduction of anon_alloc and anon_alloc_fallback counters. > >>> Incorporating additional counters should now be straightforward as we= ll. > >>> > >>> Signed-off-by: Barry Song > >>> --- > >>> include/linux/huge_mm.h | 19 ++++++++++++++++ > >>> mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++= ++ > >>> mm/memory.c | 2 ++ > >>> 3 files changed, 69 insertions(+) > >>> > >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>> index e896ca4760f6..c5d33017a4dd 100644 > >>> --- a/include/linux/huge_mm.h > >>> +++ b/include/linux/huge_mm.h > >>> @@ -264,6 +264,25 @@ unsigned long thp_vma_allowable_orders(struct vm= _area_struct *vma, > >>> enforce_sysfs, orders); > >>> } > >>> > >>> +enum mthp_stat_item { > >>> + MTHP_STAT_ANON_ALLOC, > >>> + MTHP_STAT_ANON_ALLOC_FALLBACK, > >>> + __MTHP_STAT_COUNT > >>> +}; > >>> + > >>> +struct mthp_stat { > >>> + unsigned long stats[PMD_ORDER + 1][__MTHP_STAT_COUNT]; > >> > >> I saw a fix for this allocation dynamically due to powerpc PMD_ORDER n= ot being > >> constant. I wonder if ilog2(MAX_PTRS_PER_PTE) would help here? > >> > > > > It's a possibility. However, since we've passed all the build tests > > using dynamic > > allocation, it might not be worth the effort to attempt static > > allocation again. Who > > knows what will happen next :-) > > If the dynamic version is clear and obvious then fair enough. I tried doi= ng > something similar for the swap-out series but it turned out a mess, so en= ded up > falling back to static allocation which was much easier to understand. mthp_stats doesn't need to be dynamically released, as long as hugepage_ini= t() succeeds, we need it forever even if sysfs is released or disabled. > > > > >>> +}; > >>> + > >>> +DECLARE_PER_CPU(struct mthp_stat, mthp_stats); > >>> + > >>> +static inline void count_mthp_stat(int order, enum mthp_stat_item it= em) > >> > >> I thought we were going to call this always counting up type of stat a= nd event? > >> "count_mthp_event"? But I'm happy with it as is, personally. > >> > >>> +{ > >>> + if (unlikely(order > PMD_ORDER)) > >>> + return; > >> > >> I'm wondering if it also makes sense to ignore order =3D=3D 0? Althoug= h I guess if > >> called for order-0 its safe since the storage exists and sum_mthp_stat= () is > >> never be called for 0. Ignore this comment :) > > > > Agreed. I'd like to change it to ignore oder 0; > > > >> > >>> + this_cpu_inc(mthp_stats.stats[order][item]); > >>> +} > >>> + > >>> #define transparent_hugepage_use_zero_page() = \ > >>> (transparent_hugepage_flags & = \ > >>> (1< >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index 9d4b2fbf6872..5b875f0fc923 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype =3D = { > >>> .sysfs_ops =3D &kobj_sysfs_ops, > >>> }; > >>> > >>> +DEFINE_PER_CPU(struct mthp_stat, mthp_stats) =3D {{{0}}}; > >>> + > >>> +static unsigned long sum_mthp_stat(int order, enum mthp_stat_item it= em) > >>> +{ > >>> + unsigned long sum =3D 0; > >>> + int cpu; > >>> + > >>> + for_each_online_cpu(cpu) { > >> > >> What happens if a cpu that was online and collected a bunch of stats g= ets > >> offlined? The user will see stats get smaller? > >> > >> Perhaps this should be for_each_possible_cpu()? Although I'm not sure = what > >> happens to percpu data when a cpu goes offline? Is the data preserved?= Or wiped, > >> or unmapped? dunno. Might we need to rescue stats into a global counte= r at > >> offline-time? > > > > Good catch. I see /proc/vmstat is always using the for_each_online_cpu= () but it > > doesn't have the issue, but mTHP counters do have the problem. > > > > * step 1: cat the current thp_swpout value before running a test > > program which does > > swpout; > > > > / # cat /proc/vmstat | grep thp_swpout > > thp_swpout 0 > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/anon= _swpout > > 0 > > > > * step 2: run the test program on cpu2; > > > > / # taskset -c 2 /home/barry/develop/linux/swpcache-2m > > > > * step 3: cat the current thp_swpout value after running a test > > program which does > > swpout; > > > > / # cat /proc/vmstat | grep thp_swpout > > thp_swpout 98 > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/anon= _swpout > > 98 > > > > *step 4: offline cpu2 and read thp_swpout; > > > > / # echo 0 > /sys/devices/system/cpu/cpu2/online > > [ 339.058661] psci: CPU2 killed (polled 0 ms) > > > > / # cat /proc/vmstat | grep thp_swpout > > thp_swpout 98 > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/anon= _swpout > > 0 > > > > *step 5: online cpu2 and read thp_swpout > > > > / # echo 1 > /sys/devices/system/cpu/cpu2/online > > [ 791.642058] CPU2: Booted secondary processor 0x0000000002 [0x000f051= 0] > > > > / # cat /proc/vmstat | grep thp_swpout > > thp_swpout 98 > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/anon= _swpout > > 98 > > > > > > > > As you can see, in step 4, /proc/vmstat is all good but mTHP counters b= ecome > > zero. > > > > The reason is /proc/vmstat will fold the offline cpu to an online cpu > > but mthp counters lack > > it: > > > > /* > > * Fold the foreign cpu events into our own. > > * > > * This is adding to the events on one processor > > * but keeps the global counts constant. > > */ > > void vm_events_fold_cpu(int cpu) > > { > > struct vm_event_state *fold_state =3D &per_cpu(vm_event_states,= cpu); > > int i; > > > > for (i =3D 0; i < NR_VM_EVENT_ITEMS; i++) { > > count_vm_events(i, fold_state->event[i]); > > fold_state->event[i] =3D 0; > > } > > } > > > > static int page_alloc_cpu_dead(unsigned int cpu) > > { > > ... > > /* > > * Spill the event counters of the dead processor > > * into the current processors event counters. > > * This artificially elevates the count of the current > > * processor. > > */ > > vm_events_fold_cpu(cpu); > > ... > > > > return 0; > > } > > > > So I will do the same thing for mTHP counters - fold offline cpu > > counters to online one. > > That all looks like a complete mess - better avoided if possible! A quick= search > for "for_each_possible_cpu" shows loads of places where code is iterating= over > all *possible* cpus and grabbing its per-cpu data. So the data definitely > remains accessible when the cpu is offline. Looks like it doesn't get wip= ed either. > > So can't you just change your sum function to iterate over all possible c= pus?(). I don't find why not. but i sent a v5[1] similar to vm_events_fold_cpu. i can move to for_each_possible_cpu() in v6. if it is a better approach, i can even further refine that for vmstat. [1] https://lore.kernel.org/linux-mm/20240412073740.294272-2-21cnbao@gmail.= com/ > > > > >> > >>> + struct mthp_stat *this =3D &per_cpu(mthp_stats, cpu); > >>> + > >>> + sum +=3D this->stats[order][item]; > >>> + } > >>> + > >>> + return sum; > >>> +} > >>> + > >>> +#define DEFINE_MTHP_STAT_ATTR(_name, _index) = \ > >>> +static ssize_t _name##_show(struct kobject *kobj, = \ > >>> + struct kobj_attribute *attr, char *buf) = \ > >>> +{ = \ > >>> + int order =3D to_thpsize(kobj)->order; = \ > >>> + = \ > >>> + return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index)); = \ > >>> +} = \ > >>> +static struct kobj_attribute _name##_attr =3D __ATTR_RO(_name) > >> > >> Very nice! > > > > Right. I got duplicated copy-paste and bad small in code so I wrote thi= s macro. > > > >> > >>> + > >>> +DEFINE_MTHP_STAT_ATTR(anon_alloc, MTHP_STAT_ANON_ALLOC); > >>> +DEFINE_MTHP_STAT_ATTR(anon_alloc_fallback, MTHP_STAT_ANON_ALLOC_FALL= BACK); > >>> + > >>> +static struct attribute *stats_attrs[] =3D { > >>> + &anon_alloc_attr.attr, > >>> + &anon_alloc_fallback_attr.attr, > >>> + NULL, > >>> +}; > >>> + > >>> +static struct attribute_group stats_attr_group =3D { > >>> + .name =3D "stats", > >>> + .attrs =3D stats_attrs, > >>> +}; > >>> + > >>> static struct thpsize *thpsize_create(int order, struct kobject *par= ent) > >>> { > >>> unsigned long size =3D (PAGE_SIZE << order) / SZ_1K; > >>> @@ -549,6 +589,12 @@ static struct thpsize *thpsize_create(int order,= struct kobject *parent) > >>> return ERR_PTR(ret); > >>> } > >>> > >>> + ret =3D sysfs_create_group(&thpsize->kobj, &stats_attr_group); > >>> + if (ret) { > >>> + kobject_put(&thpsize->kobj); > >>> + return ERR_PTR(ret); > >>> + } > >>> + > >>> thpsize->order =3D order; > >>> return thpsize; > >>> } > >>> @@ -1050,8 +1096,10 @@ vm_fault_t do_huge_pmd_anonymous_page(struct v= m_fault *vmf) > >>> folio =3D vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, tru= e); > >>> if (unlikely(!folio)) { > >>> count_vm_event(THP_FAULT_FALLBACK); > >>> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_ALLOC_F= ALLBACK); > >> > >> I think we should aim for the PMD-oder MTHP_STAT_ANON_ALLOC and > >> MTHP_STAT_ANON_ALLOC_FALLBACK to match THP_FAULT_ALLOC and THP_FAULT_F= ALLBACK. > >> Its not currently setup this way... > > > > right. I also realized this and asked for your comments on this in anot= her > > thread. > > Ahh sorry - must have missed that. > > > > >> > >> > >>> return VM_FAULT_FALLBACK; > >>> } > >>> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_ALLOC); > >>> return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp); > >>> } > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index 649e3ed94487..1723c8ddf9cb 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -4374,8 +4374,10 @@ static struct folio *alloc_anon_folio(struct v= m_fault *vmf) > >>> } > >>> folio_throttle_swaprate(folio, gfp); > >>> clear_huge_page(&folio->page, vmf->address, 1 <= < order); > >>> + count_mthp_stat(order, MTHP_STAT_ANON_ALLOC); > >>> return folio; > >>> } > >>> + count_mthp_stat(order, MTHP_STAT_ANON_ALLOC_FALLBACK); > >> > >> ...And we should follow the usage same pattern for the smaller mTHP he= re too. > >> Which means MTHP_STAT_ANON_ALLOC_FALLBACK should be after the next: la= bel. We > > > > The only difference is the case > > > > if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) > > goto next; > > > > but vmstat is counting this as fallback so i feel good to move after ne= xt, > > > > if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { > > folio_put(folio); > > count_vm_event(THP_FAULT_FALLBACK); > > count_vm_event(THP_FAULT_FALLBACK_CHARGE); > > return VM_FAULT_FALLBACK; > > } > > > >> could introduce a MTHP_STAT_ANON_ALLOC_FALLBACK_CHARGE which would onl= y trigger > >> on a fallback due to charge failure, just like THP_FAULT_FALLBACK_CHAR= GE? > > > > it is fine to add this THP_FAULT_FALLBACK_CHARGE though it is not that > > useful for profiling buddy fragmentation. > > Well I thought you were interested in isolating fallback due to fragmenta= tion > only. You would get that with (FAULT_FALLBACK - FAULT_FALLBACK_CHARGE)? B= ut if > you think the latter will be relatively small/unimportant for now, and th= erefore > FAULT_FALLBACK will give good enough approximation on its own, then I'm h= appy > not to add FAULT_FALLBACK_CHARGE for now. in v5, i actually added FAULT_FALLBACK_CHARGE. Please take a look at v5 :-) > > > > >> > >>> next: > >>> order =3D next_order(&orders, order); > >>> } > >> > > > > Thanks > > Barry >