From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
Date: Fri, 7 Jul 2023 18:54:43 +0530 [thread overview]
Message-ID: <47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com> (raw)
In-Reply-To: <CAOUHufY-g7ZXymsJ53L9KCqD3nLoG0qrCpF884Tbo9snw-aSYQ@mail.gmail.com>
On 7/7/23 1:27 PM, Yu Zhao wrote:
> On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
>> multi-gen LRU: support page table walks") on platforms that don't support
>> hardware atomic updates of access bits.
>>
>> Aneesh Kumar K.V (5):
>> mm/mglru: Create a new helper iterate_mm_list_walk
>> mm/mglru: Move Bloom filter code around
>> mm/mglru: Move code around to make future patch easy
>> mm/mglru: move iterate_mm_list_walk Helper
>> mm/mglru: Don't build multi-gen LRU page table walk code on
>> architecture not supported
>>
>> arch/Kconfig | 3 +
>> arch/arm64/Kconfig | 1 +
>> arch/x86/Kconfig | 1 +
>> include/linux/memcontrol.h | 2 +-
>> include/linux/mm_types.h | 10 +-
>> include/linux/mmzone.h | 12 +-
>> kernel/fork.c | 2 +-
>> mm/memcontrol.c | 2 +-
>> mm/vmscan.c | 955 +++++++++++++++++++------------------
>> 9 files changed, 528 insertions(+), 460 deletions(-)
>
> 1. There is no need for a new Kconfig -- the condition is simply
> defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
>
> 2. The best practice to disable static functions is not by macros but:
>
> static int const_cond(void)
> {
> return 1;
> }
>
> int main(void)
> {
> int a = const_cond();
>
> if (a)
> return 0;
>
> /* the compiler doesn't generate code for static funcs below */
> static_func_1();
> ...
> static_func_N();
>
> LTO also optimizes external functions. But not everyone uses it. So we
> still need macros for them, and of course data structures.
>
> 3. In 4/5, you have:
>
> @@ -461,6 +461,7 @@ enum {
> struct lru_gen_mm_state {
> /* set to max_seq after each iteration */
> unsigned long seq;
> +#ifdef CONFIG_LRU_TASK_PAGE_AGING
> /* where the current iteration continues after */
> struct list_head *head;
> /* where the last iteration ended before */
> @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
> unsigned long *filters[NR_BLOOM_FILTERS];
> /* the mm stats for debugging */
> unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
> +#else
> + /* protect the seq update above */
> + /* May be we can use lruvec->lock? */
> + spinlock_t lock;
> +#endif
> };
>
> The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
>
> I'm attaching a patch that fixes all above. If you want to post it,
> please feel free -- fully test it please, since I didn't. Otherwise I
> can ask TJ to help make this work for you.
>
> $ git diff --stat
> include/linux/memcontrol.h | 2 +-
> include/linux/mm_types.h | 12 +-
> include/linux/mmzone.h | 2 +
> kernel/bounds.c | 6 +-
> kernel/fork.c | 2 +-
> mm/vmscan.c | 169 +++++++++++++++++++--------
> 6 files changed, 137 insertions(+), 56 deletions(-)
>
> On x86:
>
> $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> Function old new delta
> ...
> should_skip_vma 206 - -206
> get_pte_pfn 261 - -261
> lru_gen_add_mm 323 - -323
> lru_gen_seq_show 1710 1370 -340
> lru_gen_del_mm 432 - -432
> reset_batch_size 572 - -572
> try_to_inc_max_seq 2947 1635 -1312
> walk_pmd_range_locked 1508 - -1508
> walk_pud_range 3238 - -3238
> Total: Before=99449, After=91699, chg -7.79%
>
> $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> 000000000000a350 <try_to_inc_max_seq>:
> {
> a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
> a355: 55 push %rbp
> a356: 48 89 e5 mov %rsp,%rbp
> a359: 41 57 push %r15
> a35b: 41 56 push %r14
> a35d: 41 55 push %r13
> a35f: 41 54 push %r12
> a361: 53 push %rbx
> a362: 48 83 ec 70 sub $0x70,%rsp
> a366: 41 89 d4 mov %edx,%r12d
> a369: 49 89 f6 mov %rsi,%r14
> a36c: 49 89 ff mov %rdi,%r15
> spin_lock_irq(&lruvec->lru_lock);
> a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
> a373: 48 89 df mov %rbx,%rdi
> a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
> success = max_seq == lrugen->max_seq;
> a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
> a382: 4c 39 f0 cmp %r14,%rax
For the below diff:
@@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+ if (!mm_state)
+ return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+
/* see the comment in iterate_mm_list() */
- if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
- success = false;
- goto done;
- }
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
/*
* If the hardware doesn't automatically set the accessed bit, fallback
@@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
walk_mm(lruvec, mm, walk);
} while (mm);
done:
- if (success)
- inc_max_seq(lruvec, can_swap, force_scan);
+ if (success) {
+ success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+ WARN_ON_ONCE(!success);
+ }
return success;
}
@
We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time.
inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
we might also need the below?
modified mm/vmscan.c
@@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
int type, zone;
struct lru_gen_struct *lrugen = &lruvec->lrugen;
+retry:
spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
- spin_lock_irq(&lruvec->lru_lock);
+ goto retry;
}
}
I also found that allowing only one cpu to increment max seq value and making other request
with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this?
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
next prev parent reply other threads:[~2023-07-07 13:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-06 6:20 Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
2023-07-07 13:24 ` Aneesh Kumar K V [this message]
2023-07-08 2:06 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox