From: Yu Zhao <yuzhao@google.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
Date: Fri, 7 Jul 2023 20:06:49 -0600 [thread overview]
Message-ID: <CAOUHufbO7CaVm=xjEb1avDhHVvnC8pJmGyKcFf2iY_dpf+zR3w@mail.gmail.com> (raw)
In-Reply-To: <47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com>
On Fri, Jul 7, 2023 at 7:24 AM Aneesh Kumar K V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 7/7/23 1:27 PM, Yu Zhao wrote:
> > On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
> >> multi-gen LRU: support page table walks") on platforms that don't support
> >> hardware atomic updates of access bits.
> >>
> >> Aneesh Kumar K.V (5):
> >> mm/mglru: Create a new helper iterate_mm_list_walk
> >> mm/mglru: Move Bloom filter code around
> >> mm/mglru: Move code around to make future patch easy
> >> mm/mglru: move iterate_mm_list_walk Helper
> >> mm/mglru: Don't build multi-gen LRU page table walk code on
> >> architecture not supported
> >>
> >> arch/Kconfig | 3 +
> >> arch/arm64/Kconfig | 1 +
> >> arch/x86/Kconfig | 1 +
> >> include/linux/memcontrol.h | 2 +-
> >> include/linux/mm_types.h | 10 +-
> >> include/linux/mmzone.h | 12 +-
> >> kernel/fork.c | 2 +-
> >> mm/memcontrol.c | 2 +-
> >> mm/vmscan.c | 955 +++++++++++++++++++------------------
> >> 9 files changed, 528 insertions(+), 460 deletions(-)
> >
> > 1. There is no need for a new Kconfig -- the condition is simply
> > defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
> >
> > 2. The best practice to disable static functions is not by macros but:
> >
> > static int const_cond(void)
> > {
> > return 1;
> > }
> >
> > int main(void)
> > {
> > int a = const_cond();
> >
> > if (a)
> > return 0;
> >
> > /* the compiler doesn't generate code for static funcs below */
> > static_func_1();
> > ...
> > static_func_N();
> >
> > LTO also optimizes external functions. But not everyone uses it. So we
> > still need macros for them, and of course data structures.
> >
> > 3. In 4/5, you have:
> >
> > @@ -461,6 +461,7 @@ enum {
> > struct lru_gen_mm_state {
> > /* set to max_seq after each iteration */
> > unsigned long seq;
> > +#ifdef CONFIG_LRU_TASK_PAGE_AGING
> > /* where the current iteration continues after */
> > struct list_head *head;
> > /* where the last iteration ended before */
> > @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
> > unsigned long *filters[NR_BLOOM_FILTERS];
> > /* the mm stats for debugging */
> > unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
> > +#else
> > + /* protect the seq update above */
> > + /* May be we can use lruvec->lock? */
> > + spinlock_t lock;
> > +#endif
> > };
> >
> > The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
> >
> > I'm attaching a patch that fixes all above. If you want to post it,
> > please feel free -- fully test it please, since I didn't. Otherwise I
> > can ask TJ to help make this work for you.
> >
> > $ git diff --stat
> > include/linux/memcontrol.h | 2 +-
> > include/linux/mm_types.h | 12 +-
> > include/linux/mmzone.h | 2 +
> > kernel/bounds.c | 6 +-
> > kernel/fork.c | 2 +-
> > mm/vmscan.c | 169 +++++++++++++++++++--------
> > 6 files changed, 137 insertions(+), 56 deletions(-)
> >
> > On x86:
> >
> > $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> > add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> > Function old new delta
> > ...
> > should_skip_vma 206 - -206
> > get_pte_pfn 261 - -261
> > lru_gen_add_mm 323 - -323
> > lru_gen_seq_show 1710 1370 -340
> > lru_gen_del_mm 432 - -432
> > reset_batch_size 572 - -572
> > try_to_inc_max_seq 2947 1635 -1312
> > walk_pmd_range_locked 1508 - -1508
> > walk_pud_range 3238 - -3238
> > Total: Before=99449, After=91699, chg -7.79%
> >
> > $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> > 000000000000a350 <try_to_inc_max_seq>:
> > {
> > a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
> > a355: 55 push %rbp
> > a356: 48 89 e5 mov %rsp,%rbp
> > a359: 41 57 push %r15
> > a35b: 41 56 push %r14
> > a35d: 41 55 push %r13
> > a35f: 41 54 push %r12
> > a361: 53 push %rbx
> > a362: 48 83 ec 70 sub $0x70,%rsp
> > a366: 41 89 d4 mov %edx,%r12d
> > a369: 49 89 f6 mov %rsi,%r14
> > a36c: 49 89 ff mov %rdi,%r15
> > spin_lock_irq(&lruvec->lru_lock);
> > a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
> > a373: 48 89 df mov %rbx,%rdi
> > a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
> > success = max_seq == lrugen->max_seq;
> > a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
> > a382: 4c 39 f0 cmp %r14,%rax
>
> For the below diff:
>
> @@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
> struct lru_gen_mm_walk *walk;
> struct mm_struct *mm = NULL;
> struct lru_gen_folio *lrugen = &lruvec->lrugen;
> + struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
>
> VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
>
> + if (!mm_state)
> + return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
> +
> /* see the comment in iterate_mm_list() */
> - if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
> - success = false;
> - goto done;
> - }
> + if (max_seq <= READ_ONCE(mm_state->seq))
> + return false;
>
> /*
> * If the hardware doesn't automatically set the accessed bit, fallback
> @@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
> walk_mm(lruvec, mm, walk);
> } while (mm);
> done:
> - if (success)
> - inc_max_seq(lruvec, can_swap, force_scan);
> + if (success) {
> + success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
> + WARN_ON_ONCE(!success);
> + }
>
> return success;
> }
> @
>
> We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time.
> inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
> we might also need the below?
Yes, you are right.
In fact, there is an existing bug here: even though max_seq can't
change without the patch, min_seq still can, and if it does, the
initial condition for inc_min_seq() to keep looping, i.e., nr_gens at
max, doesn't hold anymore.
for (type = ANON_AND_FILE - 1; type >= 0; type--) {
if (get_nr_gens(lruvec, type) != MAX_NR_GENS)
continue;
This only affects the debugfs interface (force_scan=true), which is
probably why it wasn't never reported.
> modified mm/vmscan.c
> @@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
> int type, zone;
> struct lru_gen_struct *lrugen = &lruvec->lrugen;
>
> +retry:
> spin_lock_irq(&lruvec->lru_lock);
>
> VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
> @@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
> while (!inc_min_seq(lruvec, type, can_swap)) {
> spin_unlock_irq(&lruvec->lru_lock);
> cond_resched();
> - spin_lock_irq(&lruvec->lru_lock);
> + goto retry;
> }
> }
>
> I also found that allowing only one cpu to increment max seq value and making other request
> with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this?
>
>
> + if (max_seq <= READ_ONCE(mm_state->seq))
> + return false;
Yes, but the condition should be "<" -- ">" is a bug, which was
asserted in try_to_in_max_seq().
prev parent reply other threads:[~2023-07-08 2:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-06 6:20 Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
2023-07-07 13:24 ` Aneesh Kumar K V
2023-07-08 2:06 ` Yu Zhao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOUHufbO7CaVm=xjEb1avDhHVvnC8pJmGyKcFf2iY_dpf+zR3w@mail.gmail.com' \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox