* [PATCH v2 0/5] Avoid building lrugen page table walk code
@ 2023-07-06 6:20 Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
This patchset avoids building changes added by commit bd74fdaea146 ("mm:
multi-gen LRU: support page table walks") on platforms that don't support
hardware atomic updates of access bits.
Aneesh Kumar K.V (5):
mm/mglru: Create a new helper iterate_mm_list_walk
mm/mglru: Move Bloom filter code around
mm/mglru: Move code around to make future patch easy
mm/mglru: move iterate_mm_list_walk Helper
mm/mglru: Don't build multi-gen LRU page table walk code on
architecture not supported
arch/Kconfig | 3 +
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
include/linux/memcontrol.h | 2 +-
include/linux/mm_types.h | 10 +-
include/linux/mmzone.h | 12 +-
kernel/fork.c | 2 +-
mm/memcontrol.c | 2 +-
mm/vmscan.c | 955 +++++++++++++++++++------------------
9 files changed, 528 insertions(+), 460 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
@ 2023-07-06 6:20 ` Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
In a later patch we will not build this on ppc64 architecture.
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/vmscan.c | 52 ++++++++++++++++++++++++++++++----------------------
1 file changed, 30 insertions(+), 22 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb23bb1afc64..3b183f704d5d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4491,12 +4491,37 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
spin_unlock_irq(&lruvec->lru_lock);
}
+static bool iterate_mm_list_walk(struct lruvec *lruvec, unsigned long max_seq,
+ bool can_swap, bool force_scan)
+{
+ bool success;
+ struct mm_struct *mm = NULL;
+ struct lru_gen_mm_walk *walk;
+
+ walk = set_mm_walk(NULL, true);
+ if (!walk) {
+ success = iterate_mm_list_nowalk(lruvec, max_seq);
+ return success;
+ }
+
+ walk->lruvec = lruvec;
+ walk->max_seq = max_seq;
+ walk->can_swap = can_swap;
+ walk->force_scan = force_scan;
+
+ do {
+ success = iterate_mm_list(lruvec, walk, &mm);
+ if (mm)
+ walk_mm(lruvec, mm, walk);
+ } while (mm);
+
+ return success;
+}
+
static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct scan_control *sc, bool can_swap, bool force_scan)
{
bool success;
- struct lru_gen_mm_walk *walk;
- struct mm_struct *mm = NULL;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
@@ -4506,34 +4531,17 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
success = false;
goto done;
}
-
/*
* If the hardware doesn't automatically set the accessed bit, fallback
* to lru_gen_look_around(), which only clears the accessed bit in a
* handful of PTEs. Spreading the work out over a period of time usually
* is less efficient, but it avoids bursty page faults.
*/
- if (!should_walk_mmu()) {
- success = iterate_mm_list_nowalk(lruvec, max_seq);
- goto done;
- }
-
- walk = set_mm_walk(NULL, true);
- if (!walk) {
+ if (!should_walk_mmu())
success = iterate_mm_list_nowalk(lruvec, max_seq);
- goto done;
- }
-
- walk->lruvec = lruvec;
- walk->max_seq = max_seq;
- walk->can_swap = can_swap;
- walk->force_scan = force_scan;
+ else
+ success = iterate_mm_list_walk(lruvec, max_seq, can_swap, force_scan);
- do {
- success = iterate_mm_list(lruvec, walk, &mm);
- if (mm)
- walk_mm(lruvec, mm, walk);
- } while (mm);
done:
if (success)
inc_max_seq(lruvec, can_swap, force_scan);
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/5] mm/mglru: Move Bloom filter code around
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
@ 2023-07-06 6:20 ` Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
This allows to avoid building this code on powerpc architecture.
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/vmscan.c | 462 ++++++++++++++++++++++++++--------------------------
1 file changed, 231 insertions(+), 231 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3b183f704d5d..c5fbc3babcd8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3325,6 +3325,237 @@ static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
}
+/******************************************************************************
+ * PID controller
+ ******************************************************************************/
+
+/*
+ * A feedback loop based on Proportional-Integral-Derivative (PID) controller.
+ *
+ * The P term is refaulted/(evicted+protected) from a tier in the generation
+ * currently being evicted; the I term is the exponential moving average of the
+ * P term over the generations previously evicted, using the smoothing factor
+ * 1/2; the D term isn't supported.
+ *
+ * The setpoint (SP) is always the first tier of one type; the process variable
+ * (PV) is either any tier of the other type or any other tier of the same
+ * type.
+ *
+ * The error is the difference between the SP and the PV; the correction is to
+ * turn off protection when SP>PV or turn on protection when SP<PV.
+ *
+ * For future optimizations:
+ * 1. The D term may discount the other two terms over time so that long-lived
+ * generations can resist stale information.
+ */
+struct ctrl_pos {
+ unsigned long refaulted;
+ unsigned long total;
+ int gain;
+};
+
+static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
+ struct ctrl_pos *pos)
+{
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ int hist = lru_hist_from_seq(lrugen->min_seq[type]);
+
+ pos->refaulted = lrugen->avg_refaulted[type][tier] +
+ atomic_long_read(&lrugen->refaulted[hist][type][tier]);
+ pos->total = lrugen->avg_total[type][tier] +
+ atomic_long_read(&lrugen->evicted[hist][type][tier]);
+ if (tier)
+ pos->total += lrugen->protected[hist][type][tier - 1];
+ pos->gain = gain;
+}
+
+static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
+{
+ int hist, tier;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ bool clear = carryover ? NR_HIST_GENS == 1 : NR_HIST_GENS > 1;
+ unsigned long seq = carryover ? lrugen->min_seq[type] : lrugen->max_seq + 1;
+
+ lockdep_assert_held(&lruvec->lru_lock);
+
+ if (!carryover && !clear)
+ return;
+
+ hist = lru_hist_from_seq(seq);
+
+ for (tier = 0; tier < MAX_NR_TIERS; tier++) {
+ if (carryover) {
+ unsigned long sum;
+
+ sum = lrugen->avg_refaulted[type][tier] +
+ atomic_long_read(&lrugen->refaulted[hist][type][tier]);
+ WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2);
+
+ sum = lrugen->avg_total[type][tier] +
+ atomic_long_read(&lrugen->evicted[hist][type][tier]);
+ if (tier)
+ sum += lrugen->protected[hist][type][tier - 1];
+ WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2);
+ }
+
+ if (clear) {
+ atomic_long_set(&lrugen->refaulted[hist][type][tier], 0);
+ atomic_long_set(&lrugen->evicted[hist][type][tier], 0);
+ if (tier)
+ WRITE_ONCE(lrugen->protected[hist][type][tier - 1], 0);
+ }
+ }
+}
+
+static bool positive_ctrl_err(struct ctrl_pos *sp, struct ctrl_pos *pv)
+{
+ /*
+ * Return true if the PV has a limited number of refaults or a lower
+ * refaulted/total than the SP.
+ */
+ return pv->refaulted < MIN_LRU_BATCH ||
+ pv->refaulted * (sp->total + MIN_LRU_BATCH) * sp->gain <=
+ (sp->refaulted + 1) * pv->total * pv->gain;
+}
+
+/******************************************************************************
+ * the aging
+ ******************************************************************************/
+
+/* protect pages accessed multiple times through file descriptors */
+static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
+{
+ int type = folio_is_file_lru(folio);
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
+ unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
+
+ VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio);
+
+ do {
+ new_gen = ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
+ /* folio_update_gen() has promoted this page? */
+ if (new_gen >= 0 && new_gen != old_gen)
+ return new_gen;
+
+ new_gen = (old_gen + 1) % MAX_NR_GENS;
+
+ new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_MASK | LRU_REFS_FLAGS);
+ new_flags |= (new_gen + 1UL) << LRU_GEN_PGOFF;
+ /* for folio_end_writeback() */
+ if (reclaiming)
+ new_flags |= BIT(PG_reclaim);
+ } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
+
+ lru_gen_update_size(lruvec, folio, old_gen, new_gen);
+
+ return new_gen;
+}
+
+static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr)
+{
+ unsigned long pfn = pte_pfn(pte);
+
+ VM_WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end);
+
+ if (!pte_present(pte) || is_zero_pfn(pfn))
+ return -1;
+
+ if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
+ return -1;
+
+ if (WARN_ON_ONCE(!pfn_valid(pfn)))
+ return -1;
+
+ return pfn;
+}
+
+static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
+ struct pglist_data *pgdat, bool can_swap)
+{
+ struct folio *folio;
+
+ /* try to avoid unnecessary memory loads */
+ if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
+ return NULL;
+
+ folio = pfn_folio(pfn);
+ if (folio_nid(folio) != pgdat->node_id)
+ return NULL;
+
+ if (folio_memcg_rcu(folio) != memcg)
+ return NULL;
+
+ /* file VMAs can contain anon pages from COW */
+ if (!folio_is_file_lru(folio) && !can_swap)
+ return NULL;
+
+ return folio;
+}
+
+/* promote pages accessed through page tables */
+static int folio_update_gen(struct folio *folio, int gen)
+{
+ unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
+
+ VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
+ VM_WARN_ON_ONCE(!rcu_read_lock_held());
+
+ do {
+ /* lru_gen_del_folio() has isolated this page? */
+ if (!(old_flags & LRU_GEN_MASK)) {
+ /* for shrink_folio_list() */
+ new_flags = old_flags | BIT(PG_referenced);
+ continue;
+ }
+
+ new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_MASK | LRU_REFS_FLAGS);
+ new_flags |= (gen + 1UL) << LRU_GEN_PGOFF;
+ } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
+
+ return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
+}
+
+static void update_batch_size(struct lru_gen_mm_walk *walk, struct folio *folio,
+ int old_gen, int new_gen)
+{
+ int type = folio_is_file_lru(folio);
+ int zone = folio_zonenum(folio);
+ int delta = folio_nr_pages(folio);
+
+ VM_WARN_ON_ONCE(old_gen >= MAX_NR_GENS);
+ VM_WARN_ON_ONCE(new_gen >= MAX_NR_GENS);
+
+ walk->batched++;
+
+ walk->nr_pages[old_gen][type][zone] -= delta;
+ walk->nr_pages[new_gen][type][zone] += delta;
+}
+
+static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk)
+{
+ int gen, type, zone;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+
+ walk->batched = 0;
+
+ for_each_gen_type_zone(gen, type, zone) {
+ enum lru_list lru = type * LRU_INACTIVE_FILE;
+ int delta = walk->nr_pages[gen][type][zone];
+
+ if (!delta)
+ continue;
+
+ walk->nr_pages[gen][type][zone] = 0;
+ WRITE_ONCE(lrugen->nr_pages[gen][type][zone],
+ lrugen->nr_pages[gen][type][zone] + delta);
+
+ if (lru_gen_is_active(lruvec, gen))
+ lru += LRU_ACTIVE;
+ __update_lru_size(lruvec, lru, zone, delta);
+ }
+}
+
/******************************************************************************
* Bloom filters
******************************************************************************/
@@ -3672,237 +3903,6 @@ static bool iterate_mm_list_nowalk(struct lruvec *lruvec, unsigned long max_seq)
return success;
}
-/******************************************************************************
- * PID controller
- ******************************************************************************/
-
-/*
- * A feedback loop based on Proportional-Integral-Derivative (PID) controller.
- *
- * The P term is refaulted/(evicted+protected) from a tier in the generation
- * currently being evicted; the I term is the exponential moving average of the
- * P term over the generations previously evicted, using the smoothing factor
- * 1/2; the D term isn't supported.
- *
- * The setpoint (SP) is always the first tier of one type; the process variable
- * (PV) is either any tier of the other type or any other tier of the same
- * type.
- *
- * The error is the difference between the SP and the PV; the correction is to
- * turn off protection when SP>PV or turn on protection when SP<PV.
- *
- * For future optimizations:
- * 1. The D term may discount the other two terms over time so that long-lived
- * generations can resist stale information.
- */
-struct ctrl_pos {
- unsigned long refaulted;
- unsigned long total;
- int gain;
-};
-
-static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
- struct ctrl_pos *pos)
-{
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
- int hist = lru_hist_from_seq(lrugen->min_seq[type]);
-
- pos->refaulted = lrugen->avg_refaulted[type][tier] +
- atomic_long_read(&lrugen->refaulted[hist][type][tier]);
- pos->total = lrugen->avg_total[type][tier] +
- atomic_long_read(&lrugen->evicted[hist][type][tier]);
- if (tier)
- pos->total += lrugen->protected[hist][type][tier - 1];
- pos->gain = gain;
-}
-
-static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
-{
- int hist, tier;
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
- bool clear = carryover ? NR_HIST_GENS == 1 : NR_HIST_GENS > 1;
- unsigned long seq = carryover ? lrugen->min_seq[type] : lrugen->max_seq + 1;
-
- lockdep_assert_held(&lruvec->lru_lock);
-
- if (!carryover && !clear)
- return;
-
- hist = lru_hist_from_seq(seq);
-
- for (tier = 0; tier < MAX_NR_TIERS; tier++) {
- if (carryover) {
- unsigned long sum;
-
- sum = lrugen->avg_refaulted[type][tier] +
- atomic_long_read(&lrugen->refaulted[hist][type][tier]);
- WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2);
-
- sum = lrugen->avg_total[type][tier] +
- atomic_long_read(&lrugen->evicted[hist][type][tier]);
- if (tier)
- sum += lrugen->protected[hist][type][tier - 1];
- WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2);
- }
-
- if (clear) {
- atomic_long_set(&lrugen->refaulted[hist][type][tier], 0);
- atomic_long_set(&lrugen->evicted[hist][type][tier], 0);
- if (tier)
- WRITE_ONCE(lrugen->protected[hist][type][tier - 1], 0);
- }
- }
-}
-
-static bool positive_ctrl_err(struct ctrl_pos *sp, struct ctrl_pos *pv)
-{
- /*
- * Return true if the PV has a limited number of refaults or a lower
- * refaulted/total than the SP.
- */
- return pv->refaulted < MIN_LRU_BATCH ||
- pv->refaulted * (sp->total + MIN_LRU_BATCH) * sp->gain <=
- (sp->refaulted + 1) * pv->total * pv->gain;
-}
-
-/******************************************************************************
- * the aging
- ******************************************************************************/
-
-/* protect pages accessed multiple times through file descriptors */
-static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
-{
- int type = folio_is_file_lru(folio);
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
- int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
- unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
-
- VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio);
-
- do {
- new_gen = ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
- /* folio_update_gen() has promoted this page? */
- if (new_gen >= 0 && new_gen != old_gen)
- return new_gen;
-
- new_gen = (old_gen + 1) % MAX_NR_GENS;
-
- new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_MASK | LRU_REFS_FLAGS);
- new_flags |= (new_gen + 1UL) << LRU_GEN_PGOFF;
- /* for folio_end_writeback() */
- if (reclaiming)
- new_flags |= BIT(PG_reclaim);
- } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
-
- lru_gen_update_size(lruvec, folio, old_gen, new_gen);
-
- return new_gen;
-}
-
-static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr)
-{
- unsigned long pfn = pte_pfn(pte);
-
- VM_WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end);
-
- if (!pte_present(pte) || is_zero_pfn(pfn))
- return -1;
-
- if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
- return -1;
-
- if (WARN_ON_ONCE(!pfn_valid(pfn)))
- return -1;
-
- return pfn;
-}
-
-static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
- struct pglist_data *pgdat, bool can_swap)
-{
- struct folio *folio;
-
- /* try to avoid unnecessary memory loads */
- if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
- return NULL;
-
- folio = pfn_folio(pfn);
- if (folio_nid(folio) != pgdat->node_id)
- return NULL;
-
- if (folio_memcg_rcu(folio) != memcg)
- return NULL;
-
- /* file VMAs can contain anon pages from COW */
- if (!folio_is_file_lru(folio) && !can_swap)
- return NULL;
-
- return folio;
-}
-
-/* promote pages accessed through page tables */
-static int folio_update_gen(struct folio *folio, int gen)
-{
- unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
-
- VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
- VM_WARN_ON_ONCE(!rcu_read_lock_held());
-
- do {
- /* lru_gen_del_folio() has isolated this page? */
- if (!(old_flags & LRU_GEN_MASK)) {
- /* for shrink_folio_list() */
- new_flags = old_flags | BIT(PG_referenced);
- continue;
- }
-
- new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_MASK | LRU_REFS_FLAGS);
- new_flags |= (gen + 1UL) << LRU_GEN_PGOFF;
- } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
-
- return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
-}
-
-static void update_batch_size(struct lru_gen_mm_walk *walk, struct folio *folio,
- int old_gen, int new_gen)
-{
- int type = folio_is_file_lru(folio);
- int zone = folio_zonenum(folio);
- int delta = folio_nr_pages(folio);
-
- VM_WARN_ON_ONCE(old_gen >= MAX_NR_GENS);
- VM_WARN_ON_ONCE(new_gen >= MAX_NR_GENS);
-
- walk->batched++;
-
- walk->nr_pages[old_gen][type][zone] -= delta;
- walk->nr_pages[new_gen][type][zone] += delta;
-}
-
-static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk)
-{
- int gen, type, zone;
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
-
- walk->batched = 0;
-
- for_each_gen_type_zone(gen, type, zone) {
- enum lru_list lru = type * LRU_INACTIVE_FILE;
- int delta = walk->nr_pages[gen][type][zone];
-
- if (!delta)
- continue;
-
- walk->nr_pages[gen][type][zone] = 0;
- WRITE_ONCE(lrugen->nr_pages[gen][type][zone],
- lrugen->nr_pages[gen][type][zone] + delta);
-
- if (lru_gen_is_active(lruvec, gen))
- lru += LRU_ACTIVE;
- __update_lru_size(lruvec, lru, zone, delta);
- }
-}
-
static int should_skip_vma(unsigned long start, unsigned long end, struct mm_walk *args)
{
struct address_space *mapping;
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
@ 2023-07-06 6:20 ` Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/vmscan.c | 64 ++++++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 32 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5fbc3babcd8..a846a62df0ba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3556,6 +3556,38 @@ static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk
}
}
+static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool force_alloc)
+{
+ struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
+
+ if (pgdat && current_is_kswapd()) {
+ VM_WARN_ON_ONCE(walk);
+
+ walk = &pgdat->mm_walk;
+ } else if (!walk && force_alloc) {
+ VM_WARN_ON_ONCE(current_is_kswapd());
+
+ walk = kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ }
+
+ current->reclaim_state->mm_walk = walk;
+
+ return walk;
+}
+
+static void clear_mm_walk(void)
+{
+ struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
+
+ VM_WARN_ON_ONCE(walk && memchr_inv(walk->nr_pages, 0, sizeof(walk->nr_pages)));
+ VM_WARN_ON_ONCE(walk && memchr_inv(walk->mm_stats, 0, sizeof(walk->mm_stats)));
+
+ current->reclaim_state->mm_walk = NULL;
+
+ if (!current_is_kswapd())
+ kfree(walk);
+}
+
/******************************************************************************
* Bloom filters
******************************************************************************/
@@ -4324,38 +4356,6 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_
} while (err == -EAGAIN);
}
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool force_alloc)
-{
- struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
-
- if (pgdat && current_is_kswapd()) {
- VM_WARN_ON_ONCE(walk);
-
- walk = &pgdat->mm_walk;
- } else if (!walk && force_alloc) {
- VM_WARN_ON_ONCE(current_is_kswapd());
-
- walk = kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
- }
-
- current->reclaim_state->mm_walk = walk;
-
- return walk;
-}
-
-static void clear_mm_walk(void)
-{
- struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
-
- VM_WARN_ON_ONCE(walk && memchr_inv(walk->nr_pages, 0, sizeof(walk->nr_pages)));
- VM_WARN_ON_ONCE(walk && memchr_inv(walk->mm_stats, 0, sizeof(walk->mm_stats)));
-
- current->reclaim_state->mm_walk = NULL;
-
- if (!current_is_kswapd())
- kfree(walk);
-}
-
static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
{
int zone;
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
` (2 preceding siblings ...)
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
@ 2023-07-06 6:20 ` Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
5 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
This helps to avoid building this code on powerpc.
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/vmscan.c | 54 ++++++++++++++++++++++++++---------------------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a846a62df0ba..0ea7a07990d3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4356,6 +4356,33 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_
} while (err == -EAGAIN);
}
+static bool iterate_mm_list_walk(struct lruvec *lruvec, unsigned long max_seq,
+ bool can_swap, bool force_scan)
+{
+ bool success;
+ struct mm_struct *mm = NULL;
+ struct lru_gen_mm_walk *walk;
+
+ walk = set_mm_walk(NULL, true);
+ if (!walk) {
+ success = iterate_mm_list_nowalk(lruvec, max_seq);
+ return success;
+ }
+
+ walk->lruvec = lruvec;
+ walk->max_seq = max_seq;
+ walk->can_swap = can_swap;
+ walk->force_scan = force_scan;
+
+ do {
+ success = iterate_mm_list(lruvec, walk, &mm);
+ if (mm)
+ walk_mm(lruvec, mm, walk);
+ } while (mm);
+
+ return success;
+}
+
static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
{
int zone;
@@ -4491,33 +4518,6 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
spin_unlock_irq(&lruvec->lru_lock);
}
-static bool iterate_mm_list_walk(struct lruvec *lruvec, unsigned long max_seq,
- bool can_swap, bool force_scan)
-{
- bool success;
- struct mm_struct *mm = NULL;
- struct lru_gen_mm_walk *walk;
-
- walk = set_mm_walk(NULL, true);
- if (!walk) {
- success = iterate_mm_list_nowalk(lruvec, max_seq);
- return success;
- }
-
- walk->lruvec = lruvec;
- walk->max_seq = max_seq;
- walk->can_swap = can_swap;
- walk->force_scan = force_scan;
-
- do {
- success = iterate_mm_list(lruvec, walk, &mm);
- if (mm)
- walk_mm(lruvec, mm, walk);
- } while (mm);
-
- return success;
-}
-
static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct scan_control *sc, bool can_swap, bool force_scan)
{
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
` (3 preceding siblings ...)
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
@ 2023-07-06 6:20 ` Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
5 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06 6:20 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: Yu Zhao, Aneesh Kumar K.V
Not all architecture supports hardware atomic updates of access bits. On
such an arch, we don't use a page table walk to classify pages into
generations. Add a kernel config option and remove adding all the page
table walk code on such architecture.
This avoid calling lru_gen related code (lru_gen_add/remove/migrate_mm)
in fork/exit/context switch Also we don't build different components
like Bloom filter and all the page table walk code (walk_mm and related
code) on not supported architecture with this change.
No preformance change observed with mongodb ycsb test:
Patch details Throughput(Ops/sec)
without patch 91252
With patch 91488
Without patch:
$ size mm/vmscan.o
text data bss dec hex filename
116016 36857 40 152913 25551 mm/vmscan.o
With patch
$ size mm/vmscan.o
text data bss dec hex filename
112864 36437 40 149341 2475d mm/vmscan.o
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/Kconfig | 3 +++
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
include/linux/memcontrol.h | 2 +-
include/linux/mm_types.h | 10 ++++-----
include/linux/mmzone.h | 12 +++++++++-
kernel/fork.c | 2 +-
mm/memcontrol.c | 2 +-
mm/vmscan.c | 45 ++++++++++++++++++++++++++++++++++++++
9 files changed, 69 insertions(+), 9 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index aff2746c8af2..ec8662e2f3cb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1471,6 +1471,9 @@ config DYNAMIC_SIGFRAME
config HAVE_ARCH_NODE_DEV_GROUP
bool
+config LRU_TASK_PAGE_AGING
+ bool
+
config ARCH_HAS_NONLEAF_PMD_YOUNG
bool
help
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7856c3a3e35a..d6b5d1647baa 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -233,6 +233,7 @@ config ARM64
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
select LOCK_MM_AND_FIND_VMA
+ select LRU_TASK_PAGE_AGING if LRU_GEN
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7422db409770..940d86a0a566 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -281,6 +281,7 @@ config X86
select HOTPLUG_SPLIT_STARTUP if SMP && X86_32
select IRQ_FORCED_THREADING
select LOCK_MM_AND_FIND_VMA
+ select LRU_TASK_PAGE_AGING if LRU_GEN
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
select NEED_SG_DMA_LENGTH
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0ab426a5696b..5ddc1abe95ae 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -321,7 +321,7 @@ struct mem_cgroup {
struct deferred_split deferred_split_queue;
#endif
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
/* per-memcg mm_struct list */
struct lru_gen_mm_list mm_list;
#endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de10fc797c8e..9089762aa8e2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -793,7 +793,7 @@ struct mm_struct {
*/
unsigned long ksm_rmap_items;
#endif
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
struct {
/* this mm_struct is on lru_gen_mm_list */
struct list_head list;
@@ -808,7 +808,7 @@ struct mm_struct {
struct mem_cgroup *memcg;
#endif
} lru_gen;
-#endif /* CONFIG_LRU_GEN */
+#endif /* CONFIG_LRU_TASK_PAGE_AGING */
} __randomize_layout;
/*
@@ -837,7 +837,7 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
return (struct cpumask *)&mm->cpu_bitmap;
}
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
struct lru_gen_mm_list {
/* mm_struct list for page table walkers */
@@ -871,7 +871,7 @@ static inline void lru_gen_use_mm(struct mm_struct *mm)
WRITE_ONCE(mm->lru_gen.bitmap, -1);
}
-#else /* !CONFIG_LRU_GEN */
+#else /* !CONFIG_LRU_TASK_PAGE_AGING */
static inline void lru_gen_add_mm(struct mm_struct *mm)
{
@@ -895,7 +895,7 @@ static inline void lru_gen_use_mm(struct mm_struct *mm)
{
}
-#endif /* CONFIG_LRU_GEN */
+#endif /* CONFIG_LRU_TASK_PAGE_AGING */
struct vma_iterator {
struct ma_state mas;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5e50b78d58ea..5300696d7c2c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -461,6 +461,7 @@ enum {
struct lru_gen_mm_state {
/* set to max_seq after each iteration */
unsigned long seq;
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
/* where the current iteration continues after */
struct list_head *head;
/* where the last iteration ended before */
@@ -469,6 +470,11 @@ struct lru_gen_mm_state {
unsigned long *filters[NR_BLOOM_FILTERS];
/* the mm stats for debugging */
unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
+#else
+ /* protect the seq update above */
+ /* May be we can use lruvec->lock? */
+ spinlock_t lock;
+#endif
};
struct lru_gen_mm_walk {
@@ -546,9 +552,13 @@ struct lru_gen_memcg {
};
void lru_gen_init_pgdat(struct pglist_data *pgdat);
-
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
void lru_gen_init_memcg(struct mem_cgroup *memcg);
void lru_gen_exit_memcg(struct mem_cgroup *memcg);
+#else
+static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) {}
+static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg) {}
+#endif
void lru_gen_online_memcg(struct mem_cgroup *memcg);
void lru_gen_offline_memcg(struct mem_cgroup *memcg);
void lru_gen_release_memcg(struct mem_cgroup *memcg);
diff --git a/kernel/fork.c b/kernel/fork.c
index b85814e614a5..c7e8f65a72c8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2932,7 +2932,7 @@ pid_t kernel_clone(struct kernel_clone_args *args)
get_task_struct(p);
}
- if (IS_ENABLED(CONFIG_LRU_GEN) && !(clone_flags & CLONE_VM)) {
+ if (IS_ENABLED(CONFIG_LRU_TASK_PAGE_AGING) && !(clone_flags & CLONE_VM)) {
/* lock the task to synchronize with memcg migration */
task_lock(p);
lru_gen_add_mm(p->mm);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 92898e99e8a5..cdcf1b6baf3e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6357,7 +6357,7 @@ static void mem_cgroup_move_task(void)
}
#endif
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
static void mem_cgroup_attach(struct cgroup_taskset *tset)
{
struct task_struct *task;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0ea7a07990d3..3c9f24d8a4a6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3244,10 +3244,17 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(lru_gen_caps, NR_LRU_GEN_CAPS);
#define get_cap(cap) static_branch_unlikely(&lru_gen_caps[cap])
#endif
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
static bool should_walk_mmu(void)
{
return arch_has_hw_pte_young() && get_cap(LRU_GEN_MM_WALK);
}
+#else
+static bool should_walk_mmu(void)
+{
+ return false;
+}
+#endif
static bool should_clear_pmd_young(void)
{
@@ -3588,6 +3595,8 @@ static void clear_mm_walk(void)
kfree(walk);
}
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
+
/******************************************************************************
* Bloom filters
******************************************************************************/
@@ -4382,6 +4391,33 @@ static bool iterate_mm_list_walk(struct lruvec *lruvec, unsigned long max_seq,
return success;
}
+#else
+
+static bool iterate_mm_list_nowalk(struct lruvec *lruvec, unsigned long max_seq)
+{
+ bool success = false;
+ struct lru_gen_mm_state *mm_state = &lruvec->mm_state;
+
+ spin_lock(&mm_state->lock);
+
+ VM_WARN_ON_ONCE(mm_state->seq + 1 < max_seq);
+
+ if (max_seq > mm_state->seq) {
+ WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
+ success = true;
+ }
+
+ spin_unlock(&mm_state->lock);
+
+ return success;
+}
+
+static bool iterate_mm_list_walk(struct lruvec *lruvec, unsigned long max_seq,
+ bool can_swap, bool force_scan)
+{
+ return false;
+}
+#endif
static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
{
@@ -4744,9 +4780,11 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
arch_leave_lazy_mmu_mode();
mem_cgroup_unlock_pages();
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
/* feedback from rmap walkers to page table walkers */
if (suitable_to_scan(i, young))
update_bloom_filter(lruvec, max_seq, pvmw->pmd);
+#endif
}
/******************************************************************************
@@ -5896,6 +5934,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_putc(m, '\n');
}
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
const char *s = " ";
@@ -5912,6 +5951,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_printf(m, " %10lu%c", n, s[i]);
}
seq_putc(m, '\n');
+#endif
}
/* see Documentation/admin-guide/mm/multigen_lru.rst for details */
@@ -6186,6 +6226,9 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
lruvec->mm_state.seq = MIN_NR_GENS;
+#ifndef CONFIG_LRU_TASK_PAGE_AGING
+ spin_lock_init(&lruvec->mm_state.lock);
+#endif
}
#ifdef CONFIG_MEMCG
@@ -6202,6 +6245,7 @@ void lru_gen_init_pgdat(struct pglist_data *pgdat)
}
}
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
INIT_LIST_HEAD(&memcg->mm_list.fifo);
@@ -6229,6 +6273,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
}
}
}
+#endif
#endif /* CONFIG_MEMCG */
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
` (4 preceding siblings ...)
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
@ 2023-07-07 7:57 ` Yu Zhao
2023-07-07 13:24 ` Aneesh Kumar K V
5 siblings, 1 reply; 9+ messages in thread
From: Yu Zhao @ 2023-07-07 7:57 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linux-mm, akpm
[-- Attachment #1: Type: text/plain, Size: 4919 bytes --]
On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
> multi-gen LRU: support page table walks") on platforms that don't support
> hardware atomic updates of access bits.
>
> Aneesh Kumar K.V (5):
> mm/mglru: Create a new helper iterate_mm_list_walk
> mm/mglru: Move Bloom filter code around
> mm/mglru: Move code around to make future patch easy
> mm/mglru: move iterate_mm_list_walk Helper
> mm/mglru: Don't build multi-gen LRU page table walk code on
> architecture not supported
>
> arch/Kconfig | 3 +
> arch/arm64/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> include/linux/memcontrol.h | 2 +-
> include/linux/mm_types.h | 10 +-
> include/linux/mmzone.h | 12 +-
> kernel/fork.c | 2 +-
> mm/memcontrol.c | 2 +-
> mm/vmscan.c | 955 +++++++++++++++++++------------------
> 9 files changed, 528 insertions(+), 460 deletions(-)
1. There is no need for a new Kconfig -- the condition is simply
defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
2. The best practice to disable static functions is not by macros but:
static int const_cond(void)
{
return 1;
}
int main(void)
{
int a = const_cond();
if (a)
return 0;
/* the compiler doesn't generate code for static funcs below */
static_func_1();
...
static_func_N();
LTO also optimizes external functions. But not everyone uses it. So we
still need macros for them, and of course data structures.
3. In 4/5, you have:
@@ -461,6 +461,7 @@ enum {
struct lru_gen_mm_state {
/* set to max_seq after each iteration */
unsigned long seq;
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
/* where the current iteration continues after */
struct list_head *head;
/* where the last iteration ended before */
@@ -469,6 +470,11 @@ struct lru_gen_mm_state {
unsigned long *filters[NR_BLOOM_FILTERS];
/* the mm stats for debugging */
unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
+#else
+ /* protect the seq update above */
+ /* May be we can use lruvec->lock? */
+ spinlock_t lock;
+#endif
};
The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
I'm attaching a patch that fixes all above. If you want to post it,
please feel free -- fully test it please, since I didn't. Otherwise I
can ask TJ to help make this work for you.
$ git diff --stat
include/linux/memcontrol.h | 2 +-
include/linux/mm_types.h | 12 +-
include/linux/mmzone.h | 2 +
kernel/bounds.c | 6 +-
kernel/fork.c | 2 +-
mm/vmscan.c | 169 +++++++++++++++++++--------
6 files changed, 137 insertions(+), 56 deletions(-)
On x86:
$ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
Function old new delta
...
should_skip_vma 206 - -206
get_pte_pfn 261 - -261
lru_gen_add_mm 323 - -323
lru_gen_seq_show 1710 1370 -340
lru_gen_del_mm 432 - -432
reset_batch_size 572 - -572
try_to_inc_max_seq 2947 1635 -1312
walk_pmd_range_locked 1508 - -1508
walk_pud_range 3238 - -3238
Total: Before=99449, After=91699, chg -7.79%
$ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
000000000000a350 <try_to_inc_max_seq>:
{
a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
a355: 55 push %rbp
a356: 48 89 e5 mov %rsp,%rbp
a359: 41 57 push %r15
a35b: 41 56 push %r14
a35d: 41 55 push %r13
a35f: 41 54 push %r12
a361: 53 push %rbx
a362: 48 83 ec 70 sub $0x70,%rsp
a366: 41 89 d4 mov %edx,%r12d
a369: 49 89 f6 mov %rsi,%r14
a36c: 49 89 ff mov %rdi,%r15
spin_lock_irq(&lruvec->lru_lock);
a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
a373: 48 89 df mov %rbx,%rdi
a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
success = max_seq == lrugen->max_seq;
a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
a382: 4c 39 f0 cmp %r14,%rax
[-- Attachment #2: no_mm_walk.patch --]
[-- Type: application/octet-stream, Size: 17983 bytes --]
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 634a282099bf..c55156b76a92 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -321,7 +321,7 @@ struct mem_cgroup {
struct deferred_split deferred_split_queue;
#endif
-#ifdef CONFIG_LRU_GEN
+#ifdef LRU_GEN_WALKS_PGTABLES
/* per-memcg mm_struct list */
struct lru_gen_mm_list mm_list;
#endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fb68f7b8efb6..fbf5050a36f0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -798,7 +798,7 @@ struct mm_struct {
*/
unsigned long ksm_zero_pages;
#endif /* CONFIG_KSM */
-#ifdef CONFIG_LRU_GEN
+#ifdef LRU_GEN_WALKS_PGTABLES
struct {
/* this mm_struct is on lru_gen_mm_list */
struct list_head list;
@@ -813,7 +813,7 @@ struct mm_struct {
struct mem_cgroup *memcg;
#endif
} lru_gen;
-#endif /* CONFIG_LRU_GEN */
+#endif /* LRU_GEN_WALKS_PGTABLES */
} __randomize_layout;
/*
@@ -851,6 +851,10 @@ struct lru_gen_mm_list {
spinlock_t lock;
};
+#endif /* CONFIG_LRU_GEN */
+
+#ifdef LRU_GEN_WALKS_PGTABLES
+
void lru_gen_add_mm(struct mm_struct *mm);
void lru_gen_del_mm(struct mm_struct *mm);
#ifdef CONFIG_MEMCG
@@ -876,7 +880,7 @@ static inline void lru_gen_use_mm(struct mm_struct *mm)
WRITE_ONCE(mm->lru_gen.bitmap, -1);
}
-#else /* !CONFIG_LRU_GEN */
+#else /* !LRU_GEN_WALKS_PGTABLES */
static inline void lru_gen_add_mm(struct mm_struct *mm)
{
@@ -900,7 +904,7 @@ static inline void lru_gen_use_mm(struct mm_struct *mm)
{
}
-#endif /* CONFIG_LRU_GEN */
+#endif /* LRU_GEN_WALKS_PGTABLES */
struct vma_iterator {
struct ma_state mas;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4106fbc5b4b3..720f913747fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -631,9 +631,11 @@ struct lruvec {
#ifdef CONFIG_LRU_GEN
/* evictable pages divided into generations */
struct lru_gen_folio lrugen;
+#ifdef LRU_GEN_WALKS_PGTABLES
/* to concurrently iterate lru_gen_mm_list */
struct lru_gen_mm_state mm_state;
#endif
+#endif /* CONFIG_LRU_GEN */
#ifdef CONFIG_MEMCG
struct pglist_data *pgdat;
#endif
diff --git a/kernel/bounds.c b/kernel/bounds.c
index b529182e8b04..3ee2a4273fea 100644
--- a/kernel/bounds.c
+++ b/kernel/bounds.c
@@ -12,6 +12,7 @@
#include <linux/kbuild.h>
#include <linux/log2.h>
#include <linux/spinlock_types.h>
+#include <linux/pgtable.h>
int main(void)
{
@@ -25,7 +26,10 @@ int main(void)
#ifdef CONFIG_LRU_GEN
DEFINE(LRU_GEN_WIDTH, order_base_2(MAX_NR_GENS + 1));
DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 2);
-#else
+#ifdef arch_has_hw_pte_young
+ DEFINE(LRU_GEN_WALKS_PGTABLES, 1);
+#endif
+#else /* !CONFIG_LRU_GEN */
DEFINE(LRU_GEN_WIDTH, 0);
DEFINE(__LRU_REFS_WIDTH, 0);
#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 2ba918f83bde..930cb4ee9172 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2938,7 +2938,7 @@ pid_t kernel_clone(struct kernel_clone_args *args)
get_task_struct(p);
}
- if (IS_ENABLED(CONFIG_LRU_GEN) && !(clone_flags & CLONE_VM)) {
+ if (IS_ENABLED(LRU_GEN_WALKS_PGTABLES) && !(clone_flags & CLONE_VM)) {
/* lock the task to synchronize with memcg migration */
task_lock(p);
lru_gen_add_mm(p->mm);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4039620d30fe..311716db9617 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3368,13 +3368,14 @@ static void get_item_key(void *item, int *key)
key[1] = hash >> BLOOM_FILTER_SHIFT;
}
-static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+static bool test_bloom_filter(struct lru_gen_mm_state *mm_state, unsigned long seq,
+ void *item)
{
int key[2];
unsigned long *filter;
int gen = filter_gen_from_seq(seq);
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ filter = READ_ONCE(mm_state->filters[gen]);
if (!filter)
return true;
@@ -3383,13 +3384,14 @@ static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *it
return test_bit(key[0], filter) && test_bit(key[1], filter);
}
-static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+static void update_bloom_filter(struct lru_gen_mm_state *mm_state, unsigned long seq,
+ void *item)
{
int key[2];
unsigned long *filter;
int gen = filter_gen_from_seq(seq);
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ filter = READ_ONCE(mm_state->filters[gen]);
if (!filter)
return;
@@ -3401,12 +3403,12 @@ static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *
set_bit(key[1], filter);
}
-static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
+static void reset_bloom_filter(struct lru_gen_mm_state *mm_state, unsigned long seq)
{
unsigned long *filter;
int gen = filter_gen_from_seq(seq);
- filter = lruvec->mm_state.filters[gen];
+ filter = mm_state->filters[gen];
if (filter) {
bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
return;
@@ -3414,13 +3416,15 @@ static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
- WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
+ WRITE_ONCE(mm_state->filters[gen], filter);
}
/******************************************************************************
* mm_struct list
******************************************************************************/
+#ifdef LRU_GEN_WALKS_PGTABLES
+
static struct lru_gen_mm_list *get_mm_list(struct mem_cgroup *memcg)
{
static struct lru_gen_mm_list mm_list = {
@@ -3437,6 +3441,28 @@ static struct lru_gen_mm_list *get_mm_list(struct mem_cgroup *memcg)
return &mm_list;
}
+static struct lru_gen_mm_state *get_mm_state(struct lruvec *lruvec)
+{
+ return &lruvec->mm_state;
+}
+
+static struct mm_struct *get_next_mm(struct list_head *next, struct lru_gen_mm_walk *walk)
+{
+ int key;
+ struct mm_struct *mm;
+ struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec);
+
+ mm = list_entry(next, struct mm_struct, lru_gen.list);
+ key = pgdat->node_id % BITS_PER_TYPE(mm->lru_gen.bitmap);
+
+ if (!walk->force_scan && !test_bit(key, &mm->lru_gen.bitmap))
+ return NULL;
+
+ clear_bit(key, &mm->lru_gen.bitmap);
+
+ return mm;
+}
+
void lru_gen_add_mm(struct mm_struct *mm)
{
int nid;
@@ -3452,10 +3478,11 @@ void lru_gen_add_mm(struct mm_struct *mm)
for_each_node_state(nid, N_MEMORY) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
/* the first addition since the last iteration */
- if (lruvec->mm_state.tail == &mm_list->fifo)
- lruvec->mm_state.tail = &mm->lru_gen.list;
+ if (mm_state->tail == &mm_list->fifo)
+ mm_state->tail = &mm->lru_gen.list;
}
list_add_tail(&mm->lru_gen.list, &mm_list->fifo);
@@ -3481,14 +3508,15 @@ void lru_gen_del_mm(struct mm_struct *mm)
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
/* where the current iteration continues after */
- if (lruvec->mm_state.head == &mm->lru_gen.list)
- lruvec->mm_state.head = lruvec->mm_state.head->prev;
+ if (mm_state->head == &mm->lru_gen.list)
+ mm_state->head = mm_state->head->prev;
/* where the last iteration ended before */
- if (lruvec->mm_state.tail == &mm->lru_gen.list)
- lruvec->mm_state.tail = lruvec->mm_state.tail->next;
+ if (mm_state->tail == &mm->lru_gen.list)
+ mm_state->tail = mm_state->tail->next;
}
list_del_init(&mm->lru_gen.list);
@@ -3531,10 +3559,30 @@ void lru_gen_migrate_mm(struct mm_struct *mm)
}
#endif
+#else /* !LRU_GEN_WALKS_PGTABLES */
+
+static struct lru_gen_mm_list *get_mm_list(struct mem_cgroup *memcg)
+{
+ return NULL;
+}
+
+static struct lru_gen_mm_state *get_mm_state(struct lruvec *lruvec)
+{
+ return NULL;
+}
+
+static struct mm_struct *get_next_mm(struct list_head *next, struct lru_gen_mm_walk *walk)
+{
+ return NULL;
+}
+
+#endif
+
static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk, bool last)
{
int i;
int hist;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
lockdep_assert_held(&get_mm_list(lruvec_memcg(lruvec))->lock);
@@ -3542,17 +3590,17 @@ static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk,
hist = lru_hist_from_seq(walk->max_seq);
for (i = 0; i < NR_MM_STATS; i++) {
- WRITE_ONCE(lruvec->mm_state.stats[hist][i],
- lruvec->mm_state.stats[hist][i] + walk->mm_stats[i]);
+ WRITE_ONCE(mm_state->stats[hist][i],
+ mm_state->stats[hist][i] + walk->mm_stats[i]);
walk->mm_stats[i] = 0;
}
}
if (NR_HIST_GENS > 1 && last) {
- hist = lru_hist_from_seq(lruvec->mm_state.seq + 1);
+ hist = lru_hist_from_seq(mm_state->seq + 1);
for (i = 0; i < NR_MM_STATS; i++)
- WRITE_ONCE(lruvec->mm_state.stats[hist][i], 0);
+ WRITE_ONCE(mm_state->stats[hist][i], 0);
}
}
@@ -3560,13 +3608,6 @@ static bool should_skip_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
{
int type;
unsigned long size = 0;
- struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec);
- int key = pgdat->node_id % BITS_PER_TYPE(mm->lru_gen.bitmap);
-
- if (!walk->force_scan && !test_bit(key, &mm->lru_gen.bitmap))
- return true;
-
- clear_bit(key, &mm->lru_gen.bitmap);
for (type = !walk->can_swap; type < ANON_AND_FILE; type++) {
size += type ? get_mm_counter(mm, MM_FILEPAGES) :
@@ -3588,7 +3629,7 @@ static bool iterate_mm_list(struct lruvec *lruvec, struct lru_gen_mm_walk *walk,
struct mm_struct *mm = NULL;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
- struct lru_gen_mm_state *mm_state = &lruvec->mm_state;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
/*
* mm_state->seq is incremented after each iteration of mm_list. There
@@ -3627,8 +3668,8 @@ static bool iterate_mm_list(struct lruvec *lruvec, struct lru_gen_mm_walk *walk,
walk->force_scan = true;
}
- mm = list_entry(mm_state->head, struct mm_struct, lru_gen.list);
- if (should_skip_mm(mm, walk))
+ mm = get_next_mm(mm_state->head, walk);
+ if (mm && should_skip_mm(mm, walk))
mm = NULL;
} while (!mm);
done:
@@ -3638,7 +3679,7 @@ static bool iterate_mm_list(struct lruvec *lruvec, struct lru_gen_mm_walk *walk,
spin_unlock(&mm_list->lock);
if (mm && first)
- reset_bloom_filter(lruvec, walk->max_seq + 1);
+ reset_bloom_filter(mm_state, walk->max_seq + 1);
if (*iter)
mmput_async(*iter);
@@ -3653,7 +3694,7 @@ static bool iterate_mm_list_nowalk(struct lruvec *lruvec, unsigned long max_seq)
bool success = false;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
- struct lru_gen_mm_state *mm_state = &lruvec->mm_state;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
spin_lock(&mm_list->lock);
@@ -4166,6 +4207,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
DECLARE_BITMAP(bitmap, MIN_LRU_BATCH);
unsigned long first = -1;
struct lru_gen_mm_walk *walk = args->private;
+ struct lru_gen_mm_state *mm_state = get_mm_state(walk->lruvec);
VM_WARN_ON_ONCE(pud_leaf(*pud));
@@ -4217,7 +4259,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
}
- if (!walk->force_scan && !test_bloom_filter(walk->lruvec, walk->max_seq, pmd + i))
+ if (!walk->force_scan && !test_bloom_filter(mm_state, walk->max_seq, pmd + i))
continue;
walk->mm_stats[MM_NONLEAF_FOUND]++;
@@ -4228,7 +4270,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
walk->mm_stats[MM_NONLEAF_ADDED]++;
/* carry over to the next generation */
- update_bloom_filter(walk->lruvec, walk->max_seq + 1, pmd + i);
+ update_bloom_filter(mm_state, walk->max_seq + 1, pmd + i);
}
walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first);
@@ -4434,8 +4476,10 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap)
return success;
}
-static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
+static bool inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
+ bool can_swap, bool force_scan)
{
+ bool success;
int prev, next;
int type, zone;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
@@ -4444,6 +4488,10 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
+ success = max_seq == lrugen->max_seq;
+ if (!success)
+ goto unlock;
+
for (type = ANON_AND_FILE - 1; type >= 0; type--) {
if (get_nr_gens(lruvec, type) != MAX_NR_GENS)
continue;
@@ -4486,8 +4534,10 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
WRITE_ONCE(lrugen->timestamps[next], jiffies);
/* make sure preceding modifications appear */
smp_store_release(&lrugen->max_seq, lrugen->max_seq + 1);
-
+unlock:
spin_unlock_irq(&lruvec->lru_lock);
+
+ return success;
}
static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
@@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+ if (!mm_state)
+ return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+
/* see the comment in iterate_mm_list() */
- if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
- success = false;
- goto done;
- }
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
/*
* If the hardware doesn't automatically set the accessed bit, fallback
@@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
walk_mm(lruvec, mm, walk);
} while (mm);
done:
- if (success)
- inc_max_seq(lruvec, can_swap, force_scan);
+ if (success) {
+ success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+ WARN_ON_ONCE(!success);
+ }
return success;
}
@@ -4658,6 +4712,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
struct mem_cgroup *memcg = folio_memcg(folio);
struct pglist_data *pgdat = folio_pgdat(folio);
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
DEFINE_MAX_SEQ(lruvec);
int old_gen, new_gen = lru_gen_from_seq(max_seq);
@@ -4736,8 +4791,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
mem_cgroup_unlock_pages();
/* feedback from rmap walkers to page table walkers */
- if (suitable_to_scan(i, young))
- update_bloom_filter(lruvec, max_seq, pvmw->pmd);
+ if (mm_state && suitable_to_scan(i, young))
+ update_bloom_filter(mm_state, max_seq, pvmw->pmd);
}
/******************************************************************************
@@ -5862,6 +5917,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
int type, tier;
int hist = lru_hist_from_seq(seq);
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
@@ -5887,6 +5943,9 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_putc(m, '\n');
}
+ if (!mm_state)
+ return;
+
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
const char *s = " ";
@@ -5894,10 +5953,10 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
if (seq == max_seq && NR_HIST_GENS == 1) {
s = "LOYNFA";
- n = READ_ONCE(lruvec->mm_state.stats[hist][i]);
+ n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
s = "loynfa";
- n = READ_ONCE(lruvec->mm_state.stats[hist][i]);
+ n = READ_ONCE(mm_state->stats[hist][i]);
}
seq_printf(m, " %10lu%c", n, s[i]);
@@ -6166,6 +6225,7 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
int i;
int gen, type, zone;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
lrugen->max_seq = MIN_NR_GENS + 1;
lrugen->enabled = lru_gen_enabled();
@@ -6176,7 +6236,8 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
for_each_gen_type_zone(gen, type, zone)
INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
- lruvec->mm_state.seq = MIN_NR_GENS;
+ if (mm_state)
+ mm_state->seq = MIN_NR_GENS;
}
#ifdef CONFIG_MEMCG
@@ -6195,28 +6256,38 @@ void lru_gen_init_pgdat(struct pglist_data *pgdat)
void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
- INIT_LIST_HEAD(&memcg->mm_list.fifo);
- spin_lock_init(&memcg->mm_list.lock);
+ struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
+
+ if (!mm_list)
+ return;
+
+ INIT_LIST_HEAD(&mm_list->fifo);
+ spin_lock_init(&mm_list->lock);
}
void lru_gen_exit_memcg(struct mem_cgroup *memcg)
{
int i;
int nid;
+ struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
- VM_WARN_ON_ONCE(!list_empty(&memcg->mm_list.fifo));
+ VM_WARN_ON_ONCE(mm_list && !list_empty(&mm_list->fifo));
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));
lruvec->lrugen.list.next = LIST_POISON1;
+ if (!mm_state)
+ continue;
+
for (i = 0; i < NR_BLOOM_FILTERS; i++) {
- bitmap_free(lruvec->mm_state.filters[i]);
- lruvec->mm_state.filters[i] = NULL;
+ bitmap_free(mm_state->filters[i]);
+ mm_state->filters[i] = NULL;
}
}
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
@ 2023-07-07 13:24 ` Aneesh Kumar K V
2023-07-08 2:06 ` Yu Zhao
0 siblings, 1 reply; 9+ messages in thread
From: Aneesh Kumar K V @ 2023-07-07 13:24 UTC (permalink / raw)
To: Yu Zhao; +Cc: linux-mm, akpm
On 7/7/23 1:27 PM, Yu Zhao wrote:
> On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
>> multi-gen LRU: support page table walks") on platforms that don't support
>> hardware atomic updates of access bits.
>>
>> Aneesh Kumar K.V (5):
>> mm/mglru: Create a new helper iterate_mm_list_walk
>> mm/mglru: Move Bloom filter code around
>> mm/mglru: Move code around to make future patch easy
>> mm/mglru: move iterate_mm_list_walk Helper
>> mm/mglru: Don't build multi-gen LRU page table walk code on
>> architecture not supported
>>
>> arch/Kconfig | 3 +
>> arch/arm64/Kconfig | 1 +
>> arch/x86/Kconfig | 1 +
>> include/linux/memcontrol.h | 2 +-
>> include/linux/mm_types.h | 10 +-
>> include/linux/mmzone.h | 12 +-
>> kernel/fork.c | 2 +-
>> mm/memcontrol.c | 2 +-
>> mm/vmscan.c | 955 +++++++++++++++++++------------------
>> 9 files changed, 528 insertions(+), 460 deletions(-)
>
> 1. There is no need for a new Kconfig -- the condition is simply
> defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
>
> 2. The best practice to disable static functions is not by macros but:
>
> static int const_cond(void)
> {
> return 1;
> }
>
> int main(void)
> {
> int a = const_cond();
>
> if (a)
> return 0;
>
> /* the compiler doesn't generate code for static funcs below */
> static_func_1();
> ...
> static_func_N();
>
> LTO also optimizes external functions. But not everyone uses it. So we
> still need macros for them, and of course data structures.
>
> 3. In 4/5, you have:
>
> @@ -461,6 +461,7 @@ enum {
> struct lru_gen_mm_state {
> /* set to max_seq after each iteration */
> unsigned long seq;
> +#ifdef CONFIG_LRU_TASK_PAGE_AGING
> /* where the current iteration continues after */
> struct list_head *head;
> /* where the last iteration ended before */
> @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
> unsigned long *filters[NR_BLOOM_FILTERS];
> /* the mm stats for debugging */
> unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
> +#else
> + /* protect the seq update above */
> + /* May be we can use lruvec->lock? */
> + spinlock_t lock;
> +#endif
> };
>
> The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
>
> I'm attaching a patch that fixes all above. If you want to post it,
> please feel free -- fully test it please, since I didn't. Otherwise I
> can ask TJ to help make this work for you.
>
> $ git diff --stat
> include/linux/memcontrol.h | 2 +-
> include/linux/mm_types.h | 12 +-
> include/linux/mmzone.h | 2 +
> kernel/bounds.c | 6 +-
> kernel/fork.c | 2 +-
> mm/vmscan.c | 169 +++++++++++++++++++--------
> 6 files changed, 137 insertions(+), 56 deletions(-)
>
> On x86:
>
> $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> Function old new delta
> ...
> should_skip_vma 206 - -206
> get_pte_pfn 261 - -261
> lru_gen_add_mm 323 - -323
> lru_gen_seq_show 1710 1370 -340
> lru_gen_del_mm 432 - -432
> reset_batch_size 572 - -572
> try_to_inc_max_seq 2947 1635 -1312
> walk_pmd_range_locked 1508 - -1508
> walk_pud_range 3238 - -3238
> Total: Before=99449, After=91699, chg -7.79%
>
> $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> 000000000000a350 <try_to_inc_max_seq>:
> {
> a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
> a355: 55 push %rbp
> a356: 48 89 e5 mov %rsp,%rbp
> a359: 41 57 push %r15
> a35b: 41 56 push %r14
> a35d: 41 55 push %r13
> a35f: 41 54 push %r12
> a361: 53 push %rbx
> a362: 48 83 ec 70 sub $0x70,%rsp
> a366: 41 89 d4 mov %edx,%r12d
> a369: 49 89 f6 mov %rsi,%r14
> a36c: 49 89 ff mov %rdi,%r15
> spin_lock_irq(&lruvec->lru_lock);
> a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
> a373: 48 89 df mov %rbx,%rdi
> a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
> success = max_seq == lrugen->max_seq;
> a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
> a382: 4c 39 f0 cmp %r14,%rax
For the below diff:
@@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+ if (!mm_state)
+ return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+
/* see the comment in iterate_mm_list() */
- if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
- success = false;
- goto done;
- }
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
/*
* If the hardware doesn't automatically set the accessed bit, fallback
@@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
walk_mm(lruvec, mm, walk);
} while (mm);
done:
- if (success)
- inc_max_seq(lruvec, can_swap, force_scan);
+ if (success) {
+ success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+ WARN_ON_ONCE(!success);
+ }
return success;
}
@
We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time.
inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
we might also need the below?
modified mm/vmscan.c
@@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
int type, zone;
struct lru_gen_struct *lrugen = &lruvec->lrugen;
+retry:
spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
- spin_lock_irq(&lruvec->lru_lock);
+ goto retry;
}
}
I also found that allowing only one cpu to increment max seq value and making other request
with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this?
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
2023-07-07 13:24 ` Aneesh Kumar K V
@ 2023-07-08 2:06 ` Yu Zhao
0 siblings, 0 replies; 9+ messages in thread
From: Yu Zhao @ 2023-07-08 2:06 UTC (permalink / raw)
To: Aneesh Kumar K V; +Cc: linux-mm, akpm
On Fri, Jul 7, 2023 at 7:24 AM Aneesh Kumar K V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 7/7/23 1:27 PM, Yu Zhao wrote:
> > On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
> >> multi-gen LRU: support page table walks") on platforms that don't support
> >> hardware atomic updates of access bits.
> >>
> >> Aneesh Kumar K.V (5):
> >> mm/mglru: Create a new helper iterate_mm_list_walk
> >> mm/mglru: Move Bloom filter code around
> >> mm/mglru: Move code around to make future patch easy
> >> mm/mglru: move iterate_mm_list_walk Helper
> >> mm/mglru: Don't build multi-gen LRU page table walk code on
> >> architecture not supported
> >>
> >> arch/Kconfig | 3 +
> >> arch/arm64/Kconfig | 1 +
> >> arch/x86/Kconfig | 1 +
> >> include/linux/memcontrol.h | 2 +-
> >> include/linux/mm_types.h | 10 +-
> >> include/linux/mmzone.h | 12 +-
> >> kernel/fork.c | 2 +-
> >> mm/memcontrol.c | 2 +-
> >> mm/vmscan.c | 955 +++++++++++++++++++------------------
> >> 9 files changed, 528 insertions(+), 460 deletions(-)
> >
> > 1. There is no need for a new Kconfig -- the condition is simply
> > defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
> >
> > 2. The best practice to disable static functions is not by macros but:
> >
> > static int const_cond(void)
> > {
> > return 1;
> > }
> >
> > int main(void)
> > {
> > int a = const_cond();
> >
> > if (a)
> > return 0;
> >
> > /* the compiler doesn't generate code for static funcs below */
> > static_func_1();
> > ...
> > static_func_N();
> >
> > LTO also optimizes external functions. But not everyone uses it. So we
> > still need macros for them, and of course data structures.
> >
> > 3. In 4/5, you have:
> >
> > @@ -461,6 +461,7 @@ enum {
> > struct lru_gen_mm_state {
> > /* set to max_seq after each iteration */
> > unsigned long seq;
> > +#ifdef CONFIG_LRU_TASK_PAGE_AGING
> > /* where the current iteration continues after */
> > struct list_head *head;
> > /* where the last iteration ended before */
> > @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
> > unsigned long *filters[NR_BLOOM_FILTERS];
> > /* the mm stats for debugging */
> > unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
> > +#else
> > + /* protect the seq update above */
> > + /* May be we can use lruvec->lock? */
> > + spinlock_t lock;
> > +#endif
> > };
> >
> > The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
> >
> > I'm attaching a patch that fixes all above. If you want to post it,
> > please feel free -- fully test it please, since I didn't. Otherwise I
> > can ask TJ to help make this work for you.
> >
> > $ git diff --stat
> > include/linux/memcontrol.h | 2 +-
> > include/linux/mm_types.h | 12 +-
> > include/linux/mmzone.h | 2 +
> > kernel/bounds.c | 6 +-
> > kernel/fork.c | 2 +-
> > mm/vmscan.c | 169 +++++++++++++++++++--------
> > 6 files changed, 137 insertions(+), 56 deletions(-)
> >
> > On x86:
> >
> > $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> > add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> > Function old new delta
> > ...
> > should_skip_vma 206 - -206
> > get_pte_pfn 261 - -261
> > lru_gen_add_mm 323 - -323
> > lru_gen_seq_show 1710 1370 -340
> > lru_gen_del_mm 432 - -432
> > reset_batch_size 572 - -572
> > try_to_inc_max_seq 2947 1635 -1312
> > walk_pmd_range_locked 1508 - -1508
> > walk_pud_range 3238 - -3238
> > Total: Before=99449, After=91699, chg -7.79%
> >
> > $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> > 000000000000a350 <try_to_inc_max_seq>:
> > {
> > a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
> > a355: 55 push %rbp
> > a356: 48 89 e5 mov %rsp,%rbp
> > a359: 41 57 push %r15
> > a35b: 41 56 push %r14
> > a35d: 41 55 push %r13
> > a35f: 41 54 push %r12
> > a361: 53 push %rbx
> > a362: 48 83 ec 70 sub $0x70,%rsp
> > a366: 41 89 d4 mov %edx,%r12d
> > a369: 49 89 f6 mov %rsi,%r14
> > a36c: 49 89 ff mov %rdi,%r15
> > spin_lock_irq(&lruvec->lru_lock);
> > a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
> > a373: 48 89 df mov %rbx,%rdi
> > a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
> > success = max_seq == lrugen->max_seq;
> > a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
> > a382: 4c 39 f0 cmp %r14,%rax
>
> For the below diff:
>
> @@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
> struct lru_gen_mm_walk *walk;
> struct mm_struct *mm = NULL;
> struct lru_gen_folio *lrugen = &lruvec->lrugen;
> + struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
>
> VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
>
> + if (!mm_state)
> + return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
> +
> /* see the comment in iterate_mm_list() */
> - if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
> - success = false;
> - goto done;
> - }
> + if (max_seq <= READ_ONCE(mm_state->seq))
> + return false;
>
> /*
> * If the hardware doesn't automatically set the accessed bit, fallback
> @@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
> walk_mm(lruvec, mm, walk);
> } while (mm);
> done:
> - if (success)
> - inc_max_seq(lruvec, can_swap, force_scan);
> + if (success) {
> + success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
> + WARN_ON_ONCE(!success);
> + }
>
> return success;
> }
> @
>
> We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time.
> inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
> we might also need the below?
Yes, you are right.
In fact, there is an existing bug here: even though max_seq can't
change without the patch, min_seq still can, and if it does, the
initial condition for inc_min_seq() to keep looping, i.e., nr_gens at
max, doesn't hold anymore.
for (type = ANON_AND_FILE - 1; type >= 0; type--) {
if (get_nr_gens(lruvec, type) != MAX_NR_GENS)
continue;
This only affects the debugfs interface (force_scan=true), which is
probably why it wasn't never reported.
> modified mm/vmscan.c
> @@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
> int type, zone;
> struct lru_gen_struct *lrugen = &lruvec->lrugen;
>
> +retry:
> spin_lock_irq(&lruvec->lru_lock);
>
> VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
> @@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
> while (!inc_min_seq(lruvec, type, can_swap)) {
> spin_unlock_irq(&lruvec->lru_lock);
> cond_resched();
> - spin_lock_irq(&lruvec->lru_lock);
> + goto retry;
> }
> }
>
> I also found that allowing only one cpu to increment max seq value and making other request
> with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this?
>
>
> + if (max_seq <= READ_ONCE(mm_state->seq))
> + return false;
Yes, but the condition should be "<" -- ">" is a bug, which was
asserted in try_to_in_max_seq().
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-07-08 2:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
2023-07-07 13:24 ` Aneesh Kumar K V
2023-07-08 2:06 ` Yu Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox