* [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
@ 2022-10-11 2:20 xu.xin.sc
2022-10-11 2:21 ` [PATCH v3 1/5] ksm: abstract the function try_to_get_old_rmap_item xu.xin.sc
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:20 UTC (permalink / raw)
To: akpm
Cc: ran.xiaokai, yang.yang29, jiang.xuexin, imbrenda, david,
linux-mm, linux-kernel, xu xin
From: xu xin <xu.xin16@zte.com.cn>
use_zero_pages is good, not just because of cache colouring as described
in doc, but also because use_zero_pages can accelerate merging empty pages
when there are plenty of empty pages (full of zeros) as the time of
page-by-page comparisons (unstable_tree_search_insert) is saved.
But there is something to improve, that is, when enabling use_zero_pages,
all empty pages will be merged with kernel zero pages instead of with each
other as use_zero_pages is disabled, and then these zero-pages are no longer
managed and monitor by KSM, which leads to two issues at least:
1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not*
unshare the shared zeropage as placed by KSM (which is against the
MADV_UNMERGEABLE documentation at least); see the link:
https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/
2) we cannot know how many pages are zero pages placed by KSM when
enabling use_zero_pages, which leads to KSM not being transparent
with all actual merged pages by KSM.
Zero pages may be the most common merged pages in actual environment(not only VM but
also including other application like containers). Enabling use_zero_pages in the
environment with plenty of empty pages(full of zeros) will be very useful. Users and
app developer can also benefit from knowing the proportion of zero pages in all
merged pages to optimize applications.
With the patch series, we can both unshare zero-pages(KSM-placed) accurately
and count ksm zero pages with enabling use_zero_pages.
---
v2->v3:
1) Add more descriptive information in cover letter.
2) In [patch 2/5], add more commit log for explaining reasons.
3) In [patch 2/5], fix misuse of break_ksm() in unmerge_ksm_pages():
break_ksm(vma, addr, NULL) -> break_ksm(vma, addr, false);
---
v1->v2:
[patch 4/5] fix build warning, mm/ksm.c:550, misleading indentation; statement
'rmap_item->mm->ksm_zero_pages_sharing--;' is not part of the previous 'if'.
*** BLURB HERE ***
xu xin (5):
ksm: abstract the function try_to_get_old_rmap_item
ksm: support unsharing zero pages placed by KSM
ksm: count all zero pages placed by KSM
ksm: count zero pages for each process
ksm: add zero_pages_sharing documentation
Documentation/admin-guide/mm/ksm.rst | 10 +-
fs/proc/base.c | 1 +
include/linux/mm_types.h | 7 +-
mm/ksm.c | 177 +++++++++++++++++++++------
4 files changed, 157 insertions(+), 38 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 1/5] ksm: abstract the function try_to_get_old_rmap_item
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
@ 2022-10-11 2:21 ` xu.xin.sc
2022-10-11 2:22 ` [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM xu.xin.sc
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:21 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda,
David Hildenbrand, Xuexin Jiang, Xiaokai Ran, Yang Yang
From: xu xin <xu.xin16@zte.com.cn>
A new function try_to_get_old_rmap_item is abstracted from
get_next_rmap_item. This function will be reused by the subsequent
patches about counting ksm_zero_pages.
The patch improves the readability and reusability of KSM code.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
---
mm/ksm.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index c19fcca9bc03..13c60f1071d8 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2187,23 +2187,36 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
}
}
-static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
- struct ksm_rmap_item **rmap_list,
- unsigned long addr)
+static struct ksm_rmap_item *try_to_get_old_rmap_item(unsigned long addr,
+ struct ksm_rmap_item **rmap_list)
{
- struct ksm_rmap_item *rmap_item;
-
while (*rmap_list) {
- rmap_item = *rmap_list;
+ struct ksm_rmap_item *rmap_item = *rmap_list;
if ((rmap_item->address & PAGE_MASK) == addr)
return rmap_item;
if (rmap_item->address > addr)
break;
*rmap_list = rmap_item->rmap_list;
+ /* Running here indicates it's vma has been UNMERGEABLE */
remove_rmap_item_from_tree(rmap_item);
free_rmap_item(rmap_item);
}
+ return NULL;
+}
+
+static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
+ struct ksm_rmap_item **rmap_list,
+ unsigned long addr)
+{
+ struct ksm_rmap_item *rmap_item;
+
+ /* lookup if we have a old rmap_item matching the addr*/
+ rmap_item = try_to_get_old_rmap_item(addr, rmap_list);
+ if (rmap_item)
+ return rmap_item;
+
+ /* Need to allocate a new rmap_item */
rmap_item = alloc_rmap_item();
if (rmap_item) {
/* It has already been zeroed */
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
2022-10-11 2:21 ` [PATCH v3 1/5] ksm: abstract the function try_to_get_old_rmap_item xu.xin.sc
@ 2022-10-11 2:22 ` xu.xin.sc
2022-10-21 10:17 ` David Hildenbrand
2022-10-11 2:22 ` [PATCH v3 3/5] ksm: count all " xu.xin.sc
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:22 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, xu xin, David Hildenbrand,
Claudio Imbrenda, Xuexin Jiang, Xiaokai Ran, Yang Yang
From: xu xin <xu.xin16@zte.com.cn>
use_zero_pages may be very useful, not just because of cache colouring
as described in doc, but also because use_zero_pages can accelerate
merging empty pages when there are plenty of empty pages (full of zeros)
as the time of page-by-page comparisons (unstable_tree_search_insert) is
saved.
But when enabling use_zero_pages, madvise(addr, len, MADV_UNMERGEABLE) and
other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing
will *not* unshare the shared zeropage as placed by KSM (which may be
against the MADV_UNMERGEABLE documentation at least).
To not blindly unshare all shared zero_pages in applicable VMAs, the patch
introduces a dedicated flag ZERO_PAGE_FLAG to mark the rmap_items of those
shared zero_pages. and guarantee that these rmap_items will be not freed
during the time of zero_pages not being writing, so we can only unshare
the *KSM-placed* zero_pages.
The patch will not degrade the performance of use_zero_pages as it doesn't
change the way of merging empty pages in use_zero_pages's feature.
Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
Reported-by: David Hildenbrand <david@redhat.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Co-developed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Signed-off-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Co-developed-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 136 ++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 105 insertions(+), 31 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 13c60f1071d8..e351d7b6d15e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -213,6 +213,7 @@ struct ksm_rmap_item {
#define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
#define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
#define STABLE_FLAG 0x200 /* is listed from the stable tree */
+#define ZERO_PAGE_FLAG 0x400 /* is zero page placed by KSM */
/* The stable and unstable tree heads */
static struct rb_root one_stable_tree[1] = { RB_ROOT };
@@ -381,14 +382,6 @@ static inline struct ksm_rmap_item *alloc_rmap_item(void)
return rmap_item;
}
-static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
-{
- ksm_rmap_items--;
- rmap_item->mm->ksm_rmap_items--;
- rmap_item->mm = NULL; /* debug safety */
- kmem_cache_free(rmap_item_cache, rmap_item);
-}
-
static inline struct ksm_stable_node *alloc_stable_node(void)
{
/*
@@ -420,7 +413,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
}
/*
- * We use break_ksm to break COW on a ksm page: it's a stripped down
+ * We use break_ksm to break COW on a ksm page or KSM-placed zero page (only
+ * happen when enabling use_zero_pages): it's a stripped down
*
* if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
* put_page(page);
@@ -434,7 +428,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
* of the process that owns 'vma'. We also do not want to enforce
* protection keys here anyway.
*/
-static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
+static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
+ bool ksm_check_bypass)
{
struct page *page;
vm_fault_t ret = 0;
@@ -449,6 +444,16 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
ret = handle_mm_fault(vma, addr,
FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
NULL);
+ else if (ksm_check_bypass && is_zero_pfn(page_to_pfn(page))) {
+ /*
+ * Although it's not ksm page, it's zero page as placed by
+ * KSM use_zero_page, so we should unshare it when
+ * ksm_check_bypass is true.
+ */
+ ret = handle_mm_fault(vma, addr,
+ FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
+ NULL);
+ }
else
ret = VM_FAULT_WRITE;
put_page(page);
@@ -496,6 +501,11 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
return vma;
}
+/*
+ * Note: Don't call break_cow() in the context protected by
+ * mmap_read_lock(), which may cause dead lock because inside
+ * break_cow mmap_read_lock exists.
+ */
static void break_cow(struct ksm_rmap_item *rmap_item)
{
struct mm_struct *mm = rmap_item->mm;
@@ -511,10 +521,35 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
mmap_read_lock(mm);
vma = find_mergeable_vma(mm, addr);
if (vma)
- break_ksm(vma, addr);
+ break_ksm(vma, addr, false);
mmap_read_unlock(mm);
}
+/* Only called when rmap_item->address is with ZERO_PAGE_FLAG */
+static inline int unshare_zero_pages(struct ksm_rmap_item *rmap_item)
+{
+ struct mm_struct *mm = rmap_item->mm;
+ struct vm_area_struct *vma;
+ unsigned long addr = rmap_item->address;
+ int err = -EFAULT;
+
+ vma = vma_lookup(mm, addr);
+ if (vma)
+ err = break_ksm(vma, addr, true);
+
+ return err;
+}
+
+static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
+{
+ if (rmap_item->address & ZERO_PAGE_FLAG)
+ unshare_zero_pages(rmap_item);
+ ksm_rmap_items--;
+ rmap_item->mm->ksm_rmap_items--;
+ rmap_item->mm = NULL; /* debug safety */
+ kmem_cache_free(rmap_item_cache, rmap_item);
+}
+
static struct page *get_mergeable_page(struct ksm_rmap_item *rmap_item)
{
struct mm_struct *mm = rmap_item->mm;
@@ -825,7 +860,7 @@ static int unmerge_ksm_pages(struct vm_area_struct *vma,
if (signal_pending(current))
err = -ERESTARTSYS;
else
- err = break_ksm(vma, addr);
+ err = break_ksm(vma, addr, false);
}
return err;
}
@@ -2017,6 +2052,36 @@ static void stable_tree_append(struct ksm_rmap_item *rmap_item,
rmap_item->mm->ksm_merging_pages++;
}
+static int try_to_merge_with_kernel_zero_page(struct mm_struct *mm,
+ struct ksm_rmap_item *rmap_item,
+ struct page *page)
+{
+ int err = 0;
+
+ if (!(rmap_item->address & ZERO_PAGE_FLAG)) {
+ struct vm_area_struct *vma;
+
+ mmap_read_lock(mm);
+ vma = find_mergeable_vma(mm, rmap_item->address);
+ if (vma) {
+ err = try_to_merge_one_page(vma, page,
+ ZERO_PAGE(rmap_item->address));
+ } else {
+ /* If the vma is out of date, we do not need to continue. */
+ err = 0;
+ }
+ mmap_read_unlock(mm);
+ /*
+ * In case of failure, the page was not really empty, so we
+ * need to continue. Otherwise we're done.
+ */
+ if (!err)
+ rmap_item->address |= ZERO_PAGE_FLAG;
+ }
+
+ return err;
+}
+
/*
* cmp_and_merge_page - first see if page can be merged into the stable tree;
* if not, compare checksum to previous and if it's the same, see if page can
@@ -2101,29 +2166,21 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
* Same checksum as an empty page. We attempt to merge it with the
* appropriate zero page if the user enabled this via sysfs.
*/
- if (ksm_use_zero_pages && (checksum == zero_checksum)) {
- struct vm_area_struct *vma;
-
- mmap_read_lock(mm);
- vma = find_mergeable_vma(mm, rmap_item->address);
- if (vma) {
- err = try_to_merge_one_page(vma, page,
- ZERO_PAGE(rmap_item->address));
- } else {
+ if (ksm_use_zero_pages) {
+ if (checksum == zero_checksum) {
+ /* If success, just return. Otherwise, continue */
+ if (!try_to_merge_with_kernel_zero_page(mm, rmap_item, page))
+ return;
+ } else if (rmap_item->address & ZERO_PAGE_FLAG) {
/*
- * If the vma is out of date, we do not need to
- * continue.
+ * The page now is not kernel zero page (COW happens to it)
+ * but the flag of its rmap_item is still zero-page, so need
+ * to reset the flag and update the corresponding count.
*/
- err = 0;
+ rmap_item->address &= PAGE_MASK;
}
- mmap_read_unlock(mm);
- /*
- * In case of failure, the page was not really empty, so we
- * need to continue. Otherwise we're done.
- */
- if (!err)
- return;
}
+
tree_rmap_item =
unstable_tree_search_insert(rmap_item, page, &tree_page);
if (tree_rmap_item) {
@@ -2337,6 +2394,23 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
mmap_read_unlock(mm);
return rmap_item;
}
+ /*
+ * Because we want to monitor ksm zero pages which is
+ * non-anonymous, we must try to return the rmap_items
+ * of those kernel zero pages which replaces its
+ * original anonymous empty page due to use_zero_pages's
+ * feature.
+ */
+ if (is_zero_pfn(page_to_pfn(*page))) {
+ rmap_item = try_to_get_old_rmap_item(ksm_scan.address,
+ ksm_scan.rmap_list);
+ if (rmap_item && (rmap_item->address & ZERO_PAGE_FLAG)) {
+ ksm_scan.rmap_list = &rmap_item->rmap_list;
+ ksm_scan.address += PAGE_SIZE;
+ mmap_read_unlock(mm);
+ return rmap_item;
+ }
+ }
next_page:
put_page(*page);
ksm_scan.address += PAGE_SIZE;
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 3/5] ksm: count all zero pages placed by KSM
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
2022-10-11 2:21 ` [PATCH v3 1/5] ksm: abstract the function try_to_get_old_rmap_item xu.xin.sc
2022-10-11 2:22 ` [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM xu.xin.sc
@ 2022-10-11 2:22 ` xu.xin.sc
2022-10-11 2:22 ` [PATCH v3 4/5] ksm: count zero pages for each process xu.xin.sc
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:22 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda,
David Hildenbrand, Xuexin Jiang, Xiaokai Ran, Yang Yang
From: xu xin <xu.xin16@zte.com.cn>
As pages_sharing and pages_shared don't include the number of zero pages
merged by KSM, we cannot know how many pages are zero pages placed by KSM
when enabling use_zero_pages, which leads to KSM not being transparent with
all actual merged pages by KSM. In the early days of use_zero_pages,
zero-pages was unable to get unshared by the ways like MADV_UNMERGEABLE so
it's hard to count how many times one of those zeropages was then unmerged.
But now, unsharing KSM-placed zero page accurately has been achieved, so we
can easily count both how many times a page full of zeroes was merged with
zero-page and how many times one of those pages was then unmerged. and so,
it helps to estimate memory demands when each and every shared page could
get unshared.
So we add zero_pages_sharing under /sys/kernel/mm/ksm/ to show the number
of all zero pages placed by KSM.
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index e351d7b6d15e..2970a7062db6 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -275,6 +275,9 @@ static unsigned int zero_checksum __read_mostly;
/* Whether to merge empty (zeroed) pages with actual zero pages */
static bool ksm_use_zero_pages __read_mostly;
+/* The number of zero pages placed by KSM use_zero_pages */
+static unsigned long ksm_zero_pages_sharing;
+
#ifdef CONFIG_NUMA
/* Zeroed when merging across nodes is not allowed */
static unsigned int ksm_merge_across_nodes = 1;
@@ -542,8 +545,10 @@ static inline int unshare_zero_pages(struct ksm_rmap_item *rmap_item)
static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
{
- if (rmap_item->address & ZERO_PAGE_FLAG)
- unshare_zero_pages(rmap_item);
+ if (rmap_item->address & ZERO_PAGE_FLAG) {
+ if (!unshare_zero_pages(rmap_item))
+ ksm_zero_pages_sharing--;
+ }
ksm_rmap_items--;
rmap_item->mm->ksm_rmap_items--;
rmap_item->mm = NULL; /* debug safety */
@@ -2075,8 +2080,10 @@ static int try_to_merge_with_kernel_zero_page(struct mm_struct *mm,
* In case of failure, the page was not really empty, so we
* need to continue. Otherwise we're done.
*/
- if (!err)
+ if (!err) {
rmap_item->address |= ZERO_PAGE_FLAG;
+ ksm_zero_pages_sharing++;
+ }
}
return err;
@@ -2178,6 +2185,7 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
* to reset the flag and update the corresponding count.
*/
rmap_item->address &= PAGE_MASK;
+ ksm_zero_pages_sharing--;
}
}
@@ -3190,6 +3198,13 @@ static ssize_t pages_volatile_show(struct kobject *kobj,
}
KSM_ATTR_RO(pages_volatile);
+static ssize_t zero_pages_sharing_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%ld\n", ksm_zero_pages_sharing);
+}
+KSM_ATTR_RO(zero_pages_sharing);
+
static ssize_t stable_node_dups_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -3250,6 +3265,7 @@ static struct attribute *ksm_attrs[] = {
&merge_across_nodes_attr.attr,
#endif
&max_page_sharing_attr.attr,
+ &zero_pages_sharing_attr.attr,
&stable_node_chains_attr.attr,
&stable_node_dups_attr.attr,
&stable_node_chains_prune_millisecs_attr.attr,
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 4/5] ksm: count zero pages for each process
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
` (2 preceding siblings ...)
2022-10-11 2:22 ` [PATCH v3 3/5] ksm: count all " xu.xin.sc
@ 2022-10-11 2:22 ` xu.xin.sc
2022-10-11 2:23 ` [PATCH v3 5/5] ksm: add zero_pages_sharing documentation xu.xin.sc
2022-10-17 23:55 ` [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages Andrew Morton
5 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:22 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda,
David Hildenbrand, Xuexin Jiang, Xiaokai Ran, Yang Yang
From: xu xin <xu.xin16@zte.com.cn>
As the number of ksm zero pages is not included in ksm_merging_pages per
process when enabling use_zero_pages, it's unclear of how many actual
pages are merged by KSM. To let users accurately estimate their memory
demands when unsharing KSM zero-pages, it's necessary to show KSM zero-
pages per process.
since unsharing zero pages placed by KSM accurately is achieved, then
tracking empty pages merging and unmerging is not a difficult thing any
longer.
Since we already have /proc/<pid>/ksm_stat, just add the information of
zero_pages_sharing in it.
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
fs/proc/base.c | 1 +
include/linux/mm_types.h | 7 ++++++-
mm/ksm.c | 6 +++++-
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9e479d7d202b..ac9ebe972be0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3207,6 +3207,7 @@ static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
mm = get_task_mm(task);
if (mm) {
seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
+ seq_printf(m, "zero_pages_sharing %lu\n", mm->ksm_zero_pages_sharing);
mmput(mm);
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..78a4ee264645 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -691,7 +691,7 @@ struct mm_struct {
#ifdef CONFIG_KSM
/*
* Represent how many pages of this process are involved in KSM
- * merging.
+ * merging (not including ksm_zero_pages_sharing).
*/
unsigned long ksm_merging_pages;
/*
@@ -699,6 +699,11 @@ struct mm_struct {
* including merged and not merged.
*/
unsigned long ksm_rmap_items;
+ /*
+ * Represent how many empty pages are merged with kernel zero
+ * pages when enabling KSM use_zero_pages.
+ */
+ unsigned long ksm_zero_pages_sharing;
#endif
#ifdef CONFIG_LRU_GEN
struct {
diff --git a/mm/ksm.c b/mm/ksm.c
index 2970a7062db6..c049a95afc26 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -546,8 +546,10 @@ static inline int unshare_zero_pages(struct ksm_rmap_item *rmap_item)
static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
{
if (rmap_item->address & ZERO_PAGE_FLAG) {
- if (!unshare_zero_pages(rmap_item))
+ if (!unshare_zero_pages(rmap_item)) {
ksm_zero_pages_sharing--;
+ rmap_item->mm->ksm_zero_pages_sharing--;
+ }
}
ksm_rmap_items--;
rmap_item->mm->ksm_rmap_items--;
@@ -2083,6 +2085,7 @@ static int try_to_merge_with_kernel_zero_page(struct mm_struct *mm,
if (!err) {
rmap_item->address |= ZERO_PAGE_FLAG;
ksm_zero_pages_sharing++;
+ rmap_item->mm->ksm_zero_pages_sharing++;
}
}
@@ -2186,6 +2189,7 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
*/
rmap_item->address &= PAGE_MASK;
ksm_zero_pages_sharing--;
+ rmap_item->mm->ksm_zero_pages_sharing--;
}
}
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 5/5] ksm: add zero_pages_sharing documentation
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
` (3 preceding siblings ...)
2022-10-11 2:22 ` [PATCH v3 4/5] ksm: count zero pages for each process xu.xin.sc
@ 2022-10-11 2:23 ` xu.xin.sc
2022-10-17 23:55 ` [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages Andrew Morton
5 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2022-10-11 2:23 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, xu xin, Xiaokai Ran, Yang Yang,
Jiang Xuexin, Claudio Imbrenda, David Hildenbrand
From: xu xin <xu.xin16@zte.com.cn>
When enabling use_zero_pages, pages_sharing cannot represent how
much memory saved indeed. zero_pages_sharing + pages_sharing does.
add the description of zero_pages_sharing.
Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
Documentation/admin-guide/mm/ksm.rst | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index fb6ba2002a4b..484665aa7418 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -162,7 +162,7 @@ The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
pages_shared
how many shared pages are being used
pages_sharing
- how many more sites are sharing them i.e. how much saved
+ how many more sites are sharing them
pages_unshared
how many pages unique but repeatedly checked for merging
pages_volatile
@@ -173,6 +173,14 @@ stable_node_chains
the number of KSM pages that hit the ``max_page_sharing`` limit
stable_node_dups
number of duplicated KSM pages
+zero_pages_sharing
+ how many empty pages are sharing kernel zero page(s) instead of
+ with each other as it would happen normally. Only effective when
+ enabling ``use_zero_pages`` knob.
+
+If ``use_zero_pages`` is 0, only ``pages_sharing`` can represents how
+much saved. Otherwise, ``pages_sharing`` + ``zero_pages_sharing``
+represents how much saved actually.
A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
` (4 preceding siblings ...)
2022-10-11 2:23 ` [PATCH v3 5/5] ksm: add zero_pages_sharing documentation xu.xin.sc
@ 2022-10-17 23:55 ` Andrew Morton
2022-10-18 9:00 ` xu xin
5 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2022-10-17 23:55 UTC (permalink / raw)
To: xu.xin.sc
Cc: ran.xiaokai, yang.yang29, jiang.xuexin, imbrenda, david,
linux-mm, linux-kernel, xu xin
On Tue, 11 Oct 2022 02:20:06 +0000 xu.xin.sc@gmail.com wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> use_zero_pages is good, not just because of cache colouring as described
> in doc, but also because use_zero_pages can accelerate merging empty pages
> when there are plenty of empty pages (full of zeros) as the time of
> page-by-page comparisons (unstable_tree_search_insert) is saved.
>
> But there is something to improve, that is, when enabling use_zero_pages,
> all empty pages will be merged with kernel zero pages instead of with each
> other as use_zero_pages is disabled, and then these zero-pages are no longer
> managed and monitor by KSM, which leads to two issues at least:
Sorry, but I'm struggling to understand what real value this patchset
offers.
> 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not*
> unshare the shared zeropage as placed by KSM (which is against the
> MADV_UNMERGEABLE documentation at least); see the link:
> https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/
Is that causing users any real-world problem? If not, just change the
documentation?
> 2) we cannot know how many pages are zero pages placed by KSM when
> enabling use_zero_pages, which leads to KSM not being transparent
> with all actual merged pages by KSM.
Why is this a problem?
A full description of the real-world end-user operational benefits of
these changes would help, please.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re:[PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
2022-10-17 23:55 ` [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages Andrew Morton
@ 2022-10-18 9:00 ` xu xin
2022-10-18 22:54 ` [PATCH " Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: xu xin @ 2022-10-18 9:00 UTC (permalink / raw)
To: akpm
Cc: david, imbrenda, jiang.xuexin, linux-kernel, linux-mm,
ran.xiaokai, xu.xin.sc, xu.xin16, yang.yang29
>> From: xu xin <xu.xin16@zte.com.cn>
>>
>> use_zero_pages is good, not just because of cache colouring as described
>> in doc, but also because use_zero_pages can accelerate merging empty pages
>> when there are plenty of empty pages (full of zeros) as the time of
>> page-by-page comparisons (unstable_tree_search_insert) is saved.
>>
>> But there is something to improve, that is, when enabling use_zero_pages,
>> all empty pages will be merged with kernel zero pages instead of with each
>> other as use_zero_pages is disabled, and then these zero-pages are no longer
>> managed and monitor by KSM, which leads to two issues at least:
>
>Sorry, but I'm struggling to understand what real value this patchset
>offers.
>
>> 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not*
>> unshare the shared zeropage as placed by KSM (which is against the
>> MADV_UNMERGEABLE documentation at least); see the link:
>> https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/
>
>Is that causing users any real-world problem? If not, just change the
>documentation?
>
>> 2) we cannot know how many pages are zero pages placed by KSM when
>> enabling use_zero_pages, which leads to KSM not being transparent
>> with all actual merged pages by KSM.
>
>Why is this a problem?
>
>A full description of the real-world end-user operational benefits of
>these changes would help, please.
>
The core idea of this patch set is to enable users to perceive the number of any
pages merged by KSM, regardless of whether use_zero_page switch has been turned
on, so that users can know how much free memory increase is really due to their
madvise(MERGEABLE) actions. The motivation for me to do this is that when I do
an application optimization of KSM on embedded Linux for 5G platform, I find
that ksm_merging_pages of some process becomes very small(but used to be large),
which led me to think that there was any problem with the application KSM-madvise
strategy, but in fact, it was only because use_zero_pages is on.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
2022-10-18 9:00 ` xu xin
@ 2022-10-18 22:54 ` Andrew Morton
2022-10-21 10:18 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2022-10-18 22:54 UTC (permalink / raw)
To: xu xin
Cc: david, imbrenda, jiang.xuexin, linux-kernel, linux-mm,
ran.xiaokai, xu.xin16, yang.yang29
On Tue, 18 Oct 2022 09:00:22 +0000 xu xin <xu.xin.sc@gmail.com> wrote:
> >A full description of the real-world end-user operational benefits of
> >these changes would help, please.
> >
>
> The core idea of this patch set is to enable users to perceive the number of any
> pages merged by KSM, regardless of whether use_zero_page switch has been turned
> on, so that users can know how much free memory increase is really due to their
> madvise(MERGEABLE) actions.
OK, thanks.
> The motivation for me to do this is that when I do
> an application optimization of KSM on embedded Linux for 5G platform, I find
> that ksm_merging_pages of some process becomes very small(but used to be large),
> which led me to think that there was any problem with the application KSM-madvise
> strategy, but in fact, it was only because use_zero_pages is on.
Please expand on the above motivation and experience, and include it in
the [0/n] changelog. But let's leave it a few days to see if there's
additional reviewer input.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM
2022-10-11 2:22 ` [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM xu.xin.sc
@ 2022-10-21 10:17 ` David Hildenbrand
2022-10-21 12:54 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-10-21 10:17 UTC (permalink / raw)
To: xu.xin.sc, akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda, Xuexin Jiang,
Xiaokai Ran, Yang Yang
On 11.10.22 04:22, xu.xin.sc@gmail.com wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> use_zero_pages may be very useful, not just because of cache colouring
> as described in doc, but also because use_zero_pages can accelerate
> merging empty pages when there are plenty of empty pages (full of zeros)
> as the time of page-by-page comparisons (unstable_tree_search_insert) is
> saved.
>
> But when enabling use_zero_pages, madvise(addr, len, MADV_UNMERGEABLE) and
> other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing
> will *not* unshare the shared zeropage as placed by KSM (which may be
> against the MADV_UNMERGEABLE documentation at least).
>
> To not blindly unshare all shared zero_pages in applicable VMAs, the patch
> introduces a dedicated flag ZERO_PAGE_FLAG to mark the rmap_items of those
> shared zero_pages. and guarantee that these rmap_items will be not freed
> during the time of zero_pages not being writing, so we can only unshare
> the *KSM-placed* zero_pages.
>
> The patch will not degrade the performance of use_zero_pages as it doesn't
> change the way of merging empty pages in use_zero_pages's feature.
>
> Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
> Reported-by: David Hildenbrand <david@redhat.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> Co-developed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
> Signed-off-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
> Co-developed-by: Yang Yang <yang.yang29@zte.com.cn>
> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---
> mm/ksm.c | 136 ++++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 105 insertions(+), 31 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 13c60f1071d8..e351d7b6d15e 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -213,6 +213,7 @@ struct ksm_rmap_item {
> #define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
> #define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
> #define STABLE_FLAG 0x200 /* is listed from the stable tree */
> +#define ZERO_PAGE_FLAG 0x400 /* is zero page placed by KSM */
>
> /* The stable and unstable tree heads */
> static struct rb_root one_stable_tree[1] = { RB_ROOT };
> @@ -381,14 +382,6 @@ static inline struct ksm_rmap_item *alloc_rmap_item(void)
> return rmap_item;
> }
>
> -static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
> -{
> - ksm_rmap_items--;
> - rmap_item->mm->ksm_rmap_items--;
> - rmap_item->mm = NULL; /* debug safety */
> - kmem_cache_free(rmap_item_cache, rmap_item);
> -}
> -
> static inline struct ksm_stable_node *alloc_stable_node(void)
> {
> /*
> @@ -420,7 +413,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
> }
>
> /*
> - * We use break_ksm to break COW on a ksm page: it's a stripped down
> + * We use break_ksm to break COW on a ksm page or KSM-placed zero page (only
> + * happen when enabling use_zero_pages): it's a stripped down
> *
> * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
> * put_page(page);
> @@ -434,7 +428,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
> * of the process that owns 'vma'. We also do not want to enforce
> * protection keys here anyway.
> */
> -static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
> +static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
> + bool ksm_check_bypass)
> {
> struct page *page;
> vm_fault_t ret = 0;
> @@ -449,6 +444,16 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
> ret = handle_mm_fault(vma, addr,
> FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
> NULL);
> + else if (ksm_check_bypass && is_zero_pfn(page_to_pfn(page))) {
> + /*
> + * Although it's not ksm page, it's zero page as placed by
> + * KSM use_zero_page, so we should unshare it when
> + * ksm_check_bypass is true.
> + */
> + ret = handle_mm_fault(vma, addr,
> + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
> + NULL);
> + }
Please don't duplicate that page fault triggering code.
Also, please be aware that this collides with
https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
Adjustments should be comparatively easy.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
2022-10-18 22:54 ` [PATCH " Andrew Morton
@ 2022-10-21 10:18 ` David Hildenbrand
2022-10-24 3:07 ` xu xin
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-10-21 10:18 UTC (permalink / raw)
To: Andrew Morton, xu xin
Cc: imbrenda, jiang.xuexin, linux-kernel, linux-mm, ran.xiaokai,
xu.xin16, yang.yang29
On 19.10.22 00:54, Andrew Morton wrote:
> On Tue, 18 Oct 2022 09:00:22 +0000 xu xin <xu.xin.sc@gmail.com> wrote:
>
>>> A full description of the real-world end-user operational benefits of
>>> these changes would help, please.
>>>
>>
>> The core idea of this patch set is to enable users to perceive the number of any
>> pages merged by KSM, regardless of whether use_zero_page switch has been turned
>> on, so that users can know how much free memory increase is really due to their
>> madvise(MERGEABLE) actions.
>
> OK, thanks.
>
>> The motivation for me to do this is that when I do
>> an application optimization of KSM on embedded Linux for 5G platform, I find
>> that ksm_merging_pages of some process becomes very small(but used to be large),
>> which led me to think that there was any problem with the application KSM-madvise
>> strategy, but in fact, it was only because use_zero_pages is on.
>
> Please expand on the above motivation and experience, and include it in
> the [0/n] changelog. But let's leave it a few days to see if there's
> additional reviewer input.
>
I just posted a selftest:
https://lore.kernel.org/all/20221021101141.84170-5-david@redhat.com/T/#u
That could (should) be extended to test if unmerging works as expected.
Having that said, I think we really want a second pair of (KSM-expert)
eyes on these changes before moving forward with them.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM
2022-10-21 10:17 ` David Hildenbrand
@ 2022-10-21 12:54 ` David Hildenbrand
2022-11-09 10:40 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-10-21 12:54 UTC (permalink / raw)
To: xu.xin.sc, akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda, Xuexin Jiang,
Xiaokai Ran, Yang Yang
On 21.10.22 12:17, David Hildenbrand wrote:
> On 11.10.22 04:22, xu.xin.sc@gmail.com wrote:
>> From: xu xin <xu.xin16@zte.com.cn>
>>
>> use_zero_pages may be very useful, not just because of cache colouring
>> as described in doc, but also because use_zero_pages can accelerate
>> merging empty pages when there are plenty of empty pages (full of zeros)
>> as the time of page-by-page comparisons (unstable_tree_search_insert) is
>> saved.
>>
>> But when enabling use_zero_pages, madvise(addr, len, MADV_UNMERGEABLE) and
>> other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing
>> will *not* unshare the shared zeropage as placed by KSM (which may be
>> against the MADV_UNMERGEABLE documentation at least).
>>
>> To not blindly unshare all shared zero_pages in applicable VMAs, the patch
>> introduces a dedicated flag ZERO_PAGE_FLAG to mark the rmap_items of those
>> shared zero_pages. and guarantee that these rmap_items will be not freed
>> during the time of zero_pages not being writing, so we can only unshare
>> the *KSM-placed* zero_pages.
>>
>> The patch will not degrade the performance of use_zero_pages as it doesn't
>> change the way of merging empty pages in use_zero_pages's feature.
>>
>> Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>> Co-developed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
>> Signed-off-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
>> Co-developed-by: Yang Yang <yang.yang29@zte.com.cn>
>> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>> ---
>> mm/ksm.c | 136 ++++++++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 105 insertions(+), 31 deletions(-)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 13c60f1071d8..e351d7b6d15e 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -213,6 +213,7 @@ struct ksm_rmap_item {
>> #define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
>> #define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
>> #define STABLE_FLAG 0x200 /* is listed from the stable tree */
>> +#define ZERO_PAGE_FLAG 0x400 /* is zero page placed by KSM */
>>
>> /* The stable and unstable tree heads */
>> static struct rb_root one_stable_tree[1] = { RB_ROOT };
>> @@ -381,14 +382,6 @@ static inline struct ksm_rmap_item *alloc_rmap_item(void)
>> return rmap_item;
>> }
>>
>> -static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
>> -{
>> - ksm_rmap_items--;
>> - rmap_item->mm->ksm_rmap_items--;
>> - rmap_item->mm = NULL; /* debug safety */
>> - kmem_cache_free(rmap_item_cache, rmap_item);
>> -}
>> -
>> static inline struct ksm_stable_node *alloc_stable_node(void)
>> {
>> /*
>> @@ -420,7 +413,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>> }
>>
>> /*
>> - * We use break_ksm to break COW on a ksm page: it's a stripped down
>> + * We use break_ksm to break COW on a ksm page or KSM-placed zero page (only
>> + * happen when enabling use_zero_pages): it's a stripped down
>> *
>> * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
>> * put_page(page);
>> @@ -434,7 +428,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>> * of the process that owns 'vma'. We also do not want to enforce
>> * protection keys here anyway.
>> */
>> -static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>> +static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
>> + bool ksm_check_bypass)
>> {
>> struct page *page;
>> vm_fault_t ret = 0;
>> @@ -449,6 +444,16 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>> ret = handle_mm_fault(vma, addr,
>> FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>> NULL);
>> + else if (ksm_check_bypass && is_zero_pfn(page_to_pfn(page))) {
>> + /*
>> + * Although it's not ksm page, it's zero page as placed by
>> + * KSM use_zero_page, so we should unshare it when
>> + * ksm_check_bypass is true.
>> + */
>> + ret = handle_mm_fault(vma, addr,
>> + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>> + NULL);
>> + }
>
> Please don't duplicate that page fault triggering code.
>
> Also, please be aware that this collides with
>
> https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
>
> Adjustments should be comparatively easy.
... except that I'm still working on FAULT_FLAG_UNSHARE support for the
shared zeropage. That will be posted soonish (within next 2 weeks).
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re:[PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages
2022-10-21 10:18 ` David Hildenbrand
@ 2022-10-24 3:07 ` xu xin
0 siblings, 0 replies; 15+ messages in thread
From: xu xin @ 2022-10-24 3:07 UTC (permalink / raw)
To: david
Cc: akpm, imbrenda, jiang.xuexin, linux-kernel, linux-mm,
ran.xiaokai, xu.xin.sc, xu.xin16, yang.yang29
>
>>>> A full description of the real-world end-user operational benefits of
>>>> these changes would help, please.
>>>>
>>>
>>> The core idea of this patch set is to enable users to perceive the number of any
>>> pages merged by KSM, regardless of whether use_zero_page switch has been turned
>>> on, so that users can know how much free memory increase is really due to their
>>> madvise(MERGEABLE) actions.
>>
>> OK, thanks.
>>
>>> The motivation for me to do this is that when I do
>>> an application optimization of KSM on embedded Linux for 5G platform, I find
>>> that ksm_merging_pages of some process becomes very small(but used to be large),
>>> which led me to think that there was any problem with the application KSM-madvise
>>> strategy, but in fact, it was only because use_zero_pages is on.
>>
>> Please expand on the above motivation and experience, and include it in
>> the [0/n] changelog. But let's leave it a few days to see if there's
>> additional reviewer input.
>>
>
>I just posted a selftest:
>
>https://lore.kernel.org/all/20221021101141.84170-5-david@redhat.com/T/#u
>
>That could (should) be extended to test if unmerging works as expected.
>
Yes. As you said, these selftests can be extended to test if unsharing KSM-placed
zero pages works as expected, and I'm happy to do the extending after they are merged.
>
>Having that said, I think we really want a second pair of (KSM-expert)
>eyes on these changes before moving forward with them.
OK, don't worry. Let it be reviewed for a more time, so as to absorb more views later.
If necessary, I will resend the patches to adjust to break_ksm()'s changes.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM
2022-10-21 12:54 ` David Hildenbrand
@ 2022-11-09 10:40 ` David Hildenbrand
2022-11-14 3:02 ` xu xin
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2022-11-09 10:40 UTC (permalink / raw)
To: xu.xin.sc, akpm
Cc: linux-mm, linux-kernel, xu xin, Claudio Imbrenda, Xuexin Jiang,
Xiaokai Ran, Yang Yang
On 21.10.22 14:54, David Hildenbrand wrote:
> On 21.10.22 12:17, David Hildenbrand wrote:
>> On 11.10.22 04:22, xu.xin.sc@gmail.com wrote:
>>> From: xu xin <xu.xin16@zte.com.cn>
>>>
>>> use_zero_pages may be very useful, not just because of cache colouring
>>> as described in doc, but also because use_zero_pages can accelerate
>>> merging empty pages when there are plenty of empty pages (full of zeros)
>>> as the time of page-by-page comparisons (unstable_tree_search_insert) is
>>> saved.
>>>
>>> But when enabling use_zero_pages, madvise(addr, len, MADV_UNMERGEABLE) and
>>> other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing
>>> will *not* unshare the shared zeropage as placed by KSM (which may be
>>> against the MADV_UNMERGEABLE documentation at least).
>>>
>>> To not blindly unshare all shared zero_pages in applicable VMAs, the patch
>>> introduces a dedicated flag ZERO_PAGE_FLAG to mark the rmap_items of those
>>> shared zero_pages. and guarantee that these rmap_items will be not freed
>>> during the time of zero_pages not being writing, so we can only unshare
>>> the *KSM-placed* zero_pages.
>>>
>>> The patch will not degrade the performance of use_zero_pages as it doesn't
>>> change the way of merging empty pages in use_zero_pages's feature.
>>>
>>> Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
>>> Reported-by: David Hildenbrand <david@redhat.com>
>>> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
>>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>>> Co-developed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
>>> Signed-off-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
>>> Co-developed-by: Yang Yang <yang.yang29@zte.com.cn>
>>> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
>>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>>> ---
>>> mm/ksm.c | 136 ++++++++++++++++++++++++++++++++++++++++++-------------
>>> 1 file changed, 105 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 13c60f1071d8..e351d7b6d15e 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -213,6 +213,7 @@ struct ksm_rmap_item {
>>> #define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
>>> #define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
>>> #define STABLE_FLAG 0x200 /* is listed from the stable tree */
>>> +#define ZERO_PAGE_FLAG 0x400 /* is zero page placed by KSM */
>>>
>>> /* The stable and unstable tree heads */
>>> static struct rb_root one_stable_tree[1] = { RB_ROOT };
>>> @@ -381,14 +382,6 @@ static inline struct ksm_rmap_item *alloc_rmap_item(void)
>>> return rmap_item;
>>> }
>>>
>>> -static inline void free_rmap_item(struct ksm_rmap_item *rmap_item)
>>> -{
>>> - ksm_rmap_items--;
>>> - rmap_item->mm->ksm_rmap_items--;
>>> - rmap_item->mm = NULL; /* debug safety */
>>> - kmem_cache_free(rmap_item_cache, rmap_item);
>>> -}
>>> -
>>> static inline struct ksm_stable_node *alloc_stable_node(void)
>>> {
>>> /*
>>> @@ -420,7 +413,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>>> }
>>>
>>> /*
>>> - * We use break_ksm to break COW on a ksm page: it's a stripped down
>>> + * We use break_ksm to break COW on a ksm page or KSM-placed zero page (only
>>> + * happen when enabling use_zero_pages): it's a stripped down
>>> *
>>> * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
>>> * put_page(page);
>>> @@ -434,7 +428,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>>> * of the process that owns 'vma'. We also do not want to enforce
>>> * protection keys here anyway.
>>> */
>>> -static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>>> +static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
>>> + bool ksm_check_bypass)
>>> {
>>> struct page *page;
>>> vm_fault_t ret = 0;
>>> @@ -449,6 +444,16 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>>> ret = handle_mm_fault(vma, addr,
>>> FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>>> NULL);
>>> + else if (ksm_check_bypass && is_zero_pfn(page_to_pfn(page))) {
>>> + /*
>>> + * Although it's not ksm page, it's zero page as placed by
>>> + * KSM use_zero_page, so we should unshare it when
>>> + * ksm_check_bypass is true.
>>> + */
>>> + ret = handle_mm_fault(vma, addr,
>>> + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>>> + NULL);
>>> + }
>>
>> Please don't duplicate that page fault triggering code.
>>
>> Also, please be aware that this collides with
>>
>> https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
>>
>> Adjustments should be comparatively easy.
>
> ... except that I'm still working on FAULT_FLAG_UNSHARE support for the
> shared zeropage. That will be posted soonish (within next 2 weeks).
>
Posted: https://lkml.kernel.org/r/20221107161740.144456-1-david@redhat.com
With that, we can use FAULT_FLAG_UNSHARE also to break COW on the shared
zeropage.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM
2022-11-09 10:40 ` David Hildenbrand
@ 2022-11-14 3:02 ` xu xin
0 siblings, 0 replies; 15+ messages in thread
From: xu xin @ 2022-11-14 3:02 UTC (permalink / raw)
To: david
Cc: akpm, imbrenda, jiang.xuexin, linux-kernel, linux-mm,
ran.xiaokai, xu.xin.sc, xu.xin16, yang.yang29
>>>> - * We use break_ksm to break COW on a ksm page: it's a stripped down
>>>> + * We use break_ksm to break COW on a ksm page or KSM-placed zero page (only
>>>> + * happen when enabling use_zero_pages): it's a stripped down
>>>> *
>>>> * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
>>>> * put_page(page);
>>>> @@ -434,7 +428,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>>>> * of the process that owns 'vma'. We also do not want to enforce
>>>> * protection keys here anyway.
>>>> */
>>>> -static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>>>> +static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
>>>> + bool ksm_check_bypass)
>>>> {
>>>> struct page *page;
>>>> vm_fault_t ret = 0;
>>>> @@ -449,6 +444,16 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
>>>> ret = handle_mm_fault(vma, addr,
>>>> FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>>>> NULL);
>>>> + else if (ksm_check_bypass && is_zero_pfn(page_to_pfn(page))) {
>>>> + /*
>>>> + * Although it's not ksm page, it's zero page as placed by
>>>> + * KSM use_zero_page, so we should unshare it when
>>>> + * ksm_check_bypass is true.
>>>> + */
>>>> + ret = handle_mm_fault(vma, addr,
>>>> + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
>>>> + NULL);
>>>> + }
>>>
>>> Please don't duplicate that page fault triggering code.
>>>
>>> Also, please be aware that this collides with
>>>
>>> https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
>>>
>>> Adjustments should be comparatively easy.
>>
>> ... except that I'm still working on FAULT_FLAG_UNSHARE support for the
>> shared zeropage. That will be posted soonish (within next 2 weeks).
>>
>
>Posted: https://lkml.kernel.org/r/20221107161740.144456-1-david@redhat.com
>
>With that, we can use FAULT_FLAG_UNSHARE also to break COW on the shared
>zeropage.
Sounds a better way for breaking COW working with reliable R/O long-tern pinning.
>--
>Thanks,
>
>David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-11-14 3:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-11 2:20 [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages xu.xin.sc
2022-10-11 2:21 ` [PATCH v3 1/5] ksm: abstract the function try_to_get_old_rmap_item xu.xin.sc
2022-10-11 2:22 ` [PATCH v3 2/5] ksm: support unsharing zero pages placed by KSM xu.xin.sc
2022-10-21 10:17 ` David Hildenbrand
2022-10-21 12:54 ` David Hildenbrand
2022-11-09 10:40 ` David Hildenbrand
2022-11-14 3:02 ` xu xin
2022-10-11 2:22 ` [PATCH v3 3/5] ksm: count all " xu.xin.sc
2022-10-11 2:22 ` [PATCH v3 4/5] ksm: count zero pages for each process xu.xin.sc
2022-10-11 2:23 ` [PATCH v3 5/5] ksm: add zero_pages_sharing documentation xu.xin.sc
2022-10-17 23:55 ` [PATCH v3 0/5] ksm: support tracking KSM-placed zero-pages Andrew Morton
2022-10-18 9:00 ` xu xin
2022-10-18 22:54 ` [PATCH " Andrew Morton
2022-10-21 10:18 ` David Hildenbrand
2022-10-24 3:07 ` xu xin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox