linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] KSM: Replace jhash2 with xxhash
@ 2017-09-22  9:52 Timofey Titovets
  2017-09-22  9:52 ` [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
  2017-09-22  9:52 ` [PATCH v2 2/2] KSM: Replace jhash2 with xxhash Timofey Titovets
  0 siblings, 2 replies; 4+ messages in thread
From: Timofey Titovets @ 2017-09-22  9:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, borntraeger, kvm, Timofey Titovets

(Resend, added cc kvm@vger.kernel.org)

ksm use jhash2 for hashing pages,
in 4.14 xxhash has been merged to mainline kernel.

xxhash much faster then jhash2 on big inputs (32 byte+)

xxhash has 2 versions, one with 32-bit hash and
one with 64-bit hash.

64-bit version works faster then 32-bit on 64-bit arch.

So lets get better from two worlds,
create arch dependent xxhash() function that will use
fastest algo for current arch.
This a first patch.

Performance info and ksm update can be found in second patch.

Changelog:
  v1 -> v2:
    - Move xxhash() to xxhash.h/c and separate patches

Timofey Titovets (2):
  xxHash: create arch dependent 32/64-bit xxhash()
  KSM: Replace jhash2 with xxhash

 include/linux/xxhash.h | 24 ++++++++++++++++++++++++
 lib/xxhash.c           | 10 ++++++++++
 mm/Kconfig             |  1 +
 mm/ksm.c               | 14 +++++++-------
 4 files changed, 42 insertions(+), 7 deletions(-)

--
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash()
  2017-09-22  9:52 [PATCH v2 0/2] KSM: Replace jhash2 with xxhash Timofey Titovets
@ 2017-09-22  9:52 ` Timofey Titovets
  2017-09-22  9:52 ` [PATCH v2 2/2] KSM: Replace jhash2 with xxhash Timofey Titovets
  1 sibling, 0 replies; 4+ messages in thread
From: Timofey Titovets @ 2017-09-22  9:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, borntraeger, kvm, Timofey Titovets

xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform

Create xxhash() which will pickup fastest version
on compile time.

Add type xxhash_t to map correct hash size.

As result depends on cpu word size,
the main proporse of that - in memory hashing.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Linux-kernel <linux-kernel@vger.kernel.org>
---
 include/linux/xxhash.h | 24 ++++++++++++++++++++++++
 lib/xxhash.c           | 10 ++++++++++
 2 files changed, 34 insertions(+)

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
index 9e1f42cb57e9..195a0ae10e9b 100644
--- a/include/linux/xxhash.h
+++ b/include/linux/xxhash.h
@@ -76,6 +76,7 @@
 #define XXHASH_H

 #include <linux/types.h>
+#include <linux/bitops.h> /* BITS_PER_LONG */

 /*-****************************
  * Simple Hash Functions
@@ -107,6 +108,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
  */
 uint64_t xxh64(const void *input, size_t length, uint64_t seed);

+#if BITS_PER_LONG == 64
+typedef	u64	xxhash_t;
+#else
+typedef	u32	xxhash_t;
+#endif
+
+/**
+ * xxhash() - calculate 32/64-bit hash based on cpu word size
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * This function always work as xxh32() for 32-bit systems
+ * and as xxh64() for 64-bit systems.
+ * Because result depends on cpu work size,
+ * the main proporse of that function is for  in memory hashing.
+ *
+ * Return:  32/64-bit hash of the data.
+ */
+
+xxhash_t xxhash(const void *input, size_t length, uint64_t seed);
+
 /*-****************************
  * Streaming Hash Functions
  *****************************/
diff --git a/lib/xxhash.c b/lib/xxhash.c
index aa61e2a3802f..7dd1105fcc30 100644
--- a/lib/xxhash.c
+++ b/lib/xxhash.c
@@ -236,6 +236,16 @@ uint64_t xxh64(const void *input, const size_t len, const uint64_t seed)
 }
 EXPORT_SYMBOL(xxh64);

+xxhash_t xxhash(const void *input, size_t length, uint64_t seed)
+{
+#if BITS_PER_LONG == 64
+	return xxh64(input, length, seed);
+#else
+	return xxh32(input, length, seed);
+#endif
+}
+EXPORT_SYMBOL(xxhash);
+
 /*-**************************************************
  * Advanced Hash Functions
  ***************************************************/
--
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] KSM: Replace jhash2 with xxhash
  2017-09-22  9:52 [PATCH v2 0/2] KSM: Replace jhash2 with xxhash Timofey Titovets
  2017-09-22  9:52 ` [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2017-09-22  9:52 ` Timofey Titovets
  1 sibling, 0 replies; 4+ messages in thread
From: Timofey Titovets @ 2017-09-22  9:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, borntraeger, kvm, Timofey Titovets

jhash2 used for calculating checksum
for in memory pages, for detect fact of
changes in page.

xxhash much faster then jhash2, some tests:
  x86_64 host:
    CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
    PAGE_SIZE: 4096, loop count: 1048576
    jhash2:   0xacbc7a5b            time: 1907 ms,  th:  2251.9 MiB/s
    xxhash32: 0x570da981            time: 739 ms,   th:  5809.4 MiB/s
    xxhash64: 0xa1fa032ab85bbb62    time: 371 ms,   th: 11556.6 MiB/s

    CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
    PAGE_SIZE: 4096, loop count: 1048576
    jhash2:   0xe680b382            time: 3722 ms,  th: 1153.896680 MiB/s
    xxhash32: 0x56d00be4            time: 1183 ms,  th: 3629.130689 MiB/s
    xxhash64: 0x8c194cff29cc4dee    time: 725 ms,   th: 5918.003401 MiB/s

xxhash64 on x86_32 work with ~ same speed as jhash2.
xxhash32 on x86_32 work with ~ same speed as for x86_64
jhash2 are faster than xxhash on input data smaller than 32 byte

So use xxhash() which will take appropriate hash version
for target arch

I did some benchmarks (i get cpu load of ksmd from htop):
  CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
  ksm: sleep_millisecs = 1
    jhash2:   ~18%
    xxhash64: ~11%
  ksm: sleep_millisecs = 20 - default
    jhash2:   ~4.7%
    xxhash64: ~3.3%

  - 11 / 18 ~= 0.6 -> Profit: ~40%
  - 3.3/4.7 ~= 0.7 -> Profit: ~30%

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
---
 mm/Kconfig |  1 +
 mm/ksm.c   | 14 +++++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 9c4bdddd80c2..252ab266ac23 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,7 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index 15dd7415f7b3..6527fe21aaa3 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,7 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
+#include <linux/xxhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -186,7 +186,7 @@ struct rmap_item {
 	};
 	struct mm_struct *mm;
 	unsigned long address;		/* + low bits used for flags below */
-	unsigned int oldchecksum;	/* when unstable */
+	xxhash_t oldchecksum;		/* when unstable */
 	union {
 		struct rb_node node;	/* when node of unstable tree */
 		struct {		/* when listed from stable tree */
@@ -255,7 +255,7 @@ static unsigned int ksm_thread_pages_to_scan = 100;
 static unsigned int ksm_thread_sleep_millisecs = 20;

 /* Checksum of an empty (zeroed) page */
-static unsigned int zero_checksum __read_mostly;
+static xxhash_t zero_checksum __read_mostly;

 /* Whether to merge empty (zeroed) pages with actual zero pages */
 static bool ksm_use_zero_pages __read_mostly;
@@ -982,11 +982,11 @@ static int unmerge_and_remove_all_rmap_items(void)
 }
 #endif /* CONFIG_SYSFS */

-static u32 calc_checksum(struct page *page)
+static xxhash_t calc_checksum(struct page *page)
 {
-	u32 checksum;
+	xxhash_t checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = xxhash(addr, PAGE_SIZE, 0);
 	kunmap_atomic(addr);
 	return checksum;
 }
@@ -1994,7 +1994,7 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 	struct page *tree_page = NULL;
 	struct stable_node *stable_node;
 	struct page *kpage;
-	unsigned int checksum;
+	xxhash_t checksum;
 	int err;
 	bool max_page_sharing_bypass = false;

--
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] KSM: Replace jhash2 with xxhash
  2017-09-21 23:18 [PATCH v2 0/2] " Timofey Titovets
@ 2017-09-21 23:18 ` Timofey Titovets
  0 siblings, 0 replies; 4+ messages in thread
From: Timofey Titovets @ 2017-09-21 23:18 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Timofey Titovets

jhash2 used for calculating checksum
for in memory pages, for detect fact of
changes in page.

xxhash much faster then jhash2, some tests:
  x86_64 host:
    CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
    PAGE_SIZE: 4096, loop count: 1048576
    jhash2:   0xacbc7a5b            time: 1907 ms,  th:  2251.9 MiB/s
    xxhash32: 0x570da981            time: 739 ms,   th:  5809.4 MiB/s
    xxhash64: 0xa1fa032ab85bbb62    time: 371 ms,   th: 11556.6 MiB/s

    CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
    PAGE_SIZE: 4096, loop count: 1048576
    jhash2:   0xe680b382            time: 3722 ms,  th: 1153.896680 MiB/s
    xxhash32: 0x56d00be4            time: 1183 ms,  th: 3629.130689 MiB/s
    xxhash64: 0x8c194cff29cc4dee    time: 725 ms,   th: 5918.003401 MiB/s

xxhash64 on x86_32 work with ~ same speed as jhash2.
xxhash32 on x86_32 work with ~ same speed as for x86_64
jhash2 are faster than xxhash on input data smaller than 32 byte

So use xxhash() which will take appropriate hash version
for target arch

I did some benchmarks (i get cpu load of ksmd from htop):
  CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
  ksm: sleep_millisecs = 1
    jhash2:   ~18%
    xxhash64: ~11%
  ksm: sleep_millisecs = 20 - default
    jhash2:   ~4.7%
    xxhash64: ~3.3%

  - 11 / 18 ~= 0.6 -> Profit: ~40%
  - 3.3/4.7 ~= 0.7 -> Profit: ~30%

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
---
 mm/Kconfig |  1 +
 mm/ksm.c   | 14 +++++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 9c4bdddd80c2..252ab266ac23 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,7 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index 15dd7415f7b3..6527fe21aaa3 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,7 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
+#include <linux/xxhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -186,7 +186,7 @@ struct rmap_item {
 	};
 	struct mm_struct *mm;
 	unsigned long address;		/* + low bits used for flags below */
-	unsigned int oldchecksum;	/* when unstable */
+	xxhash_t oldchecksum;		/* when unstable */
 	union {
 		struct rb_node node;	/* when node of unstable tree */
 		struct {		/* when listed from stable tree */
@@ -255,7 +255,7 @@ static unsigned int ksm_thread_pages_to_scan = 100;
 static unsigned int ksm_thread_sleep_millisecs = 20;

 /* Checksum of an empty (zeroed) page */
-static unsigned int zero_checksum __read_mostly;
+static xxhash_t zero_checksum __read_mostly;

 /* Whether to merge empty (zeroed) pages with actual zero pages */
 static bool ksm_use_zero_pages __read_mostly;
@@ -982,11 +982,11 @@ static int unmerge_and_remove_all_rmap_items(void)
 }
 #endif /* CONFIG_SYSFS */

-static u32 calc_checksum(struct page *page)
+static xxhash_t calc_checksum(struct page *page)
 {
-	u32 checksum;
+	xxhash_t checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = xxhash(addr, PAGE_SIZE, 0);
 	kunmap_atomic(addr);
 	return checksum;
 }
@@ -1994,7 +1994,7 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 	struct page *tree_page = NULL;
 	struct stable_node *stable_node;
 	struct page *kpage;
-	unsigned int checksum;
+	xxhash_t checksum;
 	int err;
 	bool max_page_sharing_bypass = false;

--
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-22  9:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22  9:52 [PATCH v2 0/2] KSM: Replace jhash2 with xxhash Timofey Titovets
2017-09-22  9:52 ` [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
2017-09-22  9:52 ` [PATCH v2 2/2] KSM: Replace jhash2 with xxhash Timofey Titovets
  -- strict thread matches above, loose matches on Subject: below --
2017-09-21 23:18 [PATCH v2 0/2] " Timofey Titovets
2017-09-21 23:18 ` [PATCH v2 2/2] " Timofey Titovets

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox