[RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag()
@ 2014-01-16 23:17 Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on Debabrata Banerjee
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm

This is a hack against 3.10.y to see if using cached allocations works better here. The unintended consequence is in the reference benchmark case, it performs ~7% better than the existing code even with a hacked slower get_page()/put_page(). The intent was to avoid very slow order-3 allocations (and really pathological retries under failure) which can cause lots of problems from OOM killer invocation to direct reclaim/compaction cycles that take up nearly all cpu and end up reaping large amounts of page cache which would have been otherwise useful. This is a regression from the same code that used order-0 allocations since those are easy and fast as they are cached per-cpu, and this code is under very heavy alloc/free behavior. This patch eliminates a majority of that due to slab caching the allocations, though could still be improved by slab holding onto free'd slabs longer; this seems like an unoptimized case when object size == slab size.

vmstat output of bad behavior: http://pastebin.ubuntu.com/6687527/

This patchset could be fixed for submission by either making another pool of cached frag buffers specifically page_frag (not using slab), or by converting the whole stack to not use get_page/put_page() to reference count and free page allocations so that hacking swap.c is not necessary and slab use normal.

Benchmark:
ifconfig lo mtu 16436
perf record ./netperf -t UDP_STREAM ; perf report

With order-0 allocations:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

262144   65507   10.00      820758      0    43012.26
262144           10.00      820754           43012.05

# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    46.15%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string              
     7.89%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags               
     6.06%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist                
     3.87%  netperf  [kernel.kallsyms]  [k] __rmqueue                             
     1.36%  netperf  [kernel.kallsyms]  [k] __alloc_pages_nodemask                
     1.11%  netperf  [kernel.kallsyms]  [k] alloc_pages_current                   

linux-3.10.y stock order-3 allocations:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00     1054158      0    55243.69
212992           10.00     1019505           53427.68

# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    59.80%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string              
     2.35%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist                
     1.95%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags               
     1.27%  netperf  [ip_tables]        [k] ipt_do_table                          
     1.26%  netperf  [kernel.kallsyms]  [k] udp_sendmsg                           
     1.03%  netperf  [kernel.kallsyms]  [k] enqueue_task_fair                     
     1.00%  netperf  [kernel.kallsyms]  [k] ip_finish_output                              

With this patchset:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00     1127089      0    59065.70
212992           10.00     1072997           56230.98


# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    69.16%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string
     2.56%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags
     1.00%  netperf  [ip_tables]        [k] ipt_do_table
     0.96%  netperf  [kernel.kallsyms]  [k] sock_alloc_send_pskb
     0.93%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on.
  2014-01-16 23:17 [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag() Debabrata Banerjee
@ 2014-01-16 23:17 ` Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 2/3] Use slab allocations for netdev page_frag receive buffers Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 3/3] Use slab allocations for sk page_frag send buffers Debabrata Banerjee
  2 siblings, 0 replies; 4+ messages in thread
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm

---
 include/linux/mm.h |  6 ++++++
 mm/slab.c          |  8 ++++++++
 mm/swap.c          | 13 ++++++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..de21a92 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -398,12 +398,18 @@ static inline void get_huge_page_tail(struct page *page)
 }
 
 extern bool __get_page_tail(struct page *page);
+extern struct page *slabpage_to_headpage(struct page *page);
 
 static inline void get_page(struct page *page)
 {
 	if (unlikely(PageTail(page)))
 		if (likely(__get_page_tail(page)))
 			return;
+
+	//Hack for slab page
+	if (unlikely(page->flags & (1L << PG_slab)))
+		page = slabpage_to_headpage(page);
+
 	/*
 	 * Getting a normal page or the head of a compound page
 	 * requires to already have an elevated page->_count.
diff --git a/mm/slab.c b/mm/slab.c
index bd88411..36d5176 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -483,6 +483,14 @@ static inline unsigned int obj_to_index(const struct kmem_cache *cache,
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
+struct page *slabpage_to_headpage(struct page *page)
+{
+	//Hack to support get_page/put_page on slabs bigger than a page
+	unsigned int idx = obj_to_index(page->slab_cache, page->slab_page, page_address(page));
+	return virt_to_page(index_to_obj(page->slab_cache, page->slab_page, idx));
+}
+EXPORT_SYMBOL(slabpage_to_headpage);
+
 static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
diff --git a/mm/swap.c b/mm/swap.c
index 9f2225f..94c75bc 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -172,9 +172,20 @@ skip_lock_tail:
 	}
 }
 
+extern struct page *slabpage_to_headpage(struct page *page);
+
 void put_page(struct page *page)
 {
-	if (unlikely(PageCompound(page)))
+	if (unlikely(page->flags & (1L << PG_slab))) {
+		struct page *head_page = slabpage_to_headpage(page);
+		//Hack. Assume we have >PAGE_SIZE and aligned slabs, and no one is dumb enough
+		//to do a put_page to 0 on a slab page without meaning to free it from the slab.
+		if (put_page_testzero(head_page)) {
+			get_page(head_page); //restore 1 _count for slab
+			kmem_cache_free(page->slab_cache, page_address(head_page));
+		}
+	}
+	else if (unlikely(PageCompound(page)))
 		put_compound_page(page);
 	else if (put_page_testzero(page))
 		__put_single_page(page);
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] Use slab allocations for netdev page_frag receive buffers
  2014-01-16 23:17 [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag() Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on Debabrata Banerjee
@ 2014-01-16 23:17 ` Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 3/3] Use slab allocations for sk page_frag send buffers Debabrata Banerjee
  2 siblings, 0 replies; 4+ messages in thread
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm

---
 net/core/skbuff.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d9e8736..7ecb7a8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -368,6 +368,8 @@ struct netdev_alloc_cache {
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);

+struct kmem_cache *netdev_page_frag_cache;
+
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
 	struct netdev_alloc_cache *nc;
@@ -379,18 +381,22 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 	nc = &__get_cpu_var(netdev_alloc_cache);
 	if (unlikely(!nc->frag.page)) {
 refill:
-		for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
-			gfp_t gfp = gfp_mask;
-
-			if (order)
-				gfp |= __GFP_COMP | __GFP_NOWARN;
-			nc->frag.page = alloc_pages(gfp, order);
-			if (likely(nc->frag.page))
-				break;
-			if (--order < 0)
-				goto end;
+		if (NETDEV_FRAG_PAGE_MAX_ORDER > 0) {
+			void *kmem = kmem_cache_alloc(netdev_page_frag_cache, gfp_mask | __GFP_NOWARN);
+			if (likely(kmem)) {
+				nc->frag.page = virt_to_page(kmem);
+				nc->frag.size = PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER;
+				goto recycle;
+			}
 		}
-		nc->frag.size = PAGE_SIZE << order;
+
+		nc->frag.page = alloc_page(gfp_mask);
+
+		if (likely(nc->frag.page))
+			nc->frag.size = PAGE_SIZE;
+		else
+			goto end;
+
 recycle:
 		atomic_set(&nc->frag.page->_count, NETDEV_PAGECNT_MAX_BIAS);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
@@ -3092,6 +3098,11 @@ void __init skb_init(void)
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 						NULL);
+	netdev_page_frag_cache = kmem_cache_create("netdev_page_frag_cache",
+						PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER,
+						PAGE_SIZE,
+						SLAB_HWCACHE_ALIGN,
+						NULL);
 }

 /**
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] Use slab allocations for sk page_frag send buffers
  2014-01-16 23:17 [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag() Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on Debabrata Banerjee
  2014-01-16 23:17 ` [RFC PATCH 2/3] Use slab allocations for netdev page_frag receive buffers Debabrata Banerjee
@ 2014-01-16 23:17 ` Debabrata Banerjee
  2 siblings, 0 replies; 4+ messages in thread
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm

---
 net/core/sock.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 6565431..dbbd2f9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1792,10 +1792,12 @@ EXPORT_SYMBOL(sock_alloc_send_skb);
 
 /* On 32bit arches, an skb frag is limited to 2^15 */
 #define SKB_FRAG_PAGE_ORDER	get_order(32768)
+struct kmem_cache *sk_page_frag_cache;
 
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 {
 	int order;
+	gfp_t gfp_mask = sk->sk_allocation;
 
 	if (pfrag->page) {
 		if (atomic_read(&pfrag->page->_count) == 1) {
@@ -1807,21 +1809,25 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 		put_page(pfrag->page);
 	}
 
-	/* We restrict high order allocations to users that can afford to wait */
-	order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
+	order = SKB_FRAG_PAGE_ORDER;
 
-	do {
-		gfp_t gfp = sk->sk_allocation;
-
-		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN;
-		pfrag->page = alloc_pages(gfp, order);
-		if (likely(pfrag->page)) {
+	if (order > 0) {
+		void *kmem = kmem_cache_alloc(sk_page_frag_cache, gfp_mask | __GFP_NOWARN);
+		if (likely(kmem)) {
+			pfrag->page = virt_to_page(kmem);
 			pfrag->offset = 0;
 			pfrag->size = PAGE_SIZE << order;
 			return true;
 		}
-	} while (--order >= 0);
+	}
+
+	pfrag->page = alloc_page(gfp_mask);
+
+	if (likely(pfrag->page)) {
+		pfrag->offset = 0;
+		pfrag->size = PAGE_SIZE;
+		return true;
+	}
 
 	sk_enter_memory_pressure(sk);
 	sk_stream_moderate_sndbuf(sk);
@@ -2822,13 +2828,18 @@ static __net_init int proto_init_net(struct net *net)
 {
 	if (!proc_create("protocols", S_IRUGO, net->proc_net, &proto_seq_fops))
 		return -ENOMEM;
-
+	sk_page_frag_cache = kmem_cache_create("sk_page_frag_cache",
+			  PAGE_SIZE << SKB_FRAG_PAGE_ORDER,
+			  PAGE_SIZE,
+			  SLAB_HWCACHE_ALIGN,
+			  NULL);
 	return 0;
 }
 
 static __net_exit void proto_exit_net(struct net *net)
 {
 	remove_proc_entry("protocols", net->proc_net);
+	kmem_cache_destroy(sk_page_frag_cache);
 }
 
 
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-16 23:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-16 23:17 [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag() Debabrata Banerjee
2014-01-16 23:17 ` [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on Debabrata Banerjee
2014-01-16 23:17 ` [RFC PATCH 2/3] Use slab allocations for netdev page_frag receive buffers Debabrata Banerjee
2014-01-16 23:17 ` [RFC PATCH 3/3] Use slab allocations for sk page_frag send buffers Debabrata Banerjee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox