linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning?
@ 2009-06-16 17:24 starlight
  0 siblings, 0 replies; 8+ messages in thread
From: starlight @ 2009-06-16 17:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, linux-mm, hugh.dickins, Lee.Schermerhorn,
	kosaki.motohiro, ebmunson, agl, apw, wli

>Tried increasing a few /proc/slabinfo tuneable parameters today
>and this appears to have fixed the issue so far today.

Spoke too soon.  A burst of allocation fails appeared
a some incoming data was lost.  'e1000e' system had
no problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced  by  tuning?
  2009-06-16  6:12           ` Eric Dumazet
@ 2009-07-05  3:44             ` Herbert Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2009-07-05  3:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: starlight, linux-kernel, mel, linux-mm, hugh.dickins,
	Lee.Schermerhorn, kosaki.motohiro, ebmunson, agl, apw, wli,
	netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> Because of slab rounding, this reallocation should be done only if resulting data
> portion is really smaller (50 %) than original skb.

If we're going to do this in the core then we should only do it
in the spots where the packet may be held indefinitely.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning?
  2009-06-16  9:19       ` Mel Gorman
@ 2009-06-16 15:25         ` starlight
  0 siblings, 0 replies; 8+ messages in thread
From: starlight @ 2009-06-16 15:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, linux-mm, hugh.dickins, Lee.Schermerhorn,
	kosaki.motohiro, ebmunson, agl, apw, wli

At 10:19 AM 6/16/2009 +0100, Mel Gorman wrote:

>Can you give an example of an allocation failure? Specifically, I want to
>see what sort of allocation it was and what order.

I think it's just the basic buffer allocation for
Ethernet frames arriving in the 'ixgbe' driver.  Seems
like it's one allocation per frame.  Per the original
message the allocations are made with the 'netdev_alloc_skb()'
kernel call.  The function where this code appears is
named 'ixgbe_alloc_rx_buffers()' and the comment is
"Replace used receive buffers."

The code path in question does not generate an error.  It just
increments the 'alloc_rx_buff_failed' counter for the ethX
device.  In addition it appears that the frame is dropped
only if the PCIe hardware ring-queue associated with each
interface is full.  So on the next interrupt the allocation
is retried and appears to be successful 99% of the time.

>For reliable protocols, an allocation failure should recover and the
>data get through but obviously there is a drop in network performance
>when this happens.

This is for a specialized high-volume UDP multicast application
where data loss of any kind is unacceptable.

>If the allocations are high-order and atomic, increasing min_free_kbytes
>can help, particularly in situations where there is a burst of network
>traffic. I won't know if they are atomic until I see an error message
>though.

Doesn't the use of 'netdev_alloc_skb()' kernel primitive
imply what the nature of the allocation is?  I followed the
call graph down into "kmem" land, but it's a complex place
and so I abandoned the review.

My impression is that 'min_free_kbytes' relates mainly to systems
where significant paging pressure exists.  The servers have zero
paging pressure and lots of free memory, though mostly in the
form of instantly discardable file data cache pages.  In the
past disabling the program that generates the cache pressure
has had no effect on data loss, though I haven't tried it in
relation this specific issue.

Tried increasing a few /proc/slabinfo tuneable parameters today
and this appears to have fixed the issue so far today.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning?
  2009-06-16  0:19     ` QUESTION: can netdev_alloc_skb() errors be reduced by tuning? starlight
  2009-06-16  2:26       ` Eric Dumazet
@ 2009-06-16  9:19       ` Mel Gorman
  2009-06-16 15:25         ` starlight
  1 sibling, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2009-06-16  9:19 UTC (permalink / raw)
  To: starlight
  Cc: linux-kernel, linux-mm, hugh.dickins, Lee.Schermerhorn,
	kosaki.motohiro, ebmunson, agl, apw, wli

On Mon, Jun 15, 2009 at 08:19:33PM -0400, starlight@binnacle.cx wrote:
> Hello,
> 
> I submitted testcase for a hugepages bug that has been 
> successfully resolved.  Have an apparently obscure question 
> related to MM, and so I am asking anyone who might have some idea 
> on this.  Nothing much turned up via Google and digging into
> the KMEM code looks daunting.
> 
> Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress. 
> Generally is working well after tuning IRQ affinities, but a 
> fair number of buffer allocation failures are occurring in the 
> 'ixgbe' device driver and are reported via 'ethtool' statistics. 
>  This may be causing data loss.
> 

Can you give an example of an allocation failure? Specifically, I want to
see what sort of allocation it was and what order.

For reliable protocols, an allocation failure should recover and the
data get through but obviously there is a drop in network performance
when this happens.

> The kernel primitive returning the error is netdev_alloc_skb().
> 
> Are any tuneable parameters available that can reduce or 
> eliminate these allocation failures?  Have about eleven 
> gigabytes of free memory, though most of that is consumed 
> by non-dirty file cache data.  Total system memory is 16GB with 
> 4GB allocated to hugepages.  Zero swap usage and activity though
> swap is enabled.  Most application memory is hugepage or is
> 'mlock()'ed.
> 

If the allocations are high-order and atomic, increasing min_free_kbytes
can help, particularly in situations where there is a burst of network
traffic. I won't know if they are atomic until I see an error message
though.

> Thank you.
> 
> 
> 
> 
> 
> System rebooted before test run.
> 
> Dual Xeon E5430, 16GB FB-DIMM RAM.
> 
> 
> $ cat /proc/meminfo
> MemTotal:     16443828 kB
> MemFree:        281176 kB
> Buffers:         53896 kB
> Cached:       11331924 kB
> SwapCached:          0 kB
> Active:         200740 kB
> Inactive:     11284312 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:     16443828 kB
> LowFree:        281176 kB
> SwapTotal:     2031608 kB
> SwapFree:      2031400 kB
> Dirty:               4 kB
> Writeback:           0 kB
> AnonPages:      104464 kB
> Mapped:          14644 kB
> Slab:           440452 kB
> PageTables:       4032 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:   8156368 kB
> Committed_AS:   122452 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed:    266872 kB
> VmallocChunk: 34359471043 kB
> HugePages_Total:  2048
> HugePages_Free:    735
> HugePages_Rsvd:      0
> Hugepagesize:     2048 kB
> 
> 
> # ethtool -S eth2 | egrep -v ': 0$'
> NIC statistics:
>      rx_packets: 724246449
>      tx_packets: 229847
>      rx_bytes: 152691992335
>      tx_bytes: 10573426
>      multicast: 725997241
>      broadcast: 6
>      rx_csum_offload_good: 723051776
>      alloc_rx_buff_failed: 7119
>      tx_queue_0_packets: 229847
>      tx_queue_0_bytes: 10573426
>      rx_queue_0_packets: 340698332
>      rx_queue_0_bytes: 70844299683
>      rx_queue_1_packets: 385298923
>      rx_queue_1_bytes: 82276167594
> 
> 
> ixgbe driver fragment
> =====================
>     struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);
> 
>     if (!skb) {
>         adapter->alloc_rx_buff_failed++;
>         goto no_buffers;
>     }
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced  by  tuning?
  2009-06-16  4:12         ` starlight
@ 2009-06-16  6:12           ` Eric Dumazet
  2009-07-05  3:44             ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2009-06-16  6:12 UTC (permalink / raw)
  To: starlight
  Cc: Eric Dumazet, linux-kernel, Mel Gorman, linux-mm, hugh.dickins,
	Lee.Schermerhorn, kosaki.motohiro, ebmunson, agl, apw, wli,
	Linux Netdev List

Please dont top post, we prefer other way around :)

starlight@binnacle.cx a ecrit :
> Eric,
> 
> Great thought--thank you.  Running a similar server with 
> 82571/e1000e and it does not exhibit the problem.  'e1000e' has 
> default copybreak=256 while 'ixgbe' has no copybreak.  Rational 
> given is
> 
>    http://osdir.com/ml/linux.drivers.e1000.devel/2008-01/msg00103.html
> 
> But the comparion is a bit apples-and-oranges since the 'e1000e' 
> system is dual Opteron 2354 while the 'ixgbe' system is Xeon 
> E5430 (a painful choice thus far).  Also 'e1000e' system passes 
> data via a PACKET socket while the 'ixgbe' system passes data 
> via UDP (a configurable option).
> 
> I'm not fully up on how this all works: am I to understand that 
> the error could result from RX ring-queue buffers not freeing 
> quickly enough because they have a use-count held non-zero as
> the packet travels the stack?

Well, error is normal in stress situation, when no more kernel
memory is available.

cat /proc/net/udp

can show you (in last column) sockets where packets where dropped
by UDP stack if their receive queue was full.

> 
> I've just doubled some SLAB tuneables that seem relevant, but 
> if the cause is the aforementioned, this won't help.  Will
> have the answer on the tweaks by the end of Tuesday.
> 
> David

copybreak in drivers themselves is nice because driver can recycle
its rx skbs much faster, but that is suboptimal in forwarding (routers)
workloads. Its also a lot of duplicated code in every driver.

So we could do the skb trimming (ie : reallocating the data portion to exactly
the size of packet) in core network stack, when we know packet must be handled
by an application, and not dropped or forwarded by kernel.

Because of slab rounding, this reallocation should be done only if resulting data
portion is really smaller (50 %) than original skb.

> 
> 
> 
> At 04:26 AM 6/16/2009 +0200, Eric Dumazet wrote:
>> 152691992335/724246449 = 210 bytes per rx packet in average
>>
>> It could make sense to add copybreak feature in this driver to 
>> reduce memory needs, but that also would consume more cpu 
>> cycles, and slow down forwarding setups.
>>
>> Maybe this packet trimming could be done generically in UDP 
>> stack input path, before queueing packet into a receive queue, 
>> if amount of available memory is under a given threshold.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced by  tuning?
  2009-06-16  2:26       ` Eric Dumazet
@ 2009-06-16  4:12         ` starlight
  2009-06-16  6:12           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: starlight @ 2009-06-16  4:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Mel Gorman, linux-mm, hugh.dickins,
	Lee.Schermerhorn, kosaki.motohiro, ebmunson, agl, apw, wli

Eric,

Great thought--thank you.  Running a similar server with 
82571/e1000e and it does not exhibit the problem.  'e1000e' has 
default copybreak=256 while 'ixgbe' has no copybreak.  Rational 
given is

   http://osdir.com/ml/linux.drivers.e1000.devel/2008-01/msg00103.html

But the comparion is a bit apples-and-oranges since the 'e1000e' 
system is dual Opteron 2354 while the 'ixgbe' system is Xeon 
E5430 (a painful choice thus far).  Also 'e1000e' system passes 
data via a PACKET socket while the 'ixgbe' system passes data 
via UDP (a configurable option).

I'm not fully up on how this all works: am I to understand that 
the error could result from RX ring-queue buffers not freeing 
quickly enough because they have a use-count held non-zero as
the packet travels the stack?

I've just doubled some SLAB tuneables that seem relevant, but 
if the cause is the aforementioned, this won't help.  Will
have the answer on the tweaks by the end of Tuesday.

David



At 04:26 AM 6/16/2009 +0200, Eric Dumazet wrote:
>
>152691992335/724246449 = 210 bytes per rx packet in average
>
>It could make sense to add copybreak feature in this driver to 
>reduce memory needs, but that also would consume more cpu 
>cycles, and slow down forwarding setups.
>
>Maybe this packet trimming could be done generically in UDP 
>stack input path, before queueing packet into a receive queue, 
>if amount of available memory is under a given threshold.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: QUESTION: can netdev_alloc_skb() errors be reduced by  tuning?
  2009-06-16  0:19     ` QUESTION: can netdev_alloc_skb() errors be reduced by tuning? starlight
@ 2009-06-16  2:26       ` Eric Dumazet
  2009-06-16  4:12         ` starlight
  2009-06-16  9:19       ` Mel Gorman
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2009-06-16  2:26 UTC (permalink / raw)
  To: starlight
  Cc: linux-kernel, Mel Gorman, linux-mm, hugh.dickins,
	Lee.Schermerhorn, kosaki.motohiro, ebmunson, agl, apw, wli

starlight@binnacle.cx a ecrit :
> Hello,
> 
> I submitted testcase for a hugepages bug that has been 
> successfully resolved.  Have an apparently obscure question 
> related to MM, and so I am asking anyone who might have some idea 
> on this.  Nothing much turned up via Google and digging into
> the KMEM code looks daunting.
> 
> Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress. 
> Generally is working well after tuning IRQ affinities, but a 
> fair number of buffer allocation failures are occurring in the 
> 'ixgbe' device driver and are reported via 'ethtool' statistics. 
>  This may be causing data loss.
> 
> The kernel primitive returning the error is netdev_alloc_skb().
> 
> Are any tuneable parameters available that can reduce or 
> eliminate these allocation failures?  Have about eleven 
> gigabytes of free memory, though most of that is consumed 
> by non-dirty file cache data.  Total system memory is 16GB with 
> 4GB allocated to hugepages.  Zero swap usage and activity though
> swap is enabled.  Most application memory is hugepage or is
> 'mlock()'ed.
> 
> Thank you.
> 
> 
> 
> 
> 
> System rebooted before test run.
> 
> Dual Xeon E5430, 16GB FB-DIMM RAM.
> 
> 
> $ cat /proc/meminfo
> MemTotal:     16443828 kB
> MemFree:        281176 kB
> Buffers:         53896 kB
> Cached:       11331924 kB
> SwapCached:          0 kB
> Active:         200740 kB
> Inactive:     11284312 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:     16443828 kB
> LowFree:        281176 kB
> SwapTotal:     2031608 kB
> SwapFree:      2031400 kB
> Dirty:               4 kB
> Writeback:           0 kB
> AnonPages:      104464 kB
> Mapped:          14644 kB
> Slab:           440452 kB
> PageTables:       4032 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:   8156368 kB
> Committed_AS:   122452 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed:    266872 kB
> VmallocChunk: 34359471043 kB
> HugePages_Total:  2048
> HugePages_Free:    735
> HugePages_Rsvd:      0
> Hugepagesize:     2048 kB
> 
> 
> # ethtool -S eth2 | egrep -v ': 0$'
> NIC statistics:
>      rx_packets: 724246449
>      tx_packets: 229847
>      rx_bytes: 152691992335
>      tx_bytes: 10573426
>      multicast: 725997241
>      broadcast: 6
>      rx_csum_offload_good: 723051776
>      alloc_rx_buff_failed: 7119
>      tx_queue_0_packets: 229847
>      tx_queue_0_bytes: 10573426
>      rx_queue_0_packets: 340698332
>      rx_queue_0_bytes: 70844299683
>      rx_queue_1_packets: 385298923
>      rx_queue_1_bytes: 82276167594
> 
> 
> ixgbe driver fragment
> =====================
>     struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);
> 
>     if (!skb) {
>         adapter->alloc_rx_buff_failed++;
>         goto no_buffers;
>     }
> 

152691992335/724246449 = 210 bytes per rx packet in average

It could make sense to add copybreak feature in this driver to reduce memory needs,
but that also would consume more cpu cycles, and slow down forwarding setups.

Maybe this packet trimming could be done generically in UDP stack input path,
before queueing packet into a receive queue, if amount of available memory
is under a given threshold.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* QUESTION: can netdev_alloc_skb() errors be reduced by tuning?
  2009-05-27 23:19   ` Ingo Molnar
@ 2009-06-16  0:19     ` starlight
  2009-06-16  2:26       ` Eric Dumazet
  2009-06-16  9:19       ` Mel Gorman
  0 siblings, 2 replies; 8+ messages in thread
From: starlight @ 2009-06-16  0:19 UTC (permalink / raw)
  To: linux-kernel, Mel Gorman, linux-mm, hugh.dickins,
	Lee.Schermerhorn, kosaki.motohiro, ebmunson, agl, apw, wli

Hello,

I submitted testcase for a hugepages bug that has been 
successfully resolved.  Have an apparently obscure question 
related to MM, and so I am asking anyone who might have some idea 
on this.  Nothing much turned up via Google and digging into
the KMEM code looks daunting.

Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress. 
Generally is working well after tuning IRQ affinities, but a 
fair number of buffer allocation failures are occurring in the 
'ixgbe' device driver and are reported via 'ethtool' statistics. 
 This may be causing data loss.

The kernel primitive returning the error is netdev_alloc_skb().

Are any tuneable parameters available that can reduce or 
eliminate these allocation failures?  Have about eleven 
gigabytes of free memory, though most of that is consumed 
by non-dirty file cache data.  Total system memory is 16GB with 
4GB allocated to hugepages.  Zero swap usage and activity though
swap is enabled.  Most application memory is hugepage or is
'mlock()'ed.

Thank you.





System rebooted before test run.

Dual Xeon E5430, 16GB FB-DIMM RAM.


$ cat /proc/meminfo
MemTotal:     16443828 kB
MemFree:        281176 kB
Buffers:         53896 kB
Cached:       11331924 kB
SwapCached:          0 kB
Active:         200740 kB
Inactive:     11284312 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     16443828 kB
LowFree:        281176 kB
SwapTotal:     2031608 kB
SwapFree:      2031400 kB
Dirty:               4 kB
Writeback:           0 kB
AnonPages:      104464 kB
Mapped:          14644 kB
Slab:           440452 kB
PageTables:       4032 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   8156368 kB
Committed_AS:   122452 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    266872 kB
VmallocChunk: 34359471043 kB
HugePages_Total:  2048
HugePages_Free:    735
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


# ethtool -S eth2 | egrep -v ': 0$'
NIC statistics:
     rx_packets: 724246449
     tx_packets: 229847
     rx_bytes: 152691992335
     tx_bytes: 10573426
     multicast: 725997241
     broadcast: 6
     rx_csum_offload_good: 723051776
     alloc_rx_buff_failed: 7119
     tx_queue_0_packets: 229847
     tx_queue_0_bytes: 10573426
     rx_queue_0_packets: 340698332
     rx_queue_0_bytes: 70844299683
     rx_queue_1_packets: 385298923
     rx_queue_1_bytes: 82276167594


ixgbe driver fragment
=====================
    struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);

    if (!skb) {
        adapter->alloc_rx_buff_failed++;
        goto no_buffers;
    }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-07-05  3:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-16 17:24 QUESTION: can netdev_alloc_skb() errors be reduced by tuning? starlight
  -- strict thread matches above, loose matches on Subject: below --
2009-05-27 11:12 [PATCH 0/2] Fixes for hugetlbfs-related problems on shared memory Mel Gorman
2009-05-27 20:14 ` Andrew Morton
2009-05-27 23:19   ` Ingo Molnar
2009-06-16  0:19     ` QUESTION: can netdev_alloc_skb() errors be reduced by tuning? starlight
2009-06-16  2:26       ` Eric Dumazet
2009-06-16  4:12         ` starlight
2009-06-16  6:12           ` Eric Dumazet
2009-07-05  3:44             ` Herbert Xu
2009-06-16  9:19       ` Mel Gorman
2009-06-16 15:25         ` starlight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox