From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 1/1] network memory allocator.
Date: Tue, 15 Aug 2006 16:48:59 +0200 [thread overview]
Message-ID: <1155653339.5696.282.camel@twins> (raw)
In-Reply-To: <20060815141501.GA10998@2ka.mipt.ru>
On Tue, 2006-08-15 at 18:15 +0400, Evgeniy Polyakov wrote:
> Kevent network AIO is completely different from network tree allocator.
How can that be? packets still need to be received, yes?
> So network allocator reserves above megabyte and works with it in a
> smart way (without too much overhead).
> Then system goes into OOM and requires to swap a page, which
> notification was sent to remote swap storage.
> Swap storage then sends an ack for that data, since network allocations
> are separated from main system ones, network allocator easily gets 60
> (or 4k, since it has a reserve, which exeeds maximum allowed TCP memory
> limit) bytes for ack and process than notification thus "freeing" acked
> data and main system can work with that free memory.
> No need to detect OOM or something other - it just works.
>
> I expect you will give me an example, when all above megabyte is going
> to be stuck somewhere.
> But... If it is not acked, each new packet goes slow path since VJ header
> prediction fails and falls into memory limit check which will drop that
> packet immediately without event trying to select a socket.
Not sure on the details; but you say: when we reach the threshold all
following packets will be dropped. So if you provide enough memory to
exceed the limit, you have some extra. If you then use that extra bit to
allow ACKs to pass through, then you're set.
Sounds good, but you'd have to carve a path for the ACKs, right? Or is
that already there?
Also, I'm worried with the effects of external fragmentation esp. after
long run times. Analysing non trivial memory allocators is hard, very
often too hard.
> > > > > And there is a simple task in TODO list to dynamically grow cache when
> > > > > threshold of memory is in use. It is really simple task and will be
> > > > > implemented as soon as I complete suggestions mentioned by Andrew Morton.
> > > >
> > > > Growing will not help, the problem is you are out of memory, you cannot
> > > > grow at that point.
> > >
> > > You do not see the point of network tree allocator.
> > >
> > > It can live with main system OOM since it has preallocated separate
> > > pool, which can be increased when there is a requirement for that, for
> > > example when system is not in OOM.
> >
> > It cannot increase enough, ever. The total capacity of the network stack
> > is huge.
> > And the sole problem I'm addressing is getting the system to work
> > reliably in tight memory situations, that is during reclaim; one cannot
> > decide to grow then, nor postpone, too late.
>
> Network *is* limited, it is not terabyte array which is going to be
> placed into VFS cache.
No it is not, but you bound it.
> > > > skbuff_head_cache and skbuff_fclone_cache are SLABs.
> > >
> > > It is quite small part of the stack, isn't it?
> > > And btw, they still suffer from SLAB design, since it is possibly to get
> > > another smaller object right after all skbs are allocated from given page.
> > > It is a minor thing of course, but nevertheless worh mentioning.
> >
> > Small but crucial, that is why I've been replacing all.
>
> Sigh, replace kmem_cache_alloc() with avl_alloc() - it does not matter.
It does matter, you need the whole packet, if you cannot allocate a
sk_buff you're still stuck.
> > > > Yes SLAB is a horrid thing on some points but very good at a lot of
> > > > other things. But surely there are frequently used sizes, kmalloc will
> > > > not know, but a developer with profiling tools might.
> > >
> > > Does not scale - admin must run system under profiling, add new
> > > entries into kmalloc_sizes.h recompile the kernel... No way.
> >
> > s/admin/developer/
> > It has been the way so far.
>
> Could you say what are preferred sizes in my testing machines here? :)
> For example MMIO-read based chips (excellent realtek 8139 adapter) can
> allocate not only 1500 bytes of memory, but real size of received frame.
> I even used it for receiving zero-copy (really excellent hardware
> for it's price) into VFS cache implementation (without any kind of
> page-per-packet stuff), but it is not related to our discussion.
Well generally the developer of the driver can say, and very often it
just doesn't matter, but see the wide spread use of private SLABs to see
there is benefit in manually tuning stuff.
> Have you seen how many adapters support packet split?
Not many I guess. That does not make higher order allocations any more
reliable.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-15 14:48 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-14 11:04 Evgeniy Polyakov
2006-08-14 11:22 ` David Miller, Evgeniy Polyakov
2006-08-14 11:32 ` Evgeniy Polyakov
2006-08-14 11:40 ` Andi Kleen
2006-08-14 11:46 ` Evgeniy Polyakov
2006-08-14 12:07 ` Keith Owens
2006-08-14 12:20 ` Evgeniy Polyakov
2006-08-14 17:42 ` Rick Jones
2006-08-14 20:15 ` David Miller, Rick Jones
2006-08-14 12:25 ` Peter Zijlstra
2006-08-14 12:35 ` Evgeniy Polyakov
2006-08-14 12:38 ` Evgeniy Polyakov
2006-08-15 10:55 ` Peter Zijlstra
2006-08-15 11:26 ` Evgeniy Polyakov
2006-08-15 12:03 ` Peter Zijlstra
2006-08-15 12:34 ` Evgeniy Polyakov
2006-08-15 13:49 ` Peter Zijlstra
2006-08-15 14:15 ` Evgeniy Polyakov
2006-08-15 14:48 ` Peter Zijlstra [this message]
2006-08-15 15:05 ` Evgeniy Polyakov
2006-08-15 15:07 ` Evgeniy Polyakov
2006-08-15 17:42 ` Peter Zijlstra
2006-08-15 17:49 ` Evgeniy Polyakov
2006-08-16 2:52 ` Bill Fink
2006-08-16 5:38 ` Evgeniy Polyakov
2006-08-14 17:46 ` Rick Jones
2006-08-14 19:42 ` Evgeniy Polyakov
2006-08-15 7:27 ` Andrew Morton
2006-08-15 8:08 ` Andi Kleen
2006-08-15 10:02 ` Evgeniy Polyakov
2006-08-15 10:27 ` David Miller, Evgeniy Polyakov
2006-08-15 9:20 ` Evgeniy Polyakov
2006-08-15 20:21 ` Arnd Bergmann
2006-08-16 5:35 ` Evgeniy Polyakov
2006-08-16 8:48 ` Christoph Hellwig
2006-08-16 9:00 ` Evgeniy Polyakov
2006-08-16 9:05 ` David Miller, Evgeniy Polyakov
2006-08-16 9:10 ` Christoph Hellwig
2006-08-16 9:32 ` Evgeniy Polyakov
2006-08-16 9:38 ` Christoph Hellwig
2006-08-16 9:40 ` David Miller, Christoph Hellwig
2006-08-16 9:44 ` Christoph Hellwig
2006-08-16 9:42 ` Christoph Hellwig
2006-08-16 11:27 ` Arnd Bergmann
2006-08-16 12:00 ` Evgeniy Polyakov
2006-08-16 12:25 ` Andi Kleen
2006-08-18 2:25 ` Christoph Lameter
2006-08-18 9:29 ` Andi Kleen
2006-08-18 8:51 ` David Miller, Andi Kleen
2006-08-18 17:04 ` Christoph Lameter
2006-08-16 7:51 ` [PATCH2 " Evgeniy Polyakov
2006-08-16 16:57 ` Stephen Hemminger
2006-08-16 19:27 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1155653339.5696.282.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox