From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Daniel Phillips <phillips@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
David Miller <davem@davemloft.net>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Subject: Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
Date: Fri, 18 Aug 2006 11:16:06 +0400 [thread overview]
Message-ID: <20060818071606.GB23264@2ka.mipt.ru> (raw)
In-Reply-To: <44E4FAAA.2050104@google.com>
On Thu, Aug 17, 2006 at 04:24:26PM -0700, Daniel Phillips (phillips@google.com) wrote:
> >Feel free to implement any receiving policy inside _separated_ allocator
> >to meet your needs, but if allocator depends on main system's memory
> >conditions it is always possible that it will fail to make forward
> >progress.
>
> Wrong. Our main allocator has a special reserve that can be accessed
> only by a task that has its PF_MEMALLOC flag set. This reserve exists
> in order to guarantee forward progress in just such situations as the
> network runs into when it is trying to receive responses from a remote
> disk. Anything otherwise is a bug.
Ok, I see your point.
You create special fix for special config situation.
In general it does not work.
> >>>I do not argue that your approach is bad or does not solve the problem,
> >>>I'm just trying to show that further evolution of that idea eventually
> >>>ends up in separated allocator (as long as all most robust systems
> >>>separate operations), which can improve things in a lot of other sides
> >>>too.
> >>
> >>Not a separate allocator per-se, separate socket group, they are
> >>serviced by the kernel, they will never refuse to process data, and it
> >>is critical for the continued well-being of your kernel that they get
> >>their data.
>
> The memalloc reserve is indeed a separate reserve, however it is a
> reserve that already exists, and you are busy creating another separate
> reserve to further partition memory. Partitioning memory is not the
> direction we should be going, we should be trying to unify our reserves
> wherever possible, and find ways such as Andrew and others propose to
> implement "reserve on demand", or in other words, take advantage of
> "easily freeable" pages.
Such approach does not fix the problem.
Why no one complain that there is priveledge separation while "we can
fix all existing application"?
It is possible that there will not be any "easily freeable" pages, and
your special reserve will not be filled.
> If your allocation code is so much more efficient than slab then why
> don't you fix slab instead of replicating functionality that already
> exists elsewhere in the system, and has since day one?
No one need an exuse to rewrite something.
> >You do not know in advance which sockets must be separated (since only
> >in the simplest situation it is the same as in NBD and is
> >kernelspace-only),
>
> Yes we do, they are exactly those sockets that lie in the block IO path.
> The VM cannot deadlock on any others.
There are some pieces of the world behind NBD and iSCSI.
> >you can not solve problem with ARP/ICMP/route changes and other control
> >messages, netfilter, IPsec and compression which still can happen in your
> >setup,
>
> If you bothered to read the patches you would know that ICMP is indeed
> handled. ARP I don't think so, we may need to do that since ARP can
> believably be required for remote disk interface failover. Anybody
> who designs ssh into remote disk failover is an idiot. Ssh for
> configuration, monitoring and administration is fine. Ssh for fencing
> or whatever is just plain stupid, unless the nodes running both server
> and client are not allowed to mount the remote disk.
I only remember that socket with sk_memalloc is being handled, no ICMP,
no ARP. What about other control messages in bonding setup of failover?
> >if something goes wrong and receiving will require additional
> >allocation from network datapath, system is dead,
> >this strict conditions does not allow flexible control over possible
> >connections and does not allow to create additional connections.
>
> You know that how?
>
> >>Also, I do not think people would like it if say 100M of their 1G system
> >>just disappears, never to used again for eg. page-cache in periods of
> >>low network traffic.
> >
> >Just for clarification: network tree allocator gets 512kb and then
> >increases cache size when it is required. Default value can be changed
> >of course.
>
> Great. Now why does the network layer need its own, invented-in-netland
> allocator? Why can't everybody use your allocator if it is better?
As far as I recall, I several times already said, that there is no
problem to use that allocator in any other places, MMU-less systems will
especially greatly benefit from it (since it was designed for them too).
> Also, please don't get the idea that your allocator by itself solves the
> block IO receive starvation problem. At the very least you need to do
> something about network traffic that is unrelated to forward progress of
> memory writeout, yet can starve the memory writeout. Oh wait, our patch
> set already does that.
It is not my allocator that solves the problem, but a situation when
pools are separated!
And you slowly go that direction too (global reserve instead of per-socket
is the first step).
Problem is not solved, when critical allocations depend on reserve which
depends on main system conditions.
> Regards,
>
> Daniel
--
Evgeniy Polyakov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-18 7:16 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-08 19:33 Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 1/9] pfn_to_kaddr() for UML Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 2/9] deadlock prevention core Peter Zijlstra
2006-08-08 20:57 ` Stephen Hemminger
2006-08-08 21:05 ` Peter Zijlstra
2006-08-09 1:33 ` Daniel Phillips
2006-08-09 1:38 ` David Miller, Daniel Phillips
2006-08-08 21:17 ` Thomas Graf
2006-08-09 1:34 ` Daniel Phillips
2006-08-09 1:39 ` David Miller, Daniel Phillips
2006-08-09 5:47 ` Daniel Phillips
2006-08-09 13:19 ` Thomas Graf
2006-08-09 14:07 ` Peter Zijlstra
2006-08-09 16:18 ` Thomas Graf
2006-08-09 16:19 ` Peter Zijlstra
2006-08-10 0:01 ` David Miller, Peter Zijlstra
2006-08-09 23:58 ` David Miller, Peter Zijlstra
2006-08-10 6:25 ` Peter Zijlstra
2006-08-11 4:24 ` Stephen Hemminger
2006-08-13 21:22 ` Daniel Phillips
2006-08-13 23:49 ` David Miller, Daniel Phillips
2006-08-14 1:15 ` Daniel Phillips
2006-08-11 2:37 ` Rik van Riel
2006-08-13 22:05 ` Daniel Phillips
2006-08-13 23:55 ` David Miller, Daniel Phillips
2006-08-14 1:31 ` Daniel Phillips
2006-08-14 1:53 ` Andrew Morton
2006-08-14 4:40 ` Peter Zijlstra
2006-08-14 4:58 ` Andrew Morton
2006-08-14 5:03 ` Peter Zijlstra
2006-08-14 5:22 ` Andrew Morton
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 7:07 ` Andrew Morton
2006-08-14 8:15 ` Peter Zijlstra
2006-08-14 8:25 ` Evgeniy Polyakov
2006-08-14 8:35 ` Peter Zijlstra
2006-08-14 8:33 ` David Miller, Andrew Morton
2006-08-17 4:27 ` Daniel Phillips
2006-08-14 7:17 ` Neil Brown
2006-08-14 7:31 ` Evgeniy Polyakov
2006-08-17 3:58 ` Daniel Phillips
2006-08-17 5:57 ` Andrew Morton
2006-08-17 23:53 ` Daniel Phillips
2006-08-18 0:24 ` Rik van Riel
2006-08-18 0:35 ` Daniel Phillips
2006-08-18 1:14 ` Neil Brown
2006-08-18 6:05 ` Andrew Morton
2006-08-18 21:22 ` Daniel Phillips
2006-08-18 22:34 ` Andrew Morton
2006-08-18 23:44 ` Daniel Phillips
2006-08-19 2:44 ` Andrew Morton
2006-08-19 4:14 ` Network receive stall avoidance (was [PATCH 2/9] deadlock prevention core) Daniel Phillips
2006-08-19 7:28 ` Andrew Morton
2006-08-19 15:06 ` [RFC][PATCH 2/9] deadlock prevention core Rik van Riel
2006-08-20 1:33 ` Andre Tomt
2006-08-19 16:53 ` Ray Lee
2006-08-21 13:27 ` Philip R. Auld
2006-08-25 10:47 ` Pavel Machek
2006-08-21 13:38 ` Jens Axboe
2006-08-08 22:10 ` David Miller
2006-08-09 1:35 ` Daniel Phillips
2006-08-09 1:41 ` David Miller, Daniel Phillips
2006-08-09 5:44 ` Daniel Phillips
2006-08-09 7:00 ` Peter Zijlstra
[not found] ` <42414.81.207.0.53.1155080443.squirrel@81.207.0.53>
2006-08-09 0:25 ` Daniel Phillips
2006-08-09 12:02 ` Indan Zupancic
2006-08-09 12:54 ` Peter Zijlstra
2006-08-09 13:48 ` Indan Zupancic
2006-08-09 14:00 ` Peter Zijlstra
2006-08-09 18:34 ` Indan Zupancic
2006-08-09 19:45 ` Peter Zijlstra
2006-08-09 20:19 ` Peter Zijlstra
2006-08-10 1:21 ` Indan Zupancic
2006-08-09 16:05 ` -v2 " Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 3/9] e1000 driver conversion Peter Zijlstra
2006-08-08 20:50 ` Auke Kok
2006-08-08 20:59 ` Peter Zijlstra
2006-08-08 22:32 ` David Miller, Auke Kok
2006-08-08 22:42 ` Auke Kok
2006-08-08 19:34 ` [RFC][PATCH 4/9] e100 " Peter Zijlstra
2006-08-08 20:13 ` Auke Kok
2006-08-08 20:18 ` Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 5/9] r8169 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 6/9] tg3 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 7/9] UML eth " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 8/9] 3c59x " Peter Zijlstra
2006-08-08 23:07 ` Jeff Garzik
2006-08-09 5:51 ` Daniel Phillips
2006-08-09 5:55 ` David Miller, Daniel Phillips
2006-08-09 6:30 ` Jeff Garzik
2006-08-09 7:03 ` Peter Zijlstra
2006-08-09 7:20 ` Jeff Garzik
2006-08-13 19:38 ` Daniel Phillips
2006-08-13 19:53 ` Jeff Garzik
2006-08-08 19:34 ` [RFC][PATCH 9/9] deadlock prevention for NBD Peter Zijlstra
2006-08-09 5:46 ` [RFC][PATCH 0/9] Network receive " Evgeniy Polyakov
2006-08-09 5:52 ` Daniel Phillips
2006-08-09 5:56 ` David Miller, Daniel Phillips
2006-08-09 5:53 ` David Miller, Evgeniy Polyakov
2006-08-09 5:55 ` Evgeniy Polyakov
2006-08-09 12:37 ` Peter Zijlstra
2006-08-09 13:07 ` Evgeniy Polyakov
2006-08-09 13:32 ` Peter Zijlstra
2006-08-09 19:29 ` Evgeniy Polyakov
2006-08-09 23:54 ` David Miller, Peter Zijlstra
2006-08-10 6:06 ` Peter Zijlstra
2006-08-13 20:16 ` Daniel Phillips
2006-08-14 5:13 ` Evgeniy Polyakov
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 6:54 ` Evgeniy Polyakov
2006-08-17 4:49 ` Daniel Phillips
2006-08-17 4:48 ` Daniel Phillips
2006-08-17 5:36 ` Evgeniy Polyakov
2006-08-17 18:01 ` Daniel Phillips
2006-08-17 18:42 ` Evgeniy Polyakov
2006-08-17 19:15 ` Peter Zijlstra
2006-08-17 19:48 ` Evgeniy Polyakov
2006-08-17 23:24 ` Daniel Phillips
2006-08-18 7:16 ` Evgeniy Polyakov [this message]
2006-08-12 3:42 ` Rik van Riel
2006-08-12 8:47 ` Evgeniy Polyakov
2006-08-12 9:19 ` Peter Zijlstra
2006-08-12 9:37 ` Evgeniy Polyakov
2006-08-12 10:18 ` Peter Zijlstra
2006-08-12 10:42 ` Evgeniy Polyakov
2006-08-12 10:51 ` Evgeniy Polyakov
2006-08-12 11:40 ` Peter Zijlstra
2006-08-12 11:53 ` Evgeniy Polyakov
2006-08-13 0:46 ` David Miller, Peter Zijlstra
2006-08-13 1:11 ` Rik van Riel
2006-08-12 14:40 ` Rik van Riel
2006-08-12 14:49 ` Evgeniy Polyakov
2006-08-12 14:56 ` Rik van Riel
2006-08-12 15:08 ` Evgeniy Polyakov
2006-08-12 15:22 ` Peter Zijlstra
2006-08-14 0:56 ` Daniel Phillips
2006-08-13 0:46 ` David Miller, Evgeniy Polyakov
2006-08-13 9:06 ` Evgeniy Polyakov
2006-08-13 9:52 ` Evgeniy Polyakov
2006-08-15 19:17 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060818071606.GB23264@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox