From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Indan Zupancic <indan@nul.nu>
Cc: Daniel Phillips <phillips@google.com>,
netdev@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH 2/9] deadlock prevention core
Date: Wed, 09 Aug 2006 21:45:44 +0200 [thread overview]
Message-ID: <1155152744.23134.67.camel@lappy> (raw)
In-Reply-To: <62411.194.109.238.121.1155148442.squirrel@194.109.238.121>
On Wed, 2006-08-09 at 20:34 +0200, Indan Zupancic wrote:
> On Wed, August 9, 2006 16:00, Peter Zijlstra said:
> > On Wed, 2006-08-09 at 15:48 +0200, Indan Zupancic wrote:
> >> On Wed, August 9, 2006 14:54, Peter Zijlstra said:
> >> > On Wed, 2006-08-09 at 14:02 +0200, Indan Zupancic wrote:
> >> >> That avoids lots of checks and should guarantee that the
> >> >> accounting is correct, except in the case when the IFF_MEMALLOC flag is
> >> >> cleared and the counter is set to zero manually. Can't that be avoided and
> >> >> just let it decrease to zero naturally?
> >> >
> >> > That would put the atomic op on the free path unconditionally, I think
> >> > davem gets nightmares from that.
> >>
> >> I confused SOCK_MEMALLOC with sk_buff::memalloc, sorry. What I meant was
> >> to unconditionally decrement the reserved usage only when memalloc is true
> >> on the free path. That way all skbs that increased the reserve also decrease
> >> it, and the counter should never go below zero.
> >
> > OK, so far so good, except we loose the notion of getting memory back
> > from regular skbs.
>
> I don't understand this, regular skbs don't have anything to do with
> rx_reserve_used as far as I can see. I'm only talking about keeping
> that field up to date and correct. rx_reserve_used is only increased
> by a skb when memalloc is set to true on that skb, so only if that field
> is set rx_reserve_used needs to be reduced when the skb is freed.
I know what you ment, and if you've looked at -v2 you'll see that I've
done this, basically because its easier. However the thought behind the
other semantics is, any skb freed will reduce memory pressure.
> Why is it needed for the protocol specific code to call dev_unreserve_skb?
It uses this to get an indication of memory pressure; if we have
memalloc'ed skbs memory pressure must be high, hence we must drop all
non critical packets. But you are right in that this is a problematic
area; the mapping from skb to device is non trivial.
Your suggestion of testing skb->memalloc might work just as good; indeed
if we have regressed into the fallback allocator we know we have
pressure.
> Only problem is if the device can change. rx_reserve_used should probably
> be updated when that happens, as a skb can't use reserved memory on a device
> it was moved away from. (right?)
Well yes, this is a problem, only today have I understood how volatile
the mapping actually is. I think you are right in that transferring the
accounting from the old to the new device is correct solution.
However this brings us the problem of limiting the fallback allocator;
currently this is done in __netdev_alloc_skb where rx_reserve_used it
compared against rx_reserve. If we transfer accounting away this will
not work anymore. I'll have to think about this case, perhaps we already
have a problem here.
> >> Also as far as I can see it should be possible to replace all atomic
> >> "if (unlikely(dev_reserve_used(skb->dev)))" checks witha check if
> >> memalloc is set. That should make davem happy, as there aren't any
> >> atomic instructions left in hot paths.
> >
> > dev_reserve_used() uses atomic_read() which isn't actually a LOCK'ed
> > instruction, so that should not matter.
>
> Perhaps, but the main reason to check memalloc instead of using
> dev_reserve_used is because the latter doesn't tell which skb did the
> reservation.
Very good point indeed.
> >> If IFF_MEMALLOC is set new skbs set memalloc and increase the reserve.
> >
> > Not quite, if IFF_MEMALLOC is set new skbs _could_ get memalloc set. We
> > only fall back to alloc_pages() if the regular path fails to alloc. If the
> > skb is backed by a page (as opposed to kmem_cache fluff) sk_buff::memalloc
> > is set.
>
> Yes, true. But doesn't matter for the rx_reserve_used accounting, as long as
> memalloc set means that it did increase rx_reserve_used.
>
> > Also, I've been thinking (more pain), should I not up the reserve for
> > each SOCK_MEMALLOC socket.
>
> Up rx_reserve_used or the total ammount of reserved memory? Probably 'no' for
> both though, as it's either device specific or skb dependent.
I came up with yes, if for each socket you gain a request queue, the
number of in-flight pages is proportional to the number of sockets.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-09 19:45 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-08 19:33 [RFC][PATCH 0/9] Network receive deadlock prevention for NBD Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 1/9] pfn_to_kaddr() for UML Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 2/9] deadlock prevention core Peter Zijlstra
2006-08-08 20:57 ` Stephen Hemminger
2006-08-08 21:05 ` Peter Zijlstra
2006-08-09 1:33 ` Daniel Phillips
2006-08-09 1:38 ` David Miller, Daniel Phillips
2006-08-08 21:17 ` Thomas Graf
2006-08-09 1:34 ` Daniel Phillips
2006-08-09 1:39 ` David Miller, Daniel Phillips
2006-08-09 5:47 ` Daniel Phillips
2006-08-09 13:19 ` Thomas Graf
2006-08-09 14:07 ` Peter Zijlstra
2006-08-09 16:18 ` Thomas Graf
2006-08-09 16:19 ` Peter Zijlstra
2006-08-10 0:01 ` David Miller, Peter Zijlstra
2006-08-09 23:58 ` David Miller, Peter Zijlstra
2006-08-10 6:25 ` Peter Zijlstra
2006-08-11 4:24 ` Stephen Hemminger
2006-08-13 21:22 ` Daniel Phillips
2006-08-13 23:49 ` David Miller, Daniel Phillips
2006-08-14 1:15 ` Daniel Phillips
2006-08-11 2:37 ` Rik van Riel
2006-08-13 22:05 ` Daniel Phillips
2006-08-13 23:55 ` David Miller, Daniel Phillips
2006-08-14 1:31 ` Daniel Phillips
2006-08-14 1:53 ` Andrew Morton
2006-08-14 4:40 ` Peter Zijlstra
2006-08-14 4:58 ` Andrew Morton
2006-08-14 5:03 ` Peter Zijlstra
2006-08-14 5:22 ` Andrew Morton
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 7:07 ` Andrew Morton
2006-08-14 8:15 ` Peter Zijlstra
2006-08-14 8:25 ` Evgeniy Polyakov
2006-08-14 8:35 ` Peter Zijlstra
2006-08-14 8:33 ` David Miller, Andrew Morton
2006-08-17 4:27 ` Daniel Phillips
2006-08-14 7:17 ` Neil Brown
2006-08-14 7:31 ` Evgeniy Polyakov
2006-08-17 3:58 ` Daniel Phillips
2006-08-17 5:57 ` Andrew Morton
2006-08-17 23:53 ` Daniel Phillips
2006-08-18 0:24 ` Rik van Riel
2006-08-18 0:35 ` Daniel Phillips
2006-08-18 1:14 ` Neil Brown
2006-08-18 6:05 ` Andrew Morton
2006-08-18 21:22 ` Daniel Phillips
2006-08-18 22:34 ` Andrew Morton
2006-08-18 23:44 ` Daniel Phillips
2006-08-19 2:44 ` Andrew Morton
2006-08-19 4:14 ` Network receive stall avoidance (was [PATCH 2/9] deadlock prevention core) Daniel Phillips
2006-08-19 7:28 ` Andrew Morton
2006-08-19 15:06 ` [RFC][PATCH 2/9] deadlock prevention core Rik van Riel
2006-08-20 1:33 ` Andre Tomt
2006-08-19 16:53 ` Ray Lee
2006-08-21 13:27 ` Philip R. Auld
2006-08-25 10:47 ` Pavel Machek
2006-08-21 13:38 ` Jens Axboe
2006-08-08 22:10 ` David Miller
2006-08-09 1:35 ` Daniel Phillips
2006-08-09 1:41 ` David Miller, Daniel Phillips
2006-08-09 5:44 ` Daniel Phillips
2006-08-09 7:00 ` Peter Zijlstra
[not found] ` <42414.81.207.0.53.1155080443.squirrel@81.207.0.53>
2006-08-09 0:25 ` Daniel Phillips
2006-08-09 12:02 ` Indan Zupancic
2006-08-09 12:54 ` Peter Zijlstra
2006-08-09 13:48 ` Indan Zupancic
2006-08-09 14:00 ` Peter Zijlstra
2006-08-09 18:34 ` Indan Zupancic
2006-08-09 19:45 ` Peter Zijlstra [this message]
2006-08-09 20:19 ` Peter Zijlstra
2006-08-10 1:21 ` Indan Zupancic
2006-08-09 16:05 ` -v2 " Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 3/9] e1000 driver conversion Peter Zijlstra
2006-08-08 20:50 ` Auke Kok
2006-08-08 20:59 ` Peter Zijlstra
2006-08-08 22:32 ` David Miller, Auke Kok
2006-08-08 22:42 ` Auke Kok
2006-08-08 19:34 ` [RFC][PATCH 4/9] e100 " Peter Zijlstra
2006-08-08 20:13 ` Auke Kok
2006-08-08 20:18 ` Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 5/9] r8169 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 6/9] tg3 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 7/9] UML eth " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 8/9] 3c59x " Peter Zijlstra
2006-08-08 23:07 ` Jeff Garzik
2006-08-09 5:51 ` Daniel Phillips
2006-08-09 5:55 ` David Miller, Daniel Phillips
2006-08-09 6:30 ` Jeff Garzik
2006-08-09 7:03 ` Peter Zijlstra
2006-08-09 7:20 ` Jeff Garzik
2006-08-13 19:38 ` Daniel Phillips
2006-08-13 19:53 ` Jeff Garzik
2006-08-08 19:34 ` [RFC][PATCH 9/9] deadlock prevention for NBD Peter Zijlstra
2006-08-09 5:46 ` [RFC][PATCH 0/9] Network receive " Evgeniy Polyakov
2006-08-09 5:52 ` Daniel Phillips
2006-08-09 5:56 ` David Miller, Daniel Phillips
2006-08-09 5:53 ` David Miller, Evgeniy Polyakov
2006-08-09 5:55 ` Evgeniy Polyakov
2006-08-09 12:37 ` Peter Zijlstra
2006-08-09 13:07 ` Evgeniy Polyakov
2006-08-09 13:32 ` Peter Zijlstra
2006-08-09 19:29 ` Evgeniy Polyakov
2006-08-09 23:54 ` David Miller, Peter Zijlstra
2006-08-10 6:06 ` Peter Zijlstra
2006-08-13 20:16 ` Daniel Phillips
2006-08-14 5:13 ` Evgeniy Polyakov
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 6:54 ` Evgeniy Polyakov
2006-08-17 4:49 ` Daniel Phillips
2006-08-17 4:48 ` Daniel Phillips
2006-08-17 5:36 ` Evgeniy Polyakov
2006-08-17 18:01 ` Daniel Phillips
2006-08-17 18:42 ` Evgeniy Polyakov
2006-08-17 19:15 ` Peter Zijlstra
2006-08-17 19:48 ` Evgeniy Polyakov
2006-08-17 23:24 ` Daniel Phillips
2006-08-18 7:16 ` Evgeniy Polyakov
2006-08-12 3:42 ` Rik van Riel
2006-08-12 8:47 ` Evgeniy Polyakov
2006-08-12 9:19 ` Peter Zijlstra
2006-08-12 9:37 ` Evgeniy Polyakov
2006-08-12 10:18 ` Peter Zijlstra
2006-08-12 10:42 ` Evgeniy Polyakov
2006-08-12 10:51 ` Evgeniy Polyakov
2006-08-12 11:40 ` Peter Zijlstra
2006-08-12 11:53 ` Evgeniy Polyakov
2006-08-13 0:46 ` David Miller, Peter Zijlstra
2006-08-13 1:11 ` Rik van Riel
2006-08-12 14:40 ` Rik van Riel
2006-08-12 14:49 ` Evgeniy Polyakov
2006-08-12 14:56 ` Rik van Riel
2006-08-12 15:08 ` Evgeniy Polyakov
2006-08-12 15:22 ` Peter Zijlstra
2006-08-14 0:56 ` Daniel Phillips
2006-08-13 0:46 ` David Miller, Evgeniy Polyakov
2006-08-13 9:06 ` Evgeniy Polyakov
2006-08-13 9:52 ` Evgeniy Polyakov
2006-08-15 19:17 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1155152744.23134.67.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=indan@nul.nu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox