From: "Indan Zupancic" <indan@nul.nu>
To: Daniel Phillips <phillips@google.com>
Cc: netdev@vger.kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH 2/9] deadlock prevention core
Date: Wed, 9 Aug 2006 14:02:36 +0200 (CEST) [thread overview]
Message-ID: <35608.81.207.0.53.1155124956.squirrel@81.207.0.53> (raw)
In-Reply-To: <44D92B78.20408@google.com>
On Wed, August 9, 2006 2:25, Daniel Phillips said:
> Indan Zupancic wrote:
>> Hello,
>> Saw the patch on lkml, and wondered about some things.
>> On Tue, August 8, 2006 21:33, Peter Zijlstra said:
>>
>>>+static inline void dev_unreserve_skb(struct net_device *dev)
>>>+{
>>>+ if (atomic_dec_return(&dev->rx_reserve_used) < 0)
>>>+ atomic_inc(&dev->);
>>>+}
>>>+
>>
>> This looks strange. Is it possible for rx_reserve_used to become negative?
>> A quick look at the code says no, except in the one case where there isn't a
>> "if (unlikely(dev_reserve_used(skb->dev)))" check:
>
> Yes, you can see what I'm trying to do there, I was short an atomic op to
> do it, My ugly solution may well be... probably is racy. Let's rewrite
> it with something better. We want the atomic op that some people call
> "monus": decrement unless zero.
Currently atomic_inc_not_zero(), atomic_add_unless() and atomic_cmpxchg()
exist, so making an atomic_dec_not_zero() should be easy.
>>>@@ -389,6 +480,8 @@ void __kfree_skb(struct sk_buff *skb)
>>> #endif
>>>
>>> kfree_skbmem(skb);
>>>+ if (dev && (dev->flags & IFF_MEMALLOC))
>>>+ dev_unreserve_skb(dev);
>>> }
>>
>> So it seems that that < 0 check in dev_unreserve_skb() was only added to handle
>> this case (though there seems to be a race between those two atomic ops).
>> Isn't it better to remove that check and just do:
>>
>> if (dev && (dev->flags & IFF_MEMALLOC) && dev_reserve_used(dev))
>
> Seems to me that unreserve is also called from each protocol handler.
> Agreed, the < 0 check is fairly sickening.
Yes, but those all do that (racy) "if (unlikely(dev_reserve_used(skb->dev)))"
check. Adding another racy check doesn't improve the situation.
>> The use of atomic seems a bit dubious. Either it's necessary, in which case
>> changes depending on tests seem unsafe as they're not atomic and something
>> crucial could change between the read and the check, or normal reads and writes
>> combined with barriers would be sufficient. All in all it seems better to
>> move that "if (unlikely(dev_reserve_used(skb->dev)))" check into
>> dev_unreserve_skb(), and make the whole atomic if necessary. Then let
>> dev_unreserve_skb() return wether rx_reserve_used was positive.
>> Getting rid of dev_reserve_used() and using the atomic_read directly might be
>> better, as it is set with the bare atomic instructions too and rarely used without
>> dev_unreserve_skb().
>
> Barriers should work for this reserve accounting, but is that better than
> an atomic op? I don't know, let's let the barrier mavens opine.
The atomic ops are fine, but if you do two atomic ops then that as a whole
isn't atomic any more and often racy. That's what seems to be the case
here.
> IMHO the cleanest thing to do is code up "monus", in fact I dimly recall
> somebody already added something similar.
I'm not sure, to me it looks like dev_unreserve_skb() is always called
without really knowing if it is justified or not, or else there wouldn't
be a chance that the counter became negative. So avoiding the negative
reserve usage seems like papering over bad accounting.
The assumption made seems to be that if there's reserve used, then it must
be us using it, and it's unreserved. So it appears that either that
assumption is wrong, and we can unreserve for others while we never
reserved for ourselves, or it is correct, in which case it probably makes
more sense to check for the IFF_MEMALLOC flag.
All in all it seems like a per skb flag which tells us if this skb was the
one reserving anything is missing. Or rx_reserve_used must be updated for
all in flight skbs whenever the IFF_MEMALLOC flag changes, so that we can
be sure that the accounting works correctly. Oh wait, isn't that what the
memalloc flag is for? So shouldn't it be sufficient to check only with
sk_is_memalloc()? That avoids lots of checks and should guarantee that the
accounting is correct, except in the case when the IFF_MEMALLOC flag is
cleared and the counter is set to zero manually. Can't that be avoided and
just let it decrease to zero naturally? So checking IFF_MEMALLOC for new
skbs and use sk_is_memalloc() for existing ones seems workable, if I'm not
missing anything (I think I do).
> Side note: please don't be shy, just reply-all in future so the discussion
> stays public. Part of what we do is try to share our development process
> so people see not only what we have done, but why we did it. (And sometimes
> so they can see what dumb mistakes we make, but I won't get into that...)
Yes, I know, but as I don't have much kernel programming experience I
didn't want to add unnecessary noise.
> I beg for forgiveness in advance having taken the liberty of CCing this
> reply to lkml.
No problem.
Greetings,
Indan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-09 12:02 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-08 19:33 [RFC][PATCH 0/9] Network receive deadlock prevention for NBD Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 1/9] pfn_to_kaddr() for UML Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 2/9] deadlock prevention core Peter Zijlstra
2006-08-08 20:57 ` Stephen Hemminger
2006-08-08 21:05 ` Peter Zijlstra
2006-08-09 1:33 ` Daniel Phillips
2006-08-09 1:38 ` David Miller, Daniel Phillips
2006-08-08 21:17 ` Thomas Graf
2006-08-09 1:34 ` Daniel Phillips
2006-08-09 1:39 ` David Miller, Daniel Phillips
2006-08-09 5:47 ` Daniel Phillips
2006-08-09 13:19 ` Thomas Graf
2006-08-09 14:07 ` Peter Zijlstra
2006-08-09 16:18 ` Thomas Graf
2006-08-09 16:19 ` Peter Zijlstra
2006-08-10 0:01 ` David Miller, Peter Zijlstra
2006-08-09 23:58 ` David Miller, Peter Zijlstra
2006-08-10 6:25 ` Peter Zijlstra
2006-08-11 4:24 ` Stephen Hemminger
2006-08-13 21:22 ` Daniel Phillips
2006-08-13 23:49 ` David Miller, Daniel Phillips
2006-08-14 1:15 ` Daniel Phillips
2006-08-11 2:37 ` Rik van Riel
2006-08-13 22:05 ` Daniel Phillips
2006-08-13 23:55 ` David Miller, Daniel Phillips
2006-08-14 1:31 ` Daniel Phillips
2006-08-14 1:53 ` Andrew Morton
2006-08-14 4:40 ` Peter Zijlstra
2006-08-14 4:58 ` Andrew Morton
2006-08-14 5:03 ` Peter Zijlstra
2006-08-14 5:22 ` Andrew Morton
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 7:07 ` Andrew Morton
2006-08-14 8:15 ` Peter Zijlstra
2006-08-14 8:25 ` Evgeniy Polyakov
2006-08-14 8:35 ` Peter Zijlstra
2006-08-14 8:33 ` David Miller, Andrew Morton
2006-08-17 4:27 ` Daniel Phillips
2006-08-14 7:17 ` Neil Brown
2006-08-14 7:31 ` Evgeniy Polyakov
2006-08-17 3:58 ` Daniel Phillips
2006-08-17 5:57 ` Andrew Morton
2006-08-17 23:53 ` Daniel Phillips
2006-08-18 0:24 ` Rik van Riel
2006-08-18 0:35 ` Daniel Phillips
2006-08-18 1:14 ` Neil Brown
2006-08-18 6:05 ` Andrew Morton
2006-08-18 21:22 ` Daniel Phillips
2006-08-18 22:34 ` Andrew Morton
2006-08-18 23:44 ` Daniel Phillips
2006-08-19 2:44 ` Andrew Morton
2006-08-19 4:14 ` Network receive stall avoidance (was [PATCH 2/9] deadlock prevention core) Daniel Phillips
2006-08-19 7:28 ` Andrew Morton
2006-08-19 15:06 ` [RFC][PATCH 2/9] deadlock prevention core Rik van Riel
2006-08-20 1:33 ` Andre Tomt
2006-08-19 16:53 ` Ray Lee
2006-08-21 13:27 ` Philip R. Auld
2006-08-25 10:47 ` Pavel Machek
2006-08-21 13:38 ` Jens Axboe
2006-08-08 22:10 ` David Miller
2006-08-09 1:35 ` Daniel Phillips
2006-08-09 1:41 ` David Miller, Daniel Phillips
2006-08-09 5:44 ` Daniel Phillips
2006-08-09 7:00 ` Peter Zijlstra
[not found] ` <42414.81.207.0.53.1155080443.squirrel@81.207.0.53>
2006-08-09 0:25 ` Daniel Phillips
2006-08-09 12:02 ` Indan Zupancic [this message]
2006-08-09 12:54 ` Peter Zijlstra
2006-08-09 13:48 ` Indan Zupancic
2006-08-09 14:00 ` Peter Zijlstra
2006-08-09 18:34 ` Indan Zupancic
2006-08-09 19:45 ` Peter Zijlstra
2006-08-09 20:19 ` Peter Zijlstra
2006-08-10 1:21 ` Indan Zupancic
2006-08-09 16:05 ` -v2 " Peter Zijlstra
2006-08-08 19:33 ` [RFC][PATCH 3/9] e1000 driver conversion Peter Zijlstra
2006-08-08 20:50 ` Auke Kok
2006-08-08 20:59 ` Peter Zijlstra
2006-08-08 22:32 ` David Miller, Auke Kok
2006-08-08 22:42 ` Auke Kok
2006-08-08 19:34 ` [RFC][PATCH 4/9] e100 " Peter Zijlstra
2006-08-08 20:13 ` Auke Kok
2006-08-08 20:18 ` Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 5/9] r8169 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 6/9] tg3 " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 7/9] UML eth " Peter Zijlstra
2006-08-08 19:34 ` [RFC][PATCH 8/9] 3c59x " Peter Zijlstra
2006-08-08 23:07 ` Jeff Garzik
2006-08-09 5:51 ` Daniel Phillips
2006-08-09 5:55 ` David Miller, Daniel Phillips
2006-08-09 6:30 ` Jeff Garzik
2006-08-09 7:03 ` Peter Zijlstra
2006-08-09 7:20 ` Jeff Garzik
2006-08-13 19:38 ` Daniel Phillips
2006-08-13 19:53 ` Jeff Garzik
2006-08-08 19:34 ` [RFC][PATCH 9/9] deadlock prevention for NBD Peter Zijlstra
2006-08-09 5:46 ` [RFC][PATCH 0/9] Network receive " Evgeniy Polyakov
2006-08-09 5:52 ` Daniel Phillips
2006-08-09 5:56 ` David Miller, Daniel Phillips
2006-08-09 5:53 ` David Miller, Evgeniy Polyakov
2006-08-09 5:55 ` Evgeniy Polyakov
2006-08-09 12:37 ` Peter Zijlstra
2006-08-09 13:07 ` Evgeniy Polyakov
2006-08-09 13:32 ` Peter Zijlstra
2006-08-09 19:29 ` Evgeniy Polyakov
2006-08-09 23:54 ` David Miller, Peter Zijlstra
2006-08-10 6:06 ` Peter Zijlstra
2006-08-13 20:16 ` Daniel Phillips
2006-08-14 5:13 ` Evgeniy Polyakov
2006-08-14 6:45 ` Peter Zijlstra
2006-08-14 6:54 ` Evgeniy Polyakov
2006-08-17 4:49 ` Daniel Phillips
2006-08-17 4:48 ` Daniel Phillips
2006-08-17 5:36 ` Evgeniy Polyakov
2006-08-17 18:01 ` Daniel Phillips
2006-08-17 18:42 ` Evgeniy Polyakov
2006-08-17 19:15 ` Peter Zijlstra
2006-08-17 19:48 ` Evgeniy Polyakov
2006-08-17 23:24 ` Daniel Phillips
2006-08-18 7:16 ` Evgeniy Polyakov
2006-08-12 3:42 ` Rik van Riel
2006-08-12 8:47 ` Evgeniy Polyakov
2006-08-12 9:19 ` Peter Zijlstra
2006-08-12 9:37 ` Evgeniy Polyakov
2006-08-12 10:18 ` Peter Zijlstra
2006-08-12 10:42 ` Evgeniy Polyakov
2006-08-12 10:51 ` Evgeniy Polyakov
2006-08-12 11:40 ` Peter Zijlstra
2006-08-12 11:53 ` Evgeniy Polyakov
2006-08-13 0:46 ` David Miller, Peter Zijlstra
2006-08-13 1:11 ` Rik van Riel
2006-08-12 14:40 ` Rik van Riel
2006-08-12 14:49 ` Evgeniy Polyakov
2006-08-12 14:56 ` Rik van Riel
2006-08-12 15:08 ` Evgeniy Polyakov
2006-08-12 15:22 ` Peter Zijlstra
2006-08-14 0:56 ` Daniel Phillips
2006-08-13 0:46 ` David Miller, Evgeniy Polyakov
2006-08-13 9:06 ` Evgeniy Polyakov
2006-08-13 9:52 ` Evgeniy Polyakov
2006-08-15 19:17 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35608.81.207.0.53.1155124956.squirrel@81.207.0.53 \
--to=indan@nul.nu \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox