From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, netdev <netdev@vger.kernel.org>,
Daniel Phillips <phillips@google.com>,
David Miller <davem@davemloft.net>, Thomas Graf <tgraf@suug.ch>,
Indan Zupancic <indan@nul.nu>
Subject: Re: [RFC][PATCH] VM deadlock prevention core -v3
Date: Thu, 10 Aug 2006 18:50:40 +0200 [thread overview]
Message-ID: <1155228640.5696.55.camel@twins> (raw)
In-Reply-To: <20060810162229.GA27364@2ka.mipt.ru>
> > > Tricky, but since you are using own allocator here, you could change it to
> > > be not so aggressive - i.e. do not round size to number of pages.
> >
> > I'm not sure I follow you, I'm explicitly using
> > alloc_pages()/free_page(), if
> > I were to go smart here, I'd loose the whole reason for doing so.
>
> You can use page to put there several skbs for example or at least add
> there a fclone (fast clone).
fclone support is there.
> > > > +struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> > > > + unsigned length, gfp_t gfp_mask)
> > > > +{
> > > > + struct sk_buff *skb;
> > > > +
> > > > + WARN_ON(gfp_mask & (__GFP_NOMEMALLOC | __GFP_MEMALLOC));
> > > > + gfp_mask &= ~(__GFP_NOMEMALLOC | __GFP_MEMALLOC);
> > > > +
> > > > + skb = ___netdev_alloc_skb(dev, length, gfp_mask | __GFP_NOMEMALLOC);
> > > > + if (skb)
> > > > + goto done;
> > > > +
> > > > + if (atomic_read(&dev->rx_reserve_used) >=
> > > > + dev->rx_reserve * dev->memalloc_socks)
> > > > + goto out;
> > > > +
> > > > + /*
> > > > + * pre-inc guards against a race with netdev_wait_memalloc()
> > > > + */
> > > > + atomic_inc(&dev->rx_reserve_used);
> > > > + skb = ___netdev_alloc_skb(dev, length, gfp_mask | __GFP_MEMALLOC);
> > > > + if (unlikely(!skb)) {
> > > > + atomic_dec(&dev->rx_reserve_used);
> > > > + goto out;
> > > > + }
> > >
> > > Since you have added atomic operation in that path, you can use device's
> > > reference counter instead and do not care that it can dissapear.
> >
> > Is that the sole reason taking a reference on the device is bad?
>
> Taking a reference is bad due to performance reasons, since atomic
> increment is not that cheap. If you do it for one variable for the
> purpose of reference counting you can use device's refcnt istead, which
> will solve some races.
Yes, I understand you. However I'm not sure if performance is the only
reason not to take a refcount on the device. Anyway, I think I have just
been convinced to abandon the per device thing and go global.
> > > > @@ -434,6 +567,12 @@ struct sk_buff *skb_clone(struct sk_buff
> > > > n->fclone = SKB_FCLONE_CLONE;
> > > > atomic_inc(fclone_ref);
> > > > } else {
> > > > + /*
> > > > + * should we special-case skb->memalloc cloning?
> > > > + * for now fudge it by forcing fast-clone alloc.
> > > > + */
> > > > + BUG_ON(skb->memalloc);
> > > > +
> > > > n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
> > > > if (!n)
> > > > return NULL;
> > >
> > > Ugh... cloning is a one of the shoulders of giant where Linux network
> > > stack is staying...
> >
> > Yes, I'm aware of that, I have a plan to fix this, however I haven't had
> > time
> > to implement it. My immediate concern is the point wrt. the net_device
> > mapping.
> >
> > My idea was: instead of the order, store the size, and allocate clone
> > skbuffs in the available room at the end of the page(s), allocating
> > extra pages
> > if needed.
>
> You can check if requested skb with fclone fits allocated pages, and if
> so use fclone magic, otherwise postpone clone allocation until it is
> required.
Yes the fclone magic works, however that will only let you have one
clone.
I'm just not confident no receive path will ever exceed that.
> Sockets can live without network devices at all, I expect it is enough
> to clean up in socket destructor, since packets can come from
> different devices into the same socket.
You are right if the reserve wasn't device bound - which I will abandon
because you are right that with multi-path routing, bridge device and
other advanced goodies this scheme is broken in that there is no
unambiguous
mapping from sockets to devices.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-10 16:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-10 13:32 Peter Zijlstra
2006-08-10 14:02 ` Evgeniy Polyakov
2006-08-10 14:46 ` Peter Zijlstra
2006-08-10 16:22 ` Evgeniy Polyakov
2006-08-10 16:50 ` Peter Zijlstra [this message]
2006-08-11 0:45 ` Indan Zupancic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1155228640.5696.55.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=indan@nul.nu \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox