linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>,
	David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Ingo Molnar <mingo@elte.hu>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, Pavel Emelyanov <xemul@openvz.org>,
	Daniel Lezcano <daniel.lezcano@free.fr>
Subject: [PATCH] tcp: fix inet_twsk_deschedule()
Date: Sat, 19 Feb 2011 09:35:56 +0100	[thread overview]
Message-ID: <1298104556.8559.21.camel@edumazet-laptop> (raw)
In-Reply-To: <m1sjvl2i3q.fsf@fess.ebiederm.org>

Le vendredi 18 fA(C)vrier 2011 A  12:38 -0800, Eric W. Biederman a A(C)crit :
> Arnaldo Carvalho de Melo <acme@redhat.com> writes:
> 
> > Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu:
> >> Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu:
> >> > This seems to be a fairly straightforward bug.
> >> > 
> >> > In net/ipv4/inet_timewait_sock.c we have this:
> >> > 
> >> >   /* These are always called from BH context.  See callers in
> >> >    * tcp_input.c to verify this.
> >> >    */
> >> > 
> >> >   /* This is for handling early-kills of TIME_WAIT sockets. */
> >> >   void inet_twsk_deschedule(struct inet_timewait_sock *tw,
> >> >                             struct inet_timewait_death_row *twdr)
> >> >   {
> >> >           spin_lock(&twdr->death_lock);
> >> >           ..
> >> > 
> >> > and the intention is clearly that that spin_lock is BH-safe because
> >> > it's called from BH context.
> >> > 
> >> > Except that clearly isn't true. It's called from a worker thread:
> >> > 
> >> > > stack backtrace:
> >> > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1
> >> > > Call Trace:
> >> > >  [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0
> >> > >  [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180
> >> > >  [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180
> >> > >  [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20
> >> > >  [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60
> >> > >  [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0
> >> > >  [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0
> >> > >  [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0
> >> > >  [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0
> >> > >  [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330
> >> > 
> >> > so it can deadlock with a BH happening at the same time, afaik.
> >> > 
> >> > The code (and comment) is all from 2005, it looks like the BH->worker
> >> > thread has broken the code. But somebody who knows that code better
> >> > should take a deeper look at it.
> >> > 
> >> > Added acme to the cc, since the code is attributed to him back in 2005
> >> > ;). Although I don't know how active he's been in networking lately
> >> > (seems to be all perf-related). Whatever, it can't hurt.
> >> 
> >> Original code is ANK's, I just made it possible to use with DCCP, and
> >> yeah, the smiley is appropriate, something 6 years old and the world
> >> around it changing continually... well, thanks for the git blame ;-)
> >
> > But yeah, your analisys seems correct, with the bug being introduced by
> > one of these world around it changing continually issues, networking
> > namespaces broke the rules of the game on its cleanup_net() routine,
> > adding Pavel to the CC list since it doesn't hurt ;-)
> 
> Which probably gets the bug back around to me.
> 
> I guess this must be one of those ipv4 cases that where the cleanup
> simply did not exist in the rmmod sense that we had to invent.
> 
> I think that was Daniel who did the time wait sockets.  I do remember
> they were a real pain.
> 
> Would a bh_disable be sufficient?  I guess I should stop remembering and
> look at the code now.
> 

Here is the patch to fix the problem

Daniel commit (d315492b1a6ba29d (netns : fix kernel panic in timewait
socket destruction) was OK (it did use local_bh_disable())

Problem comes from commit 575f4cd5a5b6394577
(net: Use rcu lookups in inet_twsk_purge.) added in 2.6.33

Thanks !

[PATCH] tcp: fix inet_twsk_deschedule()

Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()

This is caused by inet_twsk_purge(), run from process context,
and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.

Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.

With help from Linus Torvalds and Eric W. Biederman

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: stable <stable@kernel.org> (# 2.6.33+)
---
 net/ipv4/inet_timewait_sock.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index c5af909..3c8dfa1 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -505,7 +505,9 @@ restart:
 			}
 
 			rcu_read_unlock();
+			local_bh_disable();
 			inet_twsk_deschedule(tw, twdr);
+			local_bh_enable();
 			inet_twsk_put(tw);
 			goto restart_rcu;
 		}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-02-19  8:36 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 18:52 BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Michal Hocko
2011-02-16 19:37 ` Ingo Molnar
2011-02-16 19:50   ` Linus Torvalds
2011-02-16 20:09     ` Linus Torvalds
2011-02-16 20:51       ` Linus Torvalds
2011-02-17  9:09       ` Michal Hocko
2011-02-17 16:13         ` Linus Torvalds
2011-02-17 16:26           ` Michal Hocko
2011-02-17 16:35           ` Ingo Molnar
2011-02-17 18:57             ` Eric W. Biederman
2011-02-17 19:11               ` Linus Torvalds
2011-02-17 19:31                 ` Eric W. Biederman
2011-02-18  3:16                 ` Eric W. Biederman
2011-02-18  4:30                   ` Linus Torvalds
2011-02-18  4:36                     ` David Miller
2011-02-18  6:25                       ` Eric Dumazet
2011-02-18  7:29                         ` Eric Dumazet
2011-02-18  8:54                           ` [PATCH 1/2] net: dont leave active on stack LIST_HEAD Eric Dumazet
2011-02-18 20:14                             ` David Miller
2011-02-18  4:38                     ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Linus Torvalds
2011-02-18  4:40                       ` David Miller
2011-02-18  4:57                         ` Linus Torvalds
2011-02-18  8:29                           ` Eric W. Biederman
2011-02-18  5:20                     ` Eric W. Biederman
2011-02-18  8:41                       ` Eric Dumazet
2011-02-18  8:59                       ` [PATCH 2/2] net: deinit automatic LIST_HEAD Eric Dumazet
2011-02-18 20:14                         ` David Miller
2011-02-18 12:29                 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Michal Hocko
2011-02-18 16:26                   ` Michal Hocko
2011-02-18 16:39                     ` Linus Torvalds
2011-02-18 18:08                       ` Eric W. Biederman
2011-02-18 18:48                         ` Linus Torvalds
2011-02-18 19:01                           ` Arnaldo Carvalho de Melo
2011-02-18 19:11                             ` Arnaldo Carvalho de Melo
2011-02-18 20:38                               ` Eric W. Biederman
2011-02-19  8:35                                 ` Eric Dumazet [this message]
2011-02-20  2:59                                   ` [PATCH] tcp: fix inet_twsk_deschedule() David Miller
2011-02-18 19:13                             ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric Dumazet
2011-02-18 19:56                       ` David Miller
2011-02-19  6:22                       ` Eric W. Biederman
2011-02-19 15:33                         ` Linus Torvalds
2011-02-20  2:01                           ` Eric W. Biederman
2011-02-20  6:15                             ` Linus Torvalds
2011-02-20  8:27                               ` Eric Dumazet
2011-02-20 19:53                               ` David Miller
2011-02-20 21:34                                 ` Eric W. Biederman
2011-02-18  8:54             ` Michal Hocko
2011-02-20 12:43             ` Ingo Molnar
2011-02-17 16:36           ` Eric Dumazet
2011-02-17 17:07             ` Linus Torvalds
2011-02-17 19:36               ` Eric Dumazet
2011-02-17 20:18               ` Linus Torvalds
2011-02-16 20:13     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298104556.8559.21.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=acme@redhat.com \
    --cc=daniel.lezcano@free.fr \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox