From: Linus Torvalds <torvalds@linux-foundation.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>, Ingo Molnar <mingo@elte.hu>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Octavian Purdila <opurdila@ixiacom.com>,
David Miller <davem@davemloft.net>
Subject: Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4
Date: Thu, 17 Feb 2011 09:07:47 -0800 [thread overview]
Message-ID: <AANLkTikUF+gz8H3SkW4NhD8SOT5b4bxnpcJgsVU+G-bC@mail.gmail.com> (raw)
In-Reply-To: <1297960574.2769.20.camel@edumazet-laptop>
[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]
On Thu, Feb 17, 2011 at 8:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 17 février 2011 à 08:13 -0800, Linus Torvalds a écrit :
>>
>> Nope, that's roughly what I did to (in addition to doing all the .ko
>> files and checking for 0xe68 too). Which made me worry that the 0x1e68
>> offset is actually just the stack offset at some random code-path (it
>> would stay constant for a particular kernel if there is only one way
>> to reach that code, and it's always reached through some stable
>> non-irq entrypoint).
>>
>> People do use on-stack lists, and if you do it wrong I could imagine a
>> stale list entry still pointing to the stack later. And while
>> INIT_LIST_HEAD() is one pattern to get that "two consecutive words
>> pointing to themselves", so is doing a "list_del()" on the last list
>> entry that the head points to.
>>
>> So _if_ somebody has a list_head on the stack, and leaves a stale list
>> entry pointing to it, and then later on, when the stack has been
>> released that stale list entry is deleted with "list_del()", you'd see
>> the same memory corruption pattern. But I'm not aware of any new code
>> that would do anything like that.
>>
>> So I'm stumped, which is why I'm just hoping that extra debugging
>> options would catch it closer to the place where it actually occurs.
>> The "2kB allocation with a nice compile-time structure offset" sounded
>> like _such_ a great way to catch it, but it clearly doesn't :(
>>
>>
>
> Hmm, this rings a bell here.
>
> Unfortunately I have to run so cannot check right now.
>
> Please take a look at commit 443457242beb6716b43db4d (net: factorize
> sync-rcu call in unregister_netdevice_many)
>
> CC David and Octavian
>
> dev_close_many() can apparently return with an non empty list
Uhhuh. That does look scary. This would also explain why so few people
see it, and why it's often close to exit.
That __dev_close() looks very scary. When it does
static int __dev_close(struct net_device *dev)
{
LIST_HEAD(single);
list_add(&dev->unreg_list, &single);
return __dev_close_many(&single);
}
it leaves that "dev->unreg_list" entry on the on-stack "single" list.
"dev_close()" does the same.
So if "dev->unreg_list" is _ever_ touched afterwards (without being
re-initialized), you've got list corruption. And it does look like
default_device_exit_batch() does that by doing a
"unregister_netdevice_queue(dev, &dev_kill_list);" which in turn does
"list_move_tail(&dev->unreg_list, head);" which is basically just an
optimized list_del+list_add.
I haven't looked through the cases all that closely, so I might be
missing something that re-initializes the queue. But it does look
likely, and would explain why it's seen after a suspend (that takes
down the networking), and I presume Eric has been doing various
network device actions too, no?
Even if there is some guarantee that "dev->unreg_list" is never used
afterwards, I would _still_ strongly suggest that nobody ever leave
random pending on-stack list entries around when the function (that
owns the stack) exits. So at a minimum, you'd do something like the
attached.
TOTALLY UNTESTED PATCH! And I repeat: I don't know the code. I just
know "that looks damn scary".
[ Btw, that also shows another problem: "list_move()" doesn't trigger
the debugging checks when it does the __list_del(). So
CONFIG_DEBUG_LIST would never have caught the fact that the
"list_move()" was done on a list-entry that didn't have valid list
pointers any more. ]
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 758 bytes --]
net/core/dev.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 8e726cb..a18c164 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1280,10 +1280,13 @@ static int __dev_close_many(struct list_head *head)
static int __dev_close(struct net_device *dev)
{
+ int retval;
LIST_HEAD(single);
list_add(&dev->unreg_list, &single);
- return __dev_close_many(&single);
+ retval = __dev_close_many(&single);
+ list_del(&single);
+ return retval;
}
int dev_close_many(struct list_head *head)
@@ -1325,7 +1328,7 @@ int dev_close(struct net_device *dev)
list_add(&dev->unreg_list, &single);
dev_close_many(&single);
-
+ list_del(&single);
return 0;
}
EXPORT_SYMBOL(dev_close);
next prev parent reply other threads:[~2011-02-17 17:08 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-16 18:52 Michal Hocko
2011-02-16 19:37 ` Ingo Molnar
2011-02-16 19:50 ` Linus Torvalds
2011-02-16 20:09 ` Linus Torvalds
2011-02-16 20:51 ` Linus Torvalds
2011-02-17 9:09 ` Michal Hocko
2011-02-17 16:13 ` Linus Torvalds
2011-02-17 16:26 ` Michal Hocko
2011-02-17 16:35 ` Ingo Molnar
2011-02-17 18:57 ` Eric W. Biederman
2011-02-17 19:11 ` Linus Torvalds
2011-02-17 19:31 ` Eric W. Biederman
2011-02-18 3:16 ` Eric W. Biederman
2011-02-18 4:30 ` Linus Torvalds
2011-02-18 4:36 ` David Miller
2011-02-18 6:25 ` Eric Dumazet
2011-02-18 7:29 ` Eric Dumazet
2011-02-18 8:54 ` [PATCH 1/2] net: dont leave active on stack LIST_HEAD Eric Dumazet
2011-02-18 20:14 ` David Miller
2011-02-18 4:38 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Linus Torvalds
2011-02-18 4:40 ` David Miller
2011-02-18 4:57 ` Linus Torvalds
2011-02-18 8:29 ` Eric W. Biederman
2011-02-18 5:20 ` Eric W. Biederman
2011-02-18 8:41 ` Eric Dumazet
2011-02-18 8:59 ` [PATCH 2/2] net: deinit automatic LIST_HEAD Eric Dumazet
2011-02-18 20:14 ` David Miller
2011-02-18 12:29 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Michal Hocko
2011-02-18 16:26 ` Michal Hocko
2011-02-18 16:39 ` Linus Torvalds
2011-02-18 18:08 ` Eric W. Biederman
2011-02-18 18:48 ` Linus Torvalds
2011-02-18 19:01 ` Arnaldo Carvalho de Melo
2011-02-18 19:11 ` Arnaldo Carvalho de Melo
2011-02-18 20:38 ` Eric W. Biederman
2011-02-19 8:35 ` [PATCH] tcp: fix inet_twsk_deschedule() Eric Dumazet
2011-02-20 2:59 ` David Miller
2011-02-18 19:13 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric Dumazet
2011-02-18 19:56 ` David Miller
2011-02-19 6:22 ` Eric W. Biederman
2011-02-19 15:33 ` Linus Torvalds
2011-02-20 2:01 ` Eric W. Biederman
2011-02-20 6:15 ` Linus Torvalds
2011-02-20 8:27 ` Eric Dumazet
2011-02-20 19:53 ` David Miller
2011-02-20 21:34 ` Eric W. Biederman
2011-02-18 8:54 ` Michal Hocko
2011-02-20 12:43 ` Ingo Molnar
2011-02-17 16:36 ` Eric Dumazet
2011-02-17 17:07 ` Linus Torvalds [this message]
2011-02-17 19:36 ` Eric Dumazet
2011-02-17 20:18 ` Linus Torvalds
2011-02-16 20:13 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTikUF+gz8H3SkW4NhD8SOT5b4bxnpcJgsVU+G-bC@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=mingo@elte.hu \
--cc=opurdila@ixiacom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox