From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) by kanga.kvack.org (Postfix) with ESMTP id 09D6D828E4 for ; Mon, 29 Feb 2016 05:24:43 -0500 (EST) Received: by mail-wm0-f43.google.com with SMTP id p65so38708455wmp.0 for ; Mon, 29 Feb 2016 02:24:42 -0800 (PST) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com. [2a00:1450:400c:c09::22f]) by mx.google.com with ESMTPS id q16si19612580wmb.111.2016.02.29.02.24.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Feb 2016 02:24:42 -0800 (PST) Received: by mail-wm0-x22f.google.com with SMTP id p65so38707964wmp.0 for ; Mon, 29 Feb 2016 02:24:41 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <20160228234657.GA28225@MBP.local> From: Dmitry Vyukov Date: Mon, 29 Feb 2016 11:24:21 +0100 Message-ID: Subject: Re: tty: memory leak in tty_register_driver Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Catalin Marinas Cc: Greg Kroah-Hartman , Jiri Slaby , LKML , Peter Hurley , One Thousand Gnomes , J Freyensee , "linux-mm@kvack.org" , Paul Bolle , Alexander Potapenko , Kostya Serebryany , Sasha Levin , syzkaller On Mon, Feb 29, 2016 at 11:22 AM, Dmitry Vyukov wrote: > On Mon, Feb 29, 2016 at 12:47 AM, Catalin Marinas > wrote: >> On Sun, Feb 28, 2016 at 05:42:24PM +0100, Dmitry Vyukov wrote: >>> On Mon, Feb 15, 2016 at 11:42 AM, Dmitry Vyukov wrote: >>> > When I am running the following program in a parallel loop, kmemleak >>> > starts reporting memory leaks of objects allocated in >>> > tty_register_driver during boot. These leaks start popping up >>> > chaotically and as you can see they originate in different drivers >>> > (synclinkmp_init, isdn_init, chr_dev_init, sysfs_init). >>> > >>> > On commit 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 (4.5-rc3). >> [...] >>> > unreferenced object 0xffff88006708dc20 (size 8): >>> > comm "swapper/0", pid 1, jiffies 4294672590 (age 930.839s) >>> > hex dump (first 8 bytes): >>> > 74 74 79 53 4c 4d 38 00 ttySLM8. >>> > backtrace: >>> > [] __kmalloc_track_caller+0x1b0/0x320 mm/slub.c:4068 >>> > [] kstrdup+0x39/0x70 mm/util.c:53 >>> > [] kstrdup_const+0x46/0x60 mm/util.c:74 >>> > [] __kernfs_new_node+0x2b/0x2b0 fs/kernfs/dir.c:536 >>> > [] kernfs_new_node+0x80/0xe0 fs/kernfs/dir.c:572 >>> > [] kernfs_create_link+0x33/0x150 fs/kernfs/symlink.c:32 >>> > [] sysfs_do_create_link_sd.isra.2+0x8b/0x120 >> [...] >>> +Catalin (kmemleak maintainer) >>> >>> I am noticed a weird thing. I am not 100% sure but it seems that the >>> leaks are reported iff I run leak checking concurrently with the >>> programs running. And if I run the program several thousand times and >>> then run leak checking, then no leaks reported. >>> >>> Catalin, it is possible that it is a kmemleak false positive? >> >> Yes, it's possible. If you run kmemleak scanning continuously (or at >> very short intervals) and especially in parallel with some intensive >> tasks, it will miss pointers that may be stored in registers (on other >> CPUs) or moved between task stacks, other memory locations. Linked lists >> are especially prone to such false positives. >> >> Kmemleak tries to work around this by checksumming each object, so it >> will only be reported if it hasn't changed on two consecutive scans. >> Since the default scanning is 10min, it is very unlikely to trigger >> false positives in such scenarios. However, if you reduce the scanning >> time (or trigger it manually in a loop), you can hit this condition. >> >>> I see that kmemleak just scans thread stacks one-by-one. I would >>> expect that kmemleak should stop all threads, then scan all stacks and >>> all registers of all threads, and then restart threads. If it does not >>> scan registers or does not stop threads, then I think it should be >>> possible that a pointer value can sneak off kmemleak. Does it make >>> sense? >> >> Given how long it takes to scan the memory, stopping the threads is not >> really feasible. You could do something like stop_machine() only for >> scanning the current stack on all CPUs but it still wouldn't catch >> pointers being moved around in memory unless you stop the system >> completely for a full scan. The heuristic about periodic scanning and >> checksumming seems to work fine in normal usage scenarios. >> >> For your tests, I would recommend that you run the tests for a long(ish) >> time and only do two kmemleak scans at the end after they finished (and >> with a few seconds delay between them). Continuous scanning is less >> reliable. > > > Thanks for the explanation, Catalin! > > Let me describe my usage scenario first. I am running automatic > testing 24x7. Currently a VM executes a dozen of small programs (a > dozen of syscalls each), then I run manual leak scanning. I can't run > significantly more programs between scans, because then I won't be > able to restore reproducers for bugs and they will be unactionable. I > could run leak checking after each program, but it will increase > overhead significantly. So a dozen of programs is a trade-off. And I > disable automatic scanning. > > False positives are super unpleasant in automatic testing. If a tool > false positive rate if high, I just disable it, it is unusable. It is > not that bad for leak checking. But each false positive consumes human > (my) time. > > So I need to run scanning twice, because the first one never reports leaks. > > For the false positives due to registers/pointer jumping, will it help > if I run scanning one more time if leaks are detected? I mean: run > scanning twice, if leaks are found sleep for several seconds and run > scanning third time. Since leaks are usually not detected I can afford > to sleep more and do one or two additional scans. The question here: > will kmemleak _remove_ an object for leaked objects, if it discovered > reachable or contents change on subsequent scans? > > Regarding stopping all threads and doing proper scan, why is not it > feasible? Will kernel break if we stall all CPUs for seconds? In > automatic testing scenarios a stalled for several seconds machine is > not a problem. But on the other hand, absence of false positives is a > must. And it would improve testing bandwidth, because we don't need > sleep and second scan. Paul, regarding this particular leak, let's consider it kmemleak false positive (until proven otherwise). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org