From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DB5DC05027 for ; Sun, 29 Jan 2023 06:49:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BD1E6B0072; Sun, 29 Jan 2023 01:49:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 644F46B0073; Sun, 29 Jan 2023 01:49:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BE406B0074; Sun, 29 Jan 2023 01:49:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 369736B0072 for ; Sun, 29 Jan 2023 01:49:44 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E9F9F120504 for ; Sun, 29 Jan 2023 06:49:43 +0000 (UTC) X-FDA: 80406911046.30.7BA25D0 Received: from mail-oa1-f47.google.com (mail-oa1-f47.google.com [209.85.160.47]) by imf07.hostedemail.com (Postfix) with ESMTP id 2A7E24000E for ; Sun, 29 Jan 2023 06:49:41 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GeVdaE3k; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.160.47 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674974982; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xuk/F/qL7MKVeL0ItEGKgOlymizhMFbzQX+eMzmUH8E=; b=X0/Gsr2sXF2ck+IXNjWqbC2B4cOMzvVpCBl2J2TbJaaVEBcLVxwnx3pQvgnPuftx1AnGpP QkD4kp8Mh9z93RDJdin3YEXrgz3K7CQf6PgavzMj0ZHWcVbXYPATYI8M6hWKFvc8Iq85e0 RsRlcQ2kTP0fU8JHXDkIBjPeyjrflMg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GeVdaE3k; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.160.47 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674974982; a=rsa-sha256; cv=none; b=rIBfK/kvVwIIhJ1zbLpAPCbHQoGkNCZgeX93IViEaE5h7Of+bkhFb8F7q15Yzk5Ni+FryH DCdQUoZpp5Y9izCpg34FuX4BEXdqJx0BVjYfBEcSJv+B/DkqR8jc1HHCmsVtgk+aQfFywQ AQ9aMKwpMCBY4GXHrZSVgfmiyk9zK30= Received: by mail-oa1-f47.google.com with SMTP id 586e51a60fabf-16332831ed0so11523163fac.10 for ; Sat, 28 Jan 2023 22:49:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Xuk/F/qL7MKVeL0ItEGKgOlymizhMFbzQX+eMzmUH8E=; b=GeVdaE3khpUjaaN6KVMaUvePylekdJv7O68mPVFytB7sGgZLiALLJyMiTltWhGnG2O dpGj8Ngf0d0v60YY1WcBJlyQeT9sYt+vrdyq9bDnL8DwOlZ8lwcQnShYwYE4GL8g5X3K gILLIJ4W3vLU6naAwxELyGtSEvhL0tWNu6qbL89VBGMHY9medXGUsoE1A163rewMplYt m+UPpIV2rcIwh3VL1wh+/FvcdIjI46SE+3m65V/2VW6mXdo2OuENSe8HkHA7OWeorogQ tuu0HGn7hXKNh8Pv084JxCZ2+r9gZw3C3wr/8tGhxJhnhNg1mKcqFJAm/1cgnFjMg2f2 uC5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xuk/F/qL7MKVeL0ItEGKgOlymizhMFbzQX+eMzmUH8E=; b=xEl8deFI5qF5f4jyzSCTb813ANPQYqmG9zuRpGpejEHSk94OijC3+FVPIJvX2XTCT2 0mC4ytURjUZ1TQdtxmbCZ5fiiGHA3W03eibMWIUpbccPTj5WvgPQLeLHMir8EPn+11v3 YIewchbQr8RDUFQ+ib6K1Vsok5sH+mmvcL3rRMIBEI9gBhhgRXYtb43vbIz7TEs09Tgz fioBnfb9M+qFRNdSlxRgB+PPyEjCfnFgBR5tgI1H2kPWf4H5aGVymyKt0bIPD7QyNf/c C63L3nB7kr8JmWww71no7dWyEqCfmBxNtsaNR94AnacG8PYMzZ08h1tQAfESbTD8TDk/ 9puA== X-Gm-Message-State: AFqh2kp0O8ru1pxbC2+LCdUne7s3tMEBfHubS12TwryoGo8jHVkQ55H0 zgdY1UFr/iIgAxetA1+FhZ0mgA== X-Google-Smtp-Source: AMrXdXuzGRm+HNddhS5FNEJtOAlyIv/9Kvhz8vO0UGHPoZAq8fCkFDz3/RbrXxehZfkYogoJt84EgQ== X-Received: by 2002:a05:6870:610d:b0:15f:13db:ed3b with SMTP id s13-20020a056870610d00b0015f13dbed3bmr12534973oae.57.1674974980784; Sat, 28 Jan 2023 22:49:40 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z20-20020a4ad1b4000000b0051762747277sm790223oor.41.2023.01.28.22.49.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Jan 2023 22:49:39 -0800 (PST) Date: Sat, 28 Jan 2023 22:49:31 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: "Liam R. Howlett" cc: Hugh Dickins , David Hildenbrand , Matthew Wilcox , Sanan Hasanov , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "contact@pgazz.com" , "syzkaller@googlegroups.com" , Huang Ying Subject: Re: kernel BUG in page_add_anon_rmap In-Reply-To: <713c6242-be65-c212-b790-2b908627c1b4@google.com> Message-ID: <9d8fb9c-1b81-67cd-e55b-34517388e1ab@google.com> References: <713c6242-be65-c212-b790-2b908627c1b4@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-1811275884-1674974979=:23545" X-Rspamd-Queue-Id: 2A7E24000E X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: ihcopgn3bg3qs9a1j1j1nu6zhneqcf46 X-HE-Tag: 1674974981-404460 X-HE-Meta: U2FsdGVkX1//s8NFad3LVVr3YUTxAx4difBp+67eDv2p1J1ndwr5NwEvAMpgGuWqntGRHWT3cG2UKL3Hq78EdoR3gWFC3AwWL609ltUvCQIlI2nDwIsWl0rhbZ6AcKMZ7NlLgaFqEh9ugS7IlaT9WQl/s/Dqoe8OPxSvIHbIR/WjViuvAx/JVztGavONVXR3MGw94kePPW6HpYEVvqrdkkUS/hC2mJvJOOxyLOdlM0ljbyoURrwFEksWoAs2rsUgBWOi4hRmPtrV2akfsZWh6gZqZhrtWq83L4epWoyR0eXzJr/qQcKSMy9Vl3fU8MxYf1b0PjFSkFYfW/pOxt4UmXWV9d3GaYckZWdEuJkBkjhcO4+2IXQtcXyLUm7cm0i6yEFjorDZKIOpblL4xu7XJxpTvokEGRkTjSzxhGLMT62ltomqUCktYrRdFdtVkHjEbK49ERmUP+xv5SYZI4+N9e2tcJ7H32YyFsUeTvfWxfqUrFmSFoR9WyL9GsDwGl09FaJ5WBgRgGyAaUqqZVUIiFdWZ+ch0ee9sZZN0U8HQZm5lc0qeG7/PJ/xNSia/lSEB85nUcH8KHstc6PgWtsJxL7YjQVTFhE/gXHmCsXvWoC8SrborDxhJWJI4ws1tjpHquJu7PmjlBOJuzF4T9obDI/IqK43np9IJJ1yIxA7yHGMPYh5jU/rnTvtn8Gd/DQgxp+Q6bpCh+SONwKjQaigi6yirbo8TBMlRARlr0tESBCao5WU5PVIPPbQTx5uQs+8Nt7m09Dsenh9UmQ6Vg76YGxdRYvRxH7g6/P8LEh+hyH9fU4a1fkDcZgM3gsbw3luEssg5u2FSW98uWEJNM2hi/qH1kvDHohn14irf5nyRqO+Q+HQrY7bv6VlikKhGFknR/qi12dRTePBvabxhZlLICZGSWGgRb3n+p6OsHeKpzSUN8okoulclz8MfJwvYw3Viq0bru08QQ1bOeTClBf 5hkZjQfR GCxV8XymtNcsirrSFGzkRbPjoDvZE7JCPbhZoEaU19E3aPe6OFOk67SY0n4ihQJicnwr5/FYmTiFKC1ytKtFDU2vBvpP/HnhEFTG8x9Yu3UxiYJoDnjChYgBOFU/KxcqHrYWtIOi1VJ7gdtHU3SB28XMbuEzOFF1YyGj6Lukwm1WwCR6L6bBtInNtKkKwkU3Q0I0Ii9QdVQYwSwdllhF0wOofXfXhkaFfRGqxqtdK3Z3ORSuMNsm0HgDDRt1935fGn03J5d7N1yTbv2np6nXQpg0ljTFPMVcQUTf1toSlM9FnMTUVGHr48TUuc8m17B6swySr013XhxYHuZD0hyCs8aHybcI6Q9xQo4oOPmdqBkUlg05UEDL/eCFRzlZzxlMoRgULpSN09/Djl4wgEkuI4FgWs1zCruZxDlsDsJGDyiI+CXNqXmJLxmnQOGdE7vEfrVo/t6w83hn4GiVHWLOCbaYyhDwFb1N38XDa6mJynNaaor5B7r4RMpfKikO9hkKKmwwuxqta3hwL7cNkRgeJ+EUirnEk5Uu1SdrULCAE7qpk8j/Wrwa3HV1OGf7AhGZCTiJ8zkyJv8QGaOggbDCQeRTiYQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-1811275884-1674974979=:23545 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 27 Jan 2023, Hugh Dickins wrote: > On Fri, 27 Jan 2023, David Hildenbrand wrote: > > On 26.01.23 19:57, Matthew Wilcox wrote: > > > On Wed, Jan 25, 2023 at 11:59:16PM +0000, Sanan Hasanov wrote: > > >> Good day, dear maintainers, > > >> > > >> We found a bug using a modified kernel configuration file used by sy= zbot. > > >> > > >> We enhanced the coverage of the configuration file using our tool, > > >> klocalizer. > > >> > > >> Kernel Branch:=C2=A06.2.0-rc5-next-20230124 > > >> Kernel > > >> config:=C2=A0https://drive.google.com/file/d/1MZSgIF4R9QfikEuF5siUIZ= VPce-GiJQK/view?usp=3Dsharing > > >> Reproducer:=C2=A0https://drive.google.com/file/d/1H5KWkT9VVMWTUVVgIa= Zi6J-fmukRx-BM/view?usp=3Dsharing > > >> > > >> Thank you! > > >> > > >> Best regards, > > >> Sanan Hasanov This is a very interesting find: the thanks go to you. > > >> > > >> head: 0000000000020000 0000000000000000 00000004ffffffff ffff8881002= b8000 > > >> page dumped because: VM_BUG_ON_PAGE(!first && (flags & (( rmap_t)(((= (1UL))) > > >> << (0))))) > > >> ------------[ cut here ]------------ > > >=20 > > > I know it says "cut here" and you did that, but including just a few > > > lines above that would be so much more helpful. I can infer that thi= s > > > is a multi-page folio, but more than that is hard to tell. > > >=20 > > >> kernel BUG at mm/rmap.c:1248! > > >=20 > > > That tracks with VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), p= age); > > >=20 > > >> invalid opcode: 0000 [#1] PREEMPT SMP KASAN > > >> CPU: 7 PID: 14932 Comm: syz-executor.1 Not tainted 6.2.0-rc5-next-20= 230124 > > >> #1 > > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > > >> 04/01/2014 > > >> RIP: 0010:page_add_anon_rmap+0xddd/0x11c0 mm/rmap.c:1248 > > >> Code: c0 ff 48 8b 34 24 48 89 df e8 1f ff 07 00 49 89 c6 e9 85 f6 ff= ff e8 > > >> 52 73 c0 ff 48 c7 c6 c0 3c d8 89 48 89 ef e8 b3 23 f8 ff <0f> 0b e8 = 3c 73 > > >> c0 ff 48 c7 c6 00 3b d8 89 48 89 ef e8 9d 23 f8 ff > > >> RSP: 0018:ffffc9000c56f7b0 EFLAGS: 00010293 > > >> RAX: 0000000000000000 RBX: ffff88807efc6f30 RCX: 0000000000000000 > > >> RDX: ffff8880464fd7c0 RSI: ffffffff81be733d RDI: fffff520018adedb > > >> RBP: ffffea0000c68080 R08: 0000000000000056 R09: 0000000000000000 > > >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0000c68000 > > >> R13: 0000000000000001 R14: ffffea0000c68088 R15: 0000000000000000 > > >> FS: 00007f717898a700(0000) GS:ffff888119f80000(0000) > > >> knlGS:0000000000000000 > > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> CR2: 00007f7178947d78 CR3: 000000004a9e6000 CR4: 0000000000350ee0 > > >> Call Trace: > > >> > > >> remove_migration_pte+0xaa6/0x1390 mm/migrate.c:261 > > >=20 > > > if (folio_test_anon(folio)) > > > page_add_anon_rmap(new, vma, pvmw.ad= dress, > > > rmap_flags); > > >=20 > > > Earlier in that function, we had: > > > if (folio_test_anon(folio) && > > > !is_readable_migration_entry(entry)) > > > rmap_flags |=3D RMAP_EXCLUSIVE; > > >=20 > > > so that also makes sense. We can also infer that RMAP_COMPOUND wasn'= t > > > set, so we're trying to do just one page from the folio. > > >=20 > > > All right, back to rmap.c: > > >=20 > > > first =3D atomic_inc_and_test(&page->_mapcount); > > >=20 > > > So first is clearly false (ie _mapcount was not -1), implying somebod= y > > > else already mapped this page. Not really sure what's going on at > > > this point. Seems unlikely that the folio changes in > > > remove_migration_pte() are responsible since they're from last Januar= y. > > > Huang has some more changes to migrate.c that I don't feel qualified > > > to judge. > > >=20 > > > Nothing's jumping out at me as obviously wrong. Is it possible to > > > do a bisect? > >=20 > > I reproduced on next-20230127 (did not try upstream yet). Upstream's fine; on next-20230127 (with David's repro) it bisects to 5ddaec50023e ("mm/mmap: remove __vma_adjust()"). I think I'd better hand on to Liam, rather than delay you by puzzling over it further myself. > >=20 > > I think two key things are that a) THP are set to "always" and b) we ha= ve a > > NUMA setup [I assume]. > >=20 > > The relevant bits: > >=20 > > [ 439.886738] page:00000000c4de9000 refcount:513 mapcount:2 > > mapping:0000000000000000 index:0x20003 pfn:0x14ee03 > > [ 439.893758] head:000000003d5b75a4 order:9 entire_mapcount:0 > > nr_pages_mapped:511 pincount:0 > > [ 439.899611] memcg:ffff986dc4689000 > > [ 439.902207] anon flags: > > 0x17ffffc009003f(locked|referenced|uptodate|dirty|lru|active|head|swapb= acked|node=3D0|zone=3D2|lastcpupid=3D0x1fffff) > > [ 439.910737] raw: 0017ffffc0020000 ffffe952c53b8001 ffffe952c53b80c8 > > dead000000000400 > > [ 439.916268] raw: 0000000000000000 0000000000000000 0000000000000001 > > 0000000000000000 > > [ 439.921773] head: 0017ffffc009003f ffffe952c538b108 ffff986de35a0010 > > ffff98714338a001 > > [ 439.927360] head: 0000000000020000 0000000000000000 00000201ffffffff > > ffff986dc4689000 > > [ 439.932341] page dumped because: VM_BUG_ON_PAGE(!first && (flags & (= ( > > rmap_t)((((1UL))) << (0))))) > >=20 > >=20 > > Indeed, the mapcount of the subpage is 2 instead of 1. The subpage is o= nly > > mapped into a single > > page table (no fork() or similar). Yes, that mapcount:2 is weird; and what's also weird is the index:0x20003: what is remove_migration_pte(), in an mbind(0x20002000,...), doing with index:0x20003? My guess is that the remove-__vma_adjust() commit is not properly updating vm_pgoff into non_vma in some case: so that when remove_migration_pte() looks for where to insert the new pte, it's off by one page. > >=20 > > I created this reduced reproducer that triggers 100%: Very helpful, thank you. > >=20 > >=20 > > #include > > #include > > #include > > #include > >=20 > > int main(void) > > { > > =09mmap((void*)0x20000000ul, 0x1000000ul, PROT_READ|PROT_WRITE|PROT_EXE= C, > > =09 MAP_ANONYMOUS|MAP_FIXED|MAP_PRIVATE, -1, 0ul); > > =09madvise((void*)0x20000000ul, 0x1000000ul, MADV_HUGEPAGE); > >=20 > > =09*(uint32_t*)0x20000080 =3D 0x80000; > > =09mlock((void*)0x20001000ul, 0x2000ul); > > =09mlock((void*)0x20000000ul, 0x3000ul); It's not an mlock() issue in particular: quickly established by substituting madvise(,, MADV_NOHUGEPAGE) for those mlock() calls. Looks like a vma splitting issue now. > > =09mbind((void*)0x20002000ul, 0x1000ul, MPOL_LOCAL, NULL, 0x7fful, > > =09MPOL_MF_MOVE); I guess it will turn out not to be relevant to this particular syzbug, but what do we expect an mbind() of just 0x1000 of a THP to do? It's a subject I've wrestled with unsuccessfully in the past: I found myself arriving at one conclusion (split THP) in one place, and a contrary conclusion (widen range) in another place, and never had time to work out one unified answer. So I do wonder what pte replaces the migration entry when the bug here is fixed: is it a pte pointing into the THP as before, in which case what was the point of "migration"? is it a Copy-On-Bind page? or has the whole THP been migrated? I ought to read through those "estimated mapcount" threads more carefully: might be relevant, but I've not paid enough attention. Hugh > > =09return 0; > > } > >=20 > > We map a large-enough are for a THP and then populate a fresh anon THP = (PMD > > mapped) > > to write to it. > >=20 > > The first mlock() will trigger PTE-mapping the THP and mlocking that su= bpage. > > The second mlock() seems to cause the issue. The final mbind() triggers= page > > migration. > >=20 > > Removing one of the mlock() makes it work. Note that we do a double > > mlock() of the same page -- the one we are then trying to migrate. > >=20 > > Somehow, the double mlock() of the same page seems to affect our mapcou= nt. > >=20 > > CCing Hugh. >=20 > Thanks David - most especially for the reproducer, not tried here yet. > I'll assume this is my bug, and get into it later in the day. >=20 > Hugh ---1463760895-1811275884-1674974979=:23545--