From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B206FC4167B for ; Fri, 8 Dec 2023 18:32:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31BF26B0098; Fri, 8 Dec 2023 13:32:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CBF16B0099; Fri, 8 Dec 2023 13:32:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BA326B009A; Fri, 8 Dec 2023 13:32:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 092C86B0098 for ; Fri, 8 Dec 2023 13:32:11 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D334440309 for ; Fri, 8 Dec 2023 18:32:10 +0000 (UTC) X-FDA: 81544495620.25.D200694 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id 9363740017 for ; Fri, 8 Dec 2023 18:32:08 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ftae7l2C; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702060329; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JTrKGjAeLkBcKGMxQNLd+P7jRhDj7Ie2FUfD3UVG5Xw=; b=QdaTsNSgwowRzBorQx5UHY01io7ovzp2d6fcb5KCVxba0yUBr1aeV8KSJ2VXSO/Bl9OyWk FvyHsvPn3TnD/PWAFl113MopH4lz4DAicUdnoTojLESS3HUY3Pcsslb/K3NoaehlWOex/Z pjQ3QHgJal8vF827K1DkI1hyh2KjVys= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702060329; a=rsa-sha256; cv=none; b=KH4Fcq2dlZa+Tjt5eQt+sbY/AHuo3K/gpfSodtkdY6PJFvM38Kib4s06SP99tQZKD/Uwke 1yNw81Ea3zxbqPFuxEAEnOMXqX8013/uu6ZKy/8c29+DmNVVfotpi5fdoPYOll1eIn4InU Jo6IdLHabyyNB7b6MqTOzDHxg3/Zq9g= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ftae7l2C; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JTrKGjAeLkBcKGMxQNLd+P7jRhDj7Ie2FUfD3UVG5Xw=; b=ftae7l2CO/3REmdvuZVVFcEEM/ y13y3H95B90PCINFlfoKOv0Rv5/dhi2hAT5ENgYpBLrTZ5m11rMJeHe6Jby/QgB+kX5ZxVoPJdA6t eM/gZIufK7qB8txfOZd1VnG4oUIUZarsEzaNvCksV6fdu8IGVNgcsI47soh3QfeBXnFwai60zqswF 6PYkC0AqaGsdrrp2fS7ZQXODbxDJo1YS9HedWAayrXlCSC8tDDiz8y4CwqWTtA8OOv3+doUOpiCo1 7J4oTkt19113Stjyk84v4/nWYWM0NlYEsQ4BYi1hb7E5wWPZJYM/nhxfRe2cObKqXpdpo7RqqoXWp 0uCvH2IA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1rBfdi-006IQY-Tm; Fri, 08 Dec 2023 18:31:55 +0000 Date: Fri, 8 Dec 2023 18:31:54 +0000 From: Matthew Wilcox To: Vlastimil Babka Cc: Andrew Morton , Mike Kravetz , Luis Chamberlain , Oscar Salvador , Jens Axboe , io-uring@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR Message-ID: References: <20230816151201.3655946-1-willy@infradead.org> <20230816151201.3655946-8-willy@infradead.org> <8fa1c95c-4749-33dd-42ba-243e492ab109@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8fa1c95c-4749-33dd-42ba-243e492ab109@suse.cz> X-Stat-Signature: ynxgidkgrszoo7i71os9e1z1ptaz857j X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9363740017 X-Rspam-User: X-HE-Tag: 1702060328-155162 X-HE-Meta: U2FsdGVkX1/GZFRBhtCmhs4Lr2ISkce3oxUAZcJiAjSqJ9wi7hmfg0ejrUtj1Dss+riE3oH/CzFFbIbxVjYalsk1L1614Pe8zr4z/zwr2Abd04YeJN9u32WyE0HL3dSu4YTpbXsNeS5ljs9+cFHgTM4i9YF+EwWGZR7+AQMxLaMQYcwH8CLQtA7xbL7sQn3j59IEExlaPV2aTEB023RQ6TLIrMWtDZfIEJfXuzbzQjgEZjROttVf3jFJSEUll+8Z99LfFOFa7lEpkCCg1U1PBy3RV5FvEC93cwuMS8MdVv4z7ADDV0y2DXYvKhjmz6lus7eN8TiPWgyw9FutsFZIiQI/iERdjg1zzvNTpAq2J5KhCR2kz9L72e6M19fcyO436onDiOaIJ4DBDCJY6oHpzQqwRkLJ9K9tLqlbgFuzWjLTJwjgyuBTzqEMR++Wj36vsF277UIeLysDWciP7O89FeP/SbiFMWJAlbNY9G35FwckBZz9FoBowmggTKv8FYhlM17WXNgbq3tVMjUZuHeaLxoT9ipfSx9/YXaqkDJ8a3FRYytzQIqHteDteTRJDqXw0wR7HBDESuLuQjnax6lJubd2lexSwZbOVaR9MvsXBdPvWQ42lDXdjc4/HEp5AfCHxzLWk7hp58jjUAwlHpoYO9BaKcYkggMXmjlQJMEhpG0Wjav5C3LIjGQBmc8JFl8c/knuqOlee5X9jRlOKxsqVSnqDgp9g7SS+l5qH7YFsWsWlb7Nt/PQPgIXSmIcU6Sa7PX+qNJj5LBMDKmNeCqB0+hvPa0KQEicnHlXFTuEqK7pJOtL0cRgZH1WTMdURoZ6MyxGhrrSfWe20sRrVfmd4oEKiWLlFQKjaTZAW7N6RTdEtNSESXCa8/tk7QQke5UpHNbGL09gv8G6lDY5j6JwzVpplVG++5xJaM/BFykbh6mOx/fV/JCVdl0PvQN9+uYCrtxlHCH7JrWY7Qh0Nu4 4cyM9lN7 46ndzSvOw/rNDozjA21gkTT0ki36u4ebtosTT5iLg7Gl5Io07YSdMsPKlhZCb6DMZKXfl7D8CVTJY++byJoPrVukmoFTINl1zeoeRK/zPUXfU71jBUmnSKrUg/kqJbQWXI8L0ypxW8Mij4cTHCNox+HjtzEx6/i2ZC1pUqOv7SUtyCnOzYJIQEwD73ZiSMLPEas7hSw5iiARiRFbufpza9BEEbYv+QFdxqmHf0V3fZP3G1OQ94ApxivSTHsfRquO43MuCrAgPsx8+4savCVmwIAk1WXOal4eVSD4LNYI5fykrELE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 08, 2023 at 06:54:19PM +0100, Vlastimil Babka wrote: > On 8/16/23 17:11, Matthew Wilcox (Oracle) wrote: > > We can use a bit in page[1].flags to indicate that this folio belongs > > to hugetlb instead of using a value in page[1].dtors. That lets > > folio_test_hugetlb() become an inline function like it should be. > > We can also get rid of NULL_COMPOUND_DTOR. > > > > Signed-off-by: Matthew Wilcox (Oracle) > > I think this (as commit 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")) is > causing the bug reported by Luis here: > https://bugzilla.kernel.org/show_bug.cgi?id=218227 Luis, please stop using bugzilla. If you'd sent email like a normal kernel developer, I'd've seen this bug earlier. > > page:000000009006bf10 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x3f8a0 pfn:0x1035c0 > > flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff) > > page_type: 0xffffff7f(buddy) > > raw: 017fffc000000000 ffffe704c422f808 ffffe704c41ac008 0000000000000000 > > raw: 000000000003f8a0 0000000000000005 00000000ffffff7f 0000000000000000 > > page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((vo> > > ------------[ cut here ]------------ > > kernel BUG at include/linux/page-flags.h:314! > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > CPU: 6 PID: 2435641 Comm: md5sum Not tainted 6.6.0-rc5 #2 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > RIP: 0010:folio_flags+0x65/0x70 > > Code: a8 40 74 de 48 8b 47 48 a8 01 74 d6 48 83 e8 01 48 39 c7 75 bd eb cb 48 8b 07 a8 40 75 c8 48 c7 c6 d8 a7 c3 89 e8 3b c7 fa ff <0f> 0b 66 0f > > > RSP: 0018:ffffad51c0bfb7a8 EFLAGS: 00010246 > > RAX: 000000000000015f RBX: ffffe704c40d7000 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: ffffffff89be8040 RDI: 00000000ffffffff > > RBP: 0000000000103600 R08: 0000000000000000 R09: ffffad51c0bfb658 > > R10: 0000000000000003 R11: ffffffff89eacb08 R12: 0000000000000035 > > R13: ffffe704c40d7000 R14: 0000000000000000 R15: ffffad51c0bfb930 > > FS: 00007f350c51b740(0000) GS:ffff9b62fbd80000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000555860919508 CR3: 00000001217fe002 CR4: 0000000000770ee0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > PKRU: 55555554 > > Call Trace: > > > > ? die+0x32/0x80 > > ? do_trap+0xd6/0x100 > > ? folio_flags+0x65/0x70 > > ? do_error_trap+0x6a/0x90 > > ? folio_flags+0x65/0x70 > > ? exc_invalid_op+0x4c/0x60 > > ? folio_flags+0x65/0x70 > > ? asm_exc_invalid_op+0x16/0x20 > > ? folio_flags+0x65/0x70 > > ? folio_flags+0x65/0x70 > > PageHuge+0x67/0x80 > > isolate_migratepages_block+0x1c5/0x13b0 > > compact_zone+0x746/0xfc0 > > compact_zone_order+0xbb/0x100 > > try_to_compact_pages+0xf0/0x2f0 > > __alloc_pages_direct_compact+0x78/0x210 > > __alloc_pages_slowpath.constprop.0+0xac1/0xdb0 > > __alloc_pages+0x218/0x240 > > folio_alloc+0x17/0x50 > > It's because PageHuge() now does folio_test_hugetlb() which is documented to > assume caller holds a reference, but the test in > isolate_migratepages_block() doesn't. The code is there from 2021 by Oscar, > perhaps it could be changed to take a reference (and e.g. only test > PageHead() before), but it will be a bit involved as the > isolate_or_dissolve_huge_page() it calls has some logic based on the > refcount being zero/non-zero as well. Oscar, what do you think? > Also I wonder if any of the the other existing PageHuge() callers are also > affected because they might be doing so without a reference. I don't think the warning is actually wrong! We're living very dangerously here as PageHuge() could have returned a false positive before this change [1]. Then we assume that compound_nr() returns a consistent result (and compound_order() might, for example, return a value larger than 63, leading to UB). I think there's a place for a folio_test_hugetlb_unsafe(), but that would only remove the warning, and do nothing to fix all the unsafe usage. The hugetlb handling code in isolate_migratepages_block() doesn't seem to have any understanding that it's working with pages that can change under it. I can have a go at fixing it up, but maybe Oscar would prefer to do it? [1] terribly unlikely, but consider what happens if the page starts out as a non-hugetlb compound allocation. Then it is freed and reallocated; the new owner of the second page could have put enough information into it that tricked PageHuge() into returning true. Then we call isolate_or_dissolve_huge_page() which takes a lock, but doesn't take a refcount. Now it gets a bogus hstate and things go downhill from there ...)