From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D45FC2D0CD for ; Wed, 21 May 2025 05:23:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E0936B0093; Wed, 21 May 2025 01:23:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06A096B0092; Wed, 21 May 2025 01:23:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4C566B0096; Wed, 21 May 2025 01:23:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C2E496B0093 for ; Wed, 21 May 2025 01:23:31 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 18A91140243 for ; Wed, 21 May 2025 05:23:31 +0000 (UTC) X-FDA: 83465772222.23.07DB6D2 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) by imf23.hostedemail.com (Postfix) with ESMTP id A751B140008 for ; Wed, 21 May 2025 05:23:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=E42HdTkx; spf=pass (imf23.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.19 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747805008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VYWlpf5d8SgYI1TObzvl+I7jNFoyiUf80r9j03Zp9UM=; b=mmD2sHbaWAYqpOUWOjB73L/NoosSuzUkUZ3YhuI/AE6DmrFH4BB4Vy6IXqagcmvqSYXogS XjWqSJWYZu5UYfrRSWIHZL/OKJV54BF9jUkGsA7UsAbmDKC78++6zuRjYHwIEK93plAqFj HfhIIcLwYgm/eDHHw9HVgvBZlMtjzHY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747805008; a=rsa-sha256; cv=none; b=Wv5nlChbkugXZBpfyeaH3ASg9n/JwoffP5vyW1oRy1VFSJF/dVM+pEXtW+d3bahqzk85FB w7T+swnequWvXTOwXqAHz7Hxj8WqbptiMmZqlshTadxbZgpMD0yCUhR79UpIJYQCSUlM0H bkd+AkprIsF2XGZtlMjoJg1oF8b888I= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=E42HdTkx; spf=pass (imf23.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.19 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747805009; x=1779341009; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t4+iQHSwbMgQCBWXMwVldXx62xLmgRKIFkwKNtYEwnY=; b=E42HdTkxByydbIXaJ6TBdgxO8J5Zosj9xgqMkhcLewmidArOqYCXSQEQ BB5Av7eLjZY+9/NYXDHTXPyzR6iPPBdwd778Lnhf7zXw5gg2dOfulYcFQ yo+/FCksnqQArbaU0QkLtnCvUBD95GR9FOis8CIQJeEGZyy1ccBWSy7tb yyCKL3L9QY/VZ2Hs83ytctxbMo3nJ7QmXgv0N3O9AEIKPlo4XrzXBb1J5 NDZpuYlJUQDgxNBqYiRPf/pnmty1xtIqp0UU67w5DV7hUL8wVF1NQQIEm PtzkkPnfG0Hvd6qxB7TvwmbyznOeEy06Zprw5PJvTPGk3Wm+JIGYwJdMZ g==; X-CSE-ConnectionGUID: G+beACU7TTCgALfNMyqVDg== X-CSE-MsgGUID: HoSkz7rmSK+IyZ8EttvjTA== X-IronPort-AV: E=McAfee;i="6700,10204,11439"; a="49639004" X-IronPort-AV: E=Sophos;i="6.15,303,1739865600"; d="scan'208";a="49639004" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2025 22:23:24 -0700 X-CSE-ConnectionGUID: pealmcelRomwHc3neHnJbA== X-CSE-MsgGUID: xflplAHMSgKhluXjCV3YIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,303,1739865600"; d="scan'208";a="163188666" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2025 22:23:24 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , syzbot+a504cb5bae4fe117ba94@syzkaller.appspotmail.com, Steve Sistare , Muchun Song , David Hildenbrand , Andrew Morton Subject: [PATCH v3 2/3] mm/memfd: Reserve hugetlb folios before allocation Date: Tue, 20 May 2025 22:19:36 -0700 Message-ID: <20250521052119.1105530-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250521052119.1105530-1-vivek.kasireddy@intel.com> References: <20250521052119.1105530-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: rk8ra8d15659jfojqhhek4cfwbdd4p91 X-Rspamd-Queue-Id: A751B140008 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747805008-3026 X-HE-Meta: U2FsdGVkX19Xd9beECF47MT6Cil/lXU+dWVYVCJlN4163WTsuutr8zDGgJUvFappTMS6JnxnyaAYPWrbtkpwlcswfFciiYztbmBIKjEA+yIxRnha+Iq08W5zBrNrcax/9gJfre4b0vvCrUH45G1i7gt8sjResmVNZyGIu4QetxlX8OmW2dvn8HsgncmHyVY5kpDqZ5xWrDAVjGM4fFfuot8ZactrlnenUtG5vNzXdTZNJoJfpjsp+H4ua0sEBdd8jiZmwOcYt+uImw+g0ntz6VIGmOBhOUO49WAqB7YsVYvuM8PDFKYlL2/UjqKc3t9IM046GG65e6C3r9jBosAAfjmpWikWGTz3N+NIQFopR0PoeK4PQdmvs02pFXIWUSB/jTrjYjmHRcSF2cuNjr3eIAWVlUOZmZmamzw+l3NKf8hh9F61NXVuSui/1ORazu9lsbKk55lgxvrFuu7tKmeDaM1DrF2QvvoZVhZneasAjQ9VH3YBe2QcyeiC1oIq8/BjKECGw4cCA4UR+irN4ddC+LMDJQJB1joD2YOWFvjq/lQpG/CiXfyjP494l5FbLJRD/ppHsuPoHnOqd/Z+g/XCjCbJG7EsFPLbThSrBFJSec4fOGkjxTn8Y00+lYibMLi0ig+6JG3pC408h06GLGwPNJBHGpAs/c3O/VyV42UN8eveAtlVaP2MWU2HEm73Ubs4mDpYoy6HBob8uU6XGM4I5lr2uFxd4c8FStQ4ja4mDUmCfCvfqE7soHxo0icC+DhZXmw+Wh2TOHi+/PRQ8HiqF4KaHrnCbSCAzXk8vWxoHHDhAAyrtuim/OCg1LkQ0iiQuT2I9zrmqIE3C3VZDxWlXtRtayBID5llcDRfeRho8qwMnGbYhGut4YieTnNoN+BSQrsKkcvm3OFa597Q3zZj9DPcLWnRbGu42qZgYrrynDvWRopPqglsFrJyuN3yMmRPMV9vBriHi3qOgoou+cq hZzFcbaY 3I4C/Y2RrkFPuTP/rYjtR1yO/jO1QK2QGqZ/s51nas/z6yupVIMUekN6P7lCM0DoG8Nb/Ob+F3iKKNph6olDNF3mW8hfQ9GbADEYE+/M7K/FPlbDz12+pZs5xD/m7xIspBd2r7/MF71L8fhDiXKAxF2JcqzQ9PZkRHyb0CYVpc82Mbfzk0R8iMpgjE6DyhTXp5Bf21zZlkobzBFc5n9/G0A2u87UNGugxPjvpBOfeTTziukjzxgpaxwljTcZYQSnasGiTTa4gcd4f4C4oodoPEfvdLzw93b5Nnmc/UXx981uabeJJtl5eCwTBGDASaKURqlS/jH8SYyh91Ac6AhFb4h+NZ1GAk/NR/j6Npp2DOCpVpMBDPhjfXtTgmA2RXUNAwJZpcd935Rg2Rg0o6CrBrdMbE+cNVSrz3rGIGie7D6zRX1Dm//Kq9Z5Piw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are cases when we try to pin a folio but discover that it has not been faulted-in. So, we try to allocate it in memfd_alloc_folio() but there is a chance that we might encounter a crash/failure (VM_BUG_ON(!h->resv_huge_pages)) if there are no active reservations at that instant. This issue was reported by syzbot: kernel BUG at mm/hugetlb.c:2403! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5315 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller-00161-g63676eefb7a0 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:alloc_hugetlb_folio_reserve+0xbc/0xc0 mm/hugetlb.c:2403 Code: 1f eb 05 e8 56 18 a0 ff 48 c7 c7 40 56 61 8e e8 ba 21 cc 09 4c 89 f0 5b 41 5c 41 5e 41 5f 5d c3 cc cc cc cc e8 35 18 a0 ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f RSP: 0018:ffffc9000d3d77f8 EFLAGS: 00010087 RAX: ffffffff81ff6beb RBX: 0000000000000000 RCX: 0000000000100000 RDX: ffffc9000e51a000 RSI: 00000000000003ec RDI: 00000000000003ed RBP: 1ffffffff34810d9 R08: ffffffff81ff6ba3 R09: 1ffffd4000093005 R10: dffffc0000000000 R11: fffff94000093006 R12: dffffc0000000000 R13: dffffc0000000000 R14: ffffea0000498000 R15: ffffffff9a4086c8 FS: 00007f77ac12e6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f77ab54b170 CR3: 0000000040b70000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memfd_alloc_folio+0x1bd/0x370 mm/memfd.c:88 memfd_pin_folios+0xf10/0x1570 mm/gup.c:3750 udmabuf_pin_folios drivers/dma-buf/udmabuf.c:346 [inline] udmabuf_create+0x70e/0x10c0 drivers/dma-buf/udmabuf.c:443 udmabuf_ioctl_create drivers/dma-buf/udmabuf.c:495 [inline] udmabuf_ioctl+0x301/0x4e0 drivers/dma-buf/udmabuf.c:526 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Therefore, to avoid this situation and fix this issue, we just need to make a reservation (by calling hugetlb_reserve_pages()) before we try to allocate the folio. This will ensure that we are properly doing region/subpool accounting associated with our allocation. While at it, move subpool_inode() into hugetlb header and also replace the VM_BUG_ON() with WARN_ON_ONCE() as there is no need to crash the system in this scenario and instead we could just warn and fail the allocation. Fixes: 26a8ea80929c ("mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak") Reported-by: syzbot+a504cb5bae4fe117ba94@syzkaller.appspotmail.com Signed-off-by: Vivek Kasireddy Cc: Steve Sistare Cc: Muchun Song Cc: David Hildenbrand Cc: Andrew Morton --- include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 14 ++++++-------- mm/memfd.c | 17 ++++++++++++++--- 3 files changed, 25 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 793d8390d3e4..ca3d6a3acae1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -729,6 +729,11 @@ extern unsigned int default_hstate_idx; #define default_hstate (hstates[default_hstate_idx]) +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return HUGETLBFS_SB(inode->i_sb)->spool; +} + static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio) { return folio->_hugetlb_subpool; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cba9d60a4e28..6a9b701586eb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -283,11 +283,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, return ret; } -static inline struct hugepage_subpool *subpool_inode(struct inode *inode) -{ - return HUGETLBFS_SB(inode->i_sb)->spool; -} - static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) { return subpool_inode(file_inode(vma->vm_file)); @@ -2354,12 +2349,15 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, struct folio *folio; spin_lock_irq(&hugetlb_lock); + if (WARN_ON_ONCE(!h->resv_huge_pages)) { + spin_unlock_irq(&hugetlb_lock); + return NULL; + } + folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, preferred_nid, nmask); - if (folio) { - VM_BUG_ON(!h->resv_huge_pages); + if (folio) h->resv_huge_pages--; - } spin_unlock_irq(&hugetlb_lock); return folio; diff --git a/mm/memfd.c b/mm/memfd.c index c64df1343059..783f61de5784 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -70,7 +70,6 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) #ifdef CONFIG_HUGETLB_PAGE struct folio *folio; gfp_t gfp_mask; - int err; if (is_file_hugepages(memfd)) { /* @@ -79,12 +78,19 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) * alloc from. Also, the folio will be pinned for an indefinite * amount of time, so it is not expected to be migrated away. */ + struct inode *inode = file_inode(memfd); struct hstate *h = hstate_file(memfd); + int err = -ENOMEM; + long nr_resv; gfp_mask = htlb_alloc_mask(h); gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); idx >>= huge_page_order(h); + nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, 0); + if (nr_resv < 0) + return ERR_PTR(nr_resv); + folio = alloc_hugetlb_folio_reserve(h, numa_node_id(), NULL, @@ -95,12 +101,17 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) idx); if (err) { folio_put(folio); - return ERR_PTR(err); + goto err_unresv; } + + hugetlb_set_folio_subpool(folio, subpool_inode(inode)); folio_unlock(folio); return folio; } - return ERR_PTR(-ENOMEM); +err_unresv: + if (nr_resv > 0) + hugetlb_unreserve_pages(inode, idx, idx + 1, 0); + return ERR_PTR(err); } #endif return shmem_read_folio(memfd->f_mapping, idx); -- 2.49.0