From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CEBFC7EE22 for ; Mon, 8 May 2023 21:27:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8E96B0087; Mon, 8 May 2023 17:27:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA891900002; Mon, 8 May 2023 17:27:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C70AE6B0089; Mon, 8 May 2023 17:27:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B7D966B0087 for ; Mon, 8 May 2023 17:27:49 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8289E12033D for ; Mon, 8 May 2023 21:27:49 +0000 (UTC) X-FDA: 80768375058.03.43EA5F9 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 9EEB020009 for ; Mon, 8 May 2023 21:27:47 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=h9wTM9A3; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683581267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s/5gK8M/80HK8akrc9oY1kCgNbVpDxReudipYj/RHew=; b=sz9HBM/nj5vogMb6p3x/qm/6DoRvykwyX4J3Vz4R7Hb63HTw0DEOVyjQ3dX3mgXyvh9ctw 6qVDDKf1Q+wi4tIBjjMyIBtPobo/TY3r+DHcva9BdPkkiDVkVmQdyf8IOJZbNHiDIOexVT CrpcvtK3V2XQTd108zPEasiX68VNNLc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683581267; a=rsa-sha256; cv=none; b=43r/BBnGZn1M+3zqT3EVPbEJnKboUsY6PQ/T4KnOT+/cGtugd3U5Tyqs8pqvgCGIVLrrXd VMDU4BgTopwdqK3OZfke0ZCbSUmM+NwiNPZiyVRRKxDxISovE/nuUlYdCEdvAfePOGNQ4J u/A0ouwDVkLYGa3rx9lWGudTNTC0PY4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=h9wTM9A3; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-3f1f44d55cdso55785501cf.0 for ; Mon, 08 May 2023 14:27:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1683581266; x=1686173266; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=s/5gK8M/80HK8akrc9oY1kCgNbVpDxReudipYj/RHew=; b=h9wTM9A3kRp296JOqWRdtFZsmFCLrV//rkDQ5mNp3/81MH/u/k3AbXvduaze1n0KOZ Fmcls8qNSK65RyrERO17Ikvy7XDqm0F2xprpI2fZzZtOMdX90Iswwrf3DSlWQ5CGZC6q q6l1CxpqXg7Fg8FednkfmBfslns/vSxKC5MdynBw9Vcb25CLiUaGxp8d4UPYHvSQk8mG b3Q8x2noIidSFm/NJRACxqSkr/E6hLNNDwMWB3K41EZW/Ebk7Ls1iuCft6EXyK6skt8H y0YD14ym9WKA8/Idt8TkJhi+uID38vPBPX6LLirbSdTgfF7n1Kk+KxzkNAZtIYcdV2Mk iykA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683581266; x=1686173266; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s/5gK8M/80HK8akrc9oY1kCgNbVpDxReudipYj/RHew=; b=GX1QAhSKUIAYfOZ47dQMApqN6Jee5AnIAGytxZUZuM92lcGIR7B8N3B6vVteEYnGrQ 1Day/DWqeCWZZEV0AH8kJl2jtzosj2jQgVmoktJXNqR6AN/VdTSx+62EX9up86Onvyp5 P0DhTEMWu4+pyRUgi5nLLmMeMd0g37mGkrJc3srendNDzMYxFNGsd2DEKLlgutLjqf2j D/5nU/yJePjPB8Evnf5iceEL/3uj+LmLP3jiohe2647vgBsUidb3auekZnlN6/d7w1nx ghJ4y54QP4DHyH9oI3OOsusg5LPf6COh14EqQu0q+icHEd94YTk2pAdwywy0V+wKR4pW +k1g== X-Gm-Message-State: AC+VfDyC0qjYDWNBqlW5PfhSL5nk3qLi+2tbX2W61xRpv6FoR9/vPOUV Jusy6RYqjcXaq2FXIlqPdx80n/HxrFjx4La9oL56Lw== X-Google-Smtp-Source: ACHHUZ7eJmXTKgNxS7TlldFQiPMEyTLJrp5S1je99UhtwyLP0ced9iIZHSU/DouaWZf/D7YzYuHfH940tUMV2iTdnys= X-Received: by 2002:ac8:59c1:0:b0:3ec:490b:ce6e with SMTP id f1-20020ac859c1000000b003ec490bce6emr18145645qtf.48.1683581266624; Mon, 08 May 2023 14:27:46 -0700 (PDT) MIME-Version: 1.0 References: <000000000000258e5e05fae79fc1@google.com> <20230507135844.1231056-1-lrh2000@pku.edu.cn> In-Reply-To: <20230507135844.1231056-1-lrh2000@pku.edu.cn> From: Pasha Tatashin Date: Mon, 8 May 2023 17:27:10 -0400 Message-ID: Subject: Re: usbdev_mmap causes type confusion in page_table_check To: Ruihan Li Cc: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, linux-usb@vger.kernel.org, syzkaller-bugs@googlegroups.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 1kxahe88hs4zu6ir3kw3cankqt4uaywu X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9EEB020009 X-HE-Tag: 1683581267-891811 X-HE-Meta: U2FsdGVkX1+3ZOrIwgYcXd266Sn6UmQWNUcrHByAYH5gtH60UhPXjIiPTamocwuslCrso+eqNIr1clem2TOFXazDd/RrpLLYSQnvOhDiuHqiWeFGVmj8WSTTaGO8IF3klrUUyoPu0oZL5HLLAJkcRN74kpTsqsqvJKGmoOTXP19gGhMmfgIiPXxG0c0CAEkxsCuGcpQp6+unO2HfFTRKSAeM367ESKhhE+f9fBB9BSkyhHPzZ2iqsW7MfV39j0YCsowEbGyelnJx/DOYX/5rdLxx5MwNIWacFFFIDRF/xMPM6IV5joHeWBc0saoRHbQQEVzNXOdAPWHwJDfm3c1PmzOVyk1uOKRaND2/I9Fx36RMSNUuiTwdJoitl6rEFWm5pxuDzNpwuL82x1ZrbbqKvH1KacjRteJWbWmg+9kiM8S1Bb1iOm7w6Di4cx+K6NlooHGiWo55MqGIFpHR0isNTJ0Eo3hBKDMVxjDsdV2RUOdpSZ91wG9f5YYueTtr3LIFXRgJg8by1gUShqBSBr/wzRoRQtJNhEyOm/JwKB2j0PosnfJo/UlXLADo7Hetvoo6UAWuPugZQixYW86gJifO3NfTG0/GhtPCRU2Je07rnbn0jpBpL9UBFEVRDeX2LNAwxuK/IBBYM+8JVGOCXB1dcLY9aAgpR9HFQL/ARSIcV3ITpxgo9904yrm5S2J2F6r/gAySplXg5h0RTOJpJxL1TKsOqvG8ae+B4/3rZEH7Q51UqeoS267LRRWSrZ3QUUpAQO3QJIHjNT1ncCt4Yt+GFDEa29BjDWUSSVijcROaxfX1ST1v0PPjkDdQxBMwqX9MVqP/bbgvklC7Z6wHxKqjomYDDrLG9GA1i6GB3upxu2XxablhPxhFpSTmy9Dh38A8Jxsgn+TdFOxq7F0jE3nZDuGBc8E4x6sDt0kW3z4QjjPOMtHBOyBsXv/N94tJNWzxEGPaMvqED1SHqcehT4z dDf6gdoE 2JJa8dssjbLLCh38oqCuXuvj9vZ7OaXiVxeZBXdCguYv9ZoeoP8cYUdGPoHtuJILx1PcLn5l6+STiXPoltbpwtys7s9jQ4vLkZN8OSGOYx+iZx6arQDWKzdx5c0imdNL4ocDauRK3sQYpd7DJKvLn94kQc3BhXDzsh4grm7WOLU4Jb+Xi5WjRzZuRwa5P19vdnY2jU5vIi2ojrSphX/OpD+WBzeIBs0IU/Eza4I9OxG6oefU1W9r1c06wcrq0YfzWcl668LM/+mPkiPzFvR+jRhHkh8spYlk9xbrM69b9brIBUc0bNLCTs5GSmwii3VClCZy1I/95vhmZpxtg1rOo4quhQrtEt3dwoEzPeQgJgjY0JrHm4JsiQMJPJAN8rJo4BlyqYe/nv57fFG7Qva6SDnjtytYli8jj92MlVAVbWTpWwIM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, May 7, 2023 at 9:58=E2=80=AFAM Ruihan Li wrote= : > > Hi all, Hi Ruihan, Thank you for bisecting, and great analysis of the problem. > Recently, syzbot reported [1] ("kernel BUG in page_table_check_clear"). A= fter > some bisection, I found out that when usbdev_mmap calls remap_pfn_range o= n > kmalloc'ed memory, it causes type confusion between struct folio and slab= in > page_table_check. As I will explain below, it seems that both usb-side an= d > mm-side need some fixes. So I have cc'd linux-usb and linux-mm here, as w= ell > as their maintainers, to see whether there are any comments on the propos= ed > way to fix. > > [1] https://lore.kernel.org/all/000000000000258e5e05fae79fc1@google.com/ > > To handle mmap requests for /dev/bus/usb/XXX/YYY, usbdev_mmap first alloc= ates > memory by calling usb_alloc_coherent and then maps it into the user space > using remap_pfn_range: > > static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > { > // ... > mem =3D usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN= , > &dma_handle); > // ... > if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { > if (remap_pfn_range(vma, vma->vm_start, > virt_to_phys(usbm->mem) >> PAGE_SHIFT= , > size, vma->vm_page_prot) < 0) { > // ... > } > } > // ... > } > > Note that in this case, we consider the DMA-unavailable scenario, which, = to be > honest, is rare in practice. However, syzbot emulates USB devices using a > module named dummy_hcd, and since these devices are emulated, DMA is not > available at all. > > Meanwhile, usb_alloc_coherent calls hcd_buffer_alloc, which uses kmalloc = for > memory allocation: > > void *hcd_buffer_alloc( > struct usb_bus *bus, > size_t size, > gfp_t mem_flags, > dma_addr_t *dma > ) > { > // ... > if (!hcd_uses_dma(hcd)) { > *dma =3D ~(dma_addr_t) 0; > return kmalloc(size, mem_flags); > } > // ... > } > > However, during the update of the page table to map the kmalloc'd page in= to > the user space, page_table_check_set attempts to determine whether the pa= ge is > anonymous using PageAnon: > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr= , > unsigned long pfn, unsigned long pgcnt, > bool rw) > { > // ... > anon =3D PageAnon(page); > for (i =3D 0; i < pgcnt; i++) { > // ... > if (anon) { > BUG_ON(atomic_read(&ptc->file_map_count)); > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > = 1 && rw); > } else { > BUG_ON(atomic_read(&ptc->anon_map_count)); > BUG_ON(atomic_inc_return(&ptc->file_map_count) < = 0); > } > // ... > } > // ... > } > > This call to PageAnon is invalid for slab pages because slab reuses the b= its > in struct page/folio to store its internal states, and the anonymity bit = only > exists in struct page/folio. As a result, the counters are incorrectly up= dated > and checked in page_table_check_set and page_table_check_clear, leading t= o the > bug being raised. We should change anon boolean to be: anon =3D !PageSlab(page) && PageAnon(page); This way, we will have a correct way to determine anon pages, and the rest will fall into file_map_count. The file (non-anon) PTEs are OK to be double mapped, so there won't be any problems from page_table_check point of view even if it is a slab page. > Intuitively, I don't think it's reasonable to call remap_pfn_range to map > kmalloc'd pages into the user space. In the past, kmalloc'd pages might h= ave > had alignment issues when certain memory debugging options were enabled. > Although this has been fixed in commit 59bb47985c1d ("mm, sl[aou]b: guara= ntee > natural alignment for kmalloc(power-of-two)"), it has been shown that > performing such mapping is still buggy, as demonstrated by the type confu= sion > in page_table_check. Therefore, I propose adding a new function, > hcd_buffer_alloc_pages, to guarantee the page requirements (i.e., no > intermediate allocators, such as slab/slub). Below is a diff as a quick > example: > > diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c > index fbb087b72..514bdc949 100644 > --- a/drivers/usb/core/buffer.c > +++ b/drivers/usb/core/buffer.c > @@ -112,7 +112,7 @@ void hcd_buffer_destroy(struct usb_hcd *hcd) > * better sharing and to leverage mm/slab.c intelligence. > */ > > -void *hcd_buffer_alloc( > +void *hcd_buffer_alloc_pages( > struct usb_bus *bus, > size_t size, > gfp_t mem_flags, > @@ -126,12 +126,13 @@ void *hcd_buffer_alloc( > return NULL; > > if (hcd->localmem_pool) > - return gen_pool_dma_alloc(hcd->localmem_pool, size, dma); > + return gen_pool_dma_alloc_align(hcd->localmem_pool, size, > + dma, PAGE_SIZE); > > /* some USB hosts just use PIO */ > if (!hcd_uses_dma(hcd)) { > *dma =3D ~(dma_addr_t) 0; > - return kmalloc(size, mem_flags); > + return (void *)__get_free_pages(mem_flags, get_order(size= )); > } > > for (i =3D 0; i < HCD_BUFFER_POOLS; i++) { > > (There appears to be another issue. In cases where hcd->localmem_pool is > non-null, I suspect that remap_pfn_range should not be used. This is a DM= A > scenario, and the DMA handle is properly returned, so dma_mmap_coherent s= hould > be used instead. Am I correct?) > > This does not actually fix the type confusion bug in page_table_check_*. = It > should be noted that by leveraging /dev/mem, users can map arbitrary phys= ical > memory regions into the user space, which is also achieved through > remap_pfn_range. I'm not 100% certain whether a fix is necessary, as one = may > argue that using page table checks (a kernel hardening technique, which m= eans > security is important) and /dev/mem (clearly insecure and potentially Yes, /dev/mem device is a security problem, and would not work with a hardern kernel. > exploitable) together is completely unreasonable. > > If a fix is deemed necessary, I think taking the flag VM_PFNMAP into > consideration is a reasonable solution, that said, in page_table_check_*, > * when the VM_PFNMAP flag is set, all operations and checks on > file_map_count and anon_map_count counters should be bypassed; > * when the VM_PFNMAP flag is not set, an additionally check to ensure > folio_test_slab evaluates to false should be performed. > > The implementation should be straightforward. However, I noticed that the > page_table_check_* hooks are called in arch/* instead of mm/*, which not = only > limits its supported architectures (currently x86_64, arm64, and risc-v) = but > also makes it difficult to get the struct vm_area_struct as a parameter, = since > the struct vm_area_struct is not passed from mm/* to arch/* when mapping = or > unmapping pages. I have looked at d283d422c6c4 ("x86: mm: add x86_64 supp= ort > for page table check"), but I don't see a valid reason. Perhaps Pasha can > provide some explanation about this implementation choice? We specifically decided not to use VMA information in order to avoid relying on MM state (except for limited "struct page" info). The page_table_check is a separate from Linux-MM state machine that verifies that the user accessible pages are not falsely shared. Pasha