From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41733D2A52E for ; Wed, 16 Oct 2024 16:00:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB2E06B007B; Wed, 16 Oct 2024 12:00:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C62766B0082; Wed, 16 Oct 2024 12:00:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADCAA6B0083; Wed, 16 Oct 2024 12:00:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8CBF06B007B for ; Wed, 16 Oct 2024 12:00:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 624A8C057B for ; Wed, 16 Oct 2024 16:00:31 +0000 (UTC) X-FDA: 82679927946.01.674AA88 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf07.hostedemail.com (Postfix) with ESMTP id D64874000E for ; Wed, 16 Oct 2024 16:00:27 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0q8NhI2a; spf=pass (imf07.hostedemail.com: domain of jannh@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729094297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qa8NFpxmV2KKMPMWE09yZK0j+QN5oO3iXzDey8XQmBI=; b=uFyYHqsfZ+3tIQJDWEc3ZdKSl5VarmBmhZpnNNoyzMNj56CJNdpJOqDIWEUV8wh5XrSTVA O715hG2VKp1OXK3IOf6rfBYJiOMSx54nv+wWjLS24vUv7gi2k5v8uWiRoGuhEuRAmcifqU +vfns8fTk2jg74IvaG1ow++0uBZ7zjc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729094297; a=rsa-sha256; cv=none; b=IjCHGlA0Rh2dP8/AP9k/BIQ8z7IEl6EzvXKfXFOOTZBmRQe6Q12SgivOn2oumilOsMu3LU 8lZv4i5EgjZjiIEr3fDWBgA3ZaCgd88M0tFsje2gzWtqdYu6BGBqHlfDjKc6vwA5CTDhyQ wMrupoC8YowXIgfj8F1vP1maNANIQ6E= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0q8NhI2a; spf=pass (imf07.hostedemail.com: domain of jannh@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5c932b47552so24413a12.0 for ; Wed, 16 Oct 2024 09:00:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729094438; x=1729699238; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Qa8NFpxmV2KKMPMWE09yZK0j+QN5oO3iXzDey8XQmBI=; b=0q8NhI2aPHVScgd2IPqufcjZt58qmiLoCSdysrz0CKZyO6MAApkuEYlgi4wFbtTk5M +9ZrvbmeY3zLL/QfuLvczCtVAII2rJztDTlYDEbrrtRVF4FXo+M31KHo6eQ1+SATTUxl jvGD34/ELhSNqntT/3cD7gWTlrxF+sONrELxtQKscAoY4ZYeCa+Giej/J3dV6rXBK88x lEQUj5yaIDlt2ZYS1kEThAJbhdZGxhES4WLyZT0IORY7JjMxD3N87IzGvm5p41zdIRi0 GxrHuNl4aQEIEn/dCMS9UW/NplH8kZPBDxhfW2jSW0VfhuCB6iD1q1npFMmRYUIHKA5c jpKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729094438; x=1729699238; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qa8NFpxmV2KKMPMWE09yZK0j+QN5oO3iXzDey8XQmBI=; b=nwJN1+F5RuuJPYtHdXKAR56VBPsjuie9cq3nn4270mdX6vXIA1dBFwEyMuBywUshjp t0n9jJq/cVBj8v1fspBWlTDCxNCKyLJsqqxTdhqCOv3rlcLtxNxyaHhfHcDFMO7y+roU GD2q8KJesclKHxT2WIyEqqqdunuU5jVjh/+wa0c0cvVYsw+CVxjQUjhfrW/+TbH1P+DW K+zEP8/W0d6qqIXNICN7OTMtugSgDskeF+c+vTHjSUqy03Stmb3Mla7L9F6bHW3FTzMq qOtrkydeS6GdKhJ4lu/W/MKuVrhuidXqFVSN4hpdT7UG2F3o/Un2PWLo6MmGKUdR7VEj ppMg== X-Forwarded-Encrypted: i=1; AJvYcCXEEE0ZEj1Pgd6Pv+BATfNhUQtk5NbvdEvaiD0c4l2JUDdFsDdKF3NyfE9U0gefyA4Knn87ynt3uQ==@kvack.org X-Gm-Message-State: AOJu0YzpmmtOmpgA9pQR1qKZCkTTEG54QmquutgZZA3IFxsNl5EVy1ue F24fDU1PLnsJlaMTVTALKUrA9VSaKowj77NBRcuQ4S8qXIRamuboSuQ0R22EaVQ55pCkCx4LrSU qn0coPIQXEaVSlpNUZNpDWt8yeW9BW5hlnscyEC4GkFD9x36tZXB2Xg0= X-Google-Smtp-Source: AGHT+IGTFrv2alUePOetWLW0RF+CqiSAQBuYFWA0NuJoz5ABIgUEJXZEmQ0VzCf+TPqF152AaBDGNyuugEy4WKrUbyQ= X-Received: by 2002:a05:6402:2681:b0:5c2:5641:af79 with SMTP id 4fb4d7f45d1cf-5c997539f25mr556341a12.0.1729094437912; Wed, 16 Oct 2024 09:00:37 -0700 (PDT) MIME-Version: 1.0 References: <20241016-fix-munmap-abort-v1-1-601c94b2240d@google.com> <3lixuwepwvc6zqy57k2zcw4dntd7cc5cwlx7xxoieuo3pvaajy@e2p5zf5mdzon> In-Reply-To: <3lixuwepwvc6zqy57k2zcw4dntd7cc5cwlx7xxoieuo3pvaajy@e2p5zf5mdzon> From: Jann Horn Date: Wed, 16 Oct 2024 17:59:59 +0200 Message-ID: Subject: Re: [PATCH fix 6.12] mm: mark mas allocation in vms_abort_munmap_vmas as __GFP_NOFAIL To: "Liam R. Howlett" , Jann Horn , Andrew Morton , Lorenzo Stoakes , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D64874000E X-Stat-Signature: twy9ne6mnasbt9es9saady6ydnhmgprz X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729094427-280471 X-HE-Meta: U2FsdGVkX18OaJl7yEJM16mj6sT3bU8pgzCBWpeUZq8e8ZBhhIhw6avjGDBx/t2ZcBNAWAUB/uOaE6PYeBVR/d19DQCeF3qoaEy3dtAIj+Xzt/sAI4pvGoevZRmc72dP4cOVVpPu7ktArsCFqhG+3qPPeNbZn85clnHKzl9XGjdQIaDa0h8bS+RXqE8M77Sg66dS2b/jlX85BN7uPXPQTGWh/pXB5PdWyZNUrKZ2ikVX1pSHjh7kfxavnGMDJ9uvLAnuukHXC4IxkaEMYuDmOEkvaSYiAarFnE4gzkc1vaJ6lgtHUbe+ncUvXSMLJSmCH6GTShpqXs2DXUzhmpy/x4iOrXz9hcJURzCxh8WAnmwMX3satjU3QdnM+HF3C9h4vrygIXiPElDDevDO7lUAzG8KRzxTlmtdFjr0UDAzQjUFJiWKkiysJUhorlY4e8/khFDal0SuHtiYWQMebx+uum4IVLiQ3e1CYj6GjhoTst6gd58HVdHooKGvR3NZh48Yuq3sAcii/Y35K255zQlmRm7Gl+1KSw28e+Iy0TUc32NYdf/kXPZ6opk//mwntSXJoApk3ygEQmc/QyQUsBMVbeiAn2totes+slWPR7GLqUvyrSyCXJXV78+Jh1HIFzDH33NRrrHJ1GSxx+Z/mZt3CQfQzPnp5RQkPJjVWvQ/XV9obbogWQw3NrB7neSM6X4NOA6tjaapG1clZDG9kvZtNP9EASm/oYfvxIn98yRJzw0QjHsk+mfzSR+Wb21GhTzt+gAXkdFhv5vp8YiPQO/mQ7/OK5UP50mFo06zeJMEXzQRfWqGpUpIUzdzlmqZfvSdaECDutHD+ajDpA3uXfWlNA9APpzo/XQ0ysvVViKdUPJSzUpkrr1xm1AwoMpsEHD/BS25KYoM+FshOevetll7g4I8puIjkNKCVjkNgbTItNb59vvalo4PrJnFLKRVzl1UTSIqi9qWfEHJLrgrPnm 9MlW8f8U W+RiHNdLyV4va7cQRsNF+KAwfuS2UAISu+w2lSdEkokYFw2XC9ZFOYBEf2T4/UShEP3qlWkfzYMUuMyukYbiC1/WOXr8soH87I2iQBi4lBa9CpGHO5qM2e3JPnHY2pn7syFUtzE8Wg9/lWW8hNHhm6zaASWwAVU3lm2U+GFu9YLj5jalA/JIjudFOlnV/YtEwN20+0Aj8X7Nv+PEayIJHkMrGKtZ3MaxchcSD4pEq//HOPET63T1SX9yutpYtXXXDYgtYGLE1Brxp8CF8Ua9X6AzvvS+UpG8dRV+8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 16, 2024 at 5:40=E2=80=AFPM Liam R. Howlett wrote: > * Jann Horn [241016 11:08]: > > vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs > > have already been torn down halfway (in a way we can't undo) but are > > still present in the maple tree. > > > > At this point, we *must* remove the VMAs from the VMA tree, otherwise > > we get UAF. > > Wait, the vma should be re-attached without PTEs and files closed in > this case. I don't see how we are hitting the UAF in your example - we > shouldn't unless there is something with the driver itself and the file > close? > > My concern is that I am missing an error scenario. Where are the files supposed to be closed? vms_clean_up_area() unlinks the VMA from the file and calls ->close() but doesn't unlink the file, right? FWIW, I tested on commit eca631b8fe80 (current mainline head), I didn't check whether anything in the MM tree already addresses this. (I probably should have...) > But since this is under a lock that allows sleeping, shouldn't the > shrinker kick in? Yeah, I think without error injection, you'd basically only fail this allocation if the OOM killer has already decided to kill your task and the system is entirely out of memory. OOM kills IIRC have two effects on the page allocator: 1. they allow allocating with no watermarks (ALLOC_OOM) (based on the theory that you might need to allocate some memory in order to be able to exit and free more memory) 2. they allow giving up on GFP_KERNEL allocations (see the "/* Avoid allocations with no watermarks from looping endlessly */" part of __alloc_pages_slowpath()) > Should we just use __GFP_NOFAIL for the first store > instead of the error path? Which first store? Do you mean get rid of vms_abort_munmap_vmas() entirely somehow? > > Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the = gap on failure") > > Signed-off-by: Jann Horn > > --- > > This can be tested with the following reproducer (on a kernel built wit= h > > CONFIG_KASAN=3Dy, CONFIG_FAILSLAB=3Dy, CONFIG_FAULT_INJECTION_DEBUG_FS= =3Dy, > > with the reproducer running as root): > > > > ``` > > > > typeof(x) __res =3D (x); \ > > if (__res =3D=3D (typeof(x))-1) \ > > err(1, "SYSCHK(" #x ")"); \ > > __res; \ > > }) > > > > static void write_file(char *name, char *buf) { > > int fd =3D open(name, O_WRONLY); > > if (fd =3D=3D -1) > > err(1, "unable to open for writing: %s", name); > > if (SYSCHK(write(fd, buf, strlen(buf))) !=3D strlen(buf)) > > errx(1, "write %s", name); > > SYSCHK(close(fd)); > > } > > > > int main(void) { > > // make a large area with a bunch of VMAs > > char *area =3D SYSCHK(mmap(NULL, AREA_SIZE, PROT_NONE, MAP_PRIVATE|MA= P_ANONYMOUS, -1, 0)); > > for (int off=3D0; off > SYSCHK(mmap(area+off, 0x1000, PROT_READ, MAP_FIXED|MAP_PRIVATE|MAP_= ANONYMOUS, -1, 0)); > > > > // open a file whose mappings use usbdev_vm_ops, and map it in part o= f this area > > int map_fd =3D SYSCHK(open("/dev/bus/usb/001/001", O_RDONLY)); > > void *map =3D SYSCHK(mmap(area, 0x1000, PROT_READ, MAP_SHARED|MAP_FIX= ED, map_fd, 0)); > > close(map_fd); > > > > // make RWX mapping request fail late > > SYSCHK(prctl(65/*PR_SET_MDWE*/, 1/*PR_MDWE_REFUSE_EXEC_GAIN*/, 0, 0, = 0)); > > > > // some random other file > > int fd =3D SYSCHK(open("/etc/passwd", O_RDONLY)); > > > > /* disarm for now */ > > write_file("/sys/kernel/debug/failslab/probability", "0"); > > > > /* fail allocations of maple_node... */ > > write_file("/sys/kernel/debug/failslab/cache-filter", "Y"); > > write_file("/sys/kernel/slab/maple_node/failslab", "1"); > > /* ... even though it's a sleepable allocation... */ > > write_file("/sys/kernel/debug/failslab/ignore-gfp-wait", "N"); > > /* ... in this task... */ > > write_file("/sys/kernel/debug/failslab/task-filter", "Y"); > > write_file("/proc/self/make-it-fail", "1"); > > /* ... every time... */ > > write_file("/sys/kernel/debug/failslab/times", "-1"); > > /* ... after 8 successful allocations (value chosen experimentally)..= . */ > > write_file("/sys/kernel/debug/failslab/space", "2048"); // one object= is 256 > > /* ... and print where the allocations actually failed... */ > > write_file("/sys/kernel/debug/failslab/verbose", "2"); > > /* ... and that's it, arm it */ > > write_file("/sys/kernel/debug/failslab/probability", "100"); > > > > // This mmap request will fail late due to MDWE. > > // The error recovery path will attempt to clear out the VMA pointers= with a > > // spanning_store (which will be blocked by error injection). > > mmap(area, AREA_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP= _FIXED, fd, 0); > > > > /* disarm */ > > write_file("/sys/kernel/debug/failslab/probability", "0"); > > > > SYSCHK(munmap(map, 0x1000)); // UAF expected here > > system("cat /proc/$PPID/maps"); > > } > > ``` > > > > Result: > > ``` > > FAULT_INJECTION: forcing a failure. > > name failslab, interval 1, probability 100, space 256, times -1 > > CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-ge= ca631b8fe80 #518 > > [...] > > Call Trace: > > > > dump_stack_lvl+0x80/0xa0 > > should_fail_ex+0x4d3/0x5c0 > > [...] > > should_failslab+0xc7/0x130 > > kmem_cache_alloc_noprof+0x73/0x3a0 > > [...] > > mas_alloc_nodes+0x3a3/0x690 > > mas_nomem+0xaa/0x1d0 > > mas_store_gfp+0x515/0xa80 > > [...] > > mmap_region+0xa96/0x2590 > > [...] > > do_mmap+0x71e/0xfe0 > > [...] > > vm_mmap_pgoff+0x17a/0x2f0 > > [...] > > ksys_mmap_pgoff+0x2ee/0x460 > > do_syscall_64+0x68/0x140 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [...] > > > > mmap: unmap-fail: (607) Unable to abort munmap() operation > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > BUG: KASAN: slab-use-after-free in dec_usb_memory_use_count+0x365/0x430 > > Read of size 8 at addr ffff88810e9ba8b8 by task unmap-fail/607 > > What was this pointer? Should be the "struct usb_memory *usbm". > > > > CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-ge= ca631b8fe80 #518 > > [...] > > Call Trace: > > > > dump_stack_lvl+0x66/0xa0 > > print_report+0xce/0x670 > > [...] > > kasan_report+0xf7/0x130 > > [...] > > dec_usb_memory_use_count+0x365/0x430 > > remove_vma+0x76/0x120 > > vms_complete_munmap_vmas+0x447/0x750 > > do_vmi_align_munmap+0x4b9/0x700 > > [...] > > do_vmi_munmap+0x164/0x2e0 > > __vm_munmap+0x128/0x2a0 > > [...] > > __x64_sys_munmap+0x59/0x80 > > do_syscall_64+0x68/0x140 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [...] > > > > > > Allocated by task 607: > > kasan_save_stack+0x33/0x60 > > kasan_save_track+0x14/0x30 > > __kasan_kmalloc+0xaa/0xb0 > > usbdev_mmap+0x1a0/0xaf0 > > mmap_region+0xf6e/0x2590 > > do_mmap+0x71e/0xfe0 > > vm_mmap_pgoff+0x17a/0x2f0 > > ksys_mmap_pgoff+0x2ee/0x460 > > do_syscall_64+0x68/0x140 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > Freed by task 607: > > kasan_save_stack+0x33/0x60 > > kasan_save_track+0x14/0x30 > > kasan_save_free_info+0x3b/0x60 > > __kasan_slab_free+0x4f/0x70 > > kfree+0x148/0x450 > > vms_clean_up_area+0x188/0x220 > > What line is this? Should be the vma->vm_ops->close(vma) call. (That would call dec_usb_memory_use_count(), which tail-calls kfree(), so the dec_usb_memory_use_count() wouldn't show up in a stack trace.) I don't have this kernel build anymore though, sorry. If you want I'll rebuild and get a properly symbolized trace.