From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34291EB64DD for ; Sat, 22 Jul 2023 00:05:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B2C68D0002; Fri, 21 Jul 2023 20:05:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13C308D0001; Fri, 21 Jul 2023 20:05:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF80A8D0002; Fri, 21 Jul 2023 20:05:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DD8998D0001 for ; Fri, 21 Jul 2023 20:05:50 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A365C1C914E for ; Sat, 22 Jul 2023 00:05:50 +0000 (UTC) X-FDA: 81037304460.23.AF75D9B Received: from mail-yw1-f181.google.com (mail-yw1-f181.google.com [209.85.128.181]) by imf07.hostedemail.com (Postfix) with ESMTP id C657D40018 for ; Sat, 22 Jul 2023 00:05:48 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7a0ETakW; spf=pass (imf07.hostedemail.com: domain of surenb@google.com designates 209.85.128.181 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689984348; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gilgmM3T5/NO10gFbFfqKTcuQVcjb78tMR3nmoqFU0M=; b=OTUTMW3lceY0EPEy/jbnOcLWCtNSwRg/44CW1GAvNhtGhj053yuvrhg+0caCi+eHQNSn20 sofqoBi15RR/H8ps/xUUDLlQ22qM/jffWc/n23UfCOobQYcFE37fkD9/05n0VrgU27+yaj xM7Nmr9oBzGBHGj/OLXK/sCdBT4WkpE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7a0ETakW; spf=pass (imf07.hostedemail.com: domain of surenb@google.com designates 209.85.128.181 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689984348; a=rsa-sha256; cv=none; b=ReyPbtxvbJNx80efukRrg2pI3Eh4LTrt/MQTsle/mIps0Xd/l1DiHlhiL3urJvIR5TCmls PKvYGwixmYVnubuwHofloJDs020ui0M2DDN24KWybrXK3BfVJ+AY2kX+C6OkBWYLH/PjF6 hEVbwWGI7djIR8D3uiRIzNjtzD5q5K4= Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-583b019f1cbso4498717b3.3 for ; Fri, 21 Jul 2023 17:05:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689984348; x=1690589148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gilgmM3T5/NO10gFbFfqKTcuQVcjb78tMR3nmoqFU0M=; b=7a0ETakWVKpUKxncf4WB0YUPORg4ERAZLthtoN1muAeu8Eo24Tcw51rxTbRFIhlL4N PCl55xfnooRDA0g3L5YizNjjHl+SgpJpO+TsL/prUWsM0Al0SvubCJAW8a+wy+CTQDOc NISkNBGulVqHikK1TmnccpTwV5EekVQtDOIvlqYvuzhTMImZ8eSI71PmMNL2iEoyRso7 415bBqMa/e1h3InB3zEK5GPYaYSbuBT//ap+cidPJS6ku6CL32nwbBWhO9zyNHGWrKFB betNjH+5kKALWNs0tP1nsn+94VEtho5S7C6YhoqL4SmxSr9jCvNy8os4zckTAm742G2O lq+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689984348; x=1690589148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gilgmM3T5/NO10gFbFfqKTcuQVcjb78tMR3nmoqFU0M=; b=a3cMkr8FqvfowosE/HFf/vQYxO0ci7LgWZfQT/D+cyUKorTxbjnZXkpHerp0+EbmSu FHkOZJyMRWgGSMiBa6jGPoD1aXWPzJURgA3NV/zLXKrQplm5oAc1v8qnh1W2Bdz1PNKg 0eqCCJ97YSPQrNvDtuz3BeI7Vr5CtpYM7QIJ3C6zEedRWIAoylxzDm9/E8X9/vAwmeJW wnAXfAFedDMMCrGHqX6KAen1lsXu9LPQq0/8byi8+9ZgcsQQj8gfBn4DkrBnOqFeGPvm yW9JAVp0fbhGTSLfkx/rE7Wa5l0+e3ebRoFXBa0YPSNGfDSG616dVGGpjOA0q6wxZE7Z UdlA== X-Gm-Message-State: ABy/qLb3fYL0tR+QDnsMPr9swgMRUsMh/DWxK9w5hs5dV5XVrA38aqGU 9lEiBA602wUqEs0uTfkUWp37/iIL+zHYSDWU/wL04APTM0h6ikvw25y6Bw== X-Google-Smtp-Source: APBJJlHSkvKhK/NasDaYJ11LN/tb8BV7TC00wY/CCFhdkwG0zIAKsUxOEFovv4IpP1NB81j0ZIsmu89aEHmwY09HI38= X-Received: by 2002:a25:77d2:0:b0:cf6:5149:211a with SMTP id s201-20020a2577d2000000b00cf65149211amr3406479ybc.55.1689984347588; Fri, 21 Jul 2023 17:05:47 -0700 (PDT) MIME-Version: 1.0 References: <20230721034643.616851-1-jannh@google.com> In-Reply-To: <20230721034643.616851-1-jannh@google.com> From: Suren Baghdasaryan Date: Fri, 21 Jul 2023 17:05:33 -0700 Message-ID: Subject: Re: [PATCH] mm: Lock VMA in dup_anon_vma() before setting ->anon_vma To: Jann Horn Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C657D40018 X-Rspam-User: X-Stat-Signature: a86mmemxs6j5h1rk757761furm1iti6w X-Rspamd-Server: rspam01 X-HE-Tag: 1689984348-994852 X-HE-Meta: U2FsdGVkX18eKkN9FebFlJTYTZWfctjpx1oF+R2qr1AFZX4RMGwe0/MiF6ranGwPd0mJG2M0K9FHY6YmYPwNWNcmr6ZS8Rq/NrlmAem1eHHgfilbjQQR+Dd+sX/zbl4P2SG3iyRnvavt1cV8cCP7K9c5ltIhPFC4CkE+OBampMtYq2yfjA7CY2lnN9oXHiBcxSEK2Tn7ENpblfJgmf2B/DRPmI7U6ZCFSgjNgenSwaKLsdGOb3UUOQwpfhOIfeAtIXkOOxCc3h3QlJEsVfYi+aaqpbMnMLtmiU1da3hekdUh2uyfFZ+6Hd374odJGTzuI2WvHl96aWK9kReUyDk/XL6eLKcdWPWdaRbtiy9m/YBPLWlAHEibFy3X3RiG1n/t+r9PG4XGhJP+XtH7hIb/p5bMUbU87jyE+20Wqs5bERaZwaA4De1VM6PXAp6BQUmC8Y0vSqozm+Oz3npqVxMsNp5SFLh1QW6/+iPFR0ZkdmOXWPxERVM7dBUeHITK3eRQ95nxQdd6VxcbtGrWpj9cyiq7/oizkEfM05Xo57iyMdn6uwWzcGdq3D0hiC1zIdleFpGXkptthdlSYl34QqlwB6vTUFOCSc2rL+yqygILj6QzUC0zc5zv3UZHnz6Rzr1wWRtL7z46JbvS/OFI/p+0ybnKzv9IQX5q036eOnR97poFfpD2dSncU30A3uvPzHcPMT5c+Xkm1+Ck/ocJX8LSs0VN2jkancP4Y70l+uTwMA6J/JWmit7IcwXy6rxD/tJtFuVL4z2AYAMaloLyiiVUxWG96osflfwB09/0qRe/45Uo71ozVuavLT35Og8LnBHBahNMk1e33LpopXxlJaMG9h9N7nk1lWHLg//blwp4O7vxQ4hVZNLFzvYwtmdVuMWeL0QJCbJVDxYeUebGfyqoht0U4nUqR3Ndw6F8qUBvoz/L+1AI08lU/3RWluWWWd29gTL1dclrGFhl/Qvn/vK C/YuXOwy U65iqRyhCFB0Y40QTFmaCJLrAgfmUhoRk6rSOnaSaXpfRG+4AbAGeMxwwS1t8T/Wf95UApNaMFq86hsRj1faN4AkwuFtNf+9kP78NVG+uipbqAHchZ/s4f4LZOBVGLsN+lB34/8OOUzcRydtiN4nYeb/jHL6Y3KVGfq//cS0wyHNxlm4o/Nd/IryFIOIs7eBH3eAh3zdftyvyRQZ45M4J1XqZZgT8z9BJlf7qXrwzqzWyKASA9XbpV7AnVVexnKManYnlsNH2R0B1gqJ52w5wj6t6+j5kF43MAPjs7TUOwLyNTxSs5zJgq4C+GzEPq84EyHWSr5nurcIAI7g82YVmAlnRa6vaDriDDS5NltmBL7h/9tQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 8:46=E2=80=AFPM Jann Horn wrote: > > When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the > VMA that is being expanded to cover the area previously occupied by anoth= er > VMA. This currently happens while `dst` is not write-locked. > > This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon a= s > the assignment `dst->anon_vma =3D src->anon_vma` has happened, concurrent > page faults can happen on `dst` under the per-VMA lock. > This is already icky in itself, since such page faults can now install > pages into `dst` that are attached to an `anon_vma` that is not yet tied > back to the `anon_vma` with an `anon_vma_chain`. > But if `anon_vma_clone()` fails due to an out-of-memory error, things get > much worse: `anon_vma_clone()` then reverts `dst->anon_vma` back to NULL, > and `dst` remains completely unconnected to the `anon_vma`, even though w= e > can have pages in the area covered by `dst` that point to the `anon_vma`. > > This means the `anon_vma` of such pages can be freed while the pages are > still mapped into userspace, which leads to UAF when a helper like > folio_lock_anon_vma_read() tries to look up the anon_vma of such a page. > > This theoretically is a security bug, but I believe it is really hard to > actually trigger as an unprivileged user because it requires that you can > make an order-0 GFP_KERNEL allocation fail, and the page allocator tries > pretty hard to prevent that. > > I think doing the vma_start_write() call inside dup_anon_vma() is the mos= t > straightforward fix for now. Indeed, this is a valid fix because we end up modifying the 'dst' without locking it. Locking in vma_merge()/vma_expand() happens inside vma_prepare() but that's too late because dup_anon_vma() would already happen. > > For a kernel-assisted reproducer, see the notes section of the patch mail= . > > Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to contro= l it") > Cc: stable@vger.kernel.org > Cc: Suren Baghdasaryan > Signed-off-by: Jann Horn Reviewed-by: Suren Baghdasaryan > --- > To reproduce, patch mm/rmap.c by adding "#include " and > changing anon_vma_chain_alloc() like this: > > static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp) > { > + if (strcmp(current->comm, "FAILME") =3D=3D 0) { > + // inject delay and error > + mdelay(2000); > + return NULL; > + } > return kmem_cache_alloc(anon_vma_chain_cachep, gfp); > } > > Then build with KASAN and run this reproducer: > > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > > #define SYSCHK(x) ({ \ > typeof(x) __res =3D (x); \ > if (__res =3D=3D (typeof(x))-1L) \ > err(1, "SYSCHK(" #x ")"); \ > __res; \ > }) > > static char *area; > static volatile int fault_thread_done; > static volatile int spin_launch; > > static void *fault_thread(void *dummy) { > while (!spin_launch) /*spin*/; > sleep(1); > area[0] =3D 1; > fault_thread_done =3D 1; > return NULL; > } > > int main(void) { > fault_thread_done =3D 0; > pthread_t thread; > if (pthread_create(&thread, NULL, fault_thread, NULL)) > errx(1, "pthread_create"); > > // allocator spam > int fd =3D SYSCHK(open("/etc/hostname", O_RDONLY)); > char *vmas[10000]; > for (int i=3D0; i<5000; i++) { > vmas[i] =3D SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVA= TE, fd, 0)); > *vmas[i] =3D 1; > } > > // create a 3-page area, no anon_vma at this point, with guard vma behi= nd it to prevent merging with neighboring anon_vmas > area =3D SYSCHK(mmap((void*)0x10000, 0x4000, PROT_READ|PROT_WRITE, MAP_= PRIVATE|MAP_ANONYMOUS, -1, 0)); > SYSCHK(mmap(area+0x3000, 0x1000, PROT_READ, MAP_SHARED|MAP_FIXED, fd, 0= )); > // turn it into 3 VMAs > SYSCHK(mprotect(area+0x1000, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC)); > > // create an anon_vma for the tail VMA > area[0x2000] =3D 1; > > // more allocator spam > for (int i=3D5000; i<10000; i++) { > vmas[i] =3D SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVA= TE, fd, 0)); > *vmas[i] =3D 1; > } > > printf("with anon_vma on tail VMA:\n\n"); > system("cat /proc/$PPID/smaps | head -n55"); > printf("\n\n"); > > spin_launch=3D1; > // mprotect() will try to merge the VMAs but bail out due to the inject= ed > // allocator failure > SYSCHK(prctl(PR_SET_NAME, "FAILME")); > SYSCHK(mprotect(area+0x1000, 0x1000, PROT_READ|PROT_WRITE)); > SYSCHK(prctl(PR_SET_NAME, "normal")); > > printf("after merge from mprotect:\n\n"); > if (!fault_thread_done) > errx(1, "fault thread not done yet???"); > system("cat /proc/$PPID/smaps | head -n55"); > printf("\n\n"); > > // release the anon_vma > SYSCHK(munmap(area+0x1000, 0x2000)); > > // release spam > for (int i=3D0; i<10000; i++) > SYSCHK(munmap(vmas[i], 0x1000)); > > // wait for RCU > sleep(2); > > // trigger UAF? > printf("trying to trigger uaf...\n"); > SYSCHK(madvise(area, 0x1000, 21/*MADV_PAGEOUT*/)); > } > > > You should get an ASAN splat like: > > BUG: KASAN: use-after-free in folio_lock_anon_vma_read+0x9d/0x2f0 > Read of size 8 at addr ffff8880053a2660 by task normal/549 > > CPU: 1 PID: 549 Comm: normal Not tainted 6.5.0-rc2-00073-ge599e16c16a1-di= rty #292 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian= -1.16.2-1 04/01/2014 > Call Trace: > > dump_stack_lvl+0x36/0x50 > print_report+0xcf/0x660 > [...] > kasan_report+0xc7/0x100 > [...] > folio_lock_anon_vma_read+0x9d/0x2f0 > rmap_walk_anon+0x282/0x350 > [...] > folio_referenced+0x277/0x2a0 > [...] > shrink_folio_list+0xc9f/0x15c0 > [...] > reclaim_folio_list+0xdc/0x1f0 > [...] > reclaim_pages+0x211/0x280 > [...] > madvise_cold_or_pageout_pte_range+0x2ea/0x6a0 > [...] > walk_pgd_range+0x6c5/0xb90 > [...] > __walk_page_range+0x27f/0x290 > [...] > walk_page_range+0x1fd/0x230 > [...] > madvise_pageout+0x1cd/0x2d0 > [...] > do_madvise+0xb58/0x1280 > [...] > __x64_sys_madvise+0x62/0x70 > do_syscall_64+0x3b/0x90 > [...] > > > mm/mmap.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 3eda23c9ebe7..3937479d0e07 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm_area_struct = *dst, > * anon pages imported. > */ > if (src->anon_vma && !dst->anon_vma) { > + vma_start_write(dst); > dst->anon_vma =3D src->anon_vma; > return anon_vma_clone(dst, src); > } > > base-commit: e599e16c16a16be9907fb00608212df56d08d57b > -- > 2.41.0.487.g6d72f3e995-goog >