From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D9A3C4743F for ; Mon, 7 Jun 2021 19:55:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0E44260FF4 for ; Mon, 7 Jun 2021 19:55:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E44260FF4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A4C246B0074; Mon, 7 Jun 2021 15:55:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A23496B0075; Mon, 7 Jun 2021 15:55:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89C926B0078; Mon, 7 Jun 2021 15:55:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id 5B5896B0074 for ; Mon, 7 Jun 2021 15:55:52 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E462A181AF5FE for ; Mon, 7 Jun 2021 19:55:51 +0000 (UTC) X-FDA: 78227983302.34.E226D01 Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf05.hostedemail.com (Postfix) with ESMTP id 88C5CE0004FC for ; Mon, 7 Jun 2021 19:55:39 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id d2so19682167ljj.11 for ; Mon, 07 Jun 2021 12:55:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8LJdE0/DOUMHrqesI+TC9uG48g5eWLGdmgXC+rBKWYk=; b=jq4wEsDdfDzBDd8wF08Z7U1wcIi/8IVpvGjKMjl6XhnY/ZS3tZh0wi2GySCz4dreC8 12ec2GVcpWM7xD+rdbt+k+2UnY3bcr5GVx4/FjmjbgkSJ7gqkKEo7Wax5CnPkmacjZ40 ktRBiCErqwr6bvjsnxafJrBNTFh3vYovlJnknW/597rkJ3F12+yKwna6UIgsacXymjD3 Xys/VTk7EZJFbx02dJ7LlMsAHALzCtqFxoRQVj3Ulr0Ne+6JCZ4JZr0UrMVmYdzaBfTG qtMQfj4O+y+zenhGQ5IxYa07aDSH660rczA1ojHrgURK3desoY+WbHEQZozss75Lt1ix GXJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8LJdE0/DOUMHrqesI+TC9uG48g5eWLGdmgXC+rBKWYk=; b=SbFVMnrbncl5gX5SIwlNi/Z1IFcFYnfyMLOQofUBvUox576pvPJ7FXFD1iWxTKKdEZ 5Irs7khrm0oc4GZUFXZ0Mu+Mh9mkCB6y5G7ZtL1n3AtRB0zjrtxi13RAW8+1hp7lNWgT m/s9jOsyulfLCkxVgMu2HVb8F1UEuKWE7VgKMznBB/VbOP1HIlu61hBM67HgF+rnFob+ 0pbdz2rDF/gX3tKRoYCnnZDmwfp2dxBO/BkGcJV/UVfJzfNVW2GlBD0xvjdtwRCZW2yI Zl3cWZ8XRVXGXTR+qBsp4oilHT6gImEVPrjznWaZpaMviFcXEEVL2k8c2QsrMMoAZHuV RvJw== X-Gm-Message-State: AOAM530bJZz7YFvbWcFUMv+agwX15vO9lJ3z/OT0N8Q6g3D9DVgEBDO2 qUvl8Fe/WEo7IuNwt1+Gpu4COoUccopcNHAdW0mUSw== X-Google-Smtp-Source: ABdhPJyFA7F43gQQ8CHg14LneeqgMAlavMivZIGL81fpZSDpUZHp85rgUDcvc1UaV4dhCuLt0y/QJWr/mOni0F+MtTo= X-Received: by 2002:a05:651c:178f:: with SMTP id bn15mr16369600ljb.448.1623095735829; Mon, 07 Jun 2021 12:55:35 -0700 (PDT) MIME-Version: 1.0 References: <00000000000017977605c395a751@google.com> In-Reply-To: From: Jann Horn Date: Mon, 7 Jun 2021 21:55:09 +0200 Message-ID: Subject: Re: split_huge_page_to_list() races with page_mapcount() on migration entry in smaps code? [was: Re: [syzbot] kernel BUG in __page_mapcount] To: Matthew Wilcox Cc: Linux-MM , Zi Yan , Peter Xu , "Kirill A. Shutemov" , Konstantin Khlebnikov , Andrew Morton , chinwen.chang@mediatek.com, kernel list , syzkaller-bugs , Vlastimil Babka , Michel Lespinasse , syzbot Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=jq4wEsDd; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of jannh@google.com designates 209.85.208.178 as permitted sender) smtp.mailfrom=jannh@google.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 88C5CE0004FC X-Stat-Signature: 8cpuyo4gqr316u8wqdokhtf3s63itf4j X-HE-Tag: 1623095739-123858 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 7, 2021 at 8:03 PM Matthew Wilcox wrote: > On Mon, Jun 07, 2021 at 07:27:23PM +0200, Jann Horn wrote: > > === Short summary === > > I believe the issue here is a race between /proc/*/smaps and > > split_huge_page_to_list(): > > > > The codepath for /proc/*/smaps walks the pagetables and (e.g. in > > smaps_account()) calls page_mapcount() not just on pages from normal > > PTEs but also on migration entries (since commit b1d4d9e0cbd0a > > "proc/smaps: carefully handle migration entries", from Linux v3.5). > > page_mapcount() expects compound pages to be stable. > > > > The split_huge_page_to_list() path first protects the compound page by > > locking it and replacing all its PTEs with migration entries (since > > the THP rewrite in v4.5, I think?), then does the actual splitting > > using __split_huge_page(). > > > > So there's a mismatch of expectations here: > > The smaps code expects that migration entries point to stable compound > > pages, while the THP code expects that it's okay to split a compound > > page while it has migration entries. > > Will it be a colossal performance penalty if we always get the page > refcount after looking it up? That will cause split_huge_page() to > fail to split the page if it hits this race. Hmm - but with that approach I'm not sure you could even easily take a refcount on a page whose refcount may be frozen and which may be in the middle of being shattered? get_page_unless_zero() is wrong because you can't take references on tail pages, right? (Or can you?) And try_get_page() is wrong because it bugs out if the refcount is zero - and even if it didn't do that, you might end up holding a reference on the head page while the page you're actually interested in is a tail page? I guess if it was really necessary, it'd be possible to do some kind of retry thing that grabs a reference on the compound head, then checks that the tail page is still associated with the compound head, and if not, drops the compound head and tries again?