From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4860C4727E for ; Fri, 25 Sep 2020 22:09:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2EA962076D for ; Fri, 25 Sep 2020 22:09:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="Kv1GBbSh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2EA962076D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6E22C6B005D; Fri, 25 Sep 2020 18:09:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 692A96B0062; Fri, 25 Sep 2020 18:09:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5824D6B0068; Fri, 25 Sep 2020 18:09:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 420556B005D for ; Fri, 25 Sep 2020 18:09:11 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0846A8249980 for ; Fri, 25 Sep 2020 22:09:11 +0000 (UTC) X-FDA: 77302975302.22.dolls36_341428b2716b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id D42BA18038E60 for ; Fri, 25 Sep 2020 22:09:10 +0000 (UTC) X-HE-Tag: dolls36_341428b2716b X-Filterd-Recvd-Size: 7208 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 22:09:10 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id w11so4475029lfn.2 for ; Fri, 25 Sep 2020 15:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2qHTqnIfH0BrTrH8c6GV0zk0WcCBrtPGOUQiFR1y9ws=; b=Kv1GBbShRS5ryf448gWDVZmIPK8cMF1KcxmoYwXV4eM8lJFqzkq4jrowlSJCcHHxTx 8h9/NPStUqXDhl4nu5Dix0msNC0mCw3A8pys74DFElega7J8M3CmL71s3vZGX3mQljaJ RS+srjEY/wgkIpXGQN9iWrez+qZZyiHKX9Tiw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2qHTqnIfH0BrTrH8c6GV0zk0WcCBrtPGOUQiFR1y9ws=; b=MfMK2bJKzEGItQJnLKnAkBSgCtlag6+ruSznummpotDZw3ySQD3tD7uwvpN8DEhqp5 WTFOSVx9DPHG95y7hlaYsmeNK95yK4Yx86NAiq5Mpx3nn03W6MZsiad8ZbCOmOql41OZ DHQZLJ7oYQzbUwVfwMfg5qd4Tzxt53fuZmtFOaraT49r+ONfLVvTGtszglQ8nvqGTNfE 2YJN6lFjBZvA/XQ53mpd5CelQnUi8FtalHC6viVcDHWKUcgVg8suPEU0KFkkN24dvFCl tVvfjzHk0lVG9Ggtm9JLI5vtGMVdbnQxsNWdGtTlxv/xp9ITUqOrtCSv3nME/nheSDFS 2xuQ== X-Gm-Message-State: AOAM532WoB7FhcG0S/mOnwaS6aUU5Ll4M8A2hY+JSPSbl5ieJPDp+vh7 EDyKCQ4thIytr2osxD00Ri/7g3/ao8Cd/w== X-Google-Smtp-Source: ABdhPJx5biZ7NQXcGuaYU7hh8QdOKasvuKwQ2MxyZmg83aBgep3C+gL+mMKjBdLXDM12ClduY9Za+g== X-Received: by 2002:a05:6512:2030:: with SMTP id s16mr260579lfs.277.1601071748339; Fri, 25 Sep 2020 15:09:08 -0700 (PDT) Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com. [209.85.167.47]) by smtp.gmail.com with ESMTPSA id l29sm282976lfp.11.2020.09.25.15.09.02 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 25 Sep 2020 15:09:03 -0700 (PDT) Received: by mail-lf1-f47.google.com with SMTP id w11so4474766lfn.2 for ; Fri, 25 Sep 2020 15:09:02 -0700 (PDT) X-Received: by 2002:a19:521a:: with SMTP id m26mr309828lfb.133.1601071742457; Fri, 25 Sep 2020 15:09:02 -0700 (PDT) MIME-Version: 1.0 References: <20200923002735.GN19098@xz-x1> <20200923170759.GA9916@ziepe.ca> <20200924143517.GD79898@xz-x1> <20200924165152.GE9916@ziepe.ca> <20200924175531.GH79898@xz-x1> <20200924181501.GF9916@ziepe.ca> <20200924183418.GJ79898@xz-x1> <20200924183953.GG9916@ziepe.ca> <20200924213010.GL79898@xz-x1> <20200925211321.GC188812@xz-x1> In-Reply-To: <20200925211321.GC188812@xz-x1> From: Linus Torvalds Date: Fri, 25 Sep 2020 15:08:46 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned To: Peter Xu Cc: Jason Gunthorpe , John Hubbard , Linux-MM , Linux Kernel Mailing List , Andrew Morton , Jan Kara , Michal Hocko , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Christoph Hellwig , Andrea Arcangeli , Oleg Nesterov , Leon Romanovsky , Jann Horn Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 25, 2020 at 2:13 PM Peter Xu wrote: > > On Fri, Sep 25, 2020 at 12:56:05PM -0700, Linus Torvalds wrote: > > So I think we can simply add a > > > > if (page_mapcount(page) != 1) > > return false; > > > > to page_maybe_dma_pinned(), and that very naturally protects against > > the "is the page count perhaps elevated due to a lot of forking?" > > How about the MAP_SHARED case where the page is pinned by some process but also > shared (so mapcount can be >1)? MAP_SHARED doesn't matter, since it's not getting COW'ed anyway, and we keep the page around regardless. So MAP_SHARED is the easy case. We'll never get to any of this code, because is_cow_mapping() won't be true. You can still screw up MAP_SHARED if you do a truncate() on the underlying file or something like that, but that most *definitely* falls under the "you only have yourself to blame" heading. > Would the ATOMIC version always work? I mean, I thought it could fail anytime, > so any fork() can start to fail for the tests too. Sure. I'm not really happy about GFP_ATOMIC, but I suspect it works in practice. Honestly, if somebody first pins megabytes of memory, and then does a fork(), they are doing some seriously odd and wrong things. So I think this should be a "we will try to handle it gracefully, but your load is broken" case. I am still inclined to add some kind of warning to this case, but I'm also a bit on the fence wrt the whole "convenience" issue - for some very occasional use it's probably convenient to not have to worry about this in user space. Actually, what I'm even less happy about than the GFP_ATOMIC is how much annoying boilerplate this just "map anonymous page" required with the whole cgroup_charge, throttle, anon_rmap, lru_cache_add thing. Looking at that patch, it all looks _fairly_ simple, but there's a lot of details that got duplicated from the pte_none() new-page-fault case (and that the do_cow_page() case also shares) I understand why it happens, and there's not *that* many cases, it made me go "ouch, this is a lot of small details, maybe I missed some", and I got the feeling that I should try to re-organize a few helper functions to avoid duplicating the same basic code over and over again. But I decided that I wanted to keep the patch minimal and as focused as possible, so I didn't actually do that. But we clearly have decades of adding rules that just makes even something as "simple" as "add a new page to a VM" fairly complex. Also, to avoid making the patch bigger, I skipped your "pass destination vma around" patch. I think it's the right thing conceptually, but everything I looked at also screamed "we don't actually care about the differences" to me. I dunno. I'm conflicted. This really _feels_ to me like "we're so close to just fixing this once and for all", but then I also go "maybe we should just revert everything and do this for 5.10". Except "reverting everything" is sadly really really problematic too. It will fix the rdma issue, but in this case "everything" goes all the way back to "uhhuh, we have a security issue with COW going the wrong way". Otherwise I'd have gone that way two weeks ago already. Linus