From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CE3EC4727E for ; Fri, 25 Sep 2020 19:56:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5F1F23A1E for ; Fri, 25 Sep 2020 19:56:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="K2e+w8y0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A5F1F23A1E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7063900003; Fri, 25 Sep 2020 15:56:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF89A900002; Fri, 25 Sep 2020 15:56:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC040900003; Fri, 25 Sep 2020 15:56:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id A1D9A900002 for ; Fri, 25 Sep 2020 15:56:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 63D0F180AD802 for ; Fri, 25 Sep 2020 19:56:31 +0000 (UTC) X-FDA: 77302640982.21.net53_1d0b4392716a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 46F81180442C0 for ; Fri, 25 Sep 2020 19:56:31 +0000 (UTC) X-HE-Tag: net53_1d0b4392716a X-Filterd-Recvd-Size: 6943 Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 19:56:30 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id s205so3424112lja.7 for ; Fri, 25 Sep 2020 12:56:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qUiy2ig2a15MrzQwKLflyxH1xyfxaBmeAG61rIjvflk=; b=K2e+w8y0EFDqfA6rNqzjoQsIAS1IBF+m+AAbrXyUmRaFi0guqcGQf95oAMkA5qmqSG mX05KX/tiJGJFcjwBr2/LwKRNzqV/6s1eUoxd4qJUIgVvGCKzIklVLTVB2RdIqxx8zO8 h4nz26JplzDj2c4dht03UI5Iwc+kEchIVoPy8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qUiy2ig2a15MrzQwKLflyxH1xyfxaBmeAG61rIjvflk=; b=Gt/l6NAExwF7xcIFtWPPApg7YWJthZ7B3hBseK/QuhgQyJbrpbUax0Tne2jtI23zKZ WiUHgoZbeSoOXntujM9SMytBCX+4ODVfPXcnNi59cKozuAXYG4wMDsV+RvEVabrQmCg+ AGHePlmZq+EQFZG0aVxRUqb1FxScJOudpxQ0mxYmXPsXcPEJlyebcMP91DLd74dAqE4o vB08zEF1DjYUgAh9kTyZKq/tyKgljbYHamJUCkMZho7P1A3ifVmMfVO7J0rsAgHPNS4Q 75n3rLdxoH7eiuqRttimxEdpqta6QNjCAlf/3YIg96NVtu46DzQmHKxchkBeOHl4cKDc lrhA== X-Gm-Message-State: AOAM5315uSuYOUPHLeflRLXlErvq2iC151Ac/mTVagdZuqxT2LtHw+h0 hERUYmHG7bt3BdJ5CqwFKKyVCrfKIT8niQ== X-Google-Smtp-Source: ABdhPJybld+o2MKJMXI1qPuYPXea8QfH9cF93c+Hx+WLmtlO40dqWDmpKF1JQR4GZ6AqpPGuYwHYlQ== X-Received: by 2002:a2e:9c9:: with SMTP id 192mr1731134ljj.197.1601063788211; Fri, 25 Sep 2020 12:56:28 -0700 (PDT) Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com. [209.85.167.54]) by smtp.gmail.com with ESMTPSA id c4sm84427lfr.108.2020.09.25.12.56.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 25 Sep 2020 12:56:22 -0700 (PDT) Received: by mail-lf1-f54.google.com with SMTP id w11so4144676lfn.2 for ; Fri, 25 Sep 2020 12:56:22 -0700 (PDT) X-Received: by 2002:a19:e00a:: with SMTP id x10mr150939lfg.603.1601063781887; Fri, 25 Sep 2020 12:56:21 -0700 (PDT) MIME-Version: 1.0 References: <20200922175415.GI19098@xz-x1> <20200922191116.GK8409@ziepe.ca> <20200923002735.GN19098@xz-x1> <20200923170759.GA9916@ziepe.ca> <20200924143517.GD79898@xz-x1> <20200924165152.GE9916@ziepe.ca> <20200924175531.GH79898@xz-x1> <20200924181501.GF9916@ziepe.ca> <20200924183418.GJ79898@xz-x1> <20200924183953.GG9916@ziepe.ca> <20200924213010.GL79898@xz-x1> In-Reply-To: <20200924213010.GL79898@xz-x1> From: Linus Torvalds Date: Fri, 25 Sep 2020 12:56:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned To: Peter Xu Cc: Jason Gunthorpe , John Hubbard , Linux-MM , Linux Kernel Mailing List , Andrew Morton , Jan Kara , Michal Hocko , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Christoph Hellwig , Andrea Arcangeli , Oleg Nesterov , Leon Romanovsky , Jann Horn Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 24, 2020 at 2:30 PM Peter Xu wrote: > > > > > > > With the extra mprotect(!WRITE), I think we'll see a !pte_write() entry. Then > > > it'll not go into maybe_dma_pinned() at all since cow==false. > > > > Hum that seems like a problem in this patch, we still need to do the > > DMA pinned logic even if the pte is already write protected. > > Yes I agree. I'll take care of that in the next version too. You people seem to be worrying too much about crazy use cases. The fact is, if people do pinning, they had better be careful afterwards. I agree that marking things MADV_DONTFORK may not be great, and there may be apps that do it. But honestly, if people then do mprotect() to make a VM non-writable after pinning a page for writing (and before the IO has completed), such an app only has itself to blame. So I don't think this issue is even worth worrying about. At some point, when apps do broken things, the kernel says "you broke it, you get to keep both pieces". Not "Oh, you're doing unreasonable things, let me help you". This has dragged out a lot longer than I hoped it would, and I think it's been over-complicated. In fact, looking at this all, I'm starting to think that we don't actually even need the mm_struct.has_pinned logic, because we can work with something much simpler: the page mapping count. A pinned page will have the page count increased by GUP_PIN_COUNTING_BIAS, and my worry was that this would be ambiguous with the traditional "fork a lot" UNIX style behavior. And that traditional case is obviously one of the cases we very much don't want to slow down. But a pinned page has _another_ thing that is special about it: the pinning action broke COW. So I think we can simply add a if (page_mapcount(page) != 1) return false; to page_maybe_dma_pinned(), and that very naturally protects against the "is the page count perhaps elevated due to a lot of forking?" Because pinning forces the mapcount to 1, and while it is pinned, nothing else should possibly increase it - since the only thing that would increase it is fork, and the whole point is that we won't be doing that "page_dup_rmap()" for this page (which is what increases the mapcount). So we actually already have a very nice flag for "this page isn't duplicated by forking". And if we keep the existing early "ptep_set_wrprotect()", we also know that we cannot be racing with another thread that is pinning at the same time, because the fast-gup code won't be touching a read-only pte. So we'll just have to mark it writable again before we release the page table lock, and we avoid that race too. And honestly, since this is all getting fairly late in the rc, and it took longer than I thought, I think we should do the GFP_ATOMIC approach for now - not great, but since it only triggers for this case that really should never happen anyway, I think it's probably the best thing for 5.9, and we can improve on things later. Linus