From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 850CCC43461 for ; Tue, 15 Sep 2020 23:22:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BDDF2208E4 for ; Tue, 15 Sep 2020 23:22:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="cT+m4/Sf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BDDF2208E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 003736B0147; Tue, 15 Sep 2020 19:22:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECF946B0149; Tue, 15 Sep 2020 19:22:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D97136B014A; Tue, 15 Sep 2020 19:22:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id C11E06B0147 for ; Tue, 15 Sep 2020 19:22:41 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 740A92480 for ; Tue, 15 Sep 2020 23:22:41 +0000 (UTC) X-FDA: 77266872522.17.woman96_0b162dd27115 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 48D05180D0186 for ; Tue, 15 Sep 2020 23:22:41 +0000 (UTC) X-HE-Tag: woman96_0b162dd27115 X-Filterd-Recvd-Size: 7057 Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Sep 2020 23:22:40 +0000 (UTC) Received: by mail-il1-f193.google.com with SMTP id h11so4644585ilj.11 for ; Tue, 15 Sep 2020 16:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4H2qVK4w547uV3EqUqVyJ9uKvMaqmmH5qAQ7XSG/+SE=; b=cT+m4/Sflfb21KpiBS3nTulUK8OLY6yo61bKQZVdoFM6SCxUKhElw2KyKlIXFrchpg fM38fIst4G/Olouvu2xJ5zaByp/vzngH0K1TAQEQOKfT+x1+9J499/TFGqjSlfVCT+9B 8WfUF2eA9JlcI9Cex3U9paNzZ4oJNVENJ2jIpd+kRGh1mvVzdJMcTgVKlgJqJ3WzwaQC N+r9LCHtTTxzrVFd1vZ/UWT1ZO8zgj0MABs2LzgjMBzxhnSrRdZEXuoth+x4F+xEDalA 7K62jfybhHLd9VwXk0dqdv85dP2lXuQ33/PT+FFfsN3tlxwslG0HNidXSDXOZcuO1HvY K+YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=4H2qVK4w547uV3EqUqVyJ9uKvMaqmmH5qAQ7XSG/+SE=; b=aG6/7DDSufHkKLlQbzHH/yfs73bGm520L85QWP0llw9vixdZ3UCR3Ss4tZWU3+6/1A +wFDlBs9df6xTk4zhY75UDSzem+BmJ/V43ovSHNM4GTjMmM1YUzrRA5DWtg+XfhkacBR BIc8RL42AN9cs3/RCmAZEVUj0EDk+cFeCILxhZL+5esx3h+GfEXVZtQ8o2RTLO1X+uhO 2eoXD1W0Gx7Z5NJ3IorCWiOFDx4FObrAuRUOQU//Zl4zw8dYmbYTa4m90DiCWo7e0VZV CyH7g5RoeEuBLPEh9CE459iFRqIJd6kWrGdb4v0NMyYBlrdAYiW+KoysvIqIQhlxLfHZ Wdvw== X-Gm-Message-State: AOAM530FnoCrQ7kA+I6zhlBbCNWVIG1WM/WbAuQV0oestOJnhax2IhJD tSSBK3LzcO+wQYAD5M9+wj5A1A== X-Google-Smtp-Source: ABdhPJy5sWk6NAbW8AaJioc5f7jA1q5VOQMgOXwcVU+NB3JLXVbl3BCKxo6FPyXMBhrMw296gGWhsQ== X-Received: by 2002:a92:194b:: with SMTP id e11mr18171975ilm.43.1600212160225; Tue, 15 Sep 2020 16:22:40 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id m5sm606593ilc.79.2020.09.15.16.22.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 16:22:39 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kIKHW-006itc-FP; Tue, 15 Sep 2020 20:22:38 -0300 Date: Tue, 15 Sep 2020 20:22:38 -0300 From: Jason Gunthorpe To: Peter Xu Cc: Linus Torvalds , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200915232238.GO1221970@ziepe.ca> References: <20200914211515.GA5901@xz-x1> <20200914225542.GO904879@nvidia.com> <20200914232851.GH1221970@ziepe.ca> <20200915145040.GA2949@xz-x1> <20200915160553.GJ1221970@ziepe.ca> <20200915182933.GM1221970@ziepe.ca> <20200915191346.GD2949@xz-x1> <20200915193838.GN1221970@ziepe.ca> <20200915213330.GE2949@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200915213330.GE2949@xz-x1> X-Rspamd-Queue-Id: 48D05180D0186 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 15, 2020 at 05:33:30PM -0400, Peter Xu wrote: > > RDMA doesn't ever use !WRITE > > > > I'm guessing #5 is the issue, just with a different ordering. If the > > #3 pin_user_pages() preceeds the #2 fork, don't we get to the same #5? > > Right, but only if without MADV_DONTFORK? Yes, results are that MADV_DONTFORK resolves the issue for initial tests. I should know if a bigger test suite passes in a few days. > > > If this is a problem, we may still need the fix patch (maybe not as urgent as > > > before at least). But I'd like to double confirm, just in case I miss some > > > obvious facts above. > > > > I'm worred that the sudden need to have MAD_DONTFORK is going to be a > > turn into a huge regression. It already blew up our first level of > > synthetic test cases. I'm worried what we will see when the > > application suite is run in a few months :\ > > For my own preference I'll consider changing kernel behavior if the impact is > still under control (the performance report of 30%+ boost is also attractive > after the simplify-cow patch). The other way is to maintain the old reuse > logic forever, that'll be another kind of burden. Seems no easy way on either > side... It seems very strange that a physical page exclusively owned by a process can become copied if pin_user_pages() is active and the process did fork() at some point. Could the new pin_user_pages() logic help here? eg the GUP_PIN_COUNTING_BIAS stuff? Could the COW code consider a refcount of GUP_PIN_COUNTING_BIAS + 1 as being owned by the current mm and not needing COW? The DMA pin would be 'invisible' for COW purposes? Perhaps an ideal thing would be to have fork() not set the write protect on pages that have GUP_PIN_COUNTING_BIAS (ie are under DMA), instead immediately copy during fork(). Then we don't get into this situation and also don't need MADV_DONTFORK anymore (yay!!). Feels like this could be low cost as fork() must already be touching the refcount? It looks like RDMA, media, vfio, vdpa, io_uring (!!) and xdp all use FOLL_LONGTERM and may be at regression risk. I can't say at this point the scope of the problem with RDMA. *Technicaly* apps forking without MADV_DONTFORK are against the defined programming model, but since the old kernel didn't fail robustly there could be misses. FWIW the failure is catastrophic, the app just breaks completely. io_uring seems like something that would have interest to mix with fork.. I see mentions of MADV_DONTFORK in io_uring documentation, however it is not documented as a 'if you ever call fork() you have to use this API'. That seems worrying. > > > IMHO it worked because the page to do RDMA has mapcount==1, so it was reused > > > previously just as-is even after the fork without MADV_DONTFORK and after the > > > child quits. > > > > That would match the results we see.. So this patch changes things so > > it is not re-used as-is, but replaced with Y? > > Yes. The patch lets "replaced with Y" (cow) happen earlier at step #3. Then > with MADV_DONTFORK, reuse should not happen any more. ?? Step #3 is pin_user_pages(), why would that replace with COW? Thanks, Jason