From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C59F4C4361B for ; Thu, 10 Dec 2020 17:24:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D99622D02 for ; Thu, 10 Dec 2020 17:24:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D99622D02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 900426B0080; Thu, 10 Dec 2020 12:24:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B0FD6B0081; Thu, 10 Dec 2020 12:24:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79F906B0082; Thu, 10 Dec 2020 12:24:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 601A46B0080 for ; Thu, 10 Dec 2020 12:24:16 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 205823632 for ; Thu, 10 Dec 2020 17:24:16 +0000 (UTC) X-FDA: 77578046112.09.mist01_550f219273fa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id F29D3180AD820 for ; Thu, 10 Dec 2020 17:24:15 +0000 (UTC) X-HE-Tag: mist01_550f219273fa X-Filterd-Recvd-Size: 5535 Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Dec 2020 17:24:15 +0000 (UTC) Received: by mail-lf1-f66.google.com with SMTP id 23so9347841lfg.10 for ; Thu, 10 Dec 2020 09:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=NUepeImek9g6AkAD8FP2WpSlLYMQ/jndgfbQYnX33PYoGJkeaWhM41SQ4PYFMxds9i 8KPSUbc/mSnK8nMSeWE8bE9HJJVm2FkBbsHLGAsFTXqlZLGSU3mpdMA+0D2KkOgKY6O5 hPXsCpdYHUJ74krd32q8hOzJQ+Ls46rsXJEbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=gIQWX779IHcAFbsYfz9W+Gd09XoAnid+Sun4XzDXVp3sxgMiWLwhXlJq8yb9mNeJwN qT5YQvWsZ1SUDMbr9MQ5oMRGHEEOCKwIMtdj1nrG052G/CEjuOK/nNag5WhGXKL7aGTS nDlE6TDxsg/sxKF8CzrucVl38wMRAerg+rG1idE5tFit9QoppkrtG1rCA+Anu45pFljl mIiI2HiOxJXqIJND5Zmi7fUGa/lCu6GCKK/5cG5USvM8iTmUou7H8tCneRw4SIZV2b2L J3jOn+4Ti0moFLBquH4UZr6B+7AoxTCjIMRBwPKFRbjYLv+1YbzARtX4hdrPtNsQaQUm bzMw== X-Gm-Message-State: AOAM532/sxmvI0Z080ydFd+M2oiYh10KS6+S+6KWVn7jT1F7yyVXQMES j8AAMkZTff6yTUNy+Yq27cLeqPNraooSlA== X-Google-Smtp-Source: ABdhPJwMPiksTldMO6ZRKrg6KfyOLy3o8DGK5Zqa3vO92b9r03P3ZTT4y+Jmgxp/MXn32F07I//JxA== X-Received: by 2002:a19:4311:: with SMTP id q17mr1187444lfa.453.1607621053178; Thu, 10 Dec 2020 09:24:13 -0800 (PST) Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com. [209.85.167.43]) by smtp.gmail.com with ESMTPSA id m28sm607412lfo.10.2020.12.10.09.24.11 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Dec 2020 09:24:12 -0800 (PST) Received: by mail-lf1-f43.google.com with SMTP id m12so9369512lfo.7 for ; Thu, 10 Dec 2020 09:24:11 -0800 (PST) X-Received: by 2002:a05:6512:338f:: with SMTP id h15mr2925765lfg.40.1607621050581; Thu, 10 Dec 2020 09:24:10 -0800 (PST) MIME-Version: 1.0 References: <20201209163950.8494-1-will@kernel.org> <20201209163950.8494-2-will@kernel.org> <20201209184049.GA8778@willie-the-truck> <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> From: Linus Torvalds Date: Thu, 10 Dec 2020 09:23:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting To: "Kirill A. Shutemov" Cc: Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus