From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2A00C433FE for ; Wed, 13 Oct 2021 22:42:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 690B860E96 for ; Wed, 13 Oct 2021 22:42:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 690B860E96 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0273E6B006C; Wed, 13 Oct 2021 18:42:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF1B36B0071; Wed, 13 Oct 2021 18:42:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D92396B0072; Wed, 13 Oct 2021 18:42:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id C2F186B006C for ; Wed, 13 Oct 2021 18:42:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 71A558249980 for ; Wed, 13 Oct 2021 22:42:12 +0000 (UTC) X-FDA: 78692888904.18.6C50C7B Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf30.hostedemail.com (Postfix) with ESMTP id CB392E0016A1 for ; Wed, 13 Oct 2021 22:42:11 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id q5so3696539pgr.7 for ; Wed, 13 Oct 2021 15:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:content-transfer-encoding:mime-version:subject:message-id:date :cc:to; bh=ikDUwEUmpJmCVSQIH9hBnefZwbhJyu+GJCyEu9NTaJs=; b=Hd/H3KFp/4yGBMmwryOAkEPYdeTP4BbaqhKGktKMNViBXoGM9Gl+n56ukD4CCFVbPE snIwTENy+ZQ3gJ9KMn0mVtq7OQoFK5ukC1dLIuxOs55EloU5m+WZwJ3VS3Fj1yfW8zQg vO+pWYMoOLsmv7RPbav1f1EsxjHQVjYzQ8NVgP/ezKCFjuFED3qYiNeqNz9ysQfMVDMV zE2xlyyhvf4wwdzeL7H3Z7HLEFN63wsBo2Z30mDkNOwc6FeIWuYmSMncrut8MEFxwdEI 3kCYPDaPDZpsejFLQhApDFiwKfEpwWhrpoyIrBK/nBHLWHq69pToXKKNaqeVeAUWQEn3 FXvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:cc:to; bh=ikDUwEUmpJmCVSQIH9hBnefZwbhJyu+GJCyEu9NTaJs=; b=w/yAvuWe9juSq+yeFtcuMf38gEuhYJtdMONRh/n3PspfNhB1tyEisbWbVQaAK1hwOf B9kplZi4iVCH07jnTN8h5YAFplkLqCudhGbx1y88hetIwvoIrLwbNdNBKDAq9TuDjISB iyiUn5z9yNKtpABmelY1J9w4LWj63UDzYQAl9VNci7DGTYcZLjXsulMn4NTKDJx5Zqfl kbPnTZ1+PUGk0wDCRzPlpciSGLkksKc32xmWw/4n8ACWSe6l8wRpFtSwon7clKvnB26G TZoUiYLZbJkmjzupE32I9IsqMk9OgmE3K+szwDhQQxt9z4Tv89umgfm5/UBX5Tt9RsKb Hvyg== X-Gm-Message-State: AOAM532b9bG9/CSbZp3TnQ0Vm52KZl9Nvz7fUoR2PXX3W0fVeist19Hd DgQLUIu/luyrX8hMu/OuRUk= X-Google-Smtp-Source: ABdhPJzATWJVdg4WASH91t96Id7+KqonCTO5VKc/vOBeSF7Qz2PP1ZXbU8wRYFdRvMkX4mn0TIdRhA== X-Received: by 2002:a05:6a00:8d0:b0:44c:26e6:1c13 with SMTP id s16-20020a056a0008d000b0044c26e61c13mr2058175pfu.28.1634164930462; Wed, 13 Oct 2021 15:42:10 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id me18sm476240pjb.33.2021.10.13.15.42.09 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Oct 2021 15:42:09 -0700 (PDT) From: Nadav Amit Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: mm: unnecessary COW phenomenon Message-Id: Date: Wed, 13 Oct 2021 15:42:08 -0700 Cc: Linux-MM , LKML To: Andrea Arcangeli , Peter Xu X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Stat-Signature: i9nbgfgsamikuan3c5a6557pw5w18mbm Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="Hd/H3KFp"; spf=pass (imf30.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CB392E0016A1 X-HE-Tag: 1634164931-418388 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Andrea, Peter, others, I encountered many unnecessary COW operations on my development kernel (based on Linux 5.13), which I did not see a report about and I am not sure how to solve. An advice would be appreciated. Commit 09854ba94c6aa ("mm: do_wp_page() simplification=E2=80=9D) = prevents the reuse of a page on write-protect fault if page_count(page) !=3D 1. In that case, wp_page_reuse() is not used and instead the page is COW'd by = wp_page_copy (). wp_page_copy() is obviously much more expensive, not only because of = the copying, but also because it requires a TLB flush and potentially a TLB shootodwn. The scenario I encountered happens when I use userfaultfd, but = presumably it might happen regardless of userfaultfd (perhaps swap device with SWP_SYNCHRONOUS_IO). It involves two page faults: one that maps a new anonymous page as read-only and a second write-protect fault that = happens shortly after on the same page. In this case the page count is almost = always elevated and therefore a COW is needed. [ The specific scenario that I have as as follows: I map a page to the monitored process using UFFDIO_COPY (actually a variant I am working on) = as write-protected. Then, shortly after an write access to the page = triggers a page fault. The uffd monitor quickly resolves the page fault using UFFDIO_WRITEPROTECT. The kernel keeps the page write protected in the = page tables but marked logically as uffd-unprotected and the page table is retried. The retry triggers a COW. ] It turns out that the elevated page count is due to the caching of the = page in the local LRU cache (by lru_cache_add() which is called by lru_cache_add_inactive_or_unevictable() in the case userfaultfd). Since = the first fault happened shortly before the second write-protect fault, the = LRU cache was still not drained, so the page count was not decreased and a = COW is needed. Calling lru_add_drain() during this flow resolves the issue most of the = time. Obviously, it needs to be called on the core that allocated (i.e., = faulted in) the page initially to work. It is possible to do it conditionally = only if the page-count is greater than 1. My questions to you (if I may) are: 1. Am I missing something? 2. Should it happen in other cases, specifically SWP_SYNCHRONOUS_IO? 3. Do you have a better solution?