From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 777E4C6FD18 for ; Wed, 19 Apr 2023 10:51:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE1838E0002; Wed, 19 Apr 2023 06:51:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6AE08E0001; Wed, 19 Apr 2023 06:51:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E4548E0002; Wed, 19 Apr 2023 06:51:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7B5528E0001 for ; Wed, 19 Apr 2023 06:51:11 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2A2A3401F4 for ; Wed, 19 Apr 2023 10:51:11 +0000 (UTC) X-FDA: 80697823542.24.449C98D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id A5E101C0006 for ; Wed, 19 Apr 2023 10:51:08 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y+gOHVb+; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681901468; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YmBFKtAGYGfH3OTPtCSCFM9hugGfKlNNlNR/pJz1MMA=; b=SSOVcLecRfNgt1KaahQ5z7o4gR+fk3AbXB4mKio9OeKUFevF5DX3vFX8EEtg7d5p0+0sQ5 vQjZv7sG9ajEPfDuLCXrl+T8mAj4BwO96LBsNPE3DZjOMufZroS8rXLN1gI0D575EK+VUY bV+73BQ8XAdWwK6UHVE1unMBL1uy9vA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y+gOHVb+; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681901468; a=rsa-sha256; cv=none; b=skSvlD0w7V/2bssbLQsNU1oDPs4kbKP4dJDaZs/Qix1zKVf5+Q1AZPW8CYp0DxRWHsvFUA Z0JtyEM/auyAgXkf7GreVtC0bHfoBdVgDX/OTO2nhlwpDc1tBXsj1DMZkidWz8vRJ6VqS0 1joy+E0KcQDnMXdYszrzAyN5X7idKLY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681901467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YmBFKtAGYGfH3OTPtCSCFM9hugGfKlNNlNR/pJz1MMA=; b=Y+gOHVb+Osvdz/gqzM0FPjHwlIoAZQuWC6TbZLmwAzM3DTOuF6HUQb2EA0W36bT22Q4hcW s64J1KadvBu+aQ0FqStdEKrHbLYrMwfwjMsJt3I8aJXf66gB7Z5HxQQiiERRRYfgh+dU9q O5mH+Bkgw/OJbnv10SiFTNwh+qhEKZk= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460-ftQ_qBEKO62CMrDOfYZbPQ-1; Wed, 19 Apr 2023 06:51:05 -0400 X-MC-Unique: ftQ_qBEKO62CMrDOfYZbPQ-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f17b8d24bbso6138555e9.2 for ; Wed, 19 Apr 2023 03:51:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681901464; x=1684493464; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YmBFKtAGYGfH3OTPtCSCFM9hugGfKlNNlNR/pJz1MMA=; b=YxTJos9Gkl69swThWe81OfjXQOIgPlrCEQhvLSkmGmOKBED1pGVh8rMU3WSX5a3oVj wgMV3UJQ+rapu1lgsmrQh5jsB6pSFWgUjK5bFxJBV8cM7aWzK9gKaxqexfhhbVQanA56 +a+X7e9wVtk26cgXQmKgVGGnJDDalTN/eUPxJ2dTLK07JXaIT0OwZHRS+F+iaBmIwz+0 +Jp8sw5Qxe3rkZP7TpXl6EGancyYx8yonEEfJkaUAr31hVyXgvMp1b5bFeltfr5njhmw s3+vbz/lel2scpWtu6ZWUqypFPpI3KltrGF39f36wMAUK7zGaWQU5w+MeTQbW9dE0lmF l+/w== X-Gm-Message-State: AAQBX9e7NP6y5oCi6oVdIvibGvsWeOPZV2TTMA7UWeMIw65FaWNYqOEZ 6z9JzE2d/qY3HZWNcc0WlavldV6Hn/rayt6kU7Fwd6XZ41HsJbV1GiAkRWZGXnbT8QgLksH0j1+ RRDxNnP1+kCY= X-Received: by 2002:a5d:51d2:0:b0:2f5:20b:e944 with SMTP id n18-20020a5d51d2000000b002f5020be944mr4444514wrv.29.1681901464669; Wed, 19 Apr 2023 03:51:04 -0700 (PDT) X-Google-Smtp-Source: AKy350aihYNCP6dqLSpraDp28t+sXkaYrP3t+YDKucC4EYx/FV/jx64CFOHoFDOrpZYM1SD3xGH6qg== X-Received: by 2002:a5d:51d2:0:b0:2f5:20b:e944 with SMTP id n18-20020a5d51d2000000b002f5020be944mr4444498wrv.29.1681901464245; Wed, 19 Apr 2023 03:51:04 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:7b00:7c52:a5fa:8004:96fd? (p200300cbc70b7b007c52a5fa800496fd.dip0.t-ipconnect.de. [2003:cb:c70b:7b00:7c52:a5fa:8004:96fd]) by smtp.gmail.com with ESMTPSA id l5-20020a1c7905000000b003f180d5b145sm1489257wme.40.2023.04.19.03.51.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Apr 2023 03:51:03 -0700 (PDT) Message-ID: <87ad8d4b-b117-0c7a-3d0b-723ad59a0405@redhat.com> Date: Wed, 19 Apr 2023 12:51:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 To: Ryan Roberts , Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org References: <20230414130303.2345383-1-ryan.roberts@arm.com> <13969045-4e47-ae5d-73f4-dad40fe631be@arm.com> <568b5b73-f0e9-c385-f628-93e45825fb7b@redhat.com> <5b6fe242-a19e-70bf-adba-240f2d5b8548@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory In-Reply-To: <5b6fe242-a19e-70bf-adba-240f2d5b8548@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: A5E101C0006 X-Stat-Signature: 9z6k3d8ubxcysomcudhmopawuamms9uy X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681901468-510189 X-HE-Meta: U2FsdGVkX19GsOJlHyaeXer2jkpX/bSLFelmZnkCF/n6137wU9bs8P0EVVOaKFsvy+A7aaHM1EE5BvVDAzSReS/vijUdKPpH7X/nxjXSgvXx+IcszXfFEboxp8ybmkT2pRf5KY8nnhlzoNNhsGrx7jUaTbHMh2KPr1+iTZunJdQ57QLRAmuaJ+jdMlUBBMEGc5754RXwBq2o8MFfF11g1Lg7WsCqRStkZhDL43SZWYSiBjyPsFcKTIJeGJz1v9lXylGaalGsqsA0riMRVkRi2zPfI3hGPpgFyURPbbNBwCtV97N1ve2xWd60SN7J+/CbiDvpQ+YC6KIqX+J1WblExOA/N8JPmz8yi6xmFPSTLl/BEfwz+m4x656goxjU7VEUhBH3pfgG886cpPmkx8yQMemRlSaVt/9YL+ch4a6TVfAWCxm/ecY14nlhg1bVKfFAQUMsQ/2R5icZu45jvYN3tTVmkyTI5O48hssPFPT2TMRFmhvAQ+oyC9ptvDw4//uPv1ZZ+Vup6L6Y7RrpkJe5T+MTiKLMpMtEQpL9N0WlqQJjxYI0mOReQKYlpJmlLUn/jr+s+kFj3B8rChdolCZBrl1Ob2kLSRSRsJUFjVP+bXgbYIucPAzLH0n3vVrsS4Vu/GTojA40Shc+wy5CYXcWm/5D75gNegsPQdLnBgRg2Oep+p2DiLIe//p6A9CFyoJULRQFqedxggBzRz/G6kT1Jf7RiIyGlur7FUd6se+qbAwcmyJyH91r8OdBvzREuoYixbKcDj9k9o0JCbu1ph3KOe+eN5mbWKLZL7KGyZg3psu4ObZpJb/GJ8/IKx84153SqxR31Ctd56wA6OauHxGOWMcPXtT6CNTNqvybFmeOV7mcQl80hHOG/pYgR1EJVb3bQbacZeukOiEr5hi04aFYEafpYqGctx06mGFm73fJBXNZpALhI4s6N/vZJ5eEXzGWTV6TbkgS+CfeoASY25b 6bhJ6LPG 8169KNwHj2SAzjJ7NkkQRDN37eYXja2ZKKTToIh1OsuWtB/W1hm02vbFLIqOpqhsXOZJiLGxOpEFoVebdnqKiEHZ7ebo0oN0IaN66DLOD6zoVt/KH8XAnVTwHARxnNYCXZWPi8p4bX9lBqGMrPSvqfMOFA3TVBCzGAr87wgKyi4WOuB2IDVSfEFqBFfHtQ91VNzOpuDxM02e4eIq7fQ8mzYE/muAt9gBOcaKMVm6DIkKCY+LFBSdIOyTpVLHlE90ObCE120nP06IlH6rZRn+eWT1XMaslbVO/SKi5O1XqSwqBf/5cuPw5im3epiSAqvSeZDeyUpA0KD++woDcwPg3jjm1GbfaY3vahplvfR9a8A4bLabNxbn5nUyg7Wcc0ObQBhre81WDgn0BIBfYVgsvg7sLBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > I'm looking to fix this problem in my code, but am struggling to see how the > current code is safe. I'm thinking about the following scenario: > Let's see :) > - A page is CoW mapped into processes A and B. > - The page takes a fault in process A, and do_wp_page() determines that it is > "maybe-shared" and therefore must copy. So drops the PTL and calls > wp_page_copy(). Note that before calling wp_page_copy(), we do a folio_get(folio); Further, the page table reference is only dropped once we actually replace the page in the page table. So while in wp_page_copy(), the folio should have at least 2 references if the page is still mapped. > - Process B exits. > - Another thread in process A faults on the page. This time dw_wp_page() > determines that the page is exclusive (due to the ref count), and reuses it, > marking it exclusive along the way. The refcount should not be 1 (other reference from the wp_page_copy() caller), so A won't be able to reuse it, and ... > - wp_page_copy() from the original thread in process A retakes the PTL and > copies the _now exclusive_ page. > > Having typed it up, I guess this can't happen, because wp_page_copy() will only > do the copy if the PTE hasn't changed and it will have changed because it is now > writable? So this is safe? this applies as well. If the pte changed (when reusing due to a write failt it's now writable, or someone else broke COW), we back off. For FAULT_FLAG_UNSHARE, however, the PTE may not change. But the additional reference should make it work. I think it works as intended. It would be clearer if we'd also recheck in wp_page_copy() whether we still don't have an exclusive anon page under PT lock -- and if we would, back off. > > To make things more convoluted, what happens if the second thread does an > mprotect() to make the page RO after its write fault was handled? I think > mprotect() will serialize on the mmap write lock so this is safe too? Yes, mprotect() synchronizes that. There are other mechanisms to write-protect a page, though, under mmap lock in read mode (uffd-wp). So it's a valid concern. In all of these cases, reuse should be prevented due to the additional reference on the folio when entering wp_page_copy() right from the start, not turning the page exclusive but instead replacing it by a copy. An additional sanity check sounds like the right thing to do. > > Sorry if this is a bit rambly, just trying to make sure I've understood > everything correctly. It's a very interesting corner case, thanks for bringing that up. I think the old mapcount based approach could have suffered from this theoretical issue, but I might be wrong. -- Thanks, David / dhildenb