From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72344EB64D9 for ; Fri, 7 Jul 2023 18:54:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9B996B0072; Fri, 7 Jul 2023 14:54:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4B198D0002; Fri, 7 Jul 2023 14:54:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B118C8D0001; Fri, 7 Jul 2023 14:54:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A0CFE6B0072 for ; Fri, 7 Jul 2023 14:54:41 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 42F1E40637 for ; Fri, 7 Jul 2023 18:54:41 +0000 (UTC) X-FDA: 80985717162.30.39601E1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 9B9D2180011 for ; Fri, 7 Jul 2023 18:54:38 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LJMYaVLF; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688756078; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LGOB9cU43U5U0NEIYKDtQCVASK0lWRPDWnnUNNhV+lk=; b=NRNOuTLSZJrVVwqW841wTNdZ1LAlonWLX8w/f7tpHWEJ0J9PUZj6bbxBQ/Zb61LHcNIKj0 qVLZ7l2KPEmSkQolMP69QgNjrVnUSGIR1bFoZ2pJ7ulDaGxlmIV1kJAfdJMdOFai3fQrcS yr/XGfvgxZrZtUq6dQWl7cPFK0GdXkM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LJMYaVLF; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688756078; a=rsa-sha256; cv=none; b=MWzxosfJ9BB8UydwVbhRoKMenGfqHLgNpoixjhCmzXsS+5APnjNbKpByiMcKdUGdjUdJXf xJ3PoHW0NMAwF1HpOAOeTeCnxVNsAy78EPl35sbIF1mOU4meokFNYEjGlPjHkECiU0VVIM CRB2Ot+hhE+lLzz5U6cNxIgvdyjbT54= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688756077; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LGOB9cU43U5U0NEIYKDtQCVASK0lWRPDWnnUNNhV+lk=; b=LJMYaVLFgAYFKq7u4/e3XWmJ8/EZiNYloUEGeKLyP/kQ8kOFt312BaskzIq7oNn2JpLOtH vqt5VSfTZI63YaQB6HZ7Kitrc1l24gQVOzljHU5UEqPDFh0zpXiOOmrh6neVxvOOZd+Re7 mEP6lLrvMi9QxO9fGNwXIOTEPG4ZiKE= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-53-Nos0Aa64P7adPPmtC1TYww-1; Fri, 07 Jul 2023 14:54:36 -0400 X-MC-Unique: Nos0Aa64P7adPPmtC1TYww-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-31432b25c2fso1389411f8f.1 for ; Fri, 07 Jul 2023 11:54:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688756075; x=1691348075; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LGOB9cU43U5U0NEIYKDtQCVASK0lWRPDWnnUNNhV+lk=; b=TMfUKZAEO6F15d7Q89muEbSzFJ46y+M2tO1Vzgq/Skey3ErQOMBXR3kJIuPxhnDqRY dOPRrxqpVInR2nKyuhoIlWloHYFYg0NmbRIKuvbPYB5PKS9tjH+lBZCUxbj4dlszwNiO uJHEw5fjxecQgrl/0wSED1cgtd65CmSy9n3THurmmLpq9HOs/2eTc6oOVdiB7j3D4Qpq vdO86PtF+CHkqCt5wkVlGAG7lsXfPZmeqLo0xxAgQpM8LqVQLfHF2iDC5I1WO1D327sU yK93pbrTZo/L5C5K/AWpQKJ+QjMchLa7lN7JtYGToaS7j8gU1f/FkU0GIbEZ6iGucTlp fCpQ== X-Gm-Message-State: ABy/qLaY0KG0TXsuEnJyYuYs/kROIKFh42KppWabSjMyPDKKQHoIYB2E 9I24wOcFbYTO0t+lP60wFFz4BiRzNcew0ZPd+uBv8vMPmV/LFPjxzKs7kVzd5GLqDCArVel+qE4 +/WuvOEBTnNE= X-Received: by 2002:a5d:4203:0:b0:314:22ea:4ee7 with SMTP id n3-20020a5d4203000000b0031422ea4ee7mr5932859wrq.33.1688756075443; Fri, 07 Jul 2023 11:54:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlEjiq+gIwd+f32qNK87VkS1fbq4nE+ck66mEyTwrSEMX4G7bNpQhLTysPX7gScSDPG8UVR8cA== X-Received: by 2002:a5d:4203:0:b0:314:22ea:4ee7 with SMTP id n3-20020a5d4203000000b0031422ea4ee7mr5932844wrq.33.1688756075093; Fri, 07 Jul 2023 11:54:35 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id e18-20020adff352000000b003143add4396sm5155612wrp.22.2023.07.07.11.54.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 11:54:34 -0700 (PDT) Message-ID: <4bb39d6e-a324-0d85-7d44-8e8a37a1cfec@redhat.com> Date: Fri, 7 Jul 2023 20:54:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Matthew Wilcox , Yin Fengwei Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, yuzhao@google.com, ryan.roberts@arm.com, shy828301@gmail.com, akpm@linux-foundation.org References: <20230707165221.4076590-1-fengwei.yin@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/3] support large folio for mlock In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9B9D2180011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: hdo4h6q7t4ek8yxqg4fot6yqtqh7ce1t X-HE-Tag: 1688756078-43250 X-HE-Meta: U2FsdGVkX19XuKdHG1TeSccS1IbvicIkd+T/3cqnxwvEwR11nDSLdSI/z20vcOZ74Yswk1OHplJrjAgVvrtfPhKSZCSCuGcz5Ch19SiS609/Ux7TgsGApBTW073vsdUx9DmaWTJeE5VVtunqIcIle00nKXGLVO2Ak2qy5rLyMWp6aNN5Yh1/hU1VpCRBUwyE/XsefHPvMufRTqEgNU1m1w+ahK4b5VzbbfA2b2M8yP7olr7gqbEvR8KlgQB+eTx8Ivs1zBvOaYQNuDuBEdf9/jnKis3OQ/3Z8jG1QBuZU/G1CWaES2eRaukKymR6AsT8X9j0aEwnrDbsMEkYG+eUXarUACZVugVI/LC5DXS6DzpcM+MYJMMxbGHuQ75Qta/TIcraS8wYKE8YxF7rTCXHxPaKKJNSxfSIOn/P9HvjhTaQE5NeDo9kCdR0WYBRXz+phdoTkR1Rx7vkO5khxVqJZwQnsSh7iv6NdV3ab3/sQyiLZytgkwOgFG95jTqum9Jr3rL2p+eb6NyTkQvkdQqb52IjbHA2j2cw4800X6vK5kKBJX0E3Xx8t3xPOkn3DTgfzrHJ9JjKsf1RMibfMvHeaPFAh669KGfqrf52lmH4cu4hJhgn2z/Hx9rPpn1lnDVrfjyuh+QTChgD0zO47xVaIb6Wb+M9qkWA35COJKW43UPVvvh7BXOv6f9uBlTtmJOVcDed89iUULduJtbizGj9n8s7xzG74ra2eTG+SP0HqGH/XbxBiS9xZ1h9+X7dhVuVmLkSx6ROP7+4Ioz0D3qJ8otqlrp6MPUQmmTwvU73RF0izyyV5tkQYqo810rzzfAVMbVNS0QOtm2FZfp7QqdhsswbJGDtrB1OdGdd/mwXqffHJ6s1sC5cwtEkxpgipgex11i2RUujoeb5771zkBH9QP/acwd63sz5+0GPEybDoyT77dhwUW/D9cXA/FzkGF/IXpAGSwX1elYWUn6uL2U 3hAsRm7M YZQZGoLTg5qXFkiMyhDI9kbjgVtnFruNYW21gbC/AzhcIokjI3g6Gy6BHlv0RMc6Pgq7xgEQaHO3P8mLKgb8jpIcAridjVMkVxx8RwFSuknmUDSyPIDSdweJn6X55A3vofD+VgbEm34ejXW/c08QR7By2Asp7Qvkenk1cr9+PIoUl4Cnhh27xgjHhsf3FMDkDfWCoxvb4jW31emTAp1DPR0v+f+PwD+h05E83BNb2cnXzlWc8jlFvqYuo6iZjCA0n5MFwQN69qDBTX0CTxvu3I+L+dDzAVje0pHquDUXqgrbcHqQf0aWnRt4+Dnn5AxjEwGzvwb7bzYFbCtvTvsSlgg5Rcw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000079, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 07.07.23 19:26, Matthew Wilcox wrote: > On Sat, Jul 08, 2023 at 12:52:18AM +0800, Yin Fengwei wrote: >> This series identified the large folio for mlock to two types: >> - The large folio is in VM_LOCKED VMA range >> - The large folio cross VM_LOCKED VMA boundary > > This is somewhere that I think our fixation on MUST USE PMD ENTRIES > has led us astray. Today when the arguments to mlock() cross a folio > boundary, we split the PMD entry but leave the folio intact. That means > that we continue to manage the folio as a single entry on the LRU list. > But userspace may have no idea that we're doing this. It may have made > several calls to mmap() 256kB at once, they've all been coalesced into > a single VMA and khugepaged has come along behind its back and created > a 2MB THP. Now userspace calls mlock() and instead of treating that as > a hint that oops, maybe we shouldn't've done that, we do our utmost to > preserve the 2MB folio. > > I think this whole approach needs rethinking. IMO, anonymous folios > should not cross VMA boundaries. Tell me why I'm wrong. I think we touched upon that a couple of times already, and the main issue is that while it sounds nice in theory, it's impossible in practice. THP are supposed to be transparent, that is, we should not let arbitrary operations fail. But nothing stops user space from (a) mmap'ing a 2 MiB region (b) GUP-pinning the whole range (c) GUP-pinning the first half (d) unpinning the whole range from (a) (e) munmap'ing the second half And that's just one out of many examples I can think of, not even considering temporary/speculative references that can prevent a split at random points in time -- especially when splitting a VMA. Sure, any time we PTE-map a THP we might just say "let's put that on the deferred split queue" and cross fingers that we can eventually split it later. (I was recently thinking about that in the context of the mapcount ...) It's all a big mess ... -- Cheers, David / dhildenb