From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8641FC52D7C for ; Fri, 16 Aug 2024 02:55:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BA008D0034; Thu, 15 Aug 2024 22:55:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 068538D002B; Thu, 15 Aug 2024 22:55:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E71AD8D0034; Thu, 15 Aug 2024 22:55:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA3348D002B for ; Thu, 15 Aug 2024 22:55:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3CF40405AF for ; Fri, 16 Aug 2024 02:55:34 +0000 (UTC) X-FDA: 82456592988.26.53A3F7E Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 7746BC000A for ; Fri, 16 Aug 2024 02:55:30 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZN0sv91Z; spf=pass (imf28.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723776859; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7A3MM3HTnGOWLPIhXEINwFz0r0R05nhRmbTyOZGrrRY=; b=dyoxmioE8027IKDV1FfuESQSB5fYSdt8DYcYIsjett5VIj/EO8QuicxA6JP4iIKnqQfG5d X6LUGS5JPq59UlLfFxnrq0NfjrBBCVUPxA38R2Zttw8GSWvqCtunqdgpZvCLxKQU6f1uZq AM5vaITWUSVXdk5/a/wXmqn0mZpE+wA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723776859; a=rsa-sha256; cv=none; b=NPCOXYQQTu38N5uvaQBE6vQhLnyKI/n+ia6Q56qQ3yV6ibRRb/yprrBYwhtZ8ku2WAryUf vb2CaxOBJZugodWgTwnNEmHj7B+RtabyAW5jWYrAflAtQND594Hi6teSRft3Lzl9oSUHHC DMXmWvavhikbOEVWL8SGY0P6QNUV6jI= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZN0sv91Z; spf=pass (imf28.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-7b5aacae4f0so212148a12.2 for ; Thu, 15 Aug 2024 19:55:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1723776929; x=1724381729; darn=kvack.org; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=7A3MM3HTnGOWLPIhXEINwFz0r0R05nhRmbTyOZGrrRY=; b=ZN0sv91Z1bFibcWfu5XiSeeWSzJ6g9HLBKf4Vq7mxoFOp9XSliSbeoWmXCYYrwJE7q bv/y/dcaN81sxJX+ZD6MgwDaukudRHmACieXPj50x1o9hyvZzcJnBFQptfJpCckCRZcU yt1+GVost/0MGFlKOMy0FedXSjOBA92i6odxI+GJ4j3mo8KVIQuX9FcUPLXgHL2u8Bnj Kp0dAAAKg1b+mZadxzMLR+MdBgLybsj1IvrhVx4EEU+j8umGcpaTr+wq3t+TFaVRHaWG i5AJoewzcc+9n2GAGQtG/Wh1pM8tBkBysdcSHRtrKXMzVn1lNaVm8yDsiR4L9GrDENm5 JUPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723776929; x=1724381729; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7A3MM3HTnGOWLPIhXEINwFz0r0R05nhRmbTyOZGrrRY=; b=egnmg7eS3ERX1U7Ir1enAQzP2oyjsYI4tDnHDF3vgSCHSllPPQIRTgQfcpTBV5PVhH tXccTztSB2w47UNBZkxixXblT5BoGtBRYLVxVnoC5O0QGjfDPJCdjnOBZbY+PZyilpsc NYqLi66NcA5H3xpL1e4Gjzz1d2BQcuOZDUgaTDHv3Wa1zXhv7pQuKYBVXgXuD1Pu/lDC y6RBr63fPaD+yFJ8Mrg2SbWiSKuheV7d8uengm+OlVPk++fN4MoXqTraof/b6xbweCL5 iJOyahflPTk+sXxZHLt0h+Dn1csfRdMwTOprQwi6mfbnWtPg7A7HfcuwYUv0bjBf6rjv nzXw== X-Gm-Message-State: AOJu0Ywf9CP8PMB3mWp36Riq2gvYrTx4LpHjMaXI7mQB9ONBZvIIBSZI GsyHG5ORhMmkxvf+BNwXqc6+K45d26ZIpwQFM4xseNN2k9mo6HljQTBYnvoxqlw= X-Google-Smtp-Source: AGHT+IFEITycCX5z1l7KTvGHqYMR8Dl0wjrlGTV3lyxtgN4YOp2kSGgcUTdRZBJvq+p6o0b7bLQ4lA== X-Received: by 2002:a17:90b:4f86:b0:2d3:b9bb:556a with SMTP id 98e67ed59e1d1-2d3dffc0f18mr1041632a91.2.1723776928757; Thu, 15 Aug 2024 19:55:28 -0700 (PDT) Received: from [10.4.217.215] ([139.177.225.242]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d3e2f61fd9sm608109a91.16.2024.08.15.19.55.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Aug 2024 19:55:28 -0700 (PDT) Message-ID: <6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com> Date: Fri, 16 Aug 2024 10:55:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Content-Language: en-US From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, mgorman@suse.de, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, the arch/x86 maintainers References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 7746BC000A X-Stat-Signature: x8x16aoakfedxc7xjpdydib8cod5ayzr X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1723776930-495789 X-HE-Meta: U2FsdGVkX1/VgE8wq96tiUAVuOey7VmQD4jNjlyC+4QLxGTFBtf9hIVxpf1zTvKbdcbMTSHgPfK/1gR6BssriQYeECIvwMl72uZHxUhU+gkSL7HrgXbVVotAwOxDF8dfN80XlhPiktnl+UF7H+KFaMeEIK+fPSkxIVqb1DcyGRqOLPnwV4qTeYu4epX4I7DMlNdYp3jZPhEXCFB66s5PcgSBJ12dlH2pXf/0ntn++DVhG4M1voRgyTG6rKeMfGyJlsK+PJ5xYq9hDYuDtU8AFtJvtssprj2FFm4wtT2neHiCFCBEcCndMSWfuwVgs04Mdt0Z+y3R2qV2HjQoATREj6l1jyaOLA0UCRwq/QbhAiLV+IyOQC36ko3TOO+bjcNBdzfkMgdQpeTvtVTAAdzAgW/gv8dpcPI/s9hQpFCCntWqr5yYUSCGbv1P2WganhHFVpBMMtlekU8BENmwDzgFZz7qRXteLPZWWrSqiWaSupFCk7r249Bch+M5qz7nb/qAod6zzUadx3Cc2VfPcm1pMDpW8QQOM+O7F+nRPfyxQnll8Yf53ljvDzI3rhwbId311mQkWFPQd9RHM/TGPsy3HqNS1fNkgi46SKfpJ4AyXACMKpIxN1m0eeT0R2hEMZHwa21k2G07jofIqYAatx0mEaRMOmr6N1JeGJSdzFGAsEKQxPsCZ7GBe48Xg7wFyydluJquWColBU1WEwemC6DRGv9rd5IoJdEofhoU79/cJuYGHCkpBE224lgZ92l9zTAAnIoZYQLETF4DHGLKrWDx0Csprrtf9+CxJfGJM60LQF9mqNbGBvShfyST9/0Cdi5Auulps7NZnzJRCRSqQmD/ZXB63igc1hDCrZJYpgYZWevv7jqCJU0VqD86/PfpBIQWLL3b39+O79kOvreX27rcKLwpi9vBPvb5GesYLey/5h+MsRg00E7ezh17/RHa6EDmoUG9bkVtwl0n0Lj86Rw HKsj38BT Xjyj33q2BMq81SyreusHe3h35WxnE/kkcr1OwU/RNC1UsnfTJ0CGD10wYKFDd4fFtGSHlQaWwbjCSbo4Tt5Lz7VPCRCFOT8gi+TcOZBJe9fPHm3O0OtAGqJ56qDhzVpqbj7Qp8CLT60bmB2ZqsasCJmq91hktKHRp+l2599P3Phbjl7Gtxx9PKT8Ub5TiXqz42D15tIBNiKvwo7nCepuzSA7lVziyT0no2kiXqHDycPRGMt9vA3dfIWOTVkSvQm4N8tXlZErOgWA5No94i/DG4/iVtaSf3Zgk4VJflTMW4sCbV72KfwoxG9mHN1qjSERoxhzq9tjNrppS6J6ysDHm+UR3rt0bnz8n8WL88mQGZY8sDKvqyJQn5joW3qPRdEFf8QQJ7fWUEn+xHY3O81VuhHqmrnF1cO8OuSwst79//k92SRv6eJ9KDAbzHTDrSSDAMJCfGUyv63zxlLY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/6 11:31, Qi Zheng wrote: > Hi all, > > On 2024/8/5 20:55, Qi Zheng wrote: > > [...] > >> >> 2. When we use mmu_gather to batch flush tlb and free PTE pages, the >> TLB is not >>     flushed before pmd lock is unlocked. This may result in the >> following two >>     situations: >> >>     1) Userland can trigger page fault and fill a huge page, which >> will cause >>        the existence of small size TLB and huge TLB for the same address. >> >>     2) Userland can also trigger page fault and fill a PTE page, which >> will >>        cause the existence of two small size TLBs, but the PTE page >> they map >>        are different. >> >>     For case 1), according to Intel's TLB Application note (317080), >> some CPUs of >>     x86 do not allow it: >> >>     ``` >>     If software modifies the paging structures so that the page size >> used for a >>     4-KByte range of linear addresses changes, the TLBs may >> subsequently contain >>     both ordinary and large-page translations for the address range.12 >> A reference >>     to a linear address in the address range may use either >> translation. Which of >>     the two translations is used may vary from one execution to >> another and the >>     choice may be implementation-specific. >> >>     Software wishing to prevent this uncertainty should not write to a >> paging- >>     structure entry in a way that would change, for any linear >> address, both the >>     page size and either the page frame or attributes. It can instead >> use the >>     following algorithm: first mark the relevant paging-structure >> entry (e.g., >>     PDE) not present; then invalidate any translations for the >> affected linear >>     addresses (see Section 5.2); and then modify the relevant >> paging-structure >>     entry to mark it present and establish translation(s) for the new >> page size. >>     ``` >> >>     We can also learn more information from the comments above >> pmdp_invalidate() >>     in __split_huge_pmd_locked(). >> >>     For case 2), we can see from the comments above ptep_clear_flush() in >>     wp_page_copy() that this situation is also not allowed. Even without >>     this patch series, madvise(MADV_DONTNEED) can also cause this >> situation: >> >>             CPU 0                         CPU 1 >> >>     madvise (MADV_DONTNEED) >>     -->  clear pte entry >>          pte_unmap_unlock >>                                        touch and tlb miss >>                       --> set pte entry >>          mmu_gather flush tlb >> >>     But strangely, I didn't see any relevant fix code, maybe I missed >> something, >>     or is this guaranteed by userland? > > I'm still quite confused about this, is there anyone who is familiar > with this part? This is not a new issue introduced by this patch series, and I have sent a separate RFC patch [1] to track this issue. I will remove this part of the handling in the next version. [1]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/ > > Thanks, > Qi > >> >>     Anyway, this series defines the following two functions to be >> implemented by >>     the architecture. If the architecture does not allow the above two >> situations, >>     then define these two functions to flush the tlb before set_pmd_at(). >> >>     - arch_flush_tlb_before_set_huge_page >>     - arch_flush_tlb_before_set_pte_page >> > > [...] > >>