From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90732C3DA7F for ; Tue, 6 Aug 2024 03:31:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E27026B0099; Mon, 5 Aug 2024 23:31:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD71A6B009A; Mon, 5 Aug 2024 23:31:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C771E6B009B; Mon, 5 Aug 2024 23:31:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A81046B0099 for ; Mon, 5 Aug 2024 23:31:36 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 20F2EC1E0D for ; Tue, 6 Aug 2024 03:31:36 +0000 (UTC) X-FDA: 82420395792.15.B2CC298 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf24.hostedemail.com (Postfix) with ESMTP id 8671F180002 for ; Tue, 6 Aug 2024 03:31:33 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TQdCZgID; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=none (imf24.hostedemail.com: domain of zhengqi.arch@bytedance.com has no SPF policy when checking 209.85.210.180) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722915048; a=rsa-sha256; cv=none; b=Efue7gPj/xwQ2y7BEm7/JiWvIfKCauOq0ZTSt9g4pbz8i2DvUT5PtdYMBYvLQ5+O9/2FV/ RhIvXPZie1h9Hq4lUuKciOr2F2UvV+J19VN2WaBSbbXOskwZHjOu4kmaTgEtqz3hCZUdIm I7juxmyXBaYL7NSq6UrThAlhWaR3gqg= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TQdCZgID; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=none (imf24.hostedemail.com: domain of zhengqi.arch@bytedance.com has no SPF policy when checking 209.85.210.180) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722915048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z2JoVlyDoUX2ysalhvEPxZpeuSvjk1cVm0vV/c3XfoY=; b=fhp2qK5hCXecKl584XP2FpevzLEIiDYnPE8SMNp0hX01dpig/3/SxkRn/RN40svDHItK8K CCDewdY2kNielMXXAnWAcnq7pA5BcL13FjXjYUlU15JmsHDaVFybyv4wkOMlkpMp+Q6qV5 e0pXSeNrbNIYiWmhA21Mm+x64yQ5evU= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-70e9545d8b2so68312b3a.3 for ; Mon, 05 Aug 2024 20:31:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722915092; x=1723519892; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=z2JoVlyDoUX2ysalhvEPxZpeuSvjk1cVm0vV/c3XfoY=; b=TQdCZgIDrKk9qPXVQJmWPV3qjhB8ttojwzz4YQK0LUzg8cEt1iKo6Nuj3nwL5scw4O M06IoVrn7AAbDE/l4MZu1wqj2XWmIAf5eCGJbnRoHPPwkqqa4zc44o3KhbxOs/Zl9Pj7 viGcpDEs5jNkan1jWgrtqjEqHpwDkbSDKv29Yvuk+H1W6/qi8MjWnJ5grQwXKnNaRb7E mDT3NiOPm00ApBzFXLnbVX1LYf6A9kR40s3d6lLPccZB/AVQ7zOtJtZy0AeQ97d0TscC g7Mpt0Oxh2vMyueLUF5RqfwlftXjtxkhG84onGCMiRoegE3ah6tR4AhYxaukOIOiJ3eq rDdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722915092; x=1723519892; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z2JoVlyDoUX2ysalhvEPxZpeuSvjk1cVm0vV/c3XfoY=; b=OZQUc+96ZtjgSyv/9ThYAF9Yky2buvSdDuIgCfLakRj7IiVmtOf/MU4GGLeCRNs7Lu gF4UAtt2HvuY7wjzmKsaFZ+oOpK0OkR1neWohb2YHx5sO+ZoVNkW9q4o7xZW7BKwUhQY fdNM+cEZbbcYsld/qUG3LAIS55wcGVjLgqHcgjgZd6yKjFSPpu0/0y5Ms0yDNlj8xEfG KZ8xsnjliGUuci2WKPH2caVGf1zUq9hnZOOG4ax0k+FlzV8Mf8iOprEK7XVSYU4IslPI rgzyxQJ6+XBHEXMs0G9gSITLhI45/6bfiH9iqo1emYzhtms9tNGVkoCs8UU7BrdoYxYl /0Ug== X-Gm-Message-State: AOJu0YzYvaG6b5P7ITt3XlH17Jco0sF2WqMjjXAr7cUd+WRtKDCf+Gpm 368rz+s6TuN8ij8C15L6RJSnCmnH7IiDs9jSNVw8p9DNj/A7YMeSdPEevleKUQE= X-Google-Smtp-Source: AGHT+IHjB6AScazUkHn0BP4teSXOlloTlQqOI/0QMlJZDjNnp/apIsyvye9E1r70/giw8p0Wt63YyA== X-Received: by 2002:a05:6a00:3993:b0:70d:1048:d4eb with SMTP id d2e1a72fcca58-7106d07ea82mr10721161b3a.3.1722915091824; Mon, 05 Aug 2024 20:31:31 -0700 (PDT) Received: from [10.255.168.175] ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ed0d462sm6330616b3a.174.2024.08.05.20.31.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Aug 2024 20:31:31 -0700 (PDT) Message-ID: Date: Tue, 6 Aug 2024 11:31:24 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Content-Language: en-US To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, the arch/x86 maintainers References: From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8671F180002 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: m1dqrg5xgpcss6s79xnuq1bke9o9rrnk X-HE-Tag: 1722915093-137619 X-HE-Meta: U2FsdGVkX19Fi9IAe4YWf6EHCLtSomoCk7D83NV8Nl8TYDk1pRBt4mI7yyquyhrpAJIVVN/94kWpXcDykMYlTWucgbFt/WOw2hPyNpX71gMl4LZltNEgNZgcIfVxioDosNUnVLJb/BWFWN2MU+FdOv+0i4nvPNAD+bxa2Xmb3h1ZmTkNHB4AgCKo4dOmZ5IH4uPgZEjfr91/vTwMMUgILBdyAoGA4dh3QBj3k8Yoo3ic9G4oH09ceJPiWm5LR1XjUgmxJOwNHSR4VfEEqFK0hGi7VAj/p2/7P4Nt1Bs+N4B6uo7knAqClHZWV8WNBPhw1q36qxMKKcXeq3ciKzwG33JgyhOfL9XcsOwELS0rL7wVq3TfpNzzvsoFX2BF09wP4YqyXs9pxLRQmjLio7DeTcnv6yI3GU9egdCot/2iVhRY5fkmj51VF86rvvao4mWbZWnPO8UmQD/qkJU4gyzrRnWUXI575525kVP1c8asE+rv3QBGcetmt2RUdngpkar61iLWQrFtYEis5asaml/DRhHulMCUw8Sy2gMiHN+YoaYxWJDM7NqcC5OMubCJ4yLudA60LSIb7ToNjBMebHUtzsCpKKSY3AMkpko4x80l1HeA2sQ3pCfMGgKePwIW3R5y9aIxM5HmTjNcD1UFcuuXgdv7wB9G9LsqqMbN0PtDxYfnOD5SQBLqZnC0dw18XVtsmlpM5ku+UFXkUicRlFXvSjmFuyfpY+h/gRg4Bc0TfUSXr9IB3SGM8TKtnfB2onyS8i3nAWcmHG+yz+2GxK0xnng/ghwOBSFxzxELfVtgJZHKKAnotmRQhbiR/tAsAEaPblV3oAEm8KvS3ySJhxcNEZhpV8QTNWn8Z5xCp6sUETrkx0PBG6TQ0UWG5MacVGBGlukN6hKwjJor4mkRZHW9iFsLK9P2b+33GcMLjU6FePu5nosLoAF4ePgIIsvTFj1P92sbFGvCxMTQBTbD/Bc DegKQ54w u/K7bLcI8kI3kT+o9zA0RZcbutlmLhigm7Zxs1Cho6zF3BIVmgv74jo0mRt/mnKYxVSYSrWY/VFlwtV1Pngv1YrgK0rEWyqTv4z29Sjx2jGwnbK9QbtcVD7hsTW/PMSZM3O5rdGewoLq9kD4YldvKQqjTYAE0skmGkMZe3PDLsTA36FliPFEIT1u/F3cGLj0jb0MBn6sUtFcWOQSPSaucUTsAw49quBYzkXZxxCVH/tBDN4sDQvDAFhS7nGqymriILkVtlNXjUBvm0ahuINDIxdo2Agy2bpcusCuGOVmHhWCEhPU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, On 2024/8/5 20:55, Qi Zheng wrote: [...] > > 2. When we use mmu_gather to batch flush tlb and free PTE pages, the TLB is not > flushed before pmd lock is unlocked. This may result in the following two > situations: > > 1) Userland can trigger page fault and fill a huge page, which will cause > the existence of small size TLB and huge TLB for the same address. > > 2) Userland can also trigger page fault and fill a PTE page, which will > cause the existence of two small size TLBs, but the PTE page they map > are different. > > For case 1), according to Intel's TLB Application note (317080), some CPUs of > x86 do not allow it: > > ``` > If software modifies the paging structures so that the page size used for a > 4-KByte range of linear addresses changes, the TLBs may subsequently contain > both ordinary and large-page translations for the address range.12 A reference > to a linear address in the address range may use either translation. Which of > the two translations is used may vary from one execution to another and the > choice may be implementation-specific. > > Software wishing to prevent this uncertainty should not write to a paging- > structure entry in a way that would change, for any linear address, both the > page size and either the page frame or attributes. It can instead use the > following algorithm: first mark the relevant paging-structure entry (e.g., > PDE) not present; then invalidate any translations for the affected linear > addresses (see Section 5.2); and then modify the relevant paging-structure > entry to mark it present and establish translation(s) for the new page size. > ``` > > We can also learn more information from the comments above pmdp_invalidate() > in __split_huge_pmd_locked(). > > For case 2), we can see from the comments above ptep_clear_flush() in > wp_page_copy() that this situation is also not allowed. Even without > this patch series, madvise(MADV_DONTNEED) can also cause this situation: > > CPU 0 CPU 1 > > madvise (MADV_DONTNEED) > --> clear pte entry > pte_unmap_unlock > touch and tlb miss > --> set pte entry > mmu_gather flush tlb > > But strangely, I didn't see any relevant fix code, maybe I missed something, > or is this guaranteed by userland? I'm still quite confused about this, is there anyone who is familiar with this part? Thanks, Qi > > Anyway, this series defines the following two functions to be implemented by > the architecture. If the architecture does not allow the above two situations, > then define these two functions to flush the tlb before set_pmd_at(). > > - arch_flush_tlb_before_set_huge_page > - arch_flush_tlb_before_set_pte_page > [...] >