From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D680E7716D for ; Thu, 5 Dec 2024 03:23:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18F3C6B007B; Wed, 4 Dec 2024 22:23:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 13E3B6B0085; Wed, 4 Dec 2024 22:23:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0051A6B0088; Wed, 4 Dec 2024 22:23:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D6FA16B007B for ; Wed, 4 Dec 2024 22:23:15 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8027CC1328 for ; Thu, 5 Dec 2024 03:23:15 +0000 (UTC) X-FDA: 82859459424.29.2B1C064 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf14.hostedemail.com (Postfix) with ESMTP id 23FA5100003 for ; Thu, 5 Dec 2024 03:22:55 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=OhhAko7N; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733368983; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZHrI50pChK08//QUGItvS7g1V676Q3cWe2B2yU1N3lQ=; b=3WYeNbgH/ZWlR+BK+GbuY/GCMoBtQhfTLahky/2m29OR0K1QApF9vSWQo6zR8PE1d8ElEA XPvJhT8rRbHHalUr1QyHSXJz4rwT+q+0yYjgJBNs3BvhibuYFAx9u6hrJwlwBLnRdd9dB0 /u7OfYtKak76EnLfFtOwcWm0lenvpX4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733368983; a=rsa-sha256; cv=none; b=2aoYvEk8uWReSNQVSzlEj3PLGPwTpwEhbk+s4laHV7eF4jn1aTne+21wXEuj8dq/1Ct72g DmuloST4/TuaDfn5VJn0JPfMrlWTXagfBeFDNKzQN9gQOY6HIa8eKT64CbQXHK/e9YhEcw T1AWiY8K6hrXIVAPOF2rTW8JFxV7nZQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=OhhAko7N; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-7fd10cd5b1aso359650a12.2 for ; Wed, 04 Dec 2024 19:23:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733368991; x=1733973791; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ZHrI50pChK08//QUGItvS7g1V676Q3cWe2B2yU1N3lQ=; b=OhhAko7Nm2dFJhY3QU+h3aFxAc0uG46EdUEJCIrGZOUElQch1Tt9y0yctBjp2UGed5 fNNE6iBsfDz9DFd1IIs6/DMWmQCY7CRltDfgqZzcfxmPOiMb3Hmspr9VaFSxxV+Xzjpq TWmnABmaOiLINNsnhZ283Ri7VcQjBivGBoWCHDfjHQBSBx195j5k+FOFGm5AaCtPwQ+Y ddcq/f/NaIWuZyHyOXuP7nkp/PtJpTB9YIaemIyWy7tXFPYO7cHqN2hn9rOuuefbPLvH uy4/zijauDFFdHPPXteyOJwCZNcxx21nVjYc6Xxc32kUWGDeJg9o0bin32TcpcBqa4+b 4V0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733368991; x=1733973791; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZHrI50pChK08//QUGItvS7g1V676Q3cWe2B2yU1N3lQ=; b=C9ei0VFO7cQ1yaY0/15rioq5jl/Q5JO5se4zQVvIcEY19MhFYsqVGbFSHkiSMYpY6A hKsw+Nr41N+WHz8ioRCFtlv7ZHvlVGlYQB6/gU3bI4mm0RYlKskmrahgYeK2dDkKChrR Np3Ec1LxDtP9rXBCDQC8rPbtSwDjoqjse4XQPBEY6qQ62l737IgtodnzNVkIHQaUOleY GrLMjiWz7LwWtCSqqbLc0nyn9s1aVlM+5CMNGDmr27RfFYXo9uirX0mmslA2Qb6uNOd7 9g2iBN/p0mGOgrlt/BRV+4RTwaIwls2GTG9e1iSPSkamVskJ9HeUVSdENfKQZXQArZ9K wCJw== X-Forwarded-Encrypted: i=1; AJvYcCXCZ+W12yNajDhFdVrUHZytsxJ/C4028SVJ8Up6iRL5bBCKbSgHXyrv2avVZKpA+QKhruyap/c1zA==@kvack.org X-Gm-Message-State: AOJu0YxIeATlQcnsugYa3Wiwpc0/cAZDYw9Etl77ZvsXCijE20G5jVEq 2Ar/s+q2R0ESlVlBY9gB2KLAcmuUmR/kLKCBGjtAQBemH++l3bAAdITde0ClLGs= X-Gm-Gg: ASbGncs/Lv98Xu8aHGuytzC/FsorteuLHnTkAGdKOiKV4xKsjfiZna+KKyxRqenXa2P 8hcVZLAHGTvNK0sWhdbsI0RvaKkYnwX+T8/MQxiIK/+5HWtb9GejyeUxyEw446b9kq8ISyD2ik2 IB9PUgpP6OsB0G0C2i+9vFT26CteEhfJlqyr0pyxqwjb2hRQxiZZMUHKeQDgg8g4+PGxFhqks1p vglmcM8h/OqmVpGSTtWBImihITYjs1i2wOKLlefRv/Izj7G1pNcnOSv4sqH6E/RyHbf5Bz4Cw== X-Google-Smtp-Source: AGHT+IEYy4QQRgT58e6SW8SXvDIyqjn2p6ZKKi35A4rwss2f55N7+9M1MkDoBVp5DXNjZ45LbR/YQQ== X-Received: by 2002:a05:6a20:d50a:b0:1db:a33e:2c6 with SMTP id adf61e73a8af0-1e16be0d5admr11615405637.18.1733368991384; Wed, 04 Dec 2024 19:23:11 -0800 (PST) Received: from [10.84.148.23] ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7fd1568de28sm242037a12.3.2024.12.04.19.23.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 04 Dec 2024 19:23:10 -0800 (PST) Message-ID: <03bd739f-cceb-4024-a2fb-5331ba258d36@bytedance.com> Date: Thu, 5 Dec 2024 11:23:01 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) Content-Language: en-US To: Jann Horn Cc: Andrew Morton , Qi Zheng , david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <92aba2b319a734913f18ba41e7d86a265f0b84e2.1733305182.git.zhengqi.arch@bytedance.com> <20241204143625.a09c2b8376b2415b985cf50a@linux-foundation.org> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 23FA5100003 X-Stat-Signature: idor5e1esuhfmdz8trh4ezuocgp1om85 X-Rspam-User: X-HE-Tag: 1733368975-31500 X-HE-Meta: U2FsdGVkX18R4ooo9CAaW5LSshvqxThd6hVTsVAYG5kBaFCukuH9abUvSFN6/MuHKg3G5/GvfWrMBBFTlwHCr8hKgB765ilQsOHqH4wlSQkx+4jzrYPQFjx3wSXm6EEDqsTZG7dReqxoKLwTIIskPFyWlL3zxXiNJdijZtGeEdHoCHYzIBSeTY9dRV7pc+0/Lxst3nhInL+xcqOQ4UTYSYxexe5FmWtczwZLhb82Bl9cMDzvM6OgXNkdFrcFDoNI3Y9x3eqhx1qLc2yk16tKrpYAJGaIe08ddL8m9JIT3vGDlsc/XRnYGbYBNKsl2sqS7u8QX0q1NbGlyiAvNBhiYg0ROQBKmSZyLTv1tmnqWwdeBmBDXRZed/0XY7W3eQ7oI41I1xrBoGs1JYWKcE6xtT3+CDHp7SOlEypnRE7Tw3MyzJm8x05kmzjSqmuGwUtNjukkMFLkOnCOSZkyuDs28CpNLyw/y0RjwzrczW9CmDDwX5ykBQ0jxcZ6ObtGriISx5fb2hPa5pggUpo/GpeEW2Of3HmLRd0FX30xj0gzEKhHqTSMtWj4tCxArzdyVvrCTDkvZ3qNhJKvJpPFqEQ6X6+rbkKAfc18YpJ03v414v7Q61Q/5z1fOz+AfrslT8rpNaoYtLQZiGEGwcuyDEQVOSxeimjKT5N+ew7f19GxfvrT7Xv3SKWKudQ7tuPauTmbuj3l3Mw50SQj2ZQIEnD/opLUdL/9YiR9oBNuYGBJVefP7o4Ny9rHTOrQPiUGWBOzOx/NLJOorPDL0uYLYd3okj6Cq8DY/fHbHvjaBewPYOTR3a7kNavuDeXjJwtEGKeQ41UVNNJKL/qWvvH9x6DVS4UuP7HHh+kZtGhQYtXD796S/o6nkrxyw5m45IyoLsgsYFwD2eZWapheXpseJp8lZVhAZP4057Zjf4H/ZcLlt7dfj8tIBN+rvP0hJM0JIUIE6cg4mi29pD8ysrgqyuG smwFVWFb g1Mv6u2MqiKyOUtixlV90DrC7Fgues3BtKhGqlnIRlsTokZXV63VefPQo0ThAh/MWJaTX5t4nS8tMKLt/QKJZKZ+5pavF7tw7rYyh3jJySc3inPo0+LxOJrZdvUPS96WmwSwsQOnGifvA808c3b1me1FcwY93xlAFhjFG37kSGuZeh6gpUYmtb2qIVVW6j7y76BNdConT3U9yVEtXk3YSjvHIGNL8IbVSoyM0ySIPa/wWLStBv3apZObP8ULTKBwJ1pE5Y03wzdF+53M75vulgEdQC3Q3vEerwIq3R3loW7MhgKWSdwzHA2gW3JXhUL5a5mnzy9AxwSlCXQwyQTpAimx+HbLKZlTHBFSt0KUft6ljzEemTtQ4YK9f+pcSQqB0t5uhlJYUHgDe6diqFX5Nvft8tb25iiZvl8fq2Z2KiEOdDpXMXzixkSI0FWy7g4NWW1Go+Q6A1E85d33Wrkk/+foOXUstfx5bYC6Rds8yOYKEykI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/5 06:47, Jann Horn wrote: > On Wed, Dec 4, 2024 at 11:36 PM Andrew Morton wrote: >> >> On Wed, 4 Dec 2024 19:09:49 +0800 Qi Zheng wrote: >>> As a first step, this commit aims to synchronously free the empty PTE >>> pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE >>> pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude >>> cases other than madvise(MADV_DONTNEED). >>> >>> Once an empty PTE is detected, we first try to hold the pmd lock within >>> the pte lock. If successful, we clear the pmd entry directly (fast path). >>> Otherwise, we wait until the pte lock is released, then re-hold the pmd >>> and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect >>> whether the PTE page is empty and free it (slow path). >> >> "wait until the pte lock is released" sounds nasty. I'm not >> immediately seeing the code which does this. PLease provide more >> description? > > It's worded a bit confusingly, but it's fine; a better description > might be "if try_get_and_clear_pmd() fails to trylock the PMD lock > (against lock order), then later, after we have dropped the PTE lock, > try_to_free_pte() takes the PMD and PTE locks in the proper lock > order". > > The "wait until the pte lock is released" part is just supposed to > mean that the try_to_free_pte() call is placed after the point where > the PTE lock has been dropped (which makes it possible to take the PMD > lock). It does not refer to waiting for other threads. Yes. Thanks for helping to clarify my vague statement! > >>> +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, >>> + struct mmu_gather *tlb) >>> +{ >>> + pmd_t pmdval; >>> + spinlock_t *pml, *ptl; >>> + pte_t *start_pte, *pte; >>> + int i; >>> + >>> + pml = pmd_lock(mm, pmd); >>> + start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl); >>> + if (!start_pte) >>> + goto out_ptl; >>> + if (ptl != pml) >>> + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>> + >>> + /* Check if it is empty PTE page */ >>> + for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) { >>> + if (!pte_none(ptep_get(pte))) >>> + goto out_ptl; >>> + } >> >> Are there any worst-case situations in which we'll spend uncceptable >> mounts of time running this loop? > > This loop is just over a single page table, that should be no more > expensive than what we already do in other common paths like > zap_pte_range(). Agree.