From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E755E7716F for ; Wed, 4 Dec 2024 22:48:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1D3F6B007B; Wed, 4 Dec 2024 17:48:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCD376B0083; Wed, 4 Dec 2024 17:48:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B94A56B0085; Wed, 4 Dec 2024 17:48:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9CF6E6B007B for ; Wed, 4 Dec 2024 17:48:10 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4786C1412BE for ; Wed, 4 Dec 2024 22:48:10 +0000 (UTC) X-FDA: 82858766172.28.1E92FF0 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf18.hostedemail.com (Postfix) with ESMTP id A0B1A1C000F for ; Wed, 4 Dec 2024 22:48:01 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qHG20x9A; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of jannh@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733352481; a=rsa-sha256; cv=none; b=QRRMdRYpYJRrhlUyddDZs2ywl6BV394RV8uAJWouv9MN8U/ZrcTGeuQtofJi8tkbXYPz3G VkjS97lfQSk2igb1Egyp91fbjgDXrI/n47cfcpqF2yU+/Dcz1nOYIkhnZRFmKu9XkQ1B8g rpYM3mgwW4JTHLXIeAueVNKnX8IzGA0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qHG20x9A; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of jannh@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733352481; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1i377nu++qLdmOLgcxee8cAP5hNqhYVzOX02yU9RrIs=; b=U8rogsz9aLyuSVsgKv6ensLF6pqldTWE7AXeca7mFGgALuV7gJDEhyEGE/2bKA4iAh+/B+ 29IhtZ8zjumwNBNy9FBHNdxYs0intGSVdN4/GHEqB4kU/LFZgVM+krS8ct8G2VrxHHBZeM Jh1b6NzijhBbAyrG5ZofDJqLmmlzHxk= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5d0c939ab78so1516a12.0 for ; Wed, 04 Dec 2024 14:48:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733352487; x=1733957287; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1i377nu++qLdmOLgcxee8cAP5hNqhYVzOX02yU9RrIs=; b=qHG20x9AumVn7LLU5VItrsEDPhGfunkPPpkmR2YpnoqCCaUYRfS36g2d/4EjXKRTGA +t61kFGo+VwgWENXsjt1GqrHtfX23ft9dqYgVC7tgtam7CHOGPY5bZSg7qXlEnU7jlrd UCdGoGvqaKG4Y82p9vm62PevMtN/4nPy+UwRGrBRmTRVi5NvY1L3tmnlhTjEwOlERTi4 6fDRqyboX2VWULIlXmWa0l0HiFu+q92vFR5PjlIpXvoQlqH/ytYv065fhH4Oayh3pbhz dyLDXdY+e1IdWlyvx5EQg2XzV8Ub41zgrec/C54q695woj/SwGwSm36eThLScgXJfxa9 WuyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733352487; x=1733957287; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1i377nu++qLdmOLgcxee8cAP5hNqhYVzOX02yU9RrIs=; b=mzoCRKHB2hfGRi4S/54zSUVO9rX5TWyTsk/BRyXxT6HLKjz0C9CpCmzOfqi7FqOatG C0Yc6HwdePk+O1o16LMfze3HbpVPfVqMKgkDzFCPO9mXzyVpQHoczQb0GNmnc2JJOF28 n98FUhOXx0c6ra8MU9GwMM9p2+9KfSaGtpyeyJRSH6hk7t9z4kk+Wn40qlZaEx9Bg9B3 EcB1njxlwhlI4YHTXh9ACSaW5mOZVMkl3dZuD5IncRmjBJSJYyCX8lgbLXwvePDBOGen 8OZwIrou+YDV2iWYD8L/wzmYtiL3+6QaFa/0nViyLVNeQFZ6ptGhraoEmUJFk0Lqqyfh xJZg== X-Forwarded-Encrypted: i=1; AJvYcCVsnhQNIyOQuXmNYJQtmgmnW2nv9HKJvjDzqTZPRlrTPVdxYgZcfoiO8SMPrmHM6YrMiPmOPChLmg==@kvack.org X-Gm-Message-State: AOJu0Yx3uaRHApcSgd1slBcVlHQTuxlx9RgmKsHdnLQAwmScjaI2UjwI LnpPHNnai70+3dS7TDftexs/LgBrn1vXS1T1T6pSJucg+mwFVBtkAQPuLhPXM45jH5aJ/yC64CD TS/iF560YtBuZQe8CHrFL3bD00/sVS61rRqzR X-Gm-Gg: ASbGncswmRWTp3lL1i7CTvs8v/Sy7slOvor9ae2wAD+Rft4qCySKjEluxZbT40fVVIE grXfubo3gzeBRVIs2jtrgnMvqpuF2ukGeFrscHAFgT5ZYdlXmycNlEnqQY4A= X-Google-Smtp-Source: AGHT+IFut9yUAXYtT2MdTOBVgQ/tvh7Uh5w614iijMxjBbTbOVc4uk+lry4TS3NDkThd/iprdC4bfMpTsRSsHPDfvSo= X-Received: by 2002:a05:6402:1342:b0:5d0:dfe4:488a with SMTP id 4fb4d7f45d1cf-5d1259b6906mr24278a12.2.1733352486308; Wed, 04 Dec 2024 14:48:06 -0800 (PST) MIME-Version: 1.0 References: <92aba2b319a734913f18ba41e7d86a265f0b84e2.1733305182.git.zhengqi.arch@bytedance.com> <20241204143625.a09c2b8376b2415b985cf50a@linux-foundation.org> In-Reply-To: <20241204143625.a09c2b8376b2415b985cf50a@linux-foundation.org> From: Jann Horn Date: Wed, 4 Dec 2024 23:47:30 +0100 Message-ID: Subject: Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) To: Andrew Morton Cc: Qi Zheng , david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A0B1A1C000F X-Stat-Signature: gwi3czgw5ggxkpy5aq6hkcdrw7q5xjbj X-HE-Tag: 1733352481-156906 X-HE-Meta: U2FsdGVkX19ra0OlgkFFtDraNGXRAOjih9BRqaW+yfxfBbi1HUKoiDcAs5DOGmborMS2HvzKGJ9jmBpoZn6Z1dAMm+UPJ43oNNX7QgOscUSo9FAYkonB8xlbM025KlgyDXrQuEnHKjCInQ2QO1RgbgopQpcSMDcDRp5O6k76z2dXasmDUuASGgX9hiBjDXlPoOm9rmivmXiXgn85EJqSXrTeOC5OkeJY/ZqhfiOkdiUDxwUBgnkV6GGthzklyaiTxRM4y93R7HYz58fOXF/A5D1GjzlFQmtjOKmXOfZ6UygJYniP7TAbOCbbd1ChIhcUg3Bu+2ts0xi3XgPifGFExdRFGQs3SXFJpGQEKjlD4Ei1y7uX2Y91EAEE9+DV3PMFtuOiI3V8lqmus4c95e2mPf1gXysNA1YD9DJYJxX18E36M20Z1dyobfmq8BKOYHyG9F4WzwtOBFLLdJdcy8XgXTtPb3HshwqR8bpwuxGzMuruTX92C+9eu/PZvtHdpSqQD4USWZ3DTEJXdzAgnAiBVuNBOuN23Twi1K5YTjfCHP71jELwLluZgqjWgdR3dzA1lS1PyJY//g1EyAQpLymqJXB5DkU5XLugFd6f9v2fear27Mx6fninwgDEGeYYqF2nrZ6pWZYZwm6Dnj7P/2d9CR5QFrOXTFHkmhD5709s15bQwObYCmQ47TGF4T7mmymI1d1yTAjjiod+DN5Nv1/fPoDuFP3B6s3HOLMmv+WXLcocvTgXUkuAOifjpVp6w0+riL/XneZB1/5wDGdClex8+ObZN45Xt6ZpvnHqgs9Phe14k03u1/W6c+eWF9MofTQPoWbNWnWGrEd8lM+gCdAmpwe9s4q9Rjft3hRb8XhcbS5ieVN9j56hMoMaQZ58momSo7rAEN1BbudM2TQB8WRUWFH1rq03p2edBxAvr1X7SisKPNGQB2I/cukPeVazaAzjxZWFIN38TNNK75QEKSq ddBiJVCL OW4F2Eb3XQVUx3LqjlmNlVFtRSJNAHXsqSb/n0hlP3iQyINI7vWd6j2wY8WIloYL93I2szfH7P+uega6mRQQOT8PeCtOLeHby94bs85CnBgzGKWtvJM3Bg2TBcNnE0j7i1hPLugEVO3+JgIp/bv4NCr1sgqhtpJtxO/lZiHWLmlf+TMitRRBnEEt4omRsPUaQMQycxJSRCvFvh/FOTvcBKda1/1pQaID87zpxYaGpHNEZ2t7Uq7lF+LEZy9SO0LtlmI8vYP6MXkTiEwaegIs+iLIER/m7GaSod4GqI+A7i1JALZSpSnr4HymB46LGVHkoxQGn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 4, 2024 at 11:36=E2=80=AFPM Andrew Morton wrote: > > On Wed, 4 Dec 2024 19:09:49 +0800 Qi Zheng = wrote: > > As a first step, this commit aims to synchronously free the empty PTE > > pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE > > pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclud= e > > cases other than madvise(MADV_DONTNEED). > > > > Once an empty PTE is detected, we first try to hold the pmd lock within > > the pte lock. If successful, we clear the pmd entry directly (fast path= ). > > Otherwise, we wait until the pte lock is released, then re-hold the pmd > > and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-det= ect > > whether the PTE page is empty and free it (slow path). > > "wait until the pte lock is released" sounds nasty. I'm not > immediately seeing the code which does this. PLease provide more > description? It's worded a bit confusingly, but it's fine; a better description might be "if try_get_and_clear_pmd() fails to trylock the PMD lock (against lock order), then later, after we have dropped the PTE lock, try_to_free_pte() takes the PMD and PTE locks in the proper lock order". The "wait until the pte lock is released" part is just supposed to mean that the try_to_free_pte() call is placed after the point where the PTE lock has been dropped (which makes it possible to take the PMD lock). It does not refer to waiting for other threads. > > +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long a= ddr, > > + struct mmu_gather *tlb) > > +{ > > + pmd_t pmdval; > > + spinlock_t *pml, *ptl; > > + pte_t *start_pte, *pte; > > + int i; > > + > > + pml =3D pmd_lock(mm, pmd); > > + start_pte =3D pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &p= tl); > > + if (!start_pte) > > + goto out_ptl; > > + if (ptl !=3D pml) > > + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > > + > > + /* Check if it is empty PTE page */ > > + for (i =3D 0, pte =3D start_pte; i < PTRS_PER_PTE; i++, pte++) { > > + if (!pte_none(ptep_get(pte))) > > + goto out_ptl; > > + } > > Are there any worst-case situations in which we'll spend uncceptable > mounts of time running this loop? This loop is just over a single page table, that should be no more expensive than what we already do in other common paths like zap_pte_range().