From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8EDFC54E41 for ; Mon, 4 Mar 2024 22:29:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFC606B0081; Mon, 4 Mar 2024 17:29:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAC516B0082; Mon, 4 Mar 2024 17:29:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4D0E6B0083; Mon, 4 Mar 2024 17:29:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C6C546B0081 for ; Mon, 4 Mar 2024 17:29:46 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6B945A0139 for ; Mon, 4 Mar 2024 22:29:46 +0000 (UTC) X-FDA: 81860799972.28.976D865 Received: from mail-vk1-f179.google.com (mail-vk1-f179.google.com [209.85.221.179]) by imf13.hostedemail.com (Postfix) with ESMTP id A72222001A for ; Mon, 4 Mar 2024 22:29:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fliTv4lq; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709591384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jirSeDVxTZlyRe9zYTwADzzdMh/h01z9Up/58VyB0jI=; b=NQfQx17/SPrkHaTsnQb1jAvflCnNg8kSKGNGDabDAP5+/qsjmojeRosdNhb50ZdVbhm90D e0oOuuJTu6eSqQVjZcgkckael0IWe51tiDBitdyAHCzp4WERlDW6PSbIZBXPoDuuDys6/g LfnwDqQHX9LZp//uM/aRNBzal76I+6w= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fliTv4lq; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709591384; a=rsa-sha256; cv=none; b=YKjh3Sct72/YiT+Vul0huCM+YdrMAadf1F19rdl5Sn9w3NHeI6k4e8xWEB49Dia4Sttzev TcM2mP3UC9WBAK6F+s7yN2R22MmHLzROvm+b1XobZDN9bsH4chUK3g8htZr0YVoQKLH1oq Z+eixbcnpK7v699pXWkEC6qFjuR0JJ0= Received: by mail-vk1-f179.google.com with SMTP id 71dfb90a1353d-4d37114045bso1769e0c.3 for ; Mon, 04 Mar 2024 14:29:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709591383; x=1710196183; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jirSeDVxTZlyRe9zYTwADzzdMh/h01z9Up/58VyB0jI=; b=fliTv4lqU+w0+5kWQc+pxRAyIvQyeQdcJVfDcGQcTd/fuJqV6DuOgk0nC97qIGP5N5 qorvhg0kYwEdjtY1bcz7ZAdpLxyp+i/Gt/u5ERKivDI8aejFgAasWseClJRhse66MUa+ D7veVUoQe3oW0q53FlbyLAVGZH5Rk7RZvOj1zHI3NTGrsswsLmfVAEikP1xzntRtUy/c JrK6t64xp8eGe9q5/fDEP58FkDa4d2dtK7pSlm33TAzG48vb5NrGAqHA62Y1JW97tslB j3YeWAX+RnLJ1t1OJI3h4r+tv6pedrDpv5MH7VaEGsx4bKRA8IsnDbVejbNS7XDJ4tIf DxOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709591383; x=1710196183; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jirSeDVxTZlyRe9zYTwADzzdMh/h01z9Up/58VyB0jI=; b=LYyAWdmoA4uVVkvtjPuSfGiNvsFDDusQbSUkPZkcDXrpUF8B2yDPVjAZv027GPfPbs bvt0mvyvPXD1AA1x5T9j1i7V3LzoIpT0+eomabtnEBO7FQDRp/0+f3P0udKoNqNA+M06 O35ctaSYNp9r/5/nkZ8fuIfezHJavQmZQKan/1LUWTPVNa03LNk5qi7uZirLA/MMVkBc sIE1OSdGBSzwi1obNXPYU3y5Gy9Yx9WTheukkvaspdFn3Ii6I0t7CNt9P4/ZgC4pFhYV kWOdGXbeH0U0cX8vD1YsKYXiL98kpbimnT1fBXYVWJY9qCPlNE3s6KkKsI54y/NrlkRD 5dtA== X-Forwarded-Encrypted: i=1; AJvYcCWhQQ/QkYoK/iDttsWDAcJhDrKCLr877GQuWUH1ig6bgdSf8eRqPhGtA1GO1TENmBlKrd1aFcsSlj0oz2OOBowBTe4= X-Gm-Message-State: AOJu0YyKCWBcyInDdlPnH/cbBhZAlsX1apBNCkoo29+QlEV/CUoKcyOO 2c6JXFAHM/iT1npxjgfo+kdz1kcU0t4KGvu7xQYsVLZBFEMj5MDRjTDpPeh+K4HaEqwJM0yRhzC Wsy8pB/aZlqsvZhjIS1+REuH5PW4= X-Google-Smtp-Source: AGHT+IEnmR9SSBt0Nu6nkL0livb/Wk+Stg5tA9Fjmnn32Rhz1c1Qu6OXcx9tIVyT4JfRhKcSOochQjpnpH+9s55RCVU= X-Received: by 2002:a05:6122:999:b0:4d3:4ac2:29f4 with SMTP id g25-20020a056122099900b004d34ac229f4mr63078vkd.2.1709591383601; Mon, 04 Mar 2024 14:29:43 -0800 (PST) MIME-Version: 1.0 References: <20240304103757.235352-1-21cnbao@gmail.com> <706b7129-85f6-4470-9fd9-f955a8e6bd7c@arm.com> <37f1e6da-412b-4bb4-88b7-4c49f21f5fe9@redhat.com> <804524c8-772c-42d0-93a5-90d77f13f304@redhat.com> In-Reply-To: <804524c8-772c-42d0-93a5-90d77f13f304@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 5 Mar 2024 11:29:31 +1300 Message-ID: Subject: Re: [RFC PATCH] mm: hold PTL from the first PTE while reclaiming a large folio To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, yuzhao@google.com, hanchuanhua@oppo.com, linux-kernel@vger.kernel.org, willy@infradead.org, ying.huang@intel.com, xiang@kernel.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, Barry Song , Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A72222001A X-Rspam-User: X-Stat-Signature: zgyj87khgfsnapbkyhut3ma9xqr3an9t X-Rspamd-Server: rspam01 X-HE-Tag: 1709591384-147325 X-HE-Meta: U2FsdGVkX19ppQm8Avcg5LGNH//hA3iB++Tzca3pwkn/ohBjJLo/lqmL4tELc9ijxxJS3VuDQneG1y+M7pw+0CEI4Nuapo2iXB4mO1JeUms43H7sziDful6ORXcx8gQGTqnQSwGD0+XHWSFEDx+ZSgI5W3KLzttVYHjSHH7TC4qfBfAlJqrDGKQPoin9T/Gcva1V1iz/1O6OiBhQZe2NoR2AdvEMFe3CRgVCzOaPBgcXkwaqe8E5lC3QiJeft5fvXF3yeD+K6FfcrLm/KwHoA9v2WfdCXDYuADQdvt7VDYUMWZF6Ys2NA7UleCMMXWqiHPvUQB499jfYE6n+Rj5AmZS6Sn9I8I8ad8DK84n6+S415qGC9tgxUsLHoaB7m8hYNG/IQ6FltvSQ2LxIjuo7wF7YjRxxHRjSAOd/15AGyvfRO8bLTPtN1CuntZENP/7rgIYFHTgD4QlYde5e/rp9xOWMvqEPqBkBFDHtZBniJh4OCAu00xx8gB6eBuxPlMzA0gU6Le9OugRTZhz+ds/fVerdlePm2nDxP9UXv2Zj/RXmG5cWDOljZWl3K92YdW813PzCLgy1024DWNCF//bXriRsjKSziNqUu/bhBModMM96smfG+ck8tLeUQg/0g7ZQkOk5opuf3U7jtx18Y+8KC1zY4dHhGafkhib9rsRv2x6ZfXI0qwaZaTXrNOPhNCyxCfAJwC1p8p/+l1xSoHPv/jaULcHKHudQt07GuA4RvoTORJE31Qb8bb09k1t/4REIWYlJTSYtnFaIU4eDyvCcUGGIqZUQfGLNa8X6Rl1VmE06QRxyC+MF/6Tz0L7U7MLuk2eLMh8eWaAesMjX3/5bUpuhLXcxsjLS83EkGz40KCK5pHBE9rWjKbbbaxZZqQSB7jkB4+mueVrB5HWFWGVklPN4WK/N6A8UNQeG5VztPwrXVtTaccScbmdt/U73XyDyj0K5b+87uad/PX2mQTP ObF38JSP qT/UcWmBEVnnNTZspMoJBfJIg5rlqE6qXIIT3rGec0oenG6ZUFhz9aBOY8V2yTw4g08oacJzPJGhdHWNunpzvDqdJgC9ErQ+3Pj9oWqOPsRxILj1RZdUUPp1GBF97B8d9Id23GgsqE4OQMgYxnL3i0RDJ3NtnENHPMqPPksibbRg3QVGG3VhoROn137zFaAdkJoCctZkB8PdrvpKAE/gY4zL6utN8cEF9sy47oDGnd2J80vfW5vb3oB7dI3RDxLzGyrYGl3t6nhfdYvBOSO3khWH4A09lM4f1oNwUL6qEJjJac1yrleoZqoYF/HlWs9zvExmKyp6c5DVRaoPSkmjFaTitmRB69z8gF477YZzVZ40Rf6u4fGc81b9N8KVHyLMeB/hiTfhVfx/otlhOJtXxAXcLQAED1YJlyvCT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > > >>> Do we need a Fixes tag? > > > > I am not quite sure which commit should be here for a fixes tag. > > I think it's more of an optimization. > > Good, that helps! > > > > >>> > >> > >> What would be the description of the problem we are fixing? > >> > >> 1) failing to unmap? > >> > >> That can happen with small folios as well IIUC. > >> > >> 2) Putting the large folio on the deferred split queue? > >> > >> That sounds more reasonable. > > > > I don't feel it is reasonable. Avoiding this kind of accident splitting > > from the kernel's improper code is a more reasonable approach > > as there is always a price to pay for splitting and unfolding PTEs > > etc. > > > > While we can't avoid splitting coming from userspace's > > MADV_DONTNEED, munmap, mprotect, we have a way > > to ensure the kernel itself doesn't accidently break up a > > large folio. > > Note that on the next vmscan we would retry, find the remaining present > entries and swapout that thing completely :) This is true, but since we can finish the job the first time, it seems second retry is a cost :-) > > > > > In OPPO's phones, we ran into some weird bugs due to skipped PTEs > > in try_to_unmap_one. hardly could we fix it from the root cause. with > > various races, figuring out their timings was really a big pain :-) > > > > I can imagine. I assume, though, that it might be related to the way the > cont-pte bit was handled. Ryan's implementation should be able to cope > with that. I guess you are probably right. Ryan's implementation decouples CONT-PTE from mm core. nice to have it. > > > But we did "resolve" those bugs by entirely untouching all PTEs if we > > found some PTEs were skipped in try_to_unmap_one [1]. > > > > While we find we only get the PTL from 2nd, 3rd but not > > 1st PTE, we entirely give up on try_to_unmap_one, and leave > > all PTEs untouched. > > > > /* we are not starting from head */ > > if (!IS_ALIGNED((unsigned long)pvmw.pte, CONT_PTES * sizeof(*pvmw.pte))= ) { > > ret =3D false; > > atomic64_inc(&perf_stat.mapped_walk_start_from_non_= head); > > set_pte_at(mm, address, pvmw.pte, pteval); > > page_vma_mapped_walk_done(&pvmw); > > break; > > } > > This will ensure all PTEs still have a unified state such as CONT-PTE > > after try_to_unmap fails. > > I feel this could have some false postive because when racing > > with unmap, 1st PTE might really become pte_none. So explicitly > > holding PTL from 1st PTE seems a better way. > > Can we estimate the "cost" of holding the PTL? > This is just moving PTL acquisition one or two PTE earlier in those corner cases. In normal cases, it doesn't affect when PTL is held. In normal cases, page_vma_mapped_walk will find PTE0 is present, thus hold PTL immediately. in corner cases, page_vma_mapped_walk races with break- before-make, after skipping one or two PTEs whose states are transferring, it will find a present pte then acquire lock. > -- > Cheers, > > David / dhildenb Thanks Barry