From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7C99D49205 for ; Mon, 18 Nov 2024 10:34:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58DB96B00AA; Mon, 18 Nov 2024 05:34:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 517E36B00C7; Mon, 18 Nov 2024 05:34:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 390D06B00C9; Mon, 18 Nov 2024 05:34:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 15B366B00AA for ; Mon, 18 Nov 2024 05:34:50 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9601640110 for ; Mon, 18 Nov 2024 10:34:49 +0000 (UTC) X-FDA: 82798854978.01.4C5637F Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf04.hostedemail.com (Postfix) with ESMTP id E62AD40011 for ; Mon, 18 Nov 2024 10:33:44 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Tb1FbBaF; spf=pass (imf04.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731925907; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x7NJZpL44XrHkZSOw9e4NN537XnAGj8dyCP7vgNWoao=; b=V7Wsrorc/xIFKDlvoqaUP/8WbYtKRvdkEXfejkdsQUJ9l+d2Y/5GdXp3sgJXnv8x6nxfXR 6ce2/DnxbWTtQZU3xkZofI1HItadaLoaGxIceUKoZ3plqQnwU1YrIrwXf0YxMPsYBBJOy/ yIIQ2PfXQOporcep+4Y7s93ULAyqiTk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Tb1FbBaF; spf=pass (imf04.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731925907; a=rsa-sha256; cv=none; b=CULw3EmeRS3Z8unMPvAC9SeYnPNWI9Ma1G78loPyjmeDSlb54auRb4zH8JroUrvq14kmOk 7ssC8dUun8hIzGEMP5Rf+erzJ2MKvspL/jcT12Uhg7A/fxocwtVSwlwJZqtQIkav21cbzL UrXUA4JlVOL9tgRtTca9iPqBrDjrqcI= Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-72483f6e2f3so1341603b3a.1 for ; Mon, 18 Nov 2024 02:34:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1731926085; x=1732530885; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=x7NJZpL44XrHkZSOw9e4NN537XnAGj8dyCP7vgNWoao=; b=Tb1FbBaFUV1uLFkgHpBNsYdQYxubprKmmJU4Oiwz+EHWw/9nDjzWOPKCkrRisCChbq nVLsHrEu5Xg8tyi04IJ2aAzMbSmbLutTZDn2W7QcmYAViGEXjxy1QQPASnVoDYo+G+g9 WnuZRUX9lw8wX2W94y1SZ+br9zZmJyGiT9JuF3CkSMRV0YIXW4F39j2jfgn88BoTBcH7 hGC8pnN0PvYYiNwPuZ5mO7iXoz/UxSSgKFzprV3f4mGh/TrWsaO6h8qyGYvDihNPE7q3 oHsR6qY79Zxpd/HVR7d/9q7hzgDa4w6kr6g6oBDXrphgt5VKF07uSF0JPOZOgoaUiBqQ vEvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731926085; x=1732530885; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=x7NJZpL44XrHkZSOw9e4NN537XnAGj8dyCP7vgNWoao=; b=DiQ5GcHw1+8hem2y7yctRB05in05ph+D8UySa3UFt+Xoi9fN52BvqNypZmJWFEITol v8s31cC+wkrZAFjhiRZ2wfguQmmdw833O5wrU890oLhsSudLl67GiwJ9KWKYqVhO/V8S 9F5FCRpYNZqq2MQMb5Kfc/OjapgxE73qs1qXPJR8DuqeGQ/nBM4uj0iTaLnssRNetgwF 9w9Hb5QHEUlFqSKF4Eu6diljQH2rL7XTKBV6fHcAt0t5rbn2nJacb+aGqBaWIlLbHlKI mgCDJY3o/EyAPC1nyHM+WN8LXmxnJ+yodT+pIQTV5B491Gut89tl+Go5kjkM/3uDR+Nc IrFw== X-Forwarded-Encrypted: i=1; AJvYcCWhi5Dg6bjY9HuxWQvc6WS/LVStNwE6aQTNIiiZctyWcusrnCpIi/idy6fqkpQOdZndpihdZLHqXw==@kvack.org X-Gm-Message-State: AOJu0YzONJd9Jae6R4roASpGH57avfLQJeHFlvbAIfenHCti9jGK3AyD T2bSmN08O4VvT0I45Fc49U3+fFEGaqM9kDl64NZOZJxoXCmqOZJuJ4HtLni7MSM= X-Google-Smtp-Source: AGHT+IE165UDd+h18HvNTIyftKKGivChWKJqvr5wgEJincZc7sMTuaJPkw2TegD+dAqUcLpJhMYIFw== X-Received: by 2002:a05:6a00:179e:b0:724:6c21:f0c with SMTP id d2e1a72fcca58-72475fbc463mr19968229b3a.4.1731926085331; Mon, 18 Nov 2024 02:34:45 -0800 (PST) Received: from [10.84.149.95] ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-724770f9d0asm6075778b3a.26.2024.11.18.02.34.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Nov 2024 02:34:44 -0800 (PST) Message-ID: Date: Mon, 18 Nov 2024 18:34:36 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 4/9] mm: introduce skip_none_ptes() Content-Language: en-US To: David Hildenbrand Cc: jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, peterx@redhat.com, mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zokeefe@google.com, rientjes@google.com References: <574bc9b646c87d878a5048edb63698a1f8483e10.1731566457.git.zhengqi.arch@bytedance.com> <617a063e-bd84-4da5-acf4-6ff516512055@bytedance.com> <253e5fd0-7e98-43fd-b0d7-8a5b739ae4aa@bytedance.com> <77b1eddf-7c1b-43e9-9352-229998ce3fc7@redhat.com> <5a3428bd-743a-4d51-8b75-163ab560bca7@bytedance.com> <4edccc1a-2761-4a5a-89a6-7869c1b6b08a@redhat.com> <2b48d313-4f66-47c8-98d8-8aa78db62b1b@bytedance.com> <995804f4-b658-44b2-bb40-c84b8a322616@redhat.com> From: Qi Zheng In-Reply-To: <995804f4-b658-44b2-bb40-c84b8a322616@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E62AD40011 X-Stat-Signature: irum1u5hsa5ww3s9y5kq4qynt98mfyz6 X-Rspam-User: X-HE-Tag: 1731926024-139793 X-HE-Meta: U2FsdGVkX1+S+zWNqc7RUxywRBFZZyY24jtP+Bid76Y7RnQSITqeh5DzxGakILkzVtk9FhaX+MF43piduXLC1n31Nu9tIQNAgHbW7VIAQw/WG2fxv/iPiFgQoXrS60SnV2xho1FMtVv7Sa3UWoichl4cPTFURtvcwKsUGkcFqWGDkPLaB5AemI71vZ3GJm6KgEzaJH0nPd1k+tSJfpOfTbyHaE0nwM5OrqtULibGySDbDWGE9mtrUZpxPD116FpCefvPPzxDoTuGaDAlpL67YC8UUJsBoUI1+TRNjAsay8LnsCpn3hNeNI2JQHVOXFyj4AWIvNtjM8Pu1ze3uxykuHPDX/PGF8d62XGmQvq+0k8UjIUWpO6llLMn/1uX3gNJ9wjMFzqUUVOq+7+izzl6wlq01UGW6BGKhFqkjFmIxW6ZfdMsDzr76lygOf/YNjiSoZCddSPKGe3ZjsYziro6jCBwxCwmrtldjOwv1PcJVgEK2uiNutOnuqI+bih+GfPPXp0NDWFUO2GFA6C/n4SMvrRF72aT8ZcBSdG14ueZjUrj2b09szNZtPJlAPrJvQV1I178GKp2dr6aC5vdww0Uy/BJD6sxm0GWr+p4JOJgrLZpT34ax8+dQqesxImq7Kd7vCFQetqIG3i31MP/khw/LrqD89HIQvx55ESOAOMEm9ssf+u38L/WWdn76B5/w5OuKBtdZWhIugQjKDYepf0KgOHlJXKD8K1dCcvirG5wnA74ZVOQMCk0zDwxeY4RtUrcAQonex6Rr1Pz5Iy7/098WmIwoOzLwC6iuvP3oqM+Fwp7OsYiUgwT2TUF/lpY3OVceaoPRgrdxIv/kJsDswysqqGQAFxx5zakArXCUk5xPNqoVmp3sjyV+PjdM+sPr+c8zVezh0blGPegk/H05RbXJJx5R/VWVicrxUseTIs3YfCbokaFVv6wZf+6X8+vEQjz1aq7xme+7K1jqgRg5zN TtoWk6y9 InNd/1qXOln52VrDRzL4cZ3AWavwu2tfI1Y8stp6ApWYITA+dgVJ+u03z9ll+KqaMGMi2ni8WIZ7fn0nw9o2E6yFa9oeWU2AqvbXd6wn/Vkfj64SEclWbrUV9LsaFAx6HXrJtXe4lYWUi84ilpsr1AnjkzKjcW5wdTQaioW8WDYkSNlMhjcG6jBjg9jRMecj1Yq0NowDdouD9fDIv6GG+1iOgQDD36iSdi/o4T5XNvRa4bAb8fwwtrVuGhcmglA4dfWqpipRkyn78Cu8DoLe3pbU3KQA+emOmwN/3kNnaN+PQyfQK+FGX/CSMUZkg9qWuLAOBt2dP/DEg31BwwsDwGLtD0KB2bbvutSHOjVMmPWHrzENYsgCpgc4pzKW9/XpIoH5upoN+sdm70GZSb2LyxztdZOm5hyBV8XcX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/11/18 17:29, David Hildenbrand wrote: > On 18.11.24 04:35, Qi Zheng wrote: >> >> >> On 2024/11/15 22:59, David Hildenbrand wrote: >>> On 15.11.24 15:41, Qi Zheng wrote: >>>> >>>> >>>> On 2024/11/15 18:22, David Hildenbrand wrote: >>>>>>>> *nr_skip = nr; >>>>>>>> >>>>>>>> and then: >>>>>>>> >>>>>>>> zap_pte_range >>>>>>>> --> nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, >>>>>>>> &skip_nr, >>>>>>>>                             rss, &force_flush, &force_break); >>>>>>>>          if (can_reclaim_pt) { >>>>>>>>              none_nr += count_pte_none(pte, nr); >>>>>>>>              none_nr += nr_skip; >>>>>>>>          } >>>>>>>> >>>>>>>> Right? >>>>>>> >>>>>>> Yes. I did not look closely at the patch that adds the counting of >>>>>> >>>>>> Got it. >>>>>> >>>>>>> pte_none though (to digest why it is required :) ). >>>>>> >>>>>> Because 'none_nr == PTRS_PER_PTE' is used in patch #7 to detect >>>>>> empty PTE page. >>>>> >>>>> Okay, so the problem is that "nr" would be "all processed entries" but >>>>> there are cases where we "process an entry but not zap it". >>>>> >>>>> What you really only want to know is "was any entry not zapped", which >>>>> could be a simple input boolean variable passed into do_zap_pte_range? >>>>> >>>>> Because as soon as any entry was processed but  no zapped, you can >>>>> immediately give up on reclaiming that table. >>>> >>>> Yes, we can set can_reclaim_pt to false when a !pte_none() entry is >>>> found in count_pte_none(). >>> >>> I'm not sure if well need cont_pte_none(), but I'll have to take a look >>> at your new patch to see how this fits together with doing the pte_none >>> detection+skipping in do_zap_pte_range(). >>> >>> I was wondering if you cannot simply avoid the additional scanning and >>> simply set "can_reclaim_pt" if you skip a zap. >> >> Maybe we can return the information whether the zap was skipped from >> zap_present_ptes() and zap_nonpresent_ptes() through parameters like I >> did in [PATCH v1 3/7] and [PATCH v1 4/7]. >> >> In theory, we can detect empty PTE pages in the following two ways: >> >> 1) If no zap is skipped, it means that all pte entries have been >>      zap, and the PTE page must be empty. >> 2) If all pte entries are detected to be none, then the PTE page is >>      empty. >> >> In the error case, 1) may cause non-empty PTE pages to be reclaimed >> (which is unacceptable), while the 2) will at most cause empty PTE pages >> to not be reclaimed. >> >> So the most reliable and efficient method may be: >> >> a. If there is a zap that is skipped, stop scanning and do not reclaim >>      the PTE page; >> b. Otherwise, as now, detect the empty PTE page through count_pte_none() > > Is there a need for count_pte_none() that I am missing? When any_skipped == false, at least add VM_BUG_ON() to recheck none ptes. > > Assume we have > > nr = do_zap_pte_range(&any_skipped) > > > If "nr" is the number of processed entries (including pte_none()), and > "any_skipped" is set whenever we skipped to zap a !pte_none entry, we > can detect what we need, no? > > If any_skipped == false after the call, we now have "nr" pte_none() > entries. -> We can continue trying to reclaim I prefer that "nr" should not include pte_none(). Something like this: nr = do_zap_pte_range(&any_skipped); if (can_reclaim_pt) { VM_BUG_ON(!any_skipped && count_pte_none(nr) == nr); if (any_skipped) can_reclaim_pt = false; } nr: the number of processed entries (excluding pte_none()) any_skipped: set whenever we skipped to zap a !pte_none entry ``` do_zap_pte_range --> pte_t ptent = ptep_get(pte); int max_nr = (end - addr) / PAGE_SIZE; /* Skip all consecutive pte_none(). */ if (pte_none(ptent)) { int nr; for (nr = 1; nr < max_nr; nr++) { ptent = ptep_get(pte + nr); if (!pte_none(ptent)) break; } max_nr -= nr; if (!max_nr) return 0; pte += nr; addr += nr * PAGE_SIZE; } if (pte_present(ptent)) return zap_present_ptes(tlb, vma, pte, ptent, max_nr, addr, details, rss, force_flush, force_break, any_skipped); return zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, details, rss, any_skipped); ``` > > If any_skipped == true, we have at least one !pte_none() entry among the > "nr" entries. -> We cannot and must not reclaim. >