From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1294BC27C4F for ; Tue, 18 Jun 2024 09:55:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8122F8D0024; Tue, 18 Jun 2024 05:55:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79B1B8D0020; Tue, 18 Jun 2024 05:55:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EDC48D0024; Tue, 18 Jun 2024 05:55:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3B60A8D0020 for ; Tue, 18 Jun 2024 05:55:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 04505A1EEA for ; Tue, 18 Jun 2024 09:55:31 +0000 (UTC) X-FDA: 82243552104.23.B56BD80 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf09.hostedemail.com (Postfix) with ESMTP id 27AF9140014 for ; Tue, 18 Jun 2024 09:55:27 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=h8w+4Xec; spf=pass (imf09.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718704523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CZ8RZeQveT0C05v/nnaLgtXSgkTqWPEtT0u+88t9MxE=; b=dE9rnwXGY7e1IMGA9In5QBxMW6tYnBQN5JrEI8nLSmxbytKmfPJMKixFz0tdtLq/Ctzzy2 pZbHbVHI3lRAytUQfG8oU0o2pbtgVieEeHtgToTDPv61HpTgin9ayUz2EUFaW/Kb3Cengf vy0Kg9S+8InnDc/56di0Yh79Ee5WC5E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718704523; a=rsa-sha256; cv=none; b=xhu9evhochUYc3wvfPOAOdN+iJgzql/gMY1O6AdnL+heiUqvLltTLEy6WGDmUHXgYlQ9tF 87v0Q+TENBG+6ULFbJ93m36zP5tdarRWAnptBI5W0a9OqStUJLN8R+yVvMyg87QZmSztKO /DgkEI0qPdbeI/yJ/9sS3z+JwTUIsyo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=h8w+4Xec; spf=pass (imf09.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2c2c91c9279so774205a91.0 for ; Tue, 18 Jun 2024 02:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1718704525; x=1719309325; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=CZ8RZeQveT0C05v/nnaLgtXSgkTqWPEtT0u+88t9MxE=; b=h8w+4Xec0C6ejSckOAJ0pTlUYXq1cmxVP2tg8SjfSfuB72g+UYMh/+bMsQpCqf0Xyy DNTqJcAlnyszuhmj+3JtnsZE3sgbIZpo48WHr3k0IOoiRIA1UTROU2B+iPId83YtNvA7 wJgYZ9L15PBslTn96TNZzj9kMZQYTuieCNLbc+zhS9B+lal/Yzju6XjE3U4F9byayN+L laNaiUgDuu+zpgXY4ERgX8h6eGU+ZYfwmfVeiZ5s1G00LO5wrc9FRRx4g+8NZ2T7Bqpl kusfvDmMiCm8YoRQEWXGKP+ifL+hIfn4JkNaf22rixm1heHwWsWx3i+Pbayx49zJE16p QnWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718704525; x=1719309325; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CZ8RZeQveT0C05v/nnaLgtXSgkTqWPEtT0u+88t9MxE=; b=f/ohXotF56upOnstHP3XS/pgDNNF70sptgeoWK7tyXI5OcuhUbY2WmPK+IisrtuR8q 0FEgedyERIxC949QRxa6KypT0MfLqfwPEo5JDBV6PlLYVYhbx0fwWJpi099RpMJos2Jj DE20amlGdheLorogtC+Q5L5cNtT8y/uUdqcyAw32Ey8wQNkcpR9CC2Sf78hRfuOkiO3D hFHCX6CPbMG3bRk+M/BHHhvzN/KrtSsBgyeZp8HIPSrOIHsFYpVlb7Ixk0lL899hHAii 8x4MYSazD5tiZrhCm1J0k0r5/2ELBDDQB/hiUe88c/kqQaOt/pEJin1vl8F2g9Av7uYY aGuw== X-Forwarded-Encrypted: i=1; AJvYcCUlr4beve69TB0KFMPQR86J16ePW57vti4Ipz+NYqNnthmVi8Blbypcz44dyxHjiJFr+wZifHsqgCFF9zhQNlmmCaI= X-Gm-Message-State: AOJu0YycTFtFF+WtWfum2GKU7d8nExWykLjDvvqjAS4tbgLdBoB9Sn2H iFIKrtDYn95ilNKRZJ7kq8K36nk48vN5glD+cAGoCIoGyjXC7ctYzjkeu0TKwKU= X-Google-Smtp-Source: AGHT+IHIINglfbS2m/QmOZ8rWnN7gWOjRheG64W8rq9oepAfCZre/sufl9Wv4Rjm611rEoiK7Dutnw== X-Received: by 2002:a05:6a21:339f:b0:1af:96e7:2be6 with SMTP id adf61e73a8af0-1bae7b543camr14703140637.0.1718704525163; Tue, 18 Jun 2024 02:55:25 -0700 (PDT) Received: from [10.84.144.49] ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c4c45ab210sm10534022a91.4.2024.06.18.02.55.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Jun 2024 02:55:24 -0700 (PDT) Message-ID: <0317efd5-f7f8-4fd2-8892-befe9fe97f33@bytedance.com> Date: Tue, 18 Jun 2024 17:55:17 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/3] asynchronously scan and free empty user PTE pages Content-Language: en-US To: David Hildenbrand Cc: hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <02f8cbd0-8b2b-4c2d-ad96-f854d25bf3c2@redhat.com> <2cda0af6-8fde-4093-b615-7979744d6898@redhat.com> <86b29391-ad2a-4c4b-b9a8-974d1876632c@redhat.com> <025ea89a-bb94-4f60-b6ad-d8b88d3cfc60@bytedance.com> <4f56d1e9-2c23-42e5-9aef-6b29d072138e@redhat.com> From: Qi Zheng In-Reply-To: <4f56d1e9-2c23-42e5-9aef-6b29d072138e@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 27AF9140014 X-Stat-Signature: oansffyd476bf83j35qm84bxa9wtmxn6 X-HE-Tag: 1718704527-550799 X-HE-Meta: U2FsdGVkX190AouOt27l5GUUTSrVK85xxGKYwJOer2TMXFbjMnFuFeYnePgOB1lUkhwARQMH0SdLSkXbnlYctNduRJ4k1gDLx2AmSIKnbi4cedk+rbF72iOYo4hHh/qcHIOAUbnsbJ9BcWXlHFmlaGqqcz4Mnpcu17oVRR6wMwYXxnI9uIBHBWA0vahnUvTBwDp8PyxfddUUdwCPBlGojJEnnbSCFxqQ4EAwGHLPGs5DKcG5QockvYLMDvts8TZZx5avH2gpR0DseXRqUTsVTH2xNSnuL/xHadwV9+IHP4JyQzp9zS5n1fxuJ10rSKQvKK2/ICaOjaPoUR+mvPSd3EaYAvRewKUje2LyAqoOhQJmDreOZiZ4zWD4VfaexPXMMjve7RldygU6zukH4XRJkzfhbvt387zV8rZ86AEaUA/CD+K9kB9KXQ6suMfm8YDX3v5pegbE0GTtxIn/xulm+H8M6YqBVsDCU/VMVXlrCaucsu+GCa0sYTbmQWN4Ju1NWlXOHCr94DzmSQp6qfgvKtmks+Rhq4ZZbfUt79b08NmJlmoBw5HpqQnemk0o+NgkW1Oi/VmrdSw8C3jwt1vosVkGeDM7LijXHvr3QnGLwcHR9Vwe7VZ+BPZCErCjRJ3vGBO433qyfWRqOf2aTFaqajigCspk63LfXfZIsdKAa6U/wyeokIaz3oqotbKn9Yg4AwvGQRHMpCHkMnVkiyM7ssz7YUzpCmxLcSLidlLGMLMcUeTEo+ArRG4UYIWurPN3OpYoRrYMUr6MqUEk/zp07clVc2G2BYXwgV/EaxF82GZBw/1DBSLIGz5GyCpKLPyU6GfRIsiB6rKalwzuhoLjZa9kDa5sJKoXX6qzLyHf8CTl7nB8GXp1c91zJwi306RK7D8z5hrxE0jy1pvEaHKpTAQamioP1cjj6hwGvA8F7DCeeT+nF4NOYppzFLwVNJFNUzel8jdj8hGHKX4XlTR +Vl7H5vw zniAhJad/Ls0qKpItI8UpMfos1p5SCWcwVKvhOb7nMNUqZM2odAsdJAe8to61XchKb2YmuqRFBMRGo43cXIxGof2QjURgUjUM8rDrI3dLAIBP8P+yIpheJ7bRN8lNow/APJwN18xRGTdOmqh6Qx+3XM90hZpvPuEqZLBR3Z6FZkUNsDXKMky53VQdLd5Dq25N76PdYrAG7Pcq3t0NnGQ/P8iFtTXXCDrBbnY2Vyy9SuO3QhPDGzxlYQT+tAgf+ZYaSuWqFaEGqWC7dR2/OAOLEvTT1mnZwvbt7mZq+ehnRIq79tvgymSNRUIxg5/zRdtcVf6AXPTMxbzOku5EN+kojA1Hsw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/18 17:40, David Hildenbrand wrote: > On 18.06.24 09:51, Qi Zheng wrote: >> Hi David, >> >> On 2024/6/18 01:49, David Hildenbrand wrote: >>> >>>>> >>>>> No strong opinion, something synchronous sounds to me like the >>>>> low-hanging fruit, that could add the infrastructure to be used by >>>>> something more advanced/synchronously :) >>>> >>>> Got it, I will try to do the following in the next version. >>>> >>>> a. for MADV_DONTNEED case, try synchronous reclaim as you said >>>> >>> >>> I think that really is the low hanging fruit that would cover quite some >>> cases already: (1) reclaim when MADV_DONTNEED spans the complete page >>> table. >> >> I will check and free the PTE page in the zap_pte_range() if the >> (end - addr >= PMD_SIZE) condition is met. >> >>> >>> Then, there is (2) reclaim when MADV_DONTNEED spans only part of the >>> page table (e.g., single PTE), but my best guess is that it's better to >>> scan for that asynchronously than making possibly each MADV_DONTNEED >>> sycall invocation slower. >> >> Maybe just mark the vma, and then scan it in the system reclaim path. >> >> I also plan to do this in the MADV_FREE case, instead of adding an >> asynchronous madvise option first. >> >>> >>> (1) would already help a lot and showcase how the locking/machinery >>> would work. >>> >>> >>>> b. for MADV_FREE case: >>>> >>>>      - add a madvise option for synchronous reclaim >>>> >>>>      - add another madvise option to mark the vma, then add its >>>>              corresponding mm to a global list, and then traverse >>>>              the list and reclaim it when the memory is tight and >>>>              enters the system reclaim path. >>>>              (maybe there is an option to unmark) >>>> >>>> c. for s390 case you mentioned, create a CONFIG_FREE_PT first, and >>>>       then s390 will not select this config until the problem is >>>> solved. >>>> >>>> d. for lockless scan, try using disabling IRQ or (mmap read lock + >>>> pte_offset_map_nolock). >>> >>> Although d) really only is desired when scanning asynchronously I think. >>> During (1) above, we know that the table will be very likely empty >>> (unless weird race). >> >> Agree. > > Again, thanks for working on this. Let me know (can also do privately) > if you run into any issues or think I can be of help. :) That's great, thank you very much! >