From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E9FDC2BA15 for ; Tue, 18 Jun 2024 07:52:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4B616B00AE; Tue, 18 Jun 2024 03:52:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFA186B00AF; Tue, 18 Jun 2024 03:52:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC1FB6B00B2; Tue, 18 Jun 2024 03:52:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AC3BB6B00AE for ; Tue, 18 Jun 2024 03:52:02 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6132C80534 for ; Tue, 18 Jun 2024 07:52:02 +0000 (UTC) X-FDA: 82243240884.24.E8F94A2 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf14.hostedemail.com (Postfix) with ESMTP id AEC8F100010 for ; Tue, 18 Jun 2024 07:51:58 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Qqxguyaj; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718697115; a=rsa-sha256; cv=none; b=BZcS7dg38NMdWP//tdQoL5RI/f8HrKVHCQMT/4ADEevIehYpCUv5CRxiuwhLD031MvLe0F kU/G5frWABW9nhsxqFj0igoV37DVlWtw4AIOjDMJK58bVrJCYdlz2PFKSX0+okL6ovHSNE 80M7frxPBdevygHOHqtPi1X+mOkZKc8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Qqxguyaj; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718697115; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=76HALR93jYK/Zg0pajod2zJyiG6SBoSONeMtyfEqbMs=; b=cemyh4a8Z1KYsulVnEGaCxfc/+wLM4gJR+QY8+LqW7X9sn5gwG1IuHwJfwEp+tcxC+TOMj wAf5XIuNQUGE8o74+NKnKkIctEZwFghGdbLZPb827/KlQDqhkZOIPk6/m215m1vByRVZQQ xSkuVJN9htxvF0JIBBjfTL1GbDMYqPQ= Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-6e85807d306so355114a12.3 for ; Tue, 18 Jun 2024 00:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1718697117; x=1719301917; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=76HALR93jYK/Zg0pajod2zJyiG6SBoSONeMtyfEqbMs=; b=Qqxguyajc1IRpUkc1yZSWZV3tuScd2sqgMipdJBAYsHNjobIe7A1BbuNKNUx1UZH1+ FRxsO15uGaa2JJ12LSohxRqbeIuM7M7PmI7eUyVR6KK1XIXUIwIP9Fj99vNkrHEj7Qc8 gEXvhCKZwfo5fmuoPD5QnDll1/L7bzqIQMA2stUW/sksHTRN5fc0PSng87Q5xZez9px5 mkVs6elGZNOr4ACeBNDXNWM9BWzOBuW5TqKLYt+e1gIqNerM49ZaPVy6Lg4+S3aYqXws sF2SfvkrWXRSOnmkHVNQ/N3k+fP+2bEU3ANuZ6bxGQqVwKwR8VufyoSOdcwS7iZnbNYG RCZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718697117; x=1719301917; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=76HALR93jYK/Zg0pajod2zJyiG6SBoSONeMtyfEqbMs=; b=bzcP0o6xma360pY+olsfwZ4Bnr7Xc9BWvwEWbDhDdlQVsSz440TEJLEX4tOY7/Fqbm z0eRphfwDkvndqpDsZRsls+kqO0UmX12geI7VJIz/AjSEd2leAzn48Klo4Jrgy4li2Ci HjPNBuRD0fQ9T8HHsUE57VssLLvov+HJWNzViFfHJeW3JPFXmMZ+lTU/66+nrK/3P4PT peLMzpvXNHtUMTw3DGl0L0o2VNuTBtPnD0njs2iIjPJivaoooO/bMNjTzVfnh6lYwrqo Z7L9QCwD49LF/V7wcmPjR8gWnYxLK1/vGo3vCBcQSo/bQIfLopzWtlOcNYAuv6KoEwnF QsQg== X-Forwarded-Encrypted: i=1; AJvYcCVVwT2yW1s4mgS5FafmBMjRigKNPyg6NcuW4d2wr3tw1gURk8xnq5Rsj2iVlGXxbHYJB1mqBG3xsbp7Sk8SERn2XG8= X-Gm-Message-State: AOJu0YzrA5gQnVzg6OE7ecNC3U8HTRltsn5L9Xe0OlzuK9rDl8Sdc1vk LkrdIz6EuD1WDmyqC+dTfMM61LVqJbFfo5APKwDcRH4UxObCO+iZf+ddi8AkVmo= X-Google-Smtp-Source: AGHT+IEoKrICETtXKPz5g2VWzCsFEiGGFStEZKqestgQsuAKki+NR8fDw0zQ/kujUUyCVexn0Tunqw== X-Received: by 2002:a05:6a21:6da3:b0:1af:d9a3:f3a3 with SMTP id adf61e73a8af0-1bae8253a95mr12479218637.4.1718697117205; Tue, 18 Jun 2024 00:51:57 -0700 (PDT) Received: from [10.84.144.49] ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-705ccb8d9a1sm8434984b3a.186.2024.06.18.00.51.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Jun 2024 00:51:56 -0700 (PDT) Message-ID: <025ea89a-bb94-4f60-b6ad-d8b88d3cfc60@bytedance.com> Date: Tue, 18 Jun 2024 15:51:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/3] asynchronously scan and free empty user PTE pages Content-Language: en-US To: David Hildenbrand Cc: hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <02f8cbd0-8b2b-4c2d-ad96-f854d25bf3c2@redhat.com> <2cda0af6-8fde-4093-b615-7979744d6898@redhat.com> <86b29391-ad2a-4c4b-b9a8-974d1876632c@redhat.com> From: Qi Zheng In-Reply-To: <86b29391-ad2a-4c4b-b9a8-974d1876632c@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AEC8F100010 X-Stat-Signature: ix64nrt3o1qsfrrr3znccxq5qet1jh5i X-Rspam-User: X-HE-Tag: 1718697118-245219 X-HE-Meta: U2FsdGVkX1/drXPTdf1Qjs2krzP6yoXwKxJ2ROope9eg6ppwL0o2Yp0o74MwbPfI7VnZgf1SW8aiSFw67SrQg1SQFx0x7jB1HAgwoETJIC3urkgaOG6lwii3CHehLiFK7M2zibZXpPJ9hYX4qQOLy0HQvQRDYlPnBDqgjY8saiRs7riPPeB6CD/0RL2DP/rrtccuu4+qRsEnQUpsNK7PzX4I3DMxPjKfY/F2h7UzRQPHNH5V9HWmlQmkRuRclYUw8oPDDTClpAFSrDSeKDnZrR+jKgRl79+ixNKim+m1gKDdEGAmStpLPHmCw76mwOwUl18TMKu2KtwA6+rFOwt2UTgZEiJuLqJDcLdwNRHHGVwFDQEfd5cQz7pSIf5aPfyE1zOaipOo9QVFbeAg1qeJpLsoXISad+GVjN1IpjSQmnR7pxlaLds9aJdLBriXcV6FC/DCEI7L36KQ/FNAUdqanjUPysiJxBGXTnbuPBjocP5v3iOerU+9ETUezXsx+y0X344VN4PbftJ1ZnAwrTEKB4zAV8D39An1tTzQgaaXhIQZkInGE8pfAqaMWkDaEcW+h+46dPKkMVnbyPpCOxp9z0B8R5x/DrMavWbAGCcRyqLkt+IV4+4RfpiCWow4THDx7MyA7v83PxGFoDjsjy4JlASy4+rFv6ojlDbJWvdBja1F/re8Am5f9xenjDHsEHoPt/EPXbW9rifE//dhYeN4iU1BC7p/3hBgqd3/1+Yzkitpq5L18s9QjlJEesR/4lt24lih2UUzYrc/yf0b5DuKDAc3MGyfl8y/Paladv2UhnRAw+nGpfHdt3qDCM9IqAXCQIROolt/y9MNFXlK5m4hSQvDmmjXfVL/RJeX3fYFM3iGYSrDKaUdHuoo1UZsXFJGoJcCDzD1vXkILfDAitTxSSz+8MYP7KnYPWuThoN2MJpeP0jd9zriaXFquaUFPLRzi5pOeVDkENwndF+Z/5n 1T/VLAvJ JzpVuzcyoCx0TzBnmXUAKbL0FYPiwdwyXExD/Z2k4cmhJdt9302YRc24Z0cH2nCMugCKD1hMWyznkaGFoo+TzSgQZZlZyGJV4Zd6637D60LnLHmGvM+CnZdjjh+T1QOnO/sL1hr6l+fH2F0DgbhKrgeNpzmMf2y3tzFVHH4Zc/4xLjOu/qfulPCSb6zoaDUSBnsCfiWoJZ1TDcDD5WyE8X4LzG0aAzlv08QTVCMitGNo61KAUWLCBtx46kzxjr81jINTaV9Papv95+fU3q3GgXugSt3OJnz//DR9w/Sv6xYObnTkF9TjZwiRgCmDY9TD/K8HEdxvc/gHrwlAoHLP3YURXYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On 2024/6/18 01:49, David Hildenbrand wrote: > >>> >>> No strong opinion, something synchronous sounds to me like the >>> low-hanging fruit, that could add the infrastructure to be used by >>> something more advanced/synchronously :) >> >> Got it, I will try to do the following in the next version. >> >> a. for MADV_DONTNEED case, try synchronous reclaim as you said >> > > I think that really is the low hanging fruit that would cover quite some > cases already: (1) reclaim when MADV_DONTNEED spans the complete page > table. I will check and free the PTE page in the zap_pte_range() if the (end - addr >= PMD_SIZE) condition is met. > > Then, there is (2) reclaim when MADV_DONTNEED spans only part of the > page table (e.g., single PTE), but my best guess is that it's better to > scan for that asynchronously than making possibly each MADV_DONTNEED > sycall invocation slower. Maybe just mark the vma, and then scan it in the system reclaim path. I also plan to do this in the MADV_FREE case, instead of adding an asynchronous madvise option first. > > (1) would already help a lot and showcase how the locking/machinery > would work. > > >> b. for MADV_FREE case: >> >>     - add a madvise option for synchronous reclaim >> >>     - add another madvise option to mark the vma, then add its >>             corresponding mm to a global list, and then traverse >>             the list and reclaim it when the memory is tight and >>             enters the system reclaim path. >>             (maybe there is an option to unmark) >> >> c. for s390 case you mentioned, create a CONFIG_FREE_PT first, and >>      then s390 will not select this config until the problem is solved. >> >> d. for lockless scan, try using disabling IRQ or (mmap read lock + >> pte_offset_map_nolock). > > Although d) really only is desired when scanning asynchronously I think. > During (1) above, we know that the table will be very likely empty > (unless weird race). Agree. Thanks, Qi >