From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A40FBC433F5 for ; Wed, 10 Nov 2021 17:37:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 362DC611C9 for ; Wed, 10 Nov 2021 17:37:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 362DC611C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9B5AB6B006C; Wed, 10 Nov 2021 12:37:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93C836B0071; Wed, 10 Nov 2021 12:37:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78D7C6B0072; Wed, 10 Nov 2021 12:37:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id 678E46B006C for ; Wed, 10 Nov 2021 12:37:54 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 20B5818518EF5 for ; Wed, 10 Nov 2021 17:37:54 +0000 (UTC) X-FDA: 78793728468.09.296BBAE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 312FFB000199 for ; Wed, 10 Nov 2021 17:37:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636565873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V72fuJ24xTgOW+zYH7y5UyBi5jfxtvQ+nMh5kovHOGg=; b=RJvFoxuiVHWZlB7co6QQ9ynIf6ObFU1ljcv03rr+pjGNpYcED3LaFz4Yc/vp/dosONp51X lMhapWuwZMv0uAO3GM9G1kQCX7nGI6r5ZMa+VeenOC7+KS51aopxlVS4CqK3eeETs0Zn3S +On8sMZQtuV91TJCjZnKsrs3uFR+j8k= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-82-_yCdvNiLO5SOgBHhQZhKSg-1; Wed, 10 Nov 2021 12:37:52 -0500 X-MC-Unique: _yCdvNiLO5SOgBHhQZhKSg-1 Received: by mail-wr1-f70.google.com with SMTP id f3-20020a5d50c3000000b00183ce1379feso585152wrt.5 for ; Wed, 10 Nov 2021 09:37:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=V72fuJ24xTgOW+zYH7y5UyBi5jfxtvQ+nMh5kovHOGg=; b=CLIKPSlhXdxFq0nF04Eb4DotpvgbGHYLeeurlopXQ+YvleMqE4y7v98POP4/LQ6ckQ eSS4YbuN01JsSXS3ORXTHbTqeLgOiDoJHvow+tq7/XKyH2K/GASK5RWL7r7fUnHLh8t8 I3QRidgPooMyhqh0DNwC7x9tfmeYHUWQelXDfRrOqnpEgfiVaDEweQEXEm4l9LsWWsqr fDXueDOGghOhvkf+KTNwI2GcFNsxT9HbkdyvtE1ooV8Un8VuiFJijjf9AoShUS7c8PJK Pqr4gJm1KeWNSfwLdapuZk9NTKbayrI2ktNKi4z50ox2YM1Pzd/rxNeC3morVVu0tHbk fzuQ== X-Gm-Message-State: AOAM530U8fm14jPCQKb5/AM9f/q4q6jEe5SchL9IsziNwtD5O/q8PIx6 G5rBFBH1K7fALMyySz1W5r1J81IA5+mnmSFLofQSzBaMq0CCIOrBOWVvrfHuje7893nAvXliXU8 cuAeSDmahb7g= X-Received: by 2002:adf:e9c5:: with SMTP id l5mr902238wrn.218.1636565870780; Wed, 10 Nov 2021 09:37:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJxo6eV2FpNqCC7ApgXEc820M/YWOnI7TToNOh2YVDJTJPljm10278YLll4KDJ6oYmc3tgd92w== X-Received: by 2002:adf:e9c5:: with SMTP id l5mr902207wrn.218.1636565870524; Wed, 10 Nov 2021 09:37:50 -0800 (PST) Received: from [192.168.3.132] (p5b0c604f.dip0.t-ipconnect.de. [91.12.96.79]) by smtp.gmail.com with ESMTPSA id h3sm463426wrv.69.2021.11.10.09.37.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Nov 2021 09:37:47 -0800 (PST) Message-ID: <7c97d86f-57f4-f764-3e92-1660690a0f24@redhat.com> Date: Wed, 10 Nov 2021 18:37:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Jason Gunthorpe Cc: Qi Zheng , akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com References: <20211110105428.32458-1-zhengqi.arch@bytedance.com> <20211110125601.GQ1740502@nvidia.com> <8d0bc258-58ba-52c5-2e0d-a588489f2572@redhat.com> <20211110143859.GS1740502@nvidia.com> <6ac9cc0d-7dea-0e19-51b3-625ec6561ac7@redhat.com> <20211110163925.GX1740502@nvidia.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v3 00/15] Free user PTE page table pages In-Reply-To: <20211110163925.GX1740502@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RJvFoxui; spf=none (imf25.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 312FFB000199 X-Stat-Signature: c1tmbysqb6rabdm5yga4hm1yujautd4s X-HE-Tag: 1636565861-977064 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> It would still be a fairly coarse-grained locking, I am not sure if that >> is a step into the right direction. If you want to modify *some* page >> table in your process you have exclude each and every page table walker. >> Or did I mis-interpret what you were saying? > > That is one possible design, it favours fast walking and penalizes > mutation. We could also stick a lock in the PMD (instead of a > refcount) and still logically be using a lock instead of a refcount > scheme. Remember modify here is "want to change a table pointer into a > leaf pointer" so it isn't an every day activity.. It will be if we somewhat frequent when reclaim an empty PTE page table as soon as it turns empty. This not only happens when zapping, but also during writeback/swapping. So while writing back / swapping you might be left with empty page tables to reclaim. Of course, this is the current approach. Another approach that doesn't require additional refcounts is scanning page tables for empty ones and reclaiming them. This scanning can either be triggered manually from user space or automatically from the kernel. > > There is some advantage with this thinking because it harmonizes well > with the other stuff that wants to convert tables into leafs, but has > to deal with complicated locking. > > On the other hand, refcounts are a degenerate kind of rwsem and only > help with freeing pages. It also puts more atomics in normal fast > paths since we are refcounting each PTE, not read locking the PMD. > > Perhaps the ideal thing would be to stick a rwsem in the PMD. read > means a table cannot be come a leaf. I don't know if there is space > for another atomic in the PMD level, and we'd have to use a hitching > post/hashed waitq scheme too since there surely isn't room for a waitq > too.. > > I wouldn't be so quick to say one is better than the other, but at > least let's have thought about a locking solution before merging > refcounts :) Yes, absolutely. I can see the beauty in the current approach, because it just reclaims "automatically" once possible -- page table empty and nobody is walking it. The downside is that it doesn't always make sense to reclaim an empty page table immediately once it turns empty. Also, it adds complexity for something that is only a problem in some corner cases -- sparse memory mappings, especially relevant for some memory allocators after freeing a lot of memory or running VMs with memory ballooning after inflating the balloon. Some of these use cases might be good with just triggering page table reclaim manually from user space. -- Thanks, David / dhildenb