From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DB04C4345F for ; Thu, 25 Apr 2024 15:01:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B1A36B0093; Thu, 25 Apr 2024 11:01:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9617D6B0095; Thu, 25 Apr 2024 11:01:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 827C16B009A; Thu, 25 Apr 2024 11:01:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 630636B0093 for ; Thu, 25 Apr 2024 11:01:22 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C19DFA0B32 for ; Thu, 25 Apr 2024 15:01:21 +0000 (UTC) X-FDA: 82048367562.14.06B695A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 2F2CD18001B for ; Thu, 25 Apr 2024 15:01:18 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UvR8W3Y9; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714057278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=13gj/ScwEAKZZpJrstk6FqtULQDeV2qu42E2T5YpPVo=; b=FJO6I+qxTp8ppv/+R1nJh2LGKeqUFEy3gvuqN9CGX2lvaqaaK0uBCuB8IcF/+aMGV6myP8 Q1U77va4qsd0dO6Cl15ZQ3GjXEDLuJbSyFvOZ305TzdWM+plRwvY2cxcYmIaUJHml2PI+I XeYbxx4DImf31MOQhn081eOW6M/+LVA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714057278; a=rsa-sha256; cv=none; b=uhmcnwcgk0pL+uFVEEnQ/fCIi/rH/qJPfjNXm6h8KQUwAvAn1TFby/I6McfuzePrkkj5Ee Fgw0BF6593CTcH+oee2xPY/cStqYfbjM8K8KR8vYaGs4pc7+Lw8Wmu+uIvK0z6p6nYLgmh 3/hz90SpRjs5fBWwFBwTt1jZki4lNfc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UvR8W3Y9; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714057277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=13gj/ScwEAKZZpJrstk6FqtULQDeV2qu42E2T5YpPVo=; b=UvR8W3Y9qoZq9KdwxP2CpFEK1ArDXoaAQCPvaCGSgSg//3yjdF6jrccRmWDfufc8X9IxMH dwm4XuL8o1r4nPI/zMk0I/DegMnnsrjhO1tHxcG5HDwX6eVqCgI3bILrsu8RAR6bIQW2Zg Wnu77SGc3euBjIHWwVySLg0oArg5DWI= Received: from mail-ot1-f69.google.com (mail-ot1-f69.google.com [209.85.210.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-45-thozMeurP7a7dMaSxpkdgA-1; Thu, 25 Apr 2024 11:01:15 -0400 X-MC-Unique: thozMeurP7a7dMaSxpkdgA-1 Received: by mail-ot1-f69.google.com with SMTP id 46e09a7af769-6eba7dc8f1eso217887a34.3 for ; Thu, 25 Apr 2024 08:01:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714057274; x=1714662074; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=13gj/ScwEAKZZpJrstk6FqtULQDeV2qu42E2T5YpPVo=; b=YuqPB18e6SI5+32g1VdhjypIg0quC2d+RxPx/J0ujxZVUe/J6HBrA4dEviRgYvdgVN /DGd8A4VVVzUUHs57/CYecyBDcjsM15suGvewDNBw9rY2C3vZoa7lA+xSXqGjUjAuhlt hhquN2RgqoEo6PwLghcwQ3XoSPN26LTxW0i3/d+l6DuALAIyXl1GbemMadcAwdMCUXwR ITqEbsd2RCg7pz/3sjVpaL9oGMw566YV4ztQnEEnZPJ0ydwBiWtc06F51CTjqeEz+Hbf zzU4zd4Yb1mIM2aAEgMLQ04VI+kmsZxDMcSsEa8U/oDh48e6EvKNa8YDeK/B2uGm6iwL T8uQ== X-Forwarded-Encrypted: i=1; AJvYcCWsnOacB2+n8Fhtfx/D3pWc8yqcG9MuC+aPFN+Rs6tce4y4cn0fGyrNLUR067cU3DJGt/4bqq/WgGpLhIHz2bglA1M= X-Gm-Message-State: AOJu0YyGVyXeyAEgs4Yc5/g6gVzmuDitWSWeA4x4/EsfgLfp4Bh21Qhs ZILTvYHRhO5KYqJ1xm31ZpNQuKumaEmpUlQZqMNVozz/c+n9dK118x1colYJGtQsV+K2h9M5x8u HP/7eDSeT5Dzepa0YV3xJMyNQjG85Cj2E75LHNfra7i1A6O3+htouiL67 X-Received: by 2002:a05:6808:3081:b0:3c8:4d54:94d with SMTP id bl1-20020a056808308100b003c84d54094dmr3044308oib.4.1714057271918; Thu, 25 Apr 2024 08:01:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGkJQFdN5lZ0afawYlqnMhgYcJLliM8k2pjwYSR/WaK5BiYRRjTliFBrrcay+qo6ci3hKoc+A== X-Received: by 2002:a05:6808:3081:b0:3c8:4d54:94d with SMTP id bl1-20020a056808308100b003c84d54094dmr3041844oib.4.1714057241489; Thu, 25 Apr 2024 08:00:41 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id l16-20020ac84590000000b00439da286b2asm2953493qtn.97.2024.04.25.08.00.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 08:00:41 -0700 (PDT) Date: Thu, 25 Apr 2024 11:00:39 -0400 From: Peter Xu To: Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org, Alex Williamson Subject: Re: [PATCH 1/5] mm: Free non-hugetlb large folios in a batch Message-ID: References: <20240405153228.2563754-1-willy@infradead.org> <20240405153228.2563754-2-willy@infradead.org> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2F2CD18001B X-Rspam-User: X-Stat-Signature: ppncypc5f7a749gngucx4s5uf6xfnicp X-HE-Tag: 1714057278-479370 X-HE-Meta: U2FsdGVkX18he+ljxu2fIdyYCnrFu6zyfdNQHHmIQIjo5Nm0F2erKcqNkRaDtH/Kq0qvtZ9TQDEP1fyy2+IhJEWQXsbx32RvJ8DeykCuQC4+KGHxbr3F7g4+cfHwsVQzSEUVLA7YgjRnNDS8oNWWVbG82Wt9tH83Wh7glapQHwHtBzBKrNGjqGgTuc9pxrVg9+e6cKMts9UZ7yxEUmANxIpIZkl1+hCHEPjWRqW2a/sjYS76yMYr5nnJrjGOgDGQVHLG9opIuuTp3JDgUxfVegBeso7K4UOIb+BEjVPbTlowMrSoAKhVRjXjX7srKBuHwhF4BKfsw9D4divEnrrpGu+62k3I7eR8OCpf0KtHg2hWmrNdIMD54Ws7TCt41tUZ299JMOoTfrDsLy0x4Ux9ogoHvBEqcz4INJyU121Cq15S44d2WnJZ7gs/J/C1g7yE458aX3BO/zypM+X8n0DrPmfWeJf26CrnthEeivN3bEklVqS6hmkmTXymo3lNFLPBWG51aW/Iyw/D4C9BZ4INuIjJBIbt/3gNO9RL1niwZQ3rlOIVFS6CF2BCW7VeKrAfGXrMkqAArR9RSsKbHwPcggPtCAgY+xGWkIoyP6yKtoX8Vf8u9+GhO26+IZw+9pAV2sNsMuxHxq2EmrNXCw1gZvIqB3f+lnsmNFhTStXJwQM5VijlsakLRNsTLaqpkXVEoJuH0S7NBkqs033VP3933MVEFZVGH3klHHTzT8rvgmfNzAKkIjNg0DaElf84G/Yd9ciiboFCVs0yMPXiJJJA0NTek09ec16NktmT4cRHinHO2gcSfrGXLbH8r6zz9s9yWak5rGh3lT7/5JPXV3XtbhLJpUgxY6sYmoN6nrmQhP+GyBR0g7wWD2VmoI8BiJS2I07gqZVGre9H9uHkwfnk7V9+OkSugkr9j3Jr5uSgvrwyi1CJrNFeApUcSsKEUlD+3MuVhKJZNmk6qn+KQrX ww53HWVj 2DL2cAefLTc4Pm1yq6smHTmxufKwbfTYjGWx/xnzs6lPjlonZcjlli7jtRzo9bbuCveMrYA8eAt/Wj1JuvJ/ODiQHXnb/3Pb6RWoZMF9gJiWCPJe35cBksnGdGHaXgMGQ7USdZWFBUh3EqRVi2DoCKSaBEWp/4TJl7ZhsXfbc/Y+azPCYEDjPsoT1tdBdkWwvvLVc4Nm4tfUrv0pTjGPU+LbQhK5+ffMv/Sj84g/f+UqRzWXiVIevSkBvwrYsxK8SV84wy9GxJUjqFdxOcqIEnIgYPF9ddgH/8165HozA83l7EestPqmS5JJiMNwy7CY+1x9P2opZE+C6q6kcijk9EYACen9ZoCAzIiHKn4denlZShFG8/VawtB2uVyu9C83CywaoLfcEMgNoU/HEa3mkTCssCSX5DAJMMkrtT1PTprKf3hA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 25, 2024 at 04:39:14AM +0100, Matthew Wilcox wrote: > On Wed, Apr 24, 2024 at 11:20:28AM -0400, Peter Xu wrote: > > On Fri, Apr 05, 2024 at 04:32:23PM +0100, Matthew Wilcox (Oracle) wrote: > > > free_unref_folios() can now handle non-hugetlb large folios, so keep > > > normal large folios in the batch. hugetlb folios still need to be > > > handled specially. I believe that folios freed using put_pages_list() > > > cannot be accounted to a memcg (or the small folios would trip the "page > > > still charged to cgroup" warning), but put an assertion in to check that. > > > > There's such user, iommu uses put_pages_list() to free IOMMU pgtables, and > > they can be memcg accounted; since 2023 iommu_map switched to use > > GFP_KERNEL_ACCOUNT. > > > > I hit below panic when testing my local branch over mm-everthing when > > running some VFIO workloads. > > > > For this specific vfio use case, see 160912fc3d4a ("vfio/type1: account > > iommu allocations"). > > > > I think we should remove the VM_BUG_ON_FOLIO() line, as the memcg will then > > be properly taken care of later in free_pages_prepare(). Fixup attached at > > the end that will fix this crash for me. > > Yes, I think you're right. > > I was concerned about the deferred split list / memcg charge problem, > but (a) page table pages can't ever be on the deferred split list, (b) > just passing them through to free_unref_folios() works fine. The problem > was that folios_put_refs() was uncharging a batch before passing them > to free_unref_folios(). > > That does bring up the question though ... should we be uncharging > these folios as a batch for better performance? Do you have a workload > which frees a lot of page tables? Presumably an exit would do that. > If so, does adding a call to mem_cgroup_uncharge_folios() before calling > free_unref_folios() improve performance in any noticable way? Looks like something worth trying indeed. The trace I hit was an exit path, but we can double check whether it can even happen in some iommu hot paths too like unmap, so maybe such change would justify better in that case? AFAIU based on my reading to the current iommu pgtable mgmt it's more aggresive than cpu pgtable on freeing pgtable pages, so it looks like such batched release can happen during an iommu unmap too rather than exit only: intel_iommu_unmap domain_unmap dma_pte_clear_level dma_pte_list_pagetables But still worth checking in a test, perhaps the easiest way is to use ioctl(VFIO_IOMMU_[UN]MAP_DMA). > > In the meantime, this patch: > > Reviewed-by: Matthew Wilcox (Oracle) > > although I think Andrew will just fold it into > "mm: free non-hugetlb large folios in a batch" > > Andrew, if you do do that, please also edit out the last couple of > sentences from the commit message: > > free_unref_folios() can now handle non-hugetlb large folios, so keep > normal large folios in the batch. hugetlb folios still need to be handled > - specially. I believe that folios freed using put_pages_list() cannot be > - accounted to a memcg (or the small folios would trip the "page still > - charged to cgroup" warning), but put an assertion in to check that. > + specially. Yes, we should drop these lines too. Thanks, -- Peter Xu