From: "Liu, Jingqi" <jingqi.liu@intel.com>
To: Pasha Tatashin <pasha.tatashin@soleen.com>,
<akpm@linux-foundation.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <rientjes@google.com>,
<dwmw2@infradead.org>, <baolu.lu@linux.intel.com>,
<joro@8bytes.org>, <will@kernel.org>, <robin.murphy@arm.com>,
<iommu@lists.linux.dev>
Subject: Re: [RFC 1/3] iommu/intel: Use page->refcount to count number of entries in IOMMU
Date: Mon, 25 Dec 2023 10:59:16 +0800 [thread overview]
Message-ID: <a77781a4-2d3c-403a-88a6-57242c7f0246@intel.com> (raw)
In-Reply-To: <20231221031915.619337-2-pasha.tatashin@soleen.com>
On 12/21/2023 11:19 AM, Pasha Tatashin wrote:
> In order to be able to efficiently free empty page table levels, count the
> number of entries in each page table my incremeanting and decremeanting
s/incremeanting/incrementing
s/decremeanting/decrementing
> refcount every time a PTE is inserted or removed form the page table.
s/form/from
>
> For this to work correctly, add two helper function:
> dma_clear_pte and dma_set_pte where counting is performed,
>
> Also, modify the code so every page table entry is always updated using the
> two new functions.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> drivers/iommu/intel/iommu.c | 40 +++++++++++++++++++++---------------
> drivers/iommu/intel/iommu.h | 41 +++++++++++++++++++++++++++++++------
> 2 files changed, 58 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 897159dba47d..4688ef797161 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -949,7 +949,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
> if (domain->use_first_level)
> pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US | DMA_FL_PTE_ACCESS;
>
> - if (cmpxchg64(&pte->val, 0ULL, pteval))
> + if (dma_set_pte(pte, pteval))
> /* Someone else set it while we were thinking; use theirs. */
> free_pgtable_page(tmp_page);
> else
> @@ -1021,7 +1021,8 @@ static void dma_pte_clear_range(struct dmar_domain *domain,
> continue;
> }
> do {
> - dma_clear_pte(pte);
> + if (dma_pte_present(pte))
> + dma_clear_pte(pte);
> start_pfn += lvl_to_nr_pages(large_page);
> pte++;
> } while (start_pfn <= last_pfn && !first_pte_in_page(pte));
> @@ -1062,7 +1063,8 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level,
> */
> if (level < retain_level && !(start_pfn > level_pfn ||
> last_pfn < level_pfn + level_size(level) - 1)) {
> - dma_clear_pte(pte);
> + if (dma_pte_present(pte))
> + dma_clear_pte(pte);
> domain_flush_cache(domain, pte, sizeof(*pte));
> free_pgtable_page(level_pte);
> }
> @@ -1093,12 +1095,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain,
> }
> }
>
> -/* When a page at a given level is being unlinked from its parent, we don't
> - need to *modify* it at all. All we need to do is make a list of all the
> - pages which can be freed just as soon as we've flushed the IOTLB and we
> - know the hardware page-walk will no longer touch them.
> - The 'pte' argument is the *parent* PTE, pointing to the page that is to
> - be freed. */
> +/*
> + * A given page at a given level is being unlinked from its parent.
> + * We need to make a list of all the pages which can be freed just as soon as
> + * we've flushed the IOTLB and we know the hardware page-walk will no longer
> + * touch them. The 'pte' argument is the *parent* PTE, pointing to the page
> + * that is to be freed.
> + */
> static void dma_pte_list_pagetables(struct dmar_domain *domain,
> int level, struct dma_pte *pte,
> struct list_head *freelist)
> @@ -1106,17 +1109,20 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain,
> struct page *pg;
>
> pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT);
> - list_add_tail(&pg->lru, freelist);
> -
> - if (level == 1)
> - return;
> -
> pte = page_address(pg);
> +
> do {
> - if (dma_pte_present(pte) && !dma_pte_superpage(pte))
> - dma_pte_list_pagetables(domain, level - 1, pte, freelist);
> + if (dma_pte_present(pte)) {
> + if (level > 1 && !dma_pte_superpage(pte)) {
> + dma_pte_list_pagetables(domain, level - 1, pte,
> + freelist);
> + }
> + dma_clear_pte(pte);
> + }
> pte++;
> } while (!first_pte_in_page(pte));
> +
> + list_add_tail(&pg->lru, freelist);
> }
>
How about calculating the page decrement when the pages in the freelist
are really freed in iommu_free_pgtbl_pages() ?
Thanks,
Jingqi
next prev parent reply other threads:[~2023-12-25 2:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 3:19 [RFC 0/3] iommu/intel: Free empty page tables on unmaps Pasha Tatashin
2023-12-21 3:19 ` [RFC 1/3] iommu/intel: Use page->refcount to count number of entries in IOMMU Pasha Tatashin
2023-12-25 2:59 ` Liu, Jingqi [this message]
2023-12-28 15:01 ` Pasha Tatashin
2023-12-21 3:19 ` [RFC 2/3] iommu/intel: synchronize page table map and unmap operations Pasha Tatashin
2023-12-21 3:19 ` [RFC 3/3] iommu/intel: free empty page tables on unmaps Pasha Tatashin
2023-12-21 4:16 ` [RFC 0/3] iommu/intel: Free " Matthew Wilcox
2023-12-21 5:13 ` Pasha Tatashin
2023-12-21 5:42 ` Pasha Tatashin
2023-12-21 14:06 ` Matthew Wilcox
2023-12-21 14:58 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a77781a4-2d3c-403a-88a6-57242c7f0246@intel.com \
--to=jingqi.liu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
--cc=rientjes@google.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox