* RFC: dev_pagemap reference counting
@ 2017-12-05 0:34 Christoph Hellwig
2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig
2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm, linux-mm
Hi Dan,
maybe I'm missing something, but it seems like we release the reference
to the previously found pgmap before passing it to get_dev_pagemap again.
Can you check if my findings make sense?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] mm: move get_dev_pagemap out of line
2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig
@ 2017-12-05 0:34 ` Christoph Hellwig
2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm, linux-mm
This is a pretty big function, which should be out of line in general,
and a no-op stub if CONFIG_ZONE_DEVICD? is not set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/memremap.h | 42 +++++-------------------------------------
kernel/memremap.c | 36 ++++++++++++++++++++++++++++++++++--
2 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 10d23c367048..f24e0c71d6a6 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -136,8 +136,8 @@ struct dev_pagemap {
#ifdef CONFIG_ZONE_DEVICE
void *devm_memremap_pages(struct device *dev, struct resource *res,
struct percpu_ref *ref, struct vmem_altmap *altmap);
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys);
-
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+ struct dev_pagemap *pgmap);
static inline bool is_zone_device_page(const struct page *page);
#else
static inline void *devm_memremap_pages(struct device *dev,
@@ -153,11 +153,12 @@ static inline void *devm_memremap_pages(struct device *dev,
return ERR_PTR(-ENXIO);
}
-static inline struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+ struct dev_pagemap *pgmap)
{
return NULL;
}
-#endif
+#endif /* CONFIG_ZONE_DEVICE */
#if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
static inline bool is_device_private_page(const struct page *page)
@@ -173,39 +174,6 @@ static inline bool is_device_public_page(const struct page *page)
}
#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-/**
- * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
- * @pfn: page frame number to lookup page_map
- * @pgmap: optional known pgmap that already has a reference
- *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
- */
-static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
- struct dev_pagemap *pgmap)
-{
- const struct resource *res = pgmap ? pgmap->res : NULL;
- resource_size_t phys = PFN_PHYS(pfn);
-
- /*
- * In the cached case we're already holding a live reference so
- * we can simply do a blind increment
- */
- if (res && phys >= res->start && phys <= res->end) {
- percpu_ref_get(pgmap->ref);
- return pgmap;
- }
-
- /* fall back to slow path lookup */
- rcu_read_lock();
- pgmap = find_dev_pagemap(phys);
- if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
- pgmap = NULL;
- rcu_read_unlock();
-
- return pgmap;
-}
-
static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
{
if (pgmap)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 403ab9cdb949..f0b54eca85b0 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -314,7 +314,7 @@ static void devm_memremap_pages_release(struct device *dev, void *data)
}
/* assumes rcu_read_lock() held at entry */
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
{
struct page_map *page_map;
@@ -500,8 +500,40 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
return pgmap ? pgmap->altmap : NULL;
}
-#endif /* CONFIG_ZONE_DEVICE */
+/**
+ * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
+ * @pfn: page frame number to lookup page_map
+ * @pgmap: optional known pgmap that already has a reference
+ *
+ * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
+ * same mapping.
+ */
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+ struct dev_pagemap *pgmap)
+{
+ const struct resource *res = pgmap ? pgmap->res : NULL;
+ resource_size_t phys = PFN_PHYS(pfn);
+
+ /*
+ * In the cached case we're already holding a live reference so
+ * we can simply do a blind increment
+ */
+ if (res && phys >= res->start && phys <= res->end) {
+ percpu_ref_get(pgmap->ref);
+ return pgmap;
+ }
+
+ /* fall back to slow path lookup */
+ rcu_read_lock();
+ pgmap = find_dev_pagemap(phys);
+ if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
+ pgmap = NULL;
+ rcu_read_unlock();
+
+ return pgmap;
+}
+#endif /* CONFIG_ZONE_DEVICE */
#if IS_ENABLED(CONFIG_DEVICE_PRIVATE) || IS_ENABLED(CONFIG_DEVICE_PUBLIC)
void put_zone_device_private_or_public_page(struct page *page)
--
2.14.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap
2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig
2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig
@ 2017-12-05 0:34 ` Christoph Hellwig
2017-12-06 2:43 ` Dan Williams
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm, linux-mm
Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a
reference to the pgmap they pass in, contrary to the comment in the function.
Change the calling convention so that get_dev_pagemap always consumes the
previous reference instead of doing this using an explicit earlier call to
put_dev_pagemap in the callers.
The callers will still need to put the final reference after finishing the
loop over the pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
kernel/memremap.c | 17 +++++++++--------
mm/gup.c | 7 +++++--
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index f0b54eca85b0..502fa107a585 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
* @pfn: page frame number to lookup page_map
* @pgmap: optional known pgmap that already has a reference
*
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
+ * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap
+ * is non-NULL but does not cover @pfn the reference to it while be released.
*/
struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap)
{
- const struct resource *res = pgmap ? pgmap->res : NULL;
resource_size_t phys = PFN_PHYS(pfn);
/*
- * In the cached case we're already holding a live reference so
- * we can simply do a blind increment
+ * In the cached case we're already holding a live reference.
*/
- if (res && phys >= res->start && phys <= res->end) {
- percpu_ref_get(pgmap->ref);
- return pgmap;
+ if (pgmap) {
+ const struct resource *res = pgmap ? pgmap->res : NULL;
+
+ if (res && phys >= res->start && phys <= res->end)
+ return pgmap;
+ put_dev_pagemap(pgmap);
}
/* fall back to slow path lookup */
diff --git a/mm/gup.c b/mm/gup.c
index d3fb60e5bfac..9d142eb9e2e9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
VM_BUG_ON_PAGE(compound_head(page) != head, page);
- put_dev_pagemap(pgmap);
SetPageReferenced(page);
pages[*nr] = page;
(*nr)++;
@@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
ret = 1;
pte_unmap:
+ if (pgmap)
+ put_dev_pagemap(pgmap);
pte_unmap(ptem);
return ret;
}
@@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
SetPageReferenced(page);
pages[*nr] = page;
get_page(page);
- put_dev_pagemap(pgmap);
(*nr)++;
pfn++;
} while (addr += PAGE_SIZE, addr != end);
+
+ if (pgmap)
+ put_dev_pagemap(pgmap);
return 1;
}
--
2.14.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap
2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
@ 2017-12-06 2:43 ` Dan Williams
2017-12-06 22:44 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2017-12-06 2:43 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nvdimm, Linux MM
On Mon, Dec 4, 2017 at 4:34 PM, Christoph Hellwig <hch@lst.de> wrote:
> Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a
> reference to the pgmap they pass in, contrary to the comment in the function.
>
> Change the calling convention so that get_dev_pagemap always consumes the
> previous reference instead of doing this using an explicit earlier call to
> put_dev_pagemap in the callers.
>
> The callers will still need to put the final reference after finishing the
> loop over the pages.
I don't think we need this change, but perhaps the reasoning should be
added to the code as a comment... details below.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> kernel/memremap.c | 17 +++++++++--------
> mm/gup.c | 7 +++++--
> 2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index f0b54eca85b0..502fa107a585 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
> * @pfn: page frame number to lookup page_map
> * @pgmap: optional known pgmap that already has a reference
> *
> - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
> - * same mapping.
> + * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap
> + * is non-NULL but does not cover @pfn the reference to it while be released.
> */
> struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
> struct dev_pagemap *pgmap)
> {
> - const struct resource *res = pgmap ? pgmap->res : NULL;
> resource_size_t phys = PFN_PHYS(pfn);
>
> /*
> - * In the cached case we're already holding a live reference so
> - * we can simply do a blind increment
> + * In the cached case we're already holding a live reference.
> */
> - if (res && phys >= res->start && phys <= res->end) {
> - percpu_ref_get(pgmap->ref);
> - return pgmap;
> + if (pgmap) {
> + const struct resource *res = pgmap ? pgmap->res : NULL;
> +
> + if (res && phys >= res->start && phys <= res->end)
> + return pgmap;
> + put_dev_pagemap(pgmap);
> }
>
> /* fall back to slow path lookup */
> diff --git a/mm/gup.c b/mm/gup.c
> index d3fb60e5bfac..9d142eb9e2e9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>
> VM_BUG_ON_PAGE(compound_head(page) != head, page);
>
> - put_dev_pagemap(pgmap);
> SetPageReferenced(page);
> pages[*nr] = page;
> (*nr)++;
> @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> ret = 1;
>
> pte_unmap:
> + if (pgmap)
> + put_dev_pagemap(pgmap);
> pte_unmap(ptem);
> return ret;
> }
> @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
> SetPageReferenced(page);
> pages[*nr] = page;
> get_page(page);
> - put_dev_pagemap(pgmap);
It's safe to do the put_dev_pagemap() here because the pgmap cannot be
released until the corresponding put_page() for that get_page() we
just did occurs. So we're only holding the pgmap reference long enough
to take individual page references.
We used to take and put individual pgmap references inside get_page()
/ put_page(), but that got simplified in this commit to just take and
put page reference at devm_memremap_pages() setup / teardown time:
71389703839e mm, zone_device: Replace {get, put}_zone_device_page()
with a single reference to fix pmem crash
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap
2017-12-06 2:43 ` Dan Williams
@ 2017-12-06 22:44 ` Christoph Hellwig
2017-12-06 22:52 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-12-06 22:44 UTC (permalink / raw)
To: Dan Williams; +Cc: Christoph Hellwig, linux-nvdimm, Linux MM
On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote:
> I don't think we need this change, but perhaps the reasoning should be
> added to the code as a comment... details below.
Hmm, looks like we are ok at least. But even if it's not a correctness
issue there is no good point in decrementing and incrementing the
reference count every time.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap
2017-12-06 22:44 ` Christoph Hellwig
@ 2017-12-06 22:52 ` Dan Williams
0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2017-12-06 22:52 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nvdimm, Linux MM
On Wed, Dec 6, 2017 at 2:44 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote:
>> I don't think we need this change, but perhaps the reasoning should be
>> added to the code as a comment... details below.
>
> Hmm, looks like we are ok at least. But even if it's not a correctness
> issue there is no good point in decrementing and incrementing the
> reference count every time.
True, we can take it once and drop it at the end when all the related
page references have been taken.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-12-06 22:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig
2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig
2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
2017-12-06 2:43 ` Dan Williams
2017-12-06 22:44 ` Christoph Hellwig
2017-12-06 22:52 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox