* RFC: dev_pagemap reference counting @ 2017-12-05 0:34 Christoph Hellwig 2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig 2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig 0 siblings, 2 replies; 6+ messages in thread From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw) To: dan.j.williams; +Cc: linux-nvdimm, linux-mm Hi Dan, maybe I'm missing something, but it seems like we release the reference to the previously found pgmap before passing it to get_dev_pagemap again. Can you check if my findings make sense? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] mm: move get_dev_pagemap out of line 2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig @ 2017-12-05 0:34 ` Christoph Hellwig 2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig 1 sibling, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw) To: dan.j.williams; +Cc: linux-nvdimm, linux-mm This is a pretty big function, which should be out of line in general, and a no-op stub if CONFIG_ZONE_DEVICD? is not set. Signed-off-by: Christoph Hellwig <hch@lst.de> --- include/linux/memremap.h | 42 +++++------------------------------------- kernel/memremap.c | 36 ++++++++++++++++++++++++++++++++++-- 2 files changed, 39 insertions(+), 39 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 10d23c367048..f24e0c71d6a6 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -136,8 +136,8 @@ struct dev_pagemap { #ifdef CONFIG_ZONE_DEVICE void *devm_memremap_pages(struct device *dev, struct resource *res, struct percpu_ref *ref, struct vmem_altmap *altmap); -struct dev_pagemap *find_dev_pagemap(resource_size_t phys); - +struct dev_pagemap *get_dev_pagemap(unsigned long pfn, + struct dev_pagemap *pgmap); static inline bool is_zone_device_page(const struct page *page); #else static inline void *devm_memremap_pages(struct device *dev, @@ -153,11 +153,12 @@ static inline void *devm_memremap_pages(struct device *dev, return ERR_PTR(-ENXIO); } -static inline struct dev_pagemap *find_dev_pagemap(resource_size_t phys) +static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, + struct dev_pagemap *pgmap) { return NULL; } -#endif +#endif /* CONFIG_ZONE_DEVICE */ #if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC) static inline bool is_device_private_page(const struct page *page) @@ -173,39 +174,6 @@ static inline bool is_device_public_page(const struct page *page) } #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */ -/** - * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn - * @pfn: page frame number to lookup page_map - * @pgmap: optional known pgmap that already has a reference - * - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the - * same mapping. - */ -static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, - struct dev_pagemap *pgmap) -{ - const struct resource *res = pgmap ? pgmap->res : NULL; - resource_size_t phys = PFN_PHYS(pfn); - - /* - * In the cached case we're already holding a live reference so - * we can simply do a blind increment - */ - if (res && phys >= res->start && phys <= res->end) { - percpu_ref_get(pgmap->ref); - return pgmap; - } - - /* fall back to slow path lookup */ - rcu_read_lock(); - pgmap = find_dev_pagemap(phys); - if (pgmap && !percpu_ref_tryget_live(pgmap->ref)) - pgmap = NULL; - rcu_read_unlock(); - - return pgmap; -} - static inline void put_dev_pagemap(struct dev_pagemap *pgmap) { if (pgmap) diff --git a/kernel/memremap.c b/kernel/memremap.c index 403ab9cdb949..f0b54eca85b0 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -314,7 +314,7 @@ static void devm_memremap_pages_release(struct device *dev, void *data) } /* assumes rcu_read_lock() held at entry */ -struct dev_pagemap *find_dev_pagemap(resource_size_t phys) +static struct dev_pagemap *find_dev_pagemap(resource_size_t phys) { struct page_map *page_map; @@ -500,8 +500,40 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start) return pgmap ? pgmap->altmap : NULL; } -#endif /* CONFIG_ZONE_DEVICE */ +/** + * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn + * @pfn: page frame number to lookup page_map + * @pgmap: optional known pgmap that already has a reference + * + * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the + * same mapping. + */ +struct dev_pagemap *get_dev_pagemap(unsigned long pfn, + struct dev_pagemap *pgmap) +{ + const struct resource *res = pgmap ? pgmap->res : NULL; + resource_size_t phys = PFN_PHYS(pfn); + + /* + * In the cached case we're already holding a live reference so + * we can simply do a blind increment + */ + if (res && phys >= res->start && phys <= res->end) { + percpu_ref_get(pgmap->ref); + return pgmap; + } + + /* fall back to slow path lookup */ + rcu_read_lock(); + pgmap = find_dev_pagemap(phys); + if (pgmap && !percpu_ref_tryget_live(pgmap->ref)) + pgmap = NULL; + rcu_read_unlock(); + + return pgmap; +} +#endif /* CONFIG_ZONE_DEVICE */ #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) || IS_ENABLED(CONFIG_DEVICE_PUBLIC) void put_zone_device_private_or_public_page(struct page *page) -- 2.14.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap 2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig 2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig @ 2017-12-05 0:34 ` Christoph Hellwig 2017-12-06 2:43 ` Dan Williams 1 sibling, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2017-12-05 0:34 UTC (permalink / raw) To: dan.j.williams; +Cc: linux-nvdimm, linux-mm Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a reference to the pgmap they pass in, contrary to the comment in the function. Change the calling convention so that get_dev_pagemap always consumes the previous reference instead of doing this using an explicit earlier call to put_dev_pagemap in the callers. The callers will still need to put the final reference after finishing the loop over the pages. Signed-off-by: Christoph Hellwig <hch@lst.de> --- kernel/memremap.c | 17 +++++++++-------- mm/gup.c | 7 +++++-- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index f0b54eca85b0..502fa107a585 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start) * @pfn: page frame number to lookup page_map * @pgmap: optional known pgmap that already has a reference * - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the - * same mapping. + * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap + * is non-NULL but does not cover @pfn the reference to it while be released. */ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap) { - const struct resource *res = pgmap ? pgmap->res : NULL; resource_size_t phys = PFN_PHYS(pfn); /* - * In the cached case we're already holding a live reference so - * we can simply do a blind increment + * In the cached case we're already holding a live reference. */ - if (res && phys >= res->start && phys <= res->end) { - percpu_ref_get(pgmap->ref); - return pgmap; + if (pgmap) { + const struct resource *res = pgmap ? pgmap->res : NULL; + + if (res && phys >= res->start && phys <= res->end) + return pgmap; + put_dev_pagemap(pgmap); } /* fall back to slow path lookup */ diff --git a/mm/gup.c b/mm/gup.c index d3fb60e5bfac..9d142eb9e2e9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); - put_dev_pagemap(pgmap); SetPageReferenced(page); pages[*nr] = page; (*nr)++; @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, ret = 1; pte_unmap: + if (pgmap) + put_dev_pagemap(pgmap); pte_unmap(ptem); return ret; } @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, SetPageReferenced(page); pages[*nr] = page; get_page(page); - put_dev_pagemap(pgmap); (*nr)++; pfn++; } while (addr += PAGE_SIZE, addr != end); + + if (pgmap) + put_dev_pagemap(pgmap); return 1; } -- 2.14.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap 2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig @ 2017-12-06 2:43 ` Dan Williams 2017-12-06 22:44 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Dan Williams @ 2017-12-06 2:43 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvdimm, Linux MM On Mon, Dec 4, 2017 at 4:34 PM, Christoph Hellwig <hch@lst.de> wrote: > Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a > reference to the pgmap they pass in, contrary to the comment in the function. > > Change the calling convention so that get_dev_pagemap always consumes the > previous reference instead of doing this using an explicit earlier call to > put_dev_pagemap in the callers. > > The callers will still need to put the final reference after finishing the > loop over the pages. I don't think we need this change, but perhaps the reasoning should be added to the code as a comment... details below. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > kernel/memremap.c | 17 +++++++++-------- > mm/gup.c | 7 +++++-- > 2 files changed, 14 insertions(+), 10 deletions(-) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index f0b54eca85b0..502fa107a585 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start) > * @pfn: page frame number to lookup page_map > * @pgmap: optional known pgmap that already has a reference > * > - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the > - * same mapping. > + * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap > + * is non-NULL but does not cover @pfn the reference to it while be released. > */ > struct dev_pagemap *get_dev_pagemap(unsigned long pfn, > struct dev_pagemap *pgmap) > { > - const struct resource *res = pgmap ? pgmap->res : NULL; > resource_size_t phys = PFN_PHYS(pfn); > > /* > - * In the cached case we're already holding a live reference so > - * we can simply do a blind increment > + * In the cached case we're already holding a live reference. > */ > - if (res && phys >= res->start && phys <= res->end) { > - percpu_ref_get(pgmap->ref); > - return pgmap; > + if (pgmap) { > + const struct resource *res = pgmap ? pgmap->res : NULL; > + > + if (res && phys >= res->start && phys <= res->end) > + return pgmap; > + put_dev_pagemap(pgmap); > } > > /* fall back to slow path lookup */ > diff --git a/mm/gup.c b/mm/gup.c > index d3fb60e5bfac..9d142eb9e2e9 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > VM_BUG_ON_PAGE(compound_head(page) != head, page); > > - put_dev_pagemap(pgmap); > SetPageReferenced(page); > pages[*nr] = page; > (*nr)++; > @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > ret = 1; > > pte_unmap: > + if (pgmap) > + put_dev_pagemap(pgmap); > pte_unmap(ptem); > return ret; > } > @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, > SetPageReferenced(page); > pages[*nr] = page; > get_page(page); > - put_dev_pagemap(pgmap); It's safe to do the put_dev_pagemap() here because the pgmap cannot be released until the corresponding put_page() for that get_page() we just did occurs. So we're only holding the pgmap reference long enough to take individual page references. We used to take and put individual pgmap references inside get_page() / put_page(), but that got simplified in this commit to just take and put page reference at devm_memremap_pages() setup / teardown time: 71389703839e mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap 2017-12-06 2:43 ` Dan Williams @ 2017-12-06 22:44 ` Christoph Hellwig 2017-12-06 22:52 ` Dan Williams 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2017-12-06 22:44 UTC (permalink / raw) To: Dan Williams; +Cc: Christoph Hellwig, linux-nvdimm, Linux MM On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote: > I don't think we need this change, but perhaps the reasoning should be > added to the code as a comment... details below. Hmm, looks like we are ok at least. But even if it's not a correctness issue there is no good point in decrementing and incrementing the reference count every time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap 2017-12-06 22:44 ` Christoph Hellwig @ 2017-12-06 22:52 ` Dan Williams 0 siblings, 0 replies; 6+ messages in thread From: Dan Williams @ 2017-12-06 22:52 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvdimm, Linux MM On Wed, Dec 6, 2017 at 2:44 PM, Christoph Hellwig <hch@lst.de> wrote: > On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote: >> I don't think we need this change, but perhaps the reasoning should be >> added to the code as a comment... details below. > > Hmm, looks like we are ok at least. But even if it's not a correctness > issue there is no good point in decrementing and incrementing the > reference count every time. True, we can take it once and drop it at the end when all the related page references have been taken. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-12-06 22:52 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-05 0:34 RFC: dev_pagemap reference counting Christoph Hellwig 2017-12-05 0:34 ` [PATCH 1/2] mm: move get_dev_pagemap out of line Christoph Hellwig 2017-12-05 0:34 ` [PATCH 2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig 2017-12-06 2:43 ` Dan Williams 2017-12-06 22:44 ` Christoph Hellwig 2017-12-06 22:52 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox