* Re: [PATCH] dax/kmem: Fix leak of memory-hotplug resources
2023-02-16 8:36 [PATCH] dax/kmem: Fix leak of memory-hotplug resources Dan Williams
@ 2023-02-17 16:48 ` Dave Jiang
2023-02-17 18:58 ` Pasha Tatashin
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Dave Jiang @ 2023-02-17 16:48 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: stable, Oscar Salvador, David Hildenbrand, Pavel Tatashin, linux-mm
On 2/16/23 1:36 AM, Dan Williams wrote:
> While experimenting with CXL region removal the following corruption of
> /proc/iomem appeared.
>
> Before:
> f010000000-f04fffffff : CXL Window 0
> f010000000-f02fffffff : region4
> f010000000-f02fffffff : dax4.0
> f010000000-f02fffffff : System RAM (kmem)
>
> After (modprobe -r cxl_test):
> f010000000-f02fffffff : **redacted binary garbage**
> f010000000-f02fffffff : System RAM (kmem)
>
> ...and testing further the same is visible with persistent memory
> assigned to kmem:
>
> Before:
> 480000000-243fffffff : Persistent Memory
> 480000000-57e1fffff : namespace3.0
> 580000000-243fffffff : dax3.0
> 580000000-243fffffff : System RAM (kmem)
>
> After (ndctl disable-region all):
> 480000000-243fffffff : Persistent Memory
> 580000000-243fffffff : ***redacted binary garbage***
> 580000000-243fffffff : System RAM (kmem)
>
> The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0"
> resources, and it also shows that the "System RAM (kmem)" resource is
> not being removed. The bug does not appear after "modprobe -r kmem", it
> requires the parent of "dax4.0" and "dax3.0" to be removed which
> re-parents the leaked "System RAM (kmem)" instances. Those in turn
> reference the freed resource as a parent.
>
> First up for the fix is release_mem_region_adjustable() needs to
> reliably delete the resource inserted by add_memory_driver_managed().
> That is thwarted by a check for IORESOURCE_SYSRAM that predates the
> dax/kmem driver, from commit:
>
> 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
>
> That appears to be working around the behavior of HMM's
> "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
> check removed the "System RAM (kmem)" resource gets removed, but
> corruption still occurs occasionally because the "dax" resource is not
> reliably removed.
>
> The dax range information is freed before the device is unregistered, so
> the driver can not reliably recall (another use after free) what it is
> meant to release. Lastly if that use after free got lucky, the driver
> was covering up the leak of "System RAM (kmem)" due to its use of
> release_resource() which detaches, but does not free, child resources.
> The switch to remove_resource() forces remove_memory() to be responsible
> for the deletion of the resource added by add_memory_driver_managed().
>
> Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
> Cc: <stable@vger.kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/bus.c | 2 +-
> drivers/dax/kmem.c | 4 ++--
> kernel/resource.c | 14 --------------
> 3 files changed, 3 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 012d576004e9..67a64f4c472d 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev)
> dev_dbg(dev, "%s\n", __func__);
>
> kill_dev_dax(dev_dax);
> - free_dev_dax_ranges(dev_dax);
> device_del(dev);
> + free_dev_dax_ranges(dev_dax);
> put_device(dev);
> }
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 918d01d3fbaa..7b36db6f1cbd 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> if (rc) {
> dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
> i, range.start, range.end);
> - release_resource(res);
> + remove_resource(res);
> kfree(res);
> data->res[i] = NULL;
> if (mapped)
> @@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
>
> rc = remove_memory(range.start, range_len(&range));
> if (rc == 0) {
> - release_resource(data->res[i]);
> + remove_resource(data->res[i]);
> kfree(data->res[i]);
> data->res[i] = NULL;
> success++;
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ddbbacb9fb50..b1763b2fd7ef 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> continue;
> }
>
> - /*
> - * All memory regions added from memory-hotplug path have the
> - * flag IORESOURCE_SYSTEM_RAM. If the resource does not have
> - * this flag, we know that we are dealing with a resource coming
> - * from HMM/devm. HMM/devm use another mechanism to add/release
> - * a resource. This goes via devm_request_mem_region and
> - * devm_release_mem_region.
> - * HMM/devm take care to release their resources when they want,
> - * so if we are dealing with them, let us just back off here.
> - */
> - if (!(res->flags & IORESOURCE_SYSRAM)) {
> - break;
> - }
> -
> if (!(res->flags & IORESOURCE_MEM))
> break;
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] dax/kmem: Fix leak of memory-hotplug resources
2023-02-16 8:36 [PATCH] dax/kmem: Fix leak of memory-hotplug resources Dan Williams
2023-02-17 16:48 ` Dave Jiang
@ 2023-02-17 18:58 ` Pasha Tatashin
2023-02-17 21:19 ` Verma, Vishal L
2023-02-20 7:53 ` Alistair Popple
3 siblings, 0 replies; 5+ messages in thread
From: Pasha Tatashin @ 2023-02-17 18:58 UTC (permalink / raw)
To: Dan Williams
Cc: linux-cxl, stable, Oscar Salvador, David Hildenbrand, linux-mm
On Thu, Feb 16, 2023 at 3:36 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> While experimenting with CXL region removal the following corruption of
> /proc/iomem appeared.
>
> Before:
> f010000000-f04fffffff : CXL Window 0
> f010000000-f02fffffff : region4
> f010000000-f02fffffff : dax4.0
> f010000000-f02fffffff : System RAM (kmem)
>
> After (modprobe -r cxl_test):
> f010000000-f02fffffff : **redacted binary garbage**
> f010000000-f02fffffff : System RAM (kmem)
>
> ...and testing further the same is visible with persistent memory
> assigned to kmem:
>
> Before:
> 480000000-243fffffff : Persistent Memory
> 480000000-57e1fffff : namespace3.0
> 580000000-243fffffff : dax3.0
> 580000000-243fffffff : System RAM (kmem)
>
> After (ndctl disable-region all):
> 480000000-243fffffff : Persistent Memory
> 580000000-243fffffff : ***redacted binary garbage***
> 580000000-243fffffff : System RAM (kmem)
>
> The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0"
> resources, and it also shows that the "System RAM (kmem)" resource is
> not being removed. The bug does not appear after "modprobe -r kmem", it
> requires the parent of "dax4.0" and "dax3.0" to be removed which
> re-parents the leaked "System RAM (kmem)" instances. Those in turn
> reference the freed resource as a parent.
>
> First up for the fix is release_mem_region_adjustable() needs to
> reliably delete the resource inserted by add_memory_driver_managed().
> That is thwarted by a check for IORESOURCE_SYSRAM that predates the
> dax/kmem driver, from commit:
>
> 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
>
> That appears to be working around the behavior of HMM's
> "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
> check removed the "System RAM (kmem)" resource gets removed, but
> corruption still occurs occasionally because the "dax" resource is not
> reliably removed.
>
> The dax range information is freed before the device is unregistered, so
> the driver can not reliably recall (another use after free) what it is
> meant to release. Lastly if that use after free got lucky, the driver
> was covering up the leak of "System RAM (kmem)" due to its use of
> release_resource() which detaches, but does not free, child resources.
> The switch to remove_resource() forces remove_memory() to be responsible
> for the deletion of the resource added by add_memory_driver_managed().
>
> Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
> Cc: <stable@vger.kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Thanks,
Pasha
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] dax/kmem: Fix leak of memory-hotplug resources
2023-02-16 8:36 [PATCH] dax/kmem: Fix leak of memory-hotplug resources Dan Williams
2023-02-17 16:48 ` Dave Jiang
2023-02-17 18:58 ` Pasha Tatashin
@ 2023-02-17 21:19 ` Verma, Vishal L
2023-02-20 7:53 ` Alistair Popple
3 siblings, 0 replies; 5+ messages in thread
From: Verma, Vishal L @ 2023-02-17 21:19 UTC (permalink / raw)
To: Williams, Dan J, linux-cxl
Cc: linux-mm, stable, osalvador, david, pasha.tatashin
On Thu, 2023-02-16 at 00:36 -0800, Dan Williams wrote:
> While experimenting with CXL region removal the following corruption of
> /proc/iomem appeared.
>
> Before:
> f010000000-f04fffffff : CXL Window 0
> f010000000-f02fffffff : region4
> f010000000-f02fffffff : dax4.0
> f010000000-f02fffffff : System RAM (kmem)
>
> After (modprobe -r cxl_test):
> f010000000-f02fffffff : **redacted binary garbage**
> f010000000-f02fffffff : System RAM (kmem)
>
> ...and testing further the same is visible with persistent memory
> assigned to kmem:
>
> Before:
> 480000000-243fffffff : Persistent Memory
> 480000000-57e1fffff : namespace3.0
> 580000000-243fffffff : dax3.0
> 580000000-243fffffff : System RAM (kmem)
>
> After (ndctl disable-region all):
> 480000000-243fffffff : Persistent Memory
> 580000000-243fffffff : ***redacted binary garbage***
> 580000000-243fffffff : System RAM (kmem)
>
> The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0"
> resources, and it also shows that the "System RAM (kmem)" resource is
> not being removed. The bug does not appear after "modprobe -r kmem", it
> requires the parent of "dax4.0" and "dax3.0" to be removed which
> re-parents the leaked "System RAM (kmem)" instances. Those in turn
> reference the freed resource as a parent.
>
> First up for the fix is release_mem_region_adjustable() needs to
> reliably delete the resource inserted by add_memory_driver_managed().
> That is thwarted by a check for IORESOURCE_SYSRAM that predates the
> dax/kmem driver, from commit:
>
> 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
>
> That appears to be working around the behavior of HMM's
> "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
> check removed the "System RAM (kmem)" resource gets removed, but
> corruption still occurs occasionally because the "dax" resource is not
> reliably removed.
>
> The dax range information is freed before the device is unregistered, so
> the driver can not reliably recall (another use after free) what it is
> meant to release. Lastly if that use after free got lucky, the driver
> was covering up the leak of "System RAM (kmem)" due to its use of
> release_resource() which detaches, but does not free, child resources.
> The switch to remove_resource() forces remove_memory() to be responsible
> for the deletion of the resource added by add_memory_driver_managed().
>
> Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
> Cc: <stable@vger.kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> drivers/dax/bus.c | 2 +-
> drivers/dax/kmem.c | 4 ++--
> kernel/resource.c | 14 --------------
> 3 files changed, 3 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 012d576004e9..67a64f4c472d 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev)
> dev_dbg(dev, "%s\n", __func__);
>
> kill_dev_dax(dev_dax);
> - free_dev_dax_ranges(dev_dax);
> device_del(dev);
> + free_dev_dax_ranges(dev_dax);
> put_device(dev);
> }
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 918d01d3fbaa..7b36db6f1cbd 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> if (rc) {
> dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
> i, range.start, range.end);
> - release_resource(res);
> + remove_resource(res);
> kfree(res);
> data->res[i] = NULL;
> if (mapped)
> @@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
>
> rc = remove_memory(range.start, range_len(&range));
> if (rc == 0) {
> - release_resource(data->res[i]);
> + remove_resource(data->res[i]);
> kfree(data->res[i]);
> data->res[i] = NULL;
> success++;
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ddbbacb9fb50..b1763b2fd7ef 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> continue;
> }
>
> - /*
> - * All memory regions added from memory-hotplug path have the
> - * flag IORESOURCE_SYSTEM_RAM. If the resource does not have
> - * this flag, we know that we are dealing with a resource coming
> - * from HMM/devm. HMM/devm use another mechanism to add/release
> - * a resource. This goes via devm_request_mem_region and
> - * devm_release_mem_region.
> - * HMM/devm take care to release their resources when they want,
> - * so if we are dealing with them, let us just back off here.
> - */
> - if (!(res->flags & IORESOURCE_SYSRAM)) {
> - break;
> - }
> -
> if (!(res->flags & IORESOURCE_MEM))
> break;
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] dax/kmem: Fix leak of memory-hotplug resources
2023-02-16 8:36 [PATCH] dax/kmem: Fix leak of memory-hotplug resources Dan Williams
` (2 preceding siblings ...)
2023-02-17 21:19 ` Verma, Vishal L
@ 2023-02-20 7:53 ` Alistair Popple
3 siblings, 0 replies; 5+ messages in thread
From: Alistair Popple @ 2023-02-20 7:53 UTC (permalink / raw)
To: Dan Williams
Cc: linux-cxl, stable, Oscar Salvador, David Hildenbrand,
Pavel Tatashin, linux-mm
Dan Williams <dan.j.williams@intel.com> writes:
[...]
> First up for the fix is release_mem_region_adjustable() needs to
> reliably delete the resource inserted by add_memory_driver_managed().
> That is thwarted by a check for IORESOURCE_SYSRAM that predates the
> dax/kmem driver, from commit:
>
> 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
>
> That appears to be working around the behavior of HMM's
> "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
> check removed the "System RAM (kmem)" resource gets removed, but
> corruption still occurs occasionally because the "dax" resource is not
> reliably removed.
And re-added in the form of MEMORY_DEVICE_COHERENT, which is pretty
similar. However I can't understand why that commit was needed in the
first place. As mentioned in that commit message "we only call
release_mem_region_adjustable() in __remove_pages if the zone is not
ZONE_DEVICE" anyway, so I don't really understand how this code path
could ever have been exercised. I can't see why it's needed now either,
so for the change to resource.c feel free to add:
Reviewed-by: Alistair Popple <apopple@nvidia.com>
> The dax range information is freed before the device is unregistered, so
> the driver can not reliably recall (another use after free) what it is
> meant to release. Lastly if that use after free got lucky, the driver
> was covering up the leak of "System RAM (kmem)" due to its use of
> release_resource() which detaches, but does not free, child resources.
> The switch to remove_resource() forces remove_memory() to be responsible
> for the deletion of the resource added by add_memory_driver_managed().
>
> Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
> Cc: <stable@vger.kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/dax/bus.c | 2 +-
> drivers/dax/kmem.c | 4 ++--
> kernel/resource.c | 14 --------------
> 3 files changed, 3 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 012d576004e9..67a64f4c472d 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev)
> dev_dbg(dev, "%s\n", __func__);
>
> kill_dev_dax(dev_dax);
> - free_dev_dax_ranges(dev_dax);
> device_del(dev);
> + free_dev_dax_ranges(dev_dax);
> put_device(dev);
> }
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 918d01d3fbaa..7b36db6f1cbd 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> if (rc) {
> dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
> i, range.start, range.end);
> - release_resource(res);
> + remove_resource(res);
> kfree(res);
> data->res[i] = NULL;
> if (mapped)
> @@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
>
> rc = remove_memory(range.start, range_len(&range));
> if (rc == 0) {
> - release_resource(data->res[i]);
> + remove_resource(data->res[i]);
> kfree(data->res[i]);
> data->res[i] = NULL;
> success++;
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ddbbacb9fb50..b1763b2fd7ef 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> continue;
> }
>
> - /*
> - * All memory regions added from memory-hotplug path have the
> - * flag IORESOURCE_SYSTEM_RAM. If the resource does not have
> - * this flag, we know that we are dealing with a resource coming
> - * from HMM/devm. HMM/devm use another mechanism to add/release
> - * a resource. This goes via devm_request_mem_region and
> - * devm_release_mem_region.
> - * HMM/devm take care to release their resources when they want,
> - * so if we are dealing with them, let us just back off here.
> - */
> - if (!(res->flags & IORESOURCE_SYSRAM)) {
> - break;
> - }
> -
> if (!(res->flags & IORESOURCE_MEM))
> break;
>
^ permalink raw reply [flat|nested] 5+ messages in thread