* [PATCH] mm: slub: Panic if the object corruption is checked. [not found] <CGME20250120083041epcas2p30d61ee801c1db5a7ebdf26fdc642883f@epcas2p3.samsung.com> @ 2025-01-20 8:28 ` Hyesoo Yu 2025-01-20 15:36 ` Matthew Wilcox ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Hyesoo Yu @ 2025-01-20 8:28 UTC (permalink / raw) Cc: janghyuck.kim, Hyesoo Yu, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-doc, linux-kernel If a slab object is corrupted or an error occurs in its internal value, continuing after restoration may cause other side effects. At this point, it is difficult to debug because the problem occurred in the past. A flag has been added that can cause a panic when there is a problem with the object. Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 --- Documentation/mm/slub.rst | 2 ++ include/linux/slab.h | 4 ++++ mm/slub.c | 14 ++++++++++++++ 3 files changed, 20 insertions(+) diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index 84ca1dc94e5e..ce58525db93d 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -53,6 +53,7 @@ Possible debug options are:: U User tracking (free and alloc) T Trace (please only use on single slabs) A Enable failslab filter mark for the cache + C Panic if object corruption is checked. O Switch debugging off for caches that would have caused higher minimum slab orders - Switch all debugging off (useful if the kernel is @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: U store_user T trace A failslab + C corruption_panic failslab file is writable, so writing 1 or 0 will enable or disable the option at runtime. Write returns -EINVAL if cache is an alias. diff --git a/include/linux/slab.h b/include/linux/slab.h index 10a971c2bde3..4391c30564d6 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -31,6 +31,7 @@ enum _slab_flag_bits { _SLAB_CACHE_DMA32, _SLAB_STORE_USER, _SLAB_PANIC, + _SLAB_CORRUPTION_PANIC, _SLAB_TYPESAFE_BY_RCU, _SLAB_TRACE, #ifdef CONFIG_DEBUG_OBJECTS @@ -97,6 +98,9 @@ enum _slab_flag_bits { #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) /* Panic if kmem_cache_create() fails */ #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) +/* Panic if object corruption is checked */ +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) + /** * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! * diff --git a/mm/slub.c b/mm/slub.c index 48cefc969480..36a8dabf1349 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) fault, end - 1, fault - start); print_section(KERN_ERR, "Padding ", pad, remainder); + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); + restore_bytes(s, "slab padding", POISON_INUSE, fault, end); } @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, if (!ret && !slab_in_kunit_test()) { print_trailer(s, slab, object); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); } return ret; @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) case 'a': *flags |= SLAB_FAILSLAB; break; + case 'c': + *flags |= SLAB_CORRUPTION_PANIC; + break; case 'o': /* * Avoid enabling debugging on caches if its minimum @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) SLAB_ATTR_RO(store_user); +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) +{ + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); +} +SLAB_ATTR_RO(corruption_panic); + static ssize_t validate_show(struct kmem_cache *s, char *buf) { return 0; @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { &red_zone_attr.attr, &poison_attr.attr, &store_user_attr.attr, + &corruption_panic_attr.attr, &validate_attr.attr, #endif #ifdef CONFIG_ZONE_DMA -- 2.48.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-20 8:28 ` [PATCH] mm: slub: Panic if the object corruption is checked Hyesoo Yu @ 2025-01-20 15:36 ` Matthew Wilcox 2025-01-21 0:40 ` Hyesoo Yu 2025-01-20 15:41 ` Hyeonggon Yoo 2025-01-21 12:46 ` kernel test robot 2 siblings, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2025-01-20 15:36 UTC (permalink / raw) To: Hyesoo Yu Cc: janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-doc, linux-kernel On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. A flag has been added that can cause a panic when there > is a problem with the object. > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 Linux does not use Change IDs. Please omit these from future patches. Panicing is a very unfriendly approach. I think a better approach would be to freeze the slab where corruption is detected. That is, no future objects are allocated from that slab, and attempts to free objects from that slab become no-ops. I don't think that should be hard to implement. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-20 15:36 ` Matthew Wilcox @ 2025-01-21 0:40 ` Hyesoo Yu 2025-01-21 10:27 ` Vlastimil Babka 0 siblings, 1 reply; 10+ messages in thread From: Hyesoo Yu @ 2025-01-21 0:40 UTC (permalink / raw) To: Matthew Wilcox Cc: janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1608 bytes --] On Mon, Jan 20, 2025 at 03:36:08PM +0000, Matthew Wilcox wrote: > On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: > > If a slab object is corrupted or an error occurs in its internal > > value, continuing after restoration may cause other side effects. > > At this point, it is difficult to debug because the problem occurred > > in the past. A flag has been added that can cause a panic when there > > is a problem with the object. > > > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > > Linux does not use Change IDs. Please omit these from future patches. > > Panicing is a very unfriendly approach. I think a better approach would > be to freeze the slab where corruption is detected. That is, no future > objects are allocated from that slab, and attempts to free objects from > that slab become no-ops. I don't think that should be hard to implement. > Thanks you for your responce. That is my mistake. I will remove the change ID. I agree that freezing is better than recovery or panic for the system's stability. However what I want from the patch is not just to make the system run stably. I need to immediately trigger a panic to investigate the slub. I would like to analyze the corrupted data at that moment to check issues like cache problem, user errors, system clock frequency and similar problems, not just passing by without any issues. However I agree that panic is not a friendly approach. I will modify it to notify the problem using warn() and then use panic_on_warn to trigger panic. Thanks, Regards. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-21 0:40 ` Hyesoo Yu @ 2025-01-21 10:27 ` Vlastimil Babka 0 siblings, 0 replies; 10+ messages in thread From: Vlastimil Babka @ 2025-01-21 10:27 UTC (permalink / raw) To: Hyesoo Yu, Matthew Wilcox Cc: janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-doc, linux-kernel On 1/21/25 1:40 AM, Hyesoo Yu wrote: > On Mon, Jan 20, 2025 at 03:36:08PM +0000, Matthew Wilcox wrote: >> On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: >>> If a slab object is corrupted or an error occurs in its internal >>> value, continuing after restoration may cause other side effects. >>> At this point, it is difficult to debug because the problem occurred >>> in the past. A flag has been added that can cause a panic when there >>> is a problem with the object. >>> >>> Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> >>> Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 >> >> Linux does not use Change IDs. Please omit these from future patches. >> >> Panicing is a very unfriendly approach. I think a better approach would >> be to freeze the slab where corruption is detected. That is, no future >> objects are allocated from that slab, and attempts to free objects from >> that slab become no-ops. I don't think that should be hard to implement. Freezing of slab is already done in some cases when corruption is detected - all objects are marked as used, and further freeing attempts on the slab are discarded. Perhaps not all cases, which could be improved. > Thanks you for your responce. That is my mistake. I will remove the change ID. > > I agree that freezing is better than recovery or panic for the system's stability. > However what I want from the patch is not just to make the system run stably. > I need to immediately trigger a panic to investigate the slub. IMHO it's a valid goal to panic more quickly when debugging, and enabling slub_debug means debugging is in progress (as opposed to normal production when we try to avoid panic). But making it possible to reuse the general panic_on_warn mechanism (which can be also expected to be enabled when debugging) is indeed preferable to introducing a new slab-specific flag. > I would like to analyze the corrupted data at that moment to check issues > like cache problem, user errors, system clock frequency and similar problems, > not just passing by without any issues. > > However I agree that panic is not a friendly approach. > I will modify it to notify the problem using warn() and then use > panic_on_warn to trigger panic. > > Thanks, > Regards. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-20 8:28 ` [PATCH] mm: slub: Panic if the object corruption is checked Hyesoo Yu 2025-01-20 15:36 ` Matthew Wilcox @ 2025-01-20 15:41 ` Hyeonggon Yoo 2025-01-21 0:54 ` Hyesoo Yu 2025-01-21 12:46 ` kernel test robot 2 siblings, 1 reply; 10+ messages in thread From: Hyeonggon Yoo @ 2025-01-20 15:41 UTC (permalink / raw) To: Hyesoo Yu Cc: janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, linux-mm, linux-doc, linux-kernel On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. A flag has been added that can cause a panic when there > is a problem with the object. Hi Hyesoo, I'm concerned about this because it goes against the effort to avoid introducing new BUG() calls [1]. And I think it would be more appropriate to use existing panic_on_warn functionality [2] which causes a panic on WARN(), rather than introducing a SLUB-specific knob to do the same thing. However SLUB does not call WARN() and uses pr_err() instead when reporting an error. Vlastimil and I talked about changing it to use WARN() a while ago [3], but neither of us have done that yet. Probably you may want to look at it, as it also aligns with your purpose? FYI, if you would like to work on it, please make sure that it WARN() is suppressed during kunit test. [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz Best, Hyeonggon > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > --- > Documentation/mm/slub.rst | 2 ++ > include/linux/slab.h | 4 ++++ > mm/slub.c | 14 ++++++++++++++ > 3 files changed, 20 insertions(+) > > diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst > index 84ca1dc94e5e..ce58525db93d 100644 > --- a/Documentation/mm/slub.rst > +++ b/Documentation/mm/slub.rst > @@ -53,6 +53,7 @@ Possible debug options are:: > U User tracking (free and alloc) > T Trace (please only use on single slabs) > A Enable failslab filter mark for the cache > + C Panic if object corruption is checked. > O Switch debugging off for caches that would have > caused higher minimum slab orders > - Switch all debugging off (useful if the kernel is > @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: > U store_user > T trace > A failslab > + C corruption_panic > > failslab file is writable, so writing 1 or 0 will enable or disable > the option at runtime. Write returns -EINVAL if cache is an alias. > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 10a971c2bde3..4391c30564d6 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -31,6 +31,7 @@ enum _slab_flag_bits { > _SLAB_CACHE_DMA32, > _SLAB_STORE_USER, > _SLAB_PANIC, > + _SLAB_CORRUPTION_PANIC, > _SLAB_TYPESAFE_BY_RCU, > _SLAB_TRACE, > #ifdef CONFIG_DEBUG_OBJECTS > @@ -97,6 +98,9 @@ enum _slab_flag_bits { > #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) > /* Panic if kmem_cache_create() fails */ > #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) > +/* Panic if object corruption is checked */ > +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) > + > /** > * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! > * > diff --git a/mm/slub.c b/mm/slub.c > index 48cefc969480..36a8dabf1349 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) > fault, end - 1, fault - start); > print_section(KERN_ERR, "Padding ", pad, remainder); > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > + > restore_bytes(s, "slab padding", POISON_INUSE, fault, end); > } > > @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, > if (!ret && !slab_in_kunit_test()) { > print_trailer(s, slab, object); > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > + > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > } > > return ret; > @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) > case 'a': > *flags |= SLAB_FAILSLAB; > break; > + case 'c': > + *flags |= SLAB_CORRUPTION_PANIC; > + break; > case 'o': > /* > * Avoid enabling debugging on caches if its minimum > @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) > > SLAB_ATTR_RO(store_user); > > +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) > +{ > + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); > +} > +SLAB_ATTR_RO(corruption_panic); > + > static ssize_t validate_show(struct kmem_cache *s, char *buf) > { > return 0; > @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { > &red_zone_attr.attr, > &poison_attr.attr, > &store_user_attr.attr, > + &corruption_panic_attr.attr, > &validate_attr.attr, > #endif > #ifdef CONFIG_ZONE_DMA > -- > 2.48.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-20 15:41 ` Hyeonggon Yoo @ 2025-01-21 0:54 ` Hyesoo Yu 2025-01-21 1:48 ` Hyeonggon Yoo 0 siblings, 1 reply; 10+ messages in thread From: Hyesoo Yu @ 2025-01-21 0:54 UTC (permalink / raw) To: Hyeonggon Yoo Cc: janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, linux-mm, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 5959 bytes --] On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: > On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > > > If a slab object is corrupted or an error occurs in its internal > > value, continuing after restoration may cause other side effects. > > At this point, it is difficult to debug because the problem occurred > > in the past. A flag has been added that can cause a panic when there > > is a problem with the object. > > Hi Hyesoo, > > I'm concerned about this because it goes against the effort to avoid > introducing new BUG() calls [1]. > > And I think it would be more appropriate to use existing panic_on_warn > functionality [2] which causes > a panic on WARN(), rather than introducing a SLUB-specific knob to do > the same thing. > > However SLUB does not call WARN() and uses pr_err() instead when > reporting an error. > Vlastimil and I talked about changing it to use WARN() a while ago > [3], but neither of us > have done that yet. > > Probably you may want to look at it, as it also aligns with your purpose? > FYI, if you would like to work on it, please make sure that it WARN() > is suppressed during kunit test. > > [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on > [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn > [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz > > Best, > Hyeonggon Thanks for response. Using warn() instead of panic, is a great idea. Thanks for pointing out what I missed. The next patch will be changed to use warn(). Thanks. > > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > > --- > > Documentation/mm/slub.rst | 2 ++ > > include/linux/slab.h | 4 ++++ > > mm/slub.c | 14 ++++++++++++++ > > 3 files changed, 20 insertions(+) > > > > diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst > > index 84ca1dc94e5e..ce58525db93d 100644 > > --- a/Documentation/mm/slub.rst > > +++ b/Documentation/mm/slub.rst > > @@ -53,6 +53,7 @@ Possible debug options are:: > > U User tracking (free and alloc) > > T Trace (please only use on single slabs) > > A Enable failslab filter mark for the cache > > + C Panic if object corruption is checked. > > O Switch debugging off for caches that would have > > caused higher minimum slab orders > > - Switch all debugging off (useful if the kernel is > > @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: > > U store_user > > T trace > > A failslab > > + C corruption_panic > > > > failslab file is writable, so writing 1 or 0 will enable or disable > > the option at runtime. Write returns -EINVAL if cache is an alias. > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index 10a971c2bde3..4391c30564d6 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -31,6 +31,7 @@ enum _slab_flag_bits { > > _SLAB_CACHE_DMA32, > > _SLAB_STORE_USER, > > _SLAB_PANIC, > > + _SLAB_CORRUPTION_PANIC, > > _SLAB_TYPESAFE_BY_RCU, > > _SLAB_TRACE, > > #ifdef CONFIG_DEBUG_OBJECTS > > @@ -97,6 +98,9 @@ enum _slab_flag_bits { > > #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) > > /* Panic if kmem_cache_create() fails */ > > #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) > > +/* Panic if object corruption is checked */ > > +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) > > + > > /** > > * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! > > * > > diff --git a/mm/slub.c b/mm/slub.c > > index 48cefc969480..36a8dabf1349 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) > > fault, end - 1, fault - start); > > print_section(KERN_ERR, "Padding ", pad, remainder); > > > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > > + > > restore_bytes(s, "slab padding", POISON_INUSE, fault, end); > > } > > > > @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, > > if (!ret && !slab_in_kunit_test()) { > > print_trailer(s, slab, object); > > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > > + > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > > } > > > > return ret; > > @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) > > case 'a': > > *flags |= SLAB_FAILSLAB; > > break; > > + case 'c': > > + *flags |= SLAB_CORRUPTION_PANIC; > > + break; > > case 'o': > > /* > > * Avoid enabling debugging on caches if its minimum > > @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) > > > > SLAB_ATTR_RO(store_user); > > > > +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) > > +{ > > + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); > > +} > > +SLAB_ATTR_RO(corruption_panic); > > + > > static ssize_t validate_show(struct kmem_cache *s, char *buf) > > { > > return 0; > > @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { > > &red_zone_attr.attr, > > &poison_attr.attr, > > &store_user_attr.attr, > > + &corruption_panic_attr.attr, > > &validate_attr.attr, > > #endif > > #ifdef CONFIG_ZONE_DMA > > -- > > 2.48.0 > > > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-21 0:54 ` Hyesoo Yu @ 2025-01-21 1:48 ` Hyeonggon Yoo 2025-01-21 2:32 ` Hyesoo Yu 0 siblings, 1 reply; 10+ messages in thread From: Hyeonggon Yoo @ 2025-01-21 1:48 UTC (permalink / raw) To: Hyesoo Yu, Hyeonggon Yoo Cc: kernel_team, janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, linux-mm, linux-doc, linux-kernel On 1/21/2025 9:54 AM, Hyesoo Yu wrote: > On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: >> On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: >>> >>> If a slab object is corrupted or an error occurs in its internal >>> value, continuing after restoration may cause other side effects. >>> At this point, it is difficult to debug because the problem occurred >>> in the past. A flag has been added that can cause a panic when there >>> is a problem with the object. >> >> Hi Hyesoo, >> >> I'm concerned about this because it goes against the effort to avoid >> introducing new BUG() calls [1]. >> >> And I think it would be more appropriate to use existing panic_on_warn >> functionality [2] which causes >> a panic on WARN(), rather than introducing a SLUB-specific knob to do >> the same thing. >> >> However SLUB does not call WARN() and uses pr_err() instead when >> reporting an error. >> Vlastimil and I talked about changing it to use WARN() a while ago >> [3], but neither of us >> have done that yet. >> >> Probably you may want to look at it, as it also aligns with your purpose? >> FYI, if you would like to work on it, please make sure that it WARN() >> is suppressed during kunit test. >> >> [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on >> [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn >> [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz >> >> Best, >> Hyeonggon > > Thanks for response. > > Using warn() instead of panic, is a great idea. > Thanks for pointing out what I missed. Just for clarification, I think changing the common error reporting logic (like, slab_bug()) to use WARN() will be preferable to inserting new WARN()s at random points, which is what this patch does now. Best, Hyeonggon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-21 1:48 ` Hyeonggon Yoo @ 2025-01-21 2:32 ` Hyesoo Yu 2025-01-21 10:28 ` Vlastimil Babka 0 siblings, 1 reply; 10+ messages in thread From: Hyesoo Yu @ 2025-01-21 2:32 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Hyeonggon Yoo, kernel_team, janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, linux-mm, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2872 bytes --] On Tue, Jan 21, 2025 at 10:48:08AM +0900, Hyeonggon Yoo wrote: > > > On 1/21/2025 9:54 AM, Hyesoo Yu wrote: > > On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: > > > On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > > > > > > > If a slab object is corrupted or an error occurs in its internal > > > > value, continuing after restoration may cause other side effects. > > > > At this point, it is difficult to debug because the problem occurred > > > > in the past. A flag has been added that can cause a panic when there > > > > is a problem with the object. > > > > > > Hi Hyesoo, > > > > > > I'm concerned about this because it goes against the effort to avoid > > > introducing new BUG() calls [1]. > > > > > > And I think it would be more appropriate to use existing panic_on_warn > > > functionality [2] which causes > > > a panic on WARN(), rather than introducing a SLUB-specific knob to do > > > the same thing. > > > > > > However SLUB does not call WARN() and uses pr_err() instead when > > > reporting an error. > > > Vlastimil and I talked about changing it to use WARN() a while ago > > > [3], but neither of us > > > have done that yet. > > > > > > Probably you may want to look at it, as it also aligns with your purpose? > > > FYI, if you would like to work on it, please make sure that it WARN() > > > is suppressed during kunit test. > > > > > > [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on > > > [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn > > > [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz > > > > > > Best, > > > Hyeonggon > > > > Thanks for response. > > > > Using warn() instead of panic, is a great idea. > > Thanks for pointing out what I missed. > > Just for clarification, I think changing the common error reporting > logic (like, slab_bug()) to use WARN() will be preferable to inserting > new WARN()s at random points, which is what this patch does now. > > Best, > Hyeonggon > Thanks you for clarification. Actually, I considered adding BUG_ON() to slab_bug. However if we add BUG_ON() to slab_bug, it will prevent many meaningful error log from being printed subsequently. As you know, slab_bug is the log that usually is printed at the biginning of a bug in the slab. As a result, it would be difficult to figure out the problems based on the logs during our large-scale test-bed. Similary, even if I use WARN() in slab_bug, we won't be able to obtain the logs when panic_on_warn is enabled. I don't think it is useful to include WARN in slab_bug. Instead, I will implement a solution where WARN is only used in slab_fix before slab object is restored. If I add it to slab_fix, I think warning is suppressed on kunit test by slab_add_kunit_errors handling. Thanks, Regards. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-21 2:32 ` Hyesoo Yu @ 2025-01-21 10:28 ` Vlastimil Babka 0 siblings, 0 replies; 10+ messages in thread From: Vlastimil Babka @ 2025-01-21 10:28 UTC (permalink / raw) To: Hyesoo Yu, Hyeonggon Yoo Cc: Hyeonggon Yoo, kernel_team, janghyuck.kim, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, linux-mm, linux-doc, linux-kernel On 1/21/25 3:32 AM, Hyesoo Yu wrote: > On Tue, Jan 21, 2025 at 10:48:08AM +0900, Hyeonggon Yoo wrote: >> >> >> On 1/21/2025 9:54 AM, Hyesoo Yu wrote: >>> On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: >>>> On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: >>> >>> Thanks for response. >>> >>> Using warn() instead of panic, is a great idea. >>> Thanks for pointing out what I missed. >> >> Just for clarification, I think changing the common error reporting >> logic (like, slab_bug()) to use WARN() will be preferable to inserting >> new WARN()s at random points, which is what this patch does now. >> >> Best, >> Hyeonggon >> > > Thanks you for clarification. > > Actually, I considered adding BUG_ON() to slab_bug. However if we add BUG_ON() to slab_bug, > it will prevent many meaningful error log from being printed subsequently. > As you know, slab_bug is the log that usually is printed at the biginning of a bug > in the slab. As a result, it would be difficult to figure out the problems based on > the logs during our large-scale test-bed. > > Similary, even if I use WARN() in slab_bug, we won't be able to obtain the logs > when panic_on_warn is enabled. I don't think it is useful to include WARN in slab_bug. > > Instead, I will implement a solution where WARN is only used in slab_fix before > slab object is restored. If I add it to slab_fix, I think warning is suppressed on > kunit test by slab_add_kunit_errors handling. Right, makes sense to only do the WARN() after printing the debugging logs. Thanks. > Thanks, > Regards. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: slub: Panic if the object corruption is checked. 2025-01-20 8:28 ` [PATCH] mm: slub: Panic if the object corruption is checked Hyesoo Yu 2025-01-20 15:36 ` Matthew Wilcox 2025-01-20 15:41 ` Hyeonggon Yoo @ 2025-01-21 12:46 ` kernel test robot 2 siblings, 0 replies; 10+ messages in thread From: kernel test robot @ 2025-01-21 12:46 UTC (permalink / raw) To: Hyesoo Yu Cc: oe-kbuild-all, janghyuck.kim, Hyesoo Yu, Andrew Morton, Jonathan Corbet, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-doc, linux-kernel Hi Hyesoo, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Hyesoo-Yu/mm-slub-Panic-if-the-object-corruption-is-checked/20250120-163233 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20250120082908.4162780-1-hyesoo.yu%40samsung.com patch subject: [PATCH] mm: slub: Panic if the object corruption is checked. config: mips-randconfig-r111-20250121 (https://download.01.org/0day-ci/archive/20250121/202501212026.lUnLNhv6-lkp@intel.com/config) compiler: mips-linux-gcc (GCC) 14.2.0 reproduce: (https://download.01.org/0day-ci/archive/20250121/202501212026.lUnLNhv6-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501212026.lUnLNhv6-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> mm/slub.c:1308:9: sparse: sparse: cast from restricted slab_flags_t mm/slub.c:1394:17: sparse: sparse: cast from restricted slab_flags_t mm/slub.c:4440:47: sparse: sparse: context imbalance in '__slab_free' - unexpected unlock vim +1308 mm/slub.c 1273 1274 /* Check the pad bytes at the end of a slab page */ 1275 static pad_check_attributes void 1276 slab_pad_check(struct kmem_cache *s, struct slab *slab) 1277 { 1278 u8 *start; 1279 u8 *fault; 1280 u8 *end; 1281 u8 *pad; 1282 int length; 1283 int remainder; 1284 1285 if (!(s->flags & SLAB_POISON)) 1286 return; 1287 1288 start = slab_address(slab); 1289 length = slab_size(slab); 1290 end = start + length; 1291 remainder = length % s->size; 1292 if (!remainder) 1293 return; 1294 1295 pad = end - remainder; 1296 metadata_access_enable(); 1297 fault = memchr_inv(kasan_reset_tag(pad), POISON_INUSE, remainder); 1298 metadata_access_disable(); 1299 if (!fault) 1300 return; 1301 while (end > fault && end[-1] == POISON_INUSE) 1302 end--; 1303 1304 slab_err(s, slab, "Padding overwritten. 0x%p-0x%p @offset=%tu", 1305 fault, end - 1, fault - start); 1306 print_section(KERN_ERR, "Padding ", pad, remainder); 1307 > 1308 BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); 1309 1310 restore_bytes(s, "slab padding", POISON_INUSE, fault, end); 1311 } 1312 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-01-21 12:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20250120083041epcas2p30d61ee801c1db5a7ebdf26fdc642883f@epcas2p3.samsung.com>
2025-01-20 8:28 ` [PATCH] mm: slub: Panic if the object corruption is checked Hyesoo Yu
2025-01-20 15:36 ` Matthew Wilcox
2025-01-21 0:40 ` Hyesoo Yu
2025-01-21 10:27 ` Vlastimil Babka
2025-01-20 15:41 ` Hyeonggon Yoo
2025-01-21 0:54 ` Hyesoo Yu
2025-01-21 1:48 ` Hyeonggon Yoo
2025-01-21 2:32 ` Hyesoo Yu
2025-01-21 10:28 ` Vlastimil Babka
2025-01-21 12:46 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox