* [PATCH] mm: slub: Print the broken data before restoring slub. [not found] <CGME20250120083144epcas2p369584af764b617c3d2cb2a0568a45d6c@epcas2p3.samsung.com> @ 2025-01-20 8:30 ` Hyesoo Yu 2025-01-21 13:35 ` Hyeonggon Yoo 0 siblings, 1 reply; 5+ messages in thread From: Hyesoo Yu @ 2025-01-20 8:30 UTC (permalink / raw) Cc: janghyuck.kim, Hyesoo Yu, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel Previously, the restore occured after printing the object in slub. After commit 47d911b ("slab: make check_object() more consistent"), the bytes are printed after the restore. This information about the bytes before the restore is highly valuable for debugging purpose. For instance, in a event of cache issue, it displays byte patterns by breaking them down into 64-bytes units. Without this information, we can only speculate on how it was broken. Hence the corrupted regions are printed prior to the restoration process. Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> Change-Id: Iac1df0526808edc2318f9988c757cdc3e40ae4b2 --- mm/slub.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/slub.c b/mm/slub.c index c2151c9fee22..48cefc969480 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1207,6 +1207,7 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab, fault[0], value); skip_bug_print: + print_section(KERN_ERR, "Corrupt ", fault, end - fault); restore_bytes(s, what, value, fault, end); return 0; } -- 2.48.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: slub: Print the broken data before restoring slub. 2025-01-20 8:30 ` [PATCH] mm: slub: Print the broken data before restoring slub Hyesoo Yu @ 2025-01-21 13:35 ` Hyeonggon Yoo 2025-01-22 3:25 ` Chengming Zhou 2025-01-22 5:27 ` Hyesoo Yu 0 siblings, 2 replies; 5+ messages in thread From: Hyeonggon Yoo @ 2025-01-21 13:35 UTC (permalink / raw) To: Hyesoo Yu Cc: janghyuck.kim, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm, linux-kernel, Chengming Zhou On Mon, Jan 20, 2025 at 5:31 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: Let's add Chengming, the author of the commit, to Cc, as he might have some opinions about it. > Previously, the restore occured after printing the object in slub. > After commit 47d911b ("slab: make check_object() more consistent"), at least 12 characters of the commit hash should be used to refer to a commit. Documentation/process/submitting-patches.rst states that: You should also be sure to use at least the first twelve characters of the SHA-1 ID. The kernel repository holds a lot of objects, making collisions with shorter IDs a real possibility. Bear in mind that, even if there is no collision with your six-character ID now, that condition may change five years from now. > the bytes are printed after the restore. This information about the bytes > before the restore is highly valuable for debugging purpose. > For instance, in a event of cache issue, it displays byte patterns > by breaking them down into 64-bytes units. Without this information, > we can only speculate on how it was broken. Hence the corrupted regions > are printed prior to the restoration process. Probably this should be considered for -stable releases. What do you think? [1] https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > diff --git a/mm/slub.c b/mm/slub.c > index c2151c9fee22..48cefc969480 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1207,6 +1207,7 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab, > fault[0], value); > > skip_bug_print: > + print_section(KERN_ERR, "Corrupt ", fault, end - fault); I don't think it's supposed to report an error here, per the name of the label "skip_bug_print". Maybe move print_trailer() and add_taint() back to check_bytes_and_report(), and report an error only once and skip reporting if it's already reported? Best, Hyeonggon > restore_bytes(s, what, value, fault, end); > return 0; > } > -- > 2.48.0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: slub: Print the broken data before restoring slub. 2025-01-21 13:35 ` Hyeonggon Yoo @ 2025-01-22 3:25 ` Chengming Zhou 2025-01-22 5:42 ` Hyesoo Yu 2025-01-22 5:27 ` Hyesoo Yu 1 sibling, 1 reply; 5+ messages in thread From: Chengming Zhou @ 2025-01-22 3:25 UTC (permalink / raw) To: Hyeonggon Yoo, Hyesoo Yu Cc: janghyuck.kim, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm, linux-kernel On 2025/1/21 21:35, Hyeonggon Yoo wrote: > On Mon, Jan 20, 2025 at 5:31 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > Let's add Chengming, the author of the commit, to Cc, > as he might have some opinions about it. Thanks! > >> Previously, the restore occured after printing the object in slub. >> After commit 47d911b ("slab: make check_object() more consistent"), > > at least 12 characters of the commit hash should be used to refer to a commit. > Documentation/process/submitting-patches.rst states that: > You should also be sure to use at least the first twelve > characters of the SHA-1 ID. > The kernel repository holds a lot of objects, making collisions > with shorter IDs a real > possibility. Bear in mind that, even if there is no collision with > your six-character ID > now, that condition may change five years from now. > >> the bytes are printed after the restore. This information about the bytes Yes, object will be dumped once we found one error and abort checking before this commit, which changed to check all sections of the object and dump the object at last, then corrupted section has been restored. >> before the restore is highly valuable for debugging purpose. >> For instance, in a event of cache issue, it displays byte patterns >> by breaking them down into 64-bytes units. Without this information, Actually, we already print the error message of corrupted section in check_bytes_and_report() of each section checking, but it's not enough for your case. So you add print_section(), which makes sense to me. >> we can only speculate on how it was broken. Hence the corrupted regions >> are printed prior to the restoration process. > > Probably this should be considered for -stable releases. What do you think? > [1] https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html I'm not sure since it's not bug, just the printed message is not enough in this usecase. > >> diff --git a/mm/slub.c b/mm/slub.c >> index c2151c9fee22..48cefc969480 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1207,6 +1207,7 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab, >> fault[0], value); >> >> skip_bug_print: >> + print_section(KERN_ERR, "Corrupt ", fault, end - fault); > > I don't think it's supposed to report an error here, per the name of > the label "skip_bug_print". Agree, I think print_section() should be above skip_bug_print, which means we should skip printing bug message when kunit testing. Here you just print the "Corrupt" part of this section, another choice is just print this whole section, not sure which way is better. > > Maybe move print_trailer() and add_taint() back to > check_bytes_and_report(), and report an error > only once and skip reporting if it's already reported? Here is the dicussion[1]. [1] https://lore.kernel.org/all/20240528-b4-slab-debug-v1-1-8694ef4802df@linux.dev/ Thanks. > > Best, > Hyeonggon > >> restore_bytes(s, what, value, fault, end); >> return 0; >> } >> -- >> 2.48.0 >> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: slub: Print the broken data before restoring slub. 2025-01-22 3:25 ` Chengming Zhou @ 2025-01-22 5:42 ` Hyesoo Yu 0 siblings, 0 replies; 5+ messages in thread From: Hyesoo Yu @ 2025-01-22 5:42 UTC (permalink / raw) To: Chengming Zhou Cc: Hyeonggon Yoo, janghyuck.kim, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4062 bytes --] On Wed, Jan 22, 2025 at 11:25:15AM +0800, Chengming Zhou wrote: > On 2025/1/21 21:35, Hyeonggon Yoo wrote: > > On Mon, Jan 20, 2025 at 5:31 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > > > Let's add Chengming, the author of the commit, to Cc, > > as he might have some opinions about it. > > Thanks! > I am sorry, I missed your reply. > > > > > Previously, the restore occured after printing the object in slub. > > > After commit 47d911b ("slab: make check_object() more consistent"), > > > > at least 12 characters of the commit hash should be used to refer to a commit. > > Documentation/process/submitting-patches.rst states that: > > You should also be sure to use at least the first twelve > > characters of the SHA-1 ID. > > The kernel repository holds a lot of objects, making collisions > > with shorter IDs a real > > possibility. Bear in mind that, even if there is no collision with > > your six-character ID > > now, that condition may change five years from now. > > > > > the bytes are printed after the restore. This information about the bytes > > Yes, object will be dumped once we found one error and abort checking > before this commit, which changed to check all sections of the object > and dump the object at last, then corrupted section has been restored. > > > > before the restore is highly valuable for debugging purpose. > > > For instance, in a event of cache issue, it displays byte patterns > > > by breaking them down into 64-bytes units. Without this information, > > Actually, we already print the error message of corrupted section in > check_bytes_and_report() of each section checking, but it's not enough > for your case. So you add print_section(), which makes sense to me. > > > > we can only speculate on how it was broken. Hence the corrupted regions > > > are printed prior to the restoration process. > > > > Probably this should be considered for -stable releases. What do you think? > > [1] https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > > I'm not sure since it's not bug, just the printed message is not enough > in this usecase. > I agree that it is not a bug, just missing necessary information. I won't include it in stable based on your opinion. > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index c2151c9fee22..48cefc969480 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -1207,6 +1207,7 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab, > > > fault[0], value); > > > > > > skip_bug_print: > > > + print_section(KERN_ERR, "Corrupt ", fault, end - fault); > > > > I don't think it's supposed to report an error here, per the name of > > the label "skip_bug_print". > > Agree, I think print_section() should be above skip_bug_print, > which means we should skip printing bug message when kunit testing. > > Here you just print the "Corrupt" part of this section, another choice > is just print this whole section, not sure which way is better. > Yes, It is my mistake. I'll fix it. If we add print_section on check_bytes_and_report, that'll print the 'corrupted section' and then print the whole section once again. I guess that printing the restored section is not meaningful for debug. It would be more efficient for log to print the whole section once. However this would require passing additional parameter to check_bytes_and_report. In the next version, I plan to modify it to only print the whole section once by adding the boolean parameter to check_bytes_and_report. Thanks, > > > > Maybe move print_trailer() and add_taint() back to > > check_bytes_and_report(), and report an error > > only once and skip reporting if it's already reported? > > Here is the dicussion[1]. > > [1] https://lore.kernel.org/all/20240528-b4-slab-debug-v1-1-8694ef4802df@linux.dev/ > > Thanks. > > > > > Best, > > Hyeonggon > > > > > restore_bytes(s, what, value, fault, end); > > > return 0; > > > } > > > -- > > > 2.48.0 > > > > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: slub: Print the broken data before restoring slub. 2025-01-21 13:35 ` Hyeonggon Yoo 2025-01-22 3:25 ` Chengming Zhou @ 2025-01-22 5:27 ` Hyesoo Yu 1 sibling, 0 replies; 5+ messages in thread From: Hyesoo Yu @ 2025-01-22 5:27 UTC (permalink / raw) To: Hyeonggon Yoo Cc: janghyuck.kim, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm, linux-kernel, Chengming Zhou [-- Attachment #1: Type: text/plain, Size: 2632 bytes --] On Tue, Jan 21, 2025 at 10:35:58PM +0900, Hyeonggon Yoo wrote: > On Mon, Jan 20, 2025 at 5:31 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > Let's add Chengming, the author of the commit, to Cc, > as he might have some opinions about it. > > > Previously, the restore occured after printing the object in slub. > > After commit 47d911b ("slab: make check_object() more consistent"), > > at least 12 characters of the commit hash should be used to refer to a commit. > Documentation/process/submitting-patches.rst states that: > You should also be sure to use at least the first twelve > characters of the SHA-1 ID. > The kernel repository holds a lot of objects, making collisions > with shorter IDs a real > possibility. Bear in mind that, even if there is no collision with > your six-character ID > now, that condition may change five years from now. > Thanks for pointing out the mistake. > > the bytes are printed after the restore. This information about the bytes > > before the restore is highly valuable for debugging purpose. > > For instance, in a event of cache issue, it displays byte patterns > > by breaking them down into 64-bytes units. Without this information, > > we can only speculate on how it was broken. Hence the corrupted regions > > are printed prior to the restoration process. > > Probably this should be considered for -stable releases. What do you think? > [1] https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > Thank you for the advice. I will add Cc:stable@vger.kernel.org in the next version. > > diff --git a/mm/slub.c b/mm/slub.c > > index c2151c9fee22..48cefc969480 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1207,6 +1207,7 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab, > > fault[0], value); > > > > skip_bug_print: > > + print_section(KERN_ERR, "Corrupt ", fault, end - fault); > > I don't think it's supposed to report an error here, per the name of > the label "skip_bug_print". > It is good point. I will move print_section above the skip_bug_print label. > Maybe move print_trailer() and add_taint() back to > check_bytes_and_report(), and report an error > only once and skip reporting if it's already reported? > > Best, > Hyeonggon > By passing a new parameter to the check_bytes_and_report(), It could be implemented. Would it be better to add a new boolean parameter to that function ? Or do you have any other ideas ? Thanks, Hyesoo. > > restore_bytes(s, what, value, fault, end); > > return 0; > > } > > -- > > 2.48.0 > > > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-22 5:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20250120083144epcas2p369584af764b617c3d2cb2a0568a45d6c@epcas2p3.samsung.com>
2025-01-20 8:30 ` [PATCH] mm: slub: Print the broken data before restoring slub Hyesoo Yu
2025-01-21 13:35 ` Hyeonggon Yoo
2025-01-22 3:25 ` Chengming Zhou
2025-01-22 5:42 ` Hyesoo Yu
2025-01-22 5:27 ` Hyesoo Yu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox