在 2025/7/23 17:22, Lorenzo Stoakes 写道: > On Wed, Jul 23, 2025 at 05:14:19PM +0800, Xuanye Liu wrote: >> 在 2025/7/23 17:10, Xuanye Liu 写道: >>> 在 2025/7/23 16:42, David Hildenbrand 写道: >>>> On 23.07.25 10:05, David Hildenbrand wrote: >>>>> On 23.07.25 09:45, Xuanye Liu wrote: >>>>>> 在 2025/7/23 15:31, Kees Cook 写道: >>>>>>> On Wed, Jul 23, 2025 at 03:23:49PM +0800, Xuanye Liu wrote: >>>>>>>> The check_mm() function verifies the correctness of rss counters in >>>>>>>> struct mm_struct. Currently, it only prints an alert when a bad >>>>>>>> rss-counter state is detected, but lacks sufficient context for >>>>>>>> debugging. >>>>>>>> >>>>>>>> This patch adds a dump_stack() call to provide a stack trace when >>>>>>>> the rss-counter state is invalid. This helps developers identify >>>>>>>> where the corrupted mm_struct is being checked and trace the >>>>>>>> underlying cause of the inconsistency. >>>>>>> Why not just convert the pr_alert to a WARN? >>>>>> Good idea! I'll gather more feedback from others and then update to v2. >>>>> Makes sense to me. >>>> After discussion this with Lorenzo off-list, isn't the stack completely misleading/useless in that case? >>>> >>>> Whatever caused the RSS counter mismatch (e.g., unmapped the wrong pages, missed to unmap pages) quite possibly happened in different context, way way earlier. >>>> >>>> Why would you think the stack trace would be of any value when destroying an MM (__mmdrop)? >>>> >>>> Having that said, I really hate these "pr_*("BUG: ...") with passion. Probably we'd want to invoke the panic_on_warn machinery, because something unexpected happened. >>>> >>> The stack trace dumped here may indeed not reflect the root cause —— >>> the actual error could have occurred much earlier, for example during a >>> failed or missing page map/unmap operation. >>> The current stack (e.g., in __mmdrop() or exit_mmap()) is merely part >>> of the cleanup phase. >> Dumping the stack still has some chance of helping identify the issue — at the very least, it >> shows which task triggered the check. > The stack will be actively misleading because it's highly likely to be totally > unrelated. > > if you want to know the task, just output current->comm :) > > I think it's not only of no value, it's _ACTIVELY_ misleading. So it's > definitely a no to a dump_stack(). > > I am also not in favour of a WARN_ON() for the same reason. > > Really we should be catching these elsewhere. > > If you want to send the patch just outputting thet ask then all good. we can start by adding |current->comm| and |task_pid_nr(current)| to help identify the triggering task.  As for possible detection or monitoring mechanisms, we can continue the discussion. > >>> Given that, how should we go about identifying the root cause when such an issue occurs? >>> >>> Is there any existing way to trace it more effectively, or could we introduce a new mechanism >>> to monitor and detect these inconsistencies earlier? >>> >>> Let’s brainstorm possible solutions together. >>> >> -- >> Thanks, >> Xuanye >> -- Thanks, Xuanye