From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF0E2D60CEC for ; Mon, 18 Nov 2024 21:56:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 435506B0085; Mon, 18 Nov 2024 16:56:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BE5F6B008A; Mon, 18 Nov 2024 16:56:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25F786B008C; Mon, 18 Nov 2024 16:56:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 050B86B0085 for ; Mon, 18 Nov 2024 16:56:04 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A3AA5140643 for ; Mon, 18 Nov 2024 21:56:04 +0000 (UTC) X-FDA: 82800572736.26.8E42E14 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf03.hostedemail.com (Postfix) with ESMTP id 0F60620004 for ; Mon, 18 Nov 2024 21:55:39 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jMGcscfn; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731966903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KcPakxF9jKHtjtmBhR2QQPFYJaNPHuWfmL+ZcLJedzc=; b=H06stqRkbnq5ZK5xl/82t8ydieNsV3Zo1PABIk9msggo1YrtG8niQmpSmujwp6a0TeJTzm tzYfwcFzWUN6wW3YXNYMiaoiq+oF2oR00V54sBG9p40s7wcsdblrkTJPQl/gMW8MCD9rwA UmvTqDfRdsbDG2twk3V103ik+vll3wk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731966903; a=rsa-sha256; cv=none; b=B7iDKj5nfa+xp17X6WkyWOsqvnXD+3X06O6N7YcW0IJCfVGo/tr13COJJZqkGDpkvL1x/Z 4KBNtTlm+bvOoWFWVL4bm4kByLGN0bSsiaJpq/jDLIwj0O1KMFMoIKAOgEPuzG9t3GrRyh auDFQNNpVsjynin+aVDB/MSxV4KxiYQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jMGcscfn; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5cfb81a0af9so3361a12.0 for ; Mon, 18 Nov 2024 13:56:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731966961; x=1732571761; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KcPakxF9jKHtjtmBhR2QQPFYJaNPHuWfmL+ZcLJedzc=; b=jMGcscfn8NreAub+SC+spZz218KHOTuq6P8NAT0qTCKah7VaLGfwf3Q8ZLDL6Z4mDP QHlPbJBdPL+nVw5qJEfVWjcmBnI1gr8/pgopR/PtxBuMEkH+Qj+YxUj8Z2enFvZ4Dg7E JzybX7NKqjHwlSjoqs/VjE1MOg47CueTyZU7zQdSEaUF2ccE/KTZjFtwVik/2J2PS36v k+tpcfNdivQmnNg3fyPWyqysw5+oDAX7oqshF+i2H0CFe2onHfLaZvJHTPRovpMYlzcf Q8FnlZrKh6KU1w2UZuZtlDRfw5H8y1ggBsf9PAZKtrUUb/GfivU8Gks5TThuKzruKHgi ohtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731966961; x=1732571761; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KcPakxF9jKHtjtmBhR2QQPFYJaNPHuWfmL+ZcLJedzc=; b=v/TiESBBNXngFKnRIJnzBolP3FIGvejEBySGX/ekeDJLQxiJ+nJuMra0CUG8pEwT9u J+SPqdUZZcVIZNaeJSMz/eRdRSJHp+tbvffjOg3Ugv/t3NukqgELcWpdj/t6pwsk6rY7 RIV4keZvvvy1Jr6rnSS85/GzZ4edHyhhqg9oiymLwxyla/MP2BS6q5fSMWCOUoF8Bbyd LSW1J3INdRloHYEiPbtaRllIlIciHC7qMOQfi+9rd74LV0mOs4mna/mqMQsHU17porkI 4RaTz8j2MN6t+ntrU7USWc+ZSLgJrhwI/j0aRGMHqadsFYlTpA/TQuY55MywHRi8G+CL 08TQ== X-Forwarded-Encrypted: i=1; AJvYcCWGu8LuUFL3DyMGjJnd5crpiWFSWNFvKPCUVwQfx1JE6SYkQQax/kTnyfajRVAcDtAZSdY5IJUbqw==@kvack.org X-Gm-Message-State: AOJu0Yy6zm7pOe8Yn23c/ZYQtkfdvGgUIeuIu/1ti8m1VMD4ZCF25qd9 tk8cSwGqKrXApb8bV7pzRN7CCvuk5GXz5EJVAIQ9hgxvjBIrDfA5Qm1Ch6dZMb92z52Rq5i7LzG GfQdFJI9NH3LFH9iadj0I7bqTtt2N5MZlqpXF X-Gm-Gg: ASbGncs2BaS3gmEJZfANobbk99FeE0bZpI5coRiasKfG9qMDPSBw4IR1/YBXh+fqDTP mrflEXRg28GZSGbofr4e84VqshgybMtiayjTtT44auDRYRNC8fF1idyUEnDk/ X-Google-Smtp-Source: AGHT+IEmQBLeCGEVSsJbyVXTx3ofqhXea4Wqz/7DGUro15QCcR8ixmXK0vAE1L66ZK1G6ThZtJ4ySWriaqF8oPGf1vk= X-Received: by 2002:aa7:dac4:0:b0:5cf:4994:501d with SMTP id 4fb4d7f45d1cf-5cfdfc082ddmr14414a12.3.1731966960689; Mon, 18 Nov 2024 13:56:00 -0800 (PST) MIME-Version: 1.0 References: <20241116175922.3265872-1-pasha.tatashin@soleen.com> <20241116175922.3265872-5-pasha.tatashin@soleen.com> In-Reply-To: <20241116175922.3265872-5-pasha.tatashin@soleen.com> From: Jann Horn Date: Mon, 18 Nov 2024 22:55:24 +0100 Message-ID: Subject: Re: [RFCv1 4/6] misc/page_detective: Introduce Page Detective To: Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, akpm@linux-foundation.org, corbet@lwn.net, derek.kiernan@amd.com, dragan.cvetic@amd.com, arnd@arndb.de, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, tj@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, shuah@kernel.org, vegard.nossum@oracle.com, vattunuru@marvell.com, schalla@marvell.com, david@redhat.com, willy@infradead.org, osalvador@suse.de, usama.anjum@collabora.com, andrii@kernel.org, ryan.roberts@arm.com, peterx@redhat.com, oleg@redhat.com, tandersen@netflix.com, rientjes@google.com, gthelen@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: cr744cap5ouz86e8spxx911e51hsh89x X-Rspam-User: X-Rspamd-Queue-Id: 0F60620004 X-Rspamd-Server: rspam02 X-HE-Tag: 1731966939-211131 X-HE-Meta: U2FsdGVkX19CRVOty8oMA9EzBxjME+mDcg9iSaOh9h/s0iVhO4ioQZAAcj4scKgFFQDHZS9RVN/qYGZguHh+ryHzMjuKN/sqXWT8gpLDn3gmuM/oA2Pm65l48ohEitC+Fb6Kv3lOLof82tnyTGWp4Gg2zHB8IpW/C2xtLb/zJ5OefjM4zODzAkkcdtEJ637DexzdCcqDXBWcPgllF0W3SXDTooAXKEaPwfNjKWhdgwNO7ppYEFvXf0uvmNQl1/Mivs8/NxM9RvdvNYZjF7P5YrNJ5k7hrqTCKO3rEgcSFS+2HLHWd7DVH8uaSR4vf/zoFygkmSRu9En62XIT+DjLY24zb9ahnl8VbZjD012iUszrGfC0X9yk9tzSHV/Pmk0SRTmjFamjMyxJJARFFxbDTtQUoD2+DvMF4bFY2c2AX+MeHzeaEYwb+AYC5D07rThSEGSTDwQx521SJSym9dpL8/YIJBsnKX9/J8s/C1jLk6Ha3AHWy13s7l+145ELMwba9RuMfmQkh/LuSmLAHTRhfFqsXjJihemKf3c+2Z+EpvpOX0gUpdLNs93ne7P8iFC4a1q0OVmgFKJOjr37ywvDXcFhMfEOz9kVSXNyEUk5NX2EomJRYddwiv4kJfVTsfzGEuZ+BXa2W14vdbs7j0i3QVsis95JCT+5WTpoyXV5HmK7+ta6MjWaqnSoJCSLYo0AkneqaOwumj785PORRnA/XNlRXZK1Deuq4exYOReRoY2ccobOcbkzGr5xXSXea23dmP4Kyo/Yh24ndUfTa0uEGMvIifv5cdYqBW7p9ijQwr4wlUElyBAHjTok4IhXXWU2uMnY+1uN1Eozxj8DoeXZ1AViB3/z1aQcdqnfIbCR1q+Hl6eXtKugeABta/LTShXowMfU7q1M1Ep0kGpPPmS01eWTkXy3PKx4qzUjHNSW7jj3K+6w58ltpY3vkhgVzkDIk3WgsHJBaIkfDA86V7k JY72WLI/ RDphU+jMEbDMKYY5e+9RaABVF0yjSchWHqh60KM8wjBV2qv9U0YZz9V1pTZlKBv9n6y0R8AtKnYNOy/Yw/9m8YavfelDlwY6960VLiW7Yp5hyS16wDpdA7yA9pSb8L8XzbR8qS+c/W6r4VwsZnT4Uh12dNtSIN5EyqONEwpz6Fn/cnOBI/iUIUcmUNIWPLXgnp8OAuZQMjNLCUcyKdT681EV+bUEob+SyfLgxClJwQHs2DPIJ5zez2Scbu/y54f4MTLT5Hm4oNt8XNtxSyS3cCIDUa8atNnofgeDwxXLWA1mr0CE16/7NVgR87o8HETAiU11uZTO7azhLi73EcK8R04PpqSeZAtgU9GJv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Nov 16, 2024 at 6:59=E2=80=AFPM Pasha Tatashin wrote: > Page Detective is a kernel debugging tool that provides detailed > information about the usage and mapping of physical memory pages. > > It operates through the Linux debugfs interface, providing access > to both virtual and physical address inquiries. The output, presented > via kernel log messages (accessible with dmesg), will help > administrators and developers understand how specific pages are > utilized by the system. > > This tool can be used to investigate various memory-related issues, > such as checksum failures during live migration, filesystem journal > failures, general segfaults, or other corruptions. [...] > +/* > + * Walk kernel page table, and print all mappings to this pfn, return 1 = if > + * pfn is mapped in direct map, return 0 if not mapped in direct map, an= d > + * return -1 if operation canceled by user. > + */ > +static int page_detective_kernel_map_info(unsigned long pfn, > + unsigned long direct_map_addr) > +{ > + struct pd_private_kernel pr =3D {0}; > + unsigned long s, e; > + > + pr.direct_map_addr =3D direct_map_addr; > + pr.pfn =3D pfn; > + > + for (s =3D PAGE_OFFSET; s !=3D ~0ul; ) { > + e =3D s + PD_WALK_MAX_RANGE; > + if (e < s) > + e =3D ~0ul; > + > + if (walk_page_range_kernel(s, e, &pd_kernel_ops, &pr)) { I think which parts of the kernel virtual address range you can safely pagewalk is somewhat architecture-specific; for example, X86 can run under Xen PV, in which case I think part of the page tables may not be walkable because they're owned by the hypervisor for its own use? Notably the x86 version of ptdump_walk_pgd_level_core starts walking at GUARD_HOLE_END_ADDR instead. See also https://kernel.org/doc/html/latest/arch/x86/x86_64/mm.html for an ASCII table reference on address space regions. > + pr_info("Received a cancel signal from user, whil= e scanning kernel mappings\n"); > + return -1; > + } > + cond_resched(); > + s =3D e; > + } > + > + if (!pr.vmalloc_maps) { > + pr_info("The page is not mapped into kernel vmalloc area\= n"); > + } else if (pr.vmalloc_maps > 1) { > + pr_info("The page is mapped into vmalloc area: %ld times\= n", > + pr.vmalloc_maps); > + } > + > + if (!pr.direct_map) > + pr_info("The page is not mapped into kernel direct map\n"= ); > + > + pr_info("The page mapped into kernel page table: %ld times\n", pr= .maps); > + > + return pr.direct_map ? 1 : 0; > +} > + > +/* Print kernel information about the pfn, return -1 if canceled by user= */ > +static int page_detective_kernel(unsigned long pfn) > +{ > + unsigned long *mem =3D __va((pfn) << PAGE_SHIFT); > + unsigned long sum =3D 0; > + int direct_map; > + u64 s, e; > + int i; > + > + s =3D sched_clock(); > + direct_map =3D page_detective_kernel_map_info(pfn, (unsigned long= )mem); > + e =3D sched_clock() - s; > + pr_info("Scanned kernel page table in [%llu.%09llus]\n", > + e / NSEC_PER_SEC, e % NSEC_PER_SEC); > + > + /* Canceled by user or no direct map */ > + if (direct_map < 1) > + return direct_map; > + > + for (i =3D 0; i < PAGE_SIZE / sizeof(unsigned long); i++) > + sum |=3D mem[i]; If the purpose of this interface is to inspect pages in weird states, I wonder if it would make sense to use something like copy_mc_to_kernel() in case that helps avoid kernel crashes due to uncorrectable 2-bit ECC errors or such. But maybe that's not the kind of error you're concerned about here? And I also don't have any idea if copy_mc_to_kernel() actually does anything sensible for ECC errors. So don't treat this as a fix suggestion, more as a random idea that should probably be ignored unless someone who understands ECC errors says it makes sense. But I think you should at least be using READ_ONCE(), since you're reading from memory that can change concurrently. > + if (sum =3D=3D 0) > + pr_info("The page contains only zeroes\n"); > + else > + pr_info("The page contains some data\n"); > + > + return 0; > +} [...] > +/* > + * print information about mappings of pfn by mm, return -1 if canceled > + * return number of mappings found. > + */ > +static long page_detective_user_mm_info(struct mm_struct *mm, unsigned l= ong pfn) > +{ > + struct pd_private_user pr =3D {0}; > + unsigned long s, e; > + > + pr.pfn =3D pfn; > + pr.mm =3D mm; > + > + for (s =3D 0; s !=3D TASK_SIZE; ) { TASK_SIZE does not make sense when inspecting another task, because TASK_SIZE depends on the virtual address space size of the current task (whether you are a 32-bit or 64-bit process). Please use TASK_SIZE_MAX for remote process access. > + e =3D s + PD_WALK_MAX_RANGE; > + if (e > TASK_SIZE || e < s) > + e =3D TASK_SIZE; > + > + if (mmap_read_lock_killable(mm)) { > + pr_info("Received a cancel signal from user, whil= e scanning user mappings\n"); > + return -1; > + } > + walk_page_range(mm, s, e, &pd_user_ops, &pr); > + mmap_read_unlock(mm); > + cond_resched(); > + s =3D e; > + } > + return pr.maps; > +}