From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B603C61DB3 for ; Thu, 12 Jan 2023 18:22:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEEA98E0002; Thu, 12 Jan 2023 13:22:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E774B8E0001; Thu, 12 Jan 2023 13:22:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D18058E0002; Thu, 12 Jan 2023 13:22:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BF7ED8E0001 for ; Thu, 12 Jan 2023 13:22:00 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8C1581607F1 for ; Thu, 12 Jan 2023 18:22:00 +0000 (UTC) X-FDA: 80346966000.15.A1ADC01 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id 154E4160006 for ; Thu, 12 Jan 2023 18:21:57 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of "SRS0=ApYb=5J=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=ApYb=5J=goodmis.org=rostedt@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673547718; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=Ib3XuI0hY6d096g2cd346oWwMSaAbj+pNejLU7VrlMQ=; b=MXoZd7K6BTJ0pXSmoPO+0/I+0Of1BfW/GEVpJnH43pf85ycybt9Elftt+5yM4U2JIbYf3o xeDQYihB37AEDUOWWG3Mwyzjp17XczN80+YyXatRuwwafs6z65Yk5JS2i1Rl2FaakguR9C ORM9op/z4bfSykchGtpVr8IBcVl8oQQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of "SRS0=ApYb=5J=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=ApYb=5J=goodmis.org=rostedt@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673547718; a=rsa-sha256; cv=none; b=ekb5J9Tt6K9CI6MwHwbngxCrGjERzB/P1eP9tTyCicW+rjEkFqt58bfvAoYfhgMZyY3W8O 06Undy87WT2F7bDMS4vPwckI0xOw1+FxRfdv7ARbVn54ekrI56tkUWtqcnLNpMb8KtWDw7 UHJ06N+KQf8Dyz6Z0Wgh1/e8g4JP3aM= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B5B3A6212D; Thu, 12 Jan 2023 18:21:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 505F2C433D2; Thu, 12 Jan 2023 18:21:55 +0000 (UTC) Date: Thu, 12 Jan 2023 13:21:53 -0500 From: Steven Rostedt To: lsf-pc@lists.linux-foundation.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Joel Fernandes , Brian Norris , Ching-lin Yu Subject: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance Message-ID: <20230112132153.38d52708@gandalf.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 154E4160006 X-Stat-Signature: morcrayexo1nfzc5dec1wy4yaxpnykrb X-Rspam-User: X-HE-Tag: 1673547717-870699 X-HE-Meta: U2FsdGVkX1+oQIDmLVpWMJr7FWEN9i6IJl7NQz8U/xpWSh+Qxbdr1444SwURadaTAJYIToc4Mrr5CREitpKPU6EiNw/90iAPCDwKCwUNGpJVQhycZfQrnq6qZCpkAPIXKOYzyUO07i9bxL2BdrQLkktH/Z/5X3ShXrSdrO+lpm0YbeADLfAVqyOvcyHjRABFuKWVqS5ikiX0+MBuuQAHv75KE9cfV9DEjSf51nuHRgw8Be6CewBiyxX9AwkYUknjLM6Dj0wTEVdfGZijZTpy4hZ9IsEinG3rgVFnWhUTGgeKTxIOj/oFE+fFU1jMgbagAALMgXGnXksFC/W8aFgd69URf0L3zBcNIsl3acanCOx1ENB6RAFYNy4wg7QnkkGTyq4G2C9cx19cWz4uqGHFcC7GOIYyK/3eP8itVfuA3jvdUCgJhW82mVjGXZx4h68HGnRnYcSYoHWfAbOksxJRXQKMUqqK8/7YTrXo9aJEYfDkFgVJ/kZMxKCszLG4uepUjlkro/rLLa/r6XlxeJ0mQmyFeWa10m6CmSD4BZBCZNO4iqIMXVbKbLg/n7XW1Qcg26ScApWFvU5QU5Jw1UHht9XioSmH2AtCstyWBOFwlAULVbxhGb7Umya55H0WGWglnU5GmBQ9FKTpGi9mXrEAr2uwJfpg3xxE0i7rp0C6AZKMchKjVm+Ub1WscrShdnBqbLKa3G0pSBnzzkOjsEmxeEzsoRizTtyIER+fom4Nn7+FMVfX71SeUvoMDnK3bNedcaYnpUveojM8dnrAO8dX5fdyqtHfnpcp5on3R17INQSz63vRO+mLejKORok+sa8BP59rp0n7c62Jy7c2FdvCIsyom/ctBivz549SHAvzMpA+dFRWQ69GadSpHyNepUfPmspzvUmfetS2XsDRjdtDXXWEHCF1Zmn/HO3g+/2r9KJw2aeAgQ1OdfY//5aalayaVN3MBR9KLUCokuvd0C8 MJgJHGRE 2SdnqeHu5RkCFO2dn/oaK3DY0jtpnNFCPzKHcJwwcD+417GaadvYDjijK3ObVNdusZmrQ4sXN9JGacHAxl73TTG5lEq7G0BIATukw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Title: Tracing mapped pages for quicker boot performance Description: ChromeOS currently uses ureadahead that will periodically trace files that are opened by the processes during the boot sequence. Then it will use this information to call the readahead() system call in order to prefetch pages before they are needed and speed up the applications. We have seen upward towards 60% (and even higher is certain cases) performance gains when it's working properly. The ureadahead program comes from Canonical, and has not been updated since 2009 (although we've been adding patches on top of it since). https://launchpad.net/ubuntu/+source/ureadahead The only changes Ubuntu has been doing with it is forward porting it to the next release. But no code actually has changed. The 0.100.0 release was last done in 2009. Another problem with ureadahead is that it requires kernel modifications. It adds in two tracepoints into the open paths so that it can see what files have been opened (and it doesn't handle relative paths). These tracepoints have been rejected upstream. We've been carrying them in our ChromeOS kernel to use ureadahead. ureadahead only looks at the files that are opened during boot, and then reads the extents to see what parts of the file are interesting. It stores this information into a "pack" file. Then on subsequent boots, instead of tracing, it reads the pack file, calls the readahead() system call on the locations it has in that pack file, to make sure they are in cache when the applications need them. One issue is that it can pick too much of the file, where it's reading ahead portions of the file that will never be read, and hence, waste system resources. I've been looking into other approaches. I wrote a simple program that reads the page_fault_user trace event, and every time it sees a new PID, it reads the /proc//maps file. And using the page fault trace event's address, it can see exactly where in the file it is mapped to. There's several issues with this approach. The main one being the race condition between reading the pid and the /proc//maps file. As the pid may no longer exist, or it does an exec where the page faults no longer map to the right location. But even with that, it does surprisingly well (especially since we care more about long running applications than short ones). https://rostedt.org/code/file-mapping.c The above is just a toy application that tries this out, but could be used as a starting point to replace ureadahead. What I would like to discuss, is if there could be a way to add some sort of trace events that can tell an application exactly what pages in a file are being read from disk, where there is no such races. Then an application would simply have to read this information and store it, and then it can use this information later to call readahead() on these locations of the file so that they are available when needed. Note, in our use case boot ups do not change much. But I'm sure this could be useful for other distributions. This topic will require coordination with File systems, Storage, and MM. I'm also open to having BPF help with this. One issue I want to make sure we avoid, is any ABI we come up with that will hinder development later on. -- Steve