From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEC5BC352A4 for ; Wed, 12 Feb 2020 22:47:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A52E320848 for ; Wed, 12 Feb 2020 22:47:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="VraL1jFv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A52E320848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4FCD86B04BC; Wed, 12 Feb 2020 17:47:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 485DE6B04BD; Wed, 12 Feb 2020 17:47:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 326546B04BE; Wed, 12 Feb 2020 17:47:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 182AC6B04BC for ; Wed, 12 Feb 2020 17:47:39 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E6EAF8248047 for ; Wed, 12 Feb 2020 22:47:37 +0000 (UTC) X-FDA: 76482963354.25.bee64_5fa77d300a627 X-HE-Tag: bee64_5fa77d300a627 X-Filterd-Recvd-Size: 6197 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Wed, 12 Feb 2020 22:47:37 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 01CMgW0I095136; Wed, 12 Feb 2020 22:47:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : mime-version : content-type; s=corp-2020-01-29; bh=71b76M9Y3QKH1M395tUOONbUCeKYwAXfCA3BiZYAfv8=; b=VraL1jFvLTzGfK3ogwfUmf5Oh5Y0g27ANJdk1quUgHJr8G0mTG54O9IOorSTCDLJblyV nJgS2zlkRhn++gfrlXpkm4OUmiCybcBQOr0Avz7coLJkr1+K0+vb1hn5sek3vrUq4WvG y205XafmR74HMfXevbZZN0xQepdyh37oQZRx/uvuZAKTGYYvnRRg3EfenUfMT90Psyg8 o8U446ten/+5jBg3sPAUl94uLo+tufqK60toL+EdYrpzV7NCEuXbO273Qx49hrQitYP0 pwfH/I8GlMd97xaNUvOSrgxcufjNPaXuWXDFPC1TuHRigMkmjJ1iuA8bvl7IqW5cPrzB jg== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2y2k88e56h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 12 Feb 2020 22:47:20 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 01CMfhgC022253; Wed, 12 Feb 2020 22:47:19 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2y4k7xhkrn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Feb 2020 22:47:19 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 01CMlFU9016693; Wed, 12 Feb 2020 22:47:15 GMT Received: from ca-dmjordan1.us.oracle.com (/10.211.9.48) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 12 Feb 2020 14:47:15 -0800 Date: Wed, 12 Feb 2020 17:47:31 -0500 From: Daniel Jordan To: lsf-pc@lists.linuxfoundation.org Cc: linux-mm@kvack.org, Dan Williams , Dave Hansen , Tim Chen , Mike Kravetz , Herbert Xu , Steffen Klassert , Tejun Heo , Peter Zijlstra , Alex Williamson , Daniel Jordan Subject: [LSF/MM/BPF TOPIC] kernel multithreading with padata Message-ID: <20200212224731.kmss6o6agekkg3mw@ca-dmjordan1.us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: NeoMutt/20180716 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9529 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 adultscore=0 suspectscore=0 mlxscore=0 bulkscore=0 malwarescore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002120155 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9529 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 lowpriorityscore=0 suspectscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 mlxscore=0 malwarescore=0 impostorscore=0 clxscore=1011 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002120155 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: padata has been undergoing some surgery over the last year[0] and now seems ready for another enhancement: splitting up and multithreading CPU-intensive kernel work. Quoting from an earlier series[1], the problem I'm trying to solve is A single CPU can spend an excessive amount of time in the kernel operating on large amounts of data. Often these situations arise during initialization- and destruction-related tasks, where the data involved scales with system size. These long-running jobs can slow startup and shutdown of applications and the system itself while extra CPUs sit idle. Here are the current consumers: - struct page init (boot, hotplug, pmem) - VFIO page pinning (kvm guest init) - fallocating a hugetlb file (database shared memory init) On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s), and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a hugetlb file that occupy a significant fraction of memory. This work results in 7-20x speedups and is currently increasing the uptime of our production kernels. Future areas include munmap/exit, umount, and __ib_umem_release. Some of these need coarse locks broken up for multithreading (zone->lock, lru_lock). Positive outcomes for the session would be... - Finding a strategy for capping the maximum number of threads in a job. - Agreeing on a way for the job's threads to respect resource controls. In the past few weeks I've been thinking about whether remote charging in the CPU controller is feasible (RFD to come), am also considering creating workqueue workers directly in cgroup-specific pools instead, and have proposed migrating workers in and out of cgroups before[2]. There's also memory policy and sched_setaffinity() to think about. - Checking the overall design of this thing with the mm community, given that current users are all mm-related. - Getting advice from others (hallway track) on why some pmem devices perform better than others under multithreading. This work-in-progress branch shows what it looks like now. git://oss.oracle.com/git/linux-dmjordan.git padata-mt-wip-v0.2 https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-wip-v0.2 [0] https://lore.kernel.org/linux-crypto/?q=s%3Apadata+d%3A20190212..20200212 [1] https://lore.kernel.org/lkml/20181105165558.11698-1-daniel.m.jordan@oracle.com/ [2] https://lore.kernel.org/lkml/20190605133650.28545-1-daniel.m.jordan@oracle.com/