From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id D82096B004A for ; Wed, 22 Sep 2010 10:24:33 -0400 (EDT) Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e8.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o8ME5ecL024205 for ; Wed, 22 Sep 2010 10:05:40 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o8MEOVJi118130 for ; Wed, 22 Sep 2010 10:24:31 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o8MEOOv1006073 for ; Wed, 22 Sep 2010 08:24:30 -0600 Message-ID: <4C9A0F8F.2030409@austin.ibm.com> Date: Wed, 22 Sep 2010 09:15:43 -0500 From: Nathan Fontenot MIME-Version: 1.0 Subject: [PATCH 0/8] De-couple sysfs memory directories from memory sections Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@ozlabs.org Cc: Greg KH , KAMEZAWA Hiroyuki , Dave Hansen List-ID: This set of patches decouples the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are performance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. In both cases a new file 'end_phys_index' is created in each memoryXXX directory. This file will contain the physical id of the last memory section covered by the sysfs directory. For the default case, the value in 'end_phys_index' will be the same as in the existing 'phys_index' file. -Nathan Fontenot -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org