From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id 0AEEF6B0022 for ; Wed, 21 Mar 2018 15:19:25 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id j8so3880468qti.23 for ; Wed, 21 Mar 2018 12:19:25 -0700 (PDT) Received: from mx1.redhat.com (mx3-rdu2.redhat.com. [66.187.233.73]) by mx.google.com with ESMTPS id s86si1802722qki.394.2018.03.21.12.19.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Mar 2018 12:19:24 -0700 (PDT) Date: Wed, 21 Mar 2018 15:19:22 -0400 (EDT) From: Mikulas Patocka Subject: Re: [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE In-Reply-To: Message-ID: References: <20180320173512.GA19669@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Christopher Lameter Cc: Matthew Wilcox , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , linux-mm@kvack.org, dm-devel@redhat.com, Mike Snitzer On Wed, 21 Mar 2018, Christopher Lameter wrote: > On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > > So, what would you recommend for allocating 640KB objects while minimizing > > wasted space? > > * alloc_pages - rounds up to the next power of two > > * kmalloc - rounds up to the next power of two > > * alloc_pages_exact - O(n*log n) complexity; and causes memory > > fragmentation if used excesivelly > > * vmalloc - horrible performance (modifies page tables and that causes > > synchronization across all CPUs) > > > > anything else? > > Need to find it but there is a way to allocate N pages in sequence > somewhere. Otherwise mempools are something that would work. There's also continuous-memory-allocator, but it needs its memory to be reserved at boot time. It is intended for misdesigned hardware devices that need continuous memory for DMA. As it's intended for one-time allocations when loading drivers, it lacks the performance and scalability of the slab cache and alloc_pages. > > > > > What kind of problem could be caused here? > > > > > > > > Unlocked accesses are generally considered bad. For example, see this > > > > piece of code in calculate_sizes: > > > > s->allocflags = 0; > > > > if (order) > > > > s->allocflags |= __GFP_COMP; > > > > > > > > if (s->flags & SLAB_CACHE_DMA) > > > > s->allocflags |= GFP_DMA; > > > > > > > > if (s->flags & SLAB_RECLAIM_ACCOUNT) > > > > s->allocflags |= __GFP_RECLAIMABLE; > > > > > > > > If you are running this while the cache is in use (i.e. when the user > > > > writes /sys/kernel/slab//order), then other processes will see > > > > invalid s->allocflags for a short time. > > > > > > Calculating sizes is done when the slab has only a single accessor. Thus > > > no locking is neeed. > > > > The calculation is done whenever someone writes to > > "/sys/kernel/slab/*/order" > > But the flags you are mentioning do not change and the size of the object > does not change. What changes is the number of objects in the slab page. See this code again: > > > s->allocflags = 0; > > > if (order) > > > s->allocflags |= __GFP_COMP; > > > > > > if (s->flags & SLAB_CACHE_DMA) > > > s->allocflags |= GFP_DMA; > > > > > > if (s->flags & SLAB_RECLAIM_ACCOUNT) > > > s->allocflags |= __GFP_RECLAIMABLE; when this function is called, the value s->allocflags does change. At the end, s->allocflags holds the same value as before, but it changes temporarily. For example, if someone creates a slab cache with the flag SLAB_CACHE_DMA, and he allocates an object from this cache and this allocation races with the user writing to /sys/kernel/slab/cache/order - then the allocator can for a small period of time see "s->allocflags == 0" and allocate a non-DMA page. That is a bug. Mikulas