On Mon, 2005-10-17 at 19:25 +0100, Hugh Dickins wrote: > On Mon, 17 Oct 2005, Hugh Dickins wrote: > > On Mon, 17 Oct 2005, Badari Pulavarty wrote: > > > > > > I have been looking at possible ways to extend OVERCOMMIT_ALWAYS > > > to avoid its abuse. > > > > > > Few of the applications (database) would like to overcommit > > > memory (by creating shared memory segments more than RAM+swap), > > > but use only portion of it at any given time and get rid > > > of portions of them through madvise(DONTNEED), when needed. > > > They want this, especially to handle hotplug memory situations > > > (where apps may not have clear idea on how much memory they have > > > in the system at the time of shared memory create). Currently, > > > they are using OVERCOMMIT_ALWAYS system wide to do this - but > > > they are affecting every other application on the system. > > > > > > I am wondering, if there is a better way to do this. Simple solution > > > would be to add IPC_OVERCOMMIT flag or add CAP_SYS_ADMIN to > > > do the overcommit. This way only specific applications, requesting > > > this would be able to overcommit. I am worried about, the over > > > all affects it has on the system. But again, this can't be worse > > > than system wide OVERCOMMIT_ALWAYS. Isn't it ? > > > > mmap has MAP_NORESERVE, without CAP_SYS_ADMIN or other restriction, > > which exempts that mmap from security_vm_enough_memory checking - > > unless current setting is OVERCOMMIT_NEVER, in which case > > MAP_NORESERVE is ignored. > > Having written that, it does seem rather odd that we have a flag > anyone can set to evade that security_ checking. It was okay when > it was just vm_enough_memory, but now it's security_vm_enough_memory, > I wonder if this is a significant oversight, and some CAP required. > Might break things though. CC'ed Chris. > > Ah, there's a security_file_mmap earlier, which could reject the > MAP_NORESERVE flag if it feels so inclined. Perhaps you'll need > to allow a similar opportunity for rejection in your approach. > > Hugh > > > So if you're content to move to the OVERCOMMIT_GUESS world, I > > don't think you could be blamed for adding an IPC_NORESERVE which > > behaves in the same way, without CAP_SYS_ADMIN restriction. > > > > But if you want to move to OVERCOMMIT_NEVER, yet have a flag which > > says overcommit now, you'll get into a tussle with NEVER-adherents. > > > > Hugh > Hugh, As you suggested, here is the patch to add SHM_NORESERVE which does same thing as MAP_NORESERVE. This flag is ignored for OVERCOMMIT_NEVER. I decided to do SHM_NORESERVE instead of IPC_NORESERVE - just to limit its scope. BTW, there is a call to security_shm_alloc() earlier, which could be modified to reject shmget() if it needs to. Is this reasonable ? Please review. Thanks, Badari