[MaraDNS list] MaraDNS and Deadwood updates ; djbdns rant

Rich Felker dalias at aerifal.cx
Sun Jul 22 18:08:20 EDT 2012


On Sat, Jul 21, 2012 at 10:40:55PM -0400, Sam Trenholme wrote:
> >> Linux does not, by
> >> default, have malloc() fail; it simply terminates processes that use
> >> too much memory.
> >
> > This is false. Even if you leave overcommit enabled, 32-bit Linux on a
> > machine with >3gb of memory will run out of virtual address space
> > before it runs out of physical memory and thus malloc will return
> > null. This is an extremely common (possibly majority) setup.
> 
> This type of configuration is an edge case and does not represent a
> typical Linux installation.

I don't think I've seen or heard of desktop or server Linux system
with less than 4-8 gigs of total ram+swap in the past 5+ years, except
in the minimalist/old-hardware-enthusiast circles I frequent. :-)

> My hard limit for 32-bit systems is 2
> gigs of memory and it's well known it's not a good idea to have over 2
> gigs for a 32-bit OS.

Citation needed. I've never heard this recommendation, especially not
if you're counting ram+swap where the classic (and very misguided)
recommendation is to have swap size equal to 2x physical ram size.

> malloc() does not return void on the systems I
> develop on.  Processes get killed first.  Linux systems where malloc()
> fails do exist, yes, but are not very common.

Any properly deployed server will be such a system.

> > the attitude that MaraDNS has
> > undefined behavior under resource exhaustion is very troubling from a
> > security standpoint.
> 
> It's an issue, yes, but it's not a "very troubling" problem.  It's

What I meant was "very troubling" was the idea that "it could do
anything when memory runs out" rather than "it will do something bad
when memory runs out, but the badness is limited to the server
exiting and not any sort of privilege elevation, information leak, or
malicious data injection". I took your "undefined behavior" phrasing
as meaning the former.

> Now that malloc() has been replaced with dw_malloc(), there are three
> ways of handling malloc() failures:
> 
> * We have dw_malloc() failures terminate the Deadwood process.  I
> actually have been meaning to do this for a while.  I can then modify
> Duende to restart Deadwood whenever it terminates because of a
> malloc() failure (I can give it a special "malloc() puked" exit code).
>  I don't entirely like this approach because I can't think of a clean
> way of automatically restarting a stopped service on Windows.

If memory was exhausted, restarting the process will probably be
impossible, meaning your nameserver goes down semi-permanently. This
is why a server that's robust against DoS should handle allocation
failure gracefully (for example, dropping caches) or better yet not do
any essential allocations after startup; once you give up the
existence of your process there's no guarantee you'll get it back.

> * We can have dw_malloc() be a potentially blocking call, and freeze
> the entire Deadwood process should a malloc() return NULL, and keep
> Deadwood frozen until malloc() succeeds again.   I like this a little
> more:  It's fairly easy to implement and Deadwood doesn't fail because
> of a temporary malloc() failure, but just does nothing until malloc()
> is working again.  It's also a more cross-platform solution.

This will help if the failure is caused by commit charge exhaustion
(other apps), but not if it's caused by exhaustion of the process's
virtual address space or vm size ulimit, unless there are other
threads running that might free some memory.

> * Anyone is welcome to look at all 41 places in Deadwood's code where
> malloc() is called (now with the dw_malloc() name) and refactor all
> that code to handle malloc() failures reasonably gracefully.  I myself
> am not about to do this.  If someone does this, don't submit patches
> to me -- I'm not accepting patches right now -- but maybe we can get
> MaraDNS-ng rolling so that there's a code base others can hack on.
> Just remember: I'm not responsible for any bugs any of this
> refactoring introduces.

This is definitely The Right Thing, but I respect that you don't have
the time/motivation to do it or even review the patches if somebody
else does it. You don't owe anybody your time.

> [1] Such as having the LRU cache Deadwood uses to be a fixed-sized
> cache, and not perform any malloc()s once the cache is initialized at
> system startup time.  This approach has it own issues: Such a scheme
> would use more memory, which is somewhat offset because the amount of
> memory used is fixed, and such a scheme would have placed hard limits
> on the size of binary strings one could store in Deadwood's cache.

If it were my project, this is definitely the approach I'd go with. Or
possibly a hybrid approach where the LRU cache would be enlarged (up
to a sane limit) if possible before discarding old data, but where the
failure to enlarge would not be a fatal error and would just cause the
smaller existing cache to keep getting (re)used.

Rich


More information about the list mailing list