[MaraDNS list] MaraDNS and Deadwood updates ; djbdns rant

Sam Trenholme maradns at gmail.com
Sat Jul 21 22:40:55 EDT 2012


>> Linux does not, by
>> default, have malloc() fail; it simply terminates processes that use
>> too much memory.
>
> This is false. Even if you leave overcommit enabled, 32-bit Linux on a
> machine with >3gb of memory will run out of virtual address space
> before it runs out of physical memory and thus malloc will return
> null. This is an extremely common (possibly majority) setup.

This type of configuration is an edge case and does not represent a
typical Linux installation.  My hard limit for 32-bit systems is 2
gigs of memory and it's well known it's not a good idea to have over 2
gigs for a 32-bit OS.  malloc() does not return void on the systems I
develop on.  Processes get killed first.  Linux systems where malloc()
fails do exist, yes, but are not very common.

> I understand that you don't want to spend more time working on MaraDNS
> and that's a valid sentiment. However, I think your handling of the
> issues I've reported can equally be characterized as denialism.

Exactly.  MaraDNS is finished.  I devote, pretty consistently, an hour
every month to keeping it up to date with bugs and that does include
the occasional proactive security update.

> the attitude that MaraDNS has
> undefined behavior under resource exhaustion is very troubling from a
> security standpoint.

It's an issue, yes, but it's not a "very troubling" problem.  It's
only a very troubling problem when there is anything in MaraDNS' code
that allows an attacker to exhaust the memory resources on the machine
running MaraDNS is using.  Whenever I find a memory leak in MaraDNS, I
fix it and am honest about the leak's existence.  Indeed, all of
MaraDNS's CVE reports in the last year have come up because of my own
proactive look at MaraDNS' codebase and fixing security problems I
find.

> I think it would be a lot more fair to users to
> say something along the lines of "This problem exists, and it may be
> serious, but I'm not willing to devote time to fixing it."

This is exactly what I did!  Let's see
http://samiam.org/blog/20120721.html and the head of this thread:

"For the record: MaraDNS terminates upon a malloc() failure.
Deadwood's behavior is undefined should malloc() fail. If anyone is
using MaraDNS in an environment where a kernel allows malloc() to
return a NULL pointer, it is best to wrap MaraDNS is a script that
restarts it when it terminates. If using Deadwood in an environment
where malloc() may return NULL, please replace the dw_malloc() macro
with a function that can properly handle a malloc() failure."

Since I had some spare time today to devote to Deadwood after fixing
up the code around the failure that caused es-us.noticias.yahoo.com to
not resolve a couple of months ago, I went to some effort to replace
all of the malloc() calls with dw_malloc() macros:

http://maradns.org/deadwood/browse-source/head/update/3.2.03/deadwood-3.2.02-dw_malloc.patch

Now that malloc() has been replaced with dw_malloc(), there are three
ways of handling malloc() failures:

* We have dw_malloc() failures terminate the Deadwood process.  I
actually have been meaning to do this for a while.  I can then modify
Duende to restart Deadwood whenever it terminates because of a
malloc() failure (I can give it a special "malloc() puked" exit code).
 I don't entirely like this approach because I can't think of a clean
way of automatically restarting a stopped service on Windows.

* We can have dw_malloc() be a potentially blocking call, and freeze
the entire Deadwood process should a malloc() return NULL, and keep
Deadwood frozen until malloc() succeeds again.   I like this a little
more:  It's fairly easy to implement and Deadwood doesn't fail because
of a temporary malloc() failure, but just does nothing until malloc()
is working again.  It's also a more cross-platform solution.

* Anyone is welcome to look at all 41 places in Deadwood's code where
malloc() is called (now with the dw_malloc() name) and refactor all
that code to handle malloc() failures reasonably gracefully.  I myself
am not about to do this.  If someone does this, don't submit patches
to me -- I'm not accepting patches right now -- but maybe we can get
MaraDNS-ng rolling so that there's a code base others can hack on.
Just remember: I'm not responsible for any bugs any of this
refactoring introduces.

* I'm not even going to think about how to update MaraDNS until
Deadwood has been updated.

Now that that is said, I am not going to devote any more time right
now to looking at handling malloc() returning NULL.  I have spent too
much time on this issue already, and there are other more important
issues with MaraDNS.

Once MaraDNS is fully functional on CentOS 6, and once all known "this
host does not resolve with Deadwood" issues are fixed (they are right
now, knock on wood), I may be able to look in to how to better handle
malloc() returning NULL again.  I will let the list know when and if
that happens.

If it were 2007 again, and I knew then what I know today, I probably
would have done things differently.  [1]  I have never been 100% happy
that the amount of memory Deadwood uses is somewhat unpredictable.
But, Deadwood and MaraDNS are finished so it's water under the bridge
now.

This is my last posting to the mailing list until late August unless either:

* A security hole with a CVE number is discovered.

* Someone on the list sets up a git/cvs/svn/whatever repository for
MaraDNS-ng.  I could very well bless said branch as being the official
successor of MaraDNS.

"This host does not resolve in Deadwood" bug reports are welcome.  I
will not look at any such reports until the end of August, but getting
Deadwood's lingering resolution bugs ironed out is still important to
me.

- Sam

[1] Such as having the LRU cache Deadwood uses to be a fixed-sized
cache, and not perform any malloc()s once the cache is initialized at
system startup time.  This approach has it own issues: Such a scheme
would use more memory, which is somewhat offset because the amount of
memory used is fixed, and such a scheme would have placed hard limits
on the size of binary strings one could store in Deadwood's cache.

I would probably have also set up the non-blocking select() state
machine code keeping track of unfinished DNS queries to use a fixed
amount of memory; that's the other major area where Deadwood does a
lot of malloc()s and free()s of memory.  Doing this for the TCP code
would have been non-trivial.


More information about the list mailing list