[Nobug] memory checking (braindump)

benny.lyons at uniserv.com benny.lyons at uniserv.com
Mon Aug 30 16:40:03 CEST 2010


Hi Christian, 
I wonder if this is the correct way forward for NoBug? or would all of
this 
be more appropriately done under another project, e.g. NoLeak?  (I am
asking
a question here, not making a statement.)  Maybe someone can supply a
few
convincing arguments, I'm just brain dumping here as your email is
interesting
enough to deserve comment.

Here is the thesis against:

--NoBug is small, self-contained and does what it does well.  
  Trying to do the other things is a project probably bigger than NoBug
itself;
  this will also require a lot of work, and never be perfect, as you
correctly 
  point out.
--We already have other tools: valgrind (ok, arguments against valgrind
are
  perfectly valid: it is slow and has holes, as a comparison with Purify
will
  confirm; although Purify also has its problems, apart from being
$-ware).
  Electric Fence (I cannot recall if this is still being maintained,
but, if
  I recall corretly, that is what Electric Fence set out to do.)
  dbx Debugger on OpenSolaris does these things also, but again, not
perfectly.
  There are also a meriad of other tools out there that I can't recall
out
  of the top of my head.
  The point being: do we want another? or would the time be better spent
  improving/using another tool?
  Or put a completely different way: what will NoLeak provide, that
others don't
  already?  (Maybe someone does have some viable suggestions to offer!)
--Such an endevour is error prone: take a look at valgrind, purify,
Electric Fence
  or even the dbx debugger on Solaris (check -memuse, -access)
--Such a project is going to be big and cost & a lot of time.
--This sounds like Electric Fence, i.e.
   >...each allocation should have some guard area around it to detect
   >unintentional writes before and after the object. I'd recommend to
have
   >this guard area at least 1 object size before and after the object
(to
   >catch off-by-1 indirection errors)...
  The point I am trying to make is a question: what's the difference
between 
  Electric Fence and NoLeak? 


...and pro NoLeak
--Yes, it would compliment NoBug adding useful stuff
--It could be kept simple as your API shows.
--So what, about the arguments above, the API below looks simple enough
and
  has a few good ideas, enough to have a go.


Maybe someone out there knows a little more about Electric Fence & other
tools to 
point out a few more arguments pro or con doing a NoLeak or leak checker
in NoBug.

Anyway, the more I think about it the more I like the concept of doing
all of this 
separate from NoBug, but, still and all, make it easy to plug-into
NoBug.  This makes
it more modular, if things in NoLeak (or whatever its called) turn out
to be not 
everyones cup of tea.



Benny

-----Original Message-----
From: nobug-bounces at lists.pipapo.org
[mailto:nobug-bounces at lists.pipapo.org] On Behalf Of Christian Thaeter
Sent: Saturday, August 28, 2010 12:41 AM
To: nobug at lists.pipapo.org
Subject: [Nobug] memory checking (braindump)

hi,

I just want to drop a braindump about memory validation/watching this
time.
Some time ago I already talked with ichthyo about this and we decided we
currently don't need this feature for Lumiera. But eventually it would
be a
nice thing to complete NoBug's featureset, so i will implement it
someday.
Documenting this here may help to work it further out.

First I conclude what kinds of errors such a feature should address:
 * Writing data out of bounds
 * Unintentional writes to valid data
 * Finding memory leaks

Whats the Rationale to do this in NoBug instead of Valgrind? Valgrind
has an
insane overhead, one can not run performance demanding applications
easily
under valgrind. It's up to 20 times slower than running something
natively and
also needs about as much more memory. More importantly, Valgrind has by
design
some blind spots, some stack corruption and other tricks with the stack
are
going unnoticed by valgrind. This is just to point this out. Valgrind is
a
valuable tool I don't want to miss, I am just after filling the gaps and
add
memory debugging to NoBug to gain speed and catch the cases where
Valgrind is
blind. The NoBug way will be explicit, intrusive instrumention, NoBug
will not
hook into the system like valgrind or other tools do. In this regard,
NoBug
will have its own blind spots too, for example use of uninitialized
memory is
rather hard to track down and certainly a strength of valgrind.

Doing memory checks in a intrusive way has some benefits (except for the
usual
drawback that you have to set up this intrusive macros whenever
required).
This means you can add memory checks at higher levels (your object
factories)
as well as on lower levels for example if you implement custom
allocators
(memory pools, garbage collectors, temporary buffers,...).

So lets look at an idea:

each allocation should have some guard area around it to detect
unintentional writes before and after the object. I'd recommend to have
this guard area at least 1 object size before and after the object (to
catch off-by-1 indirection errors). For bulk (array, memory pool,..)
allocations it would suffice to have half (rounded up) times the object
size guard areas around, except that you want to add some more margin
the to the complete memory block where the objects will be in.

Unintentional writes can be detected by calculating a checksum
over the memory block watched and then assert its validity.

Memory is often allocated for some domains/subsystems or hierarchical.
For leak checking you want to be sure that all child allocations are
freed before you free the parent.

Thus follows the following API considerations:

Every allocation will be instrumented to include the guard areas, for
this we need a SIZEOF() which accounts for it. (I using short names
here, names in a real API will differ)

Then every return of an allocation has to be adjusted by the offset if
the user data (except the allocation indicated failure by returning
NULL). So firstly an basic instrumentation would look like:

  Object* myobject = DATAOF( malloc( SIZEOF(*myobject)));

and freeing it by:

 free(BASEOF(myobject));


To implement a registry of all allocations we need a (double) linked
list
chaining up all allocations in random order and a central node being the
entry
to all this nodes. Double linked because we need fast removes on any
node. We
need to store the size of the object, the size of the guard areas and we
have
to store a checksums over the data and guard areas. Further a
nobug_context
should be stored to track down from where the object was allocated.

With some care this metadata can be stored within the the guard areas
before
and after the object to preserve locality and optimize memory usage.
Further
some things can be further optimized:

 * It makes no sense to store a full size_t for the object size,
gigantic
   allocations with (possibly equally large) guard areas in front and
back are
   impractical and rare. I proclaim that 3 bytes (16MB) objects should
be
   enough. Biggier allocation areas should be watched otherwise or we
when we
   have that big allocations having biggier management structures which
store
   the size at another place wont hurt (we can indicate this by storing
0 in
   the ordinary size location).

 * The guard size could be just a factor, then signed char will suffice.
1 to
   127 gives a multiplier, -1 to -128 a (abs) divisor.


Thus follows:

struct before
{
        llist node;
        int32_t guard_size : 8;
        uint32_t allocation_size : 24;
        uint32_t guard_checksum;
};

struct after
{
        uint32_t data_checksum;
        nobug_context context;
};


and wrap a user will have the layout like:
{
	char front_guard[guard_size - sizeof(struct before)];
	struct before front_meta;

	user_data here; /* this is the address of the user data */

	char pad_guard[calculate_alignment_somehow()];
	struct after back_meta;
	char back_guard[guard_size - sizeof(struct after)];
}


Now we can assert that the guard areas didnt got corrupted by
calculating and
comparing the checksum (we need to do this on each llist update touching
a
node). When the data checksuim is set (we define '0' as being unused)
then we
can assert that the data didnt got modified too. This should be done
with an
explicit API, MEMCHECK(object) for checking and FREEZE(object) for
recalculating the checksum after a mutation. Freeing an object should do
the
MEMCHECK too. To release the FREEZE we need a UNFREEZE() too.

Next, we have all allocated memory available in the list, this means we
can
check for potential leaks at the end of the application lifetime (or
inbetween) by doing a garbage collector like conservative scan. Since we
have
no roots (as in a GC) and not all memory/references are necceesary
covered
with the intrusive mechanism this should be taken with a grain of salt,
as
there is no gurantee for it to be exact.



Implications?

There is a gotcha now, when the program runs under valgrind, then
initializing memory guards will prevent valgrind detecting illegal reads
to
it. Valgrind has some hooks to mark memory areas uninitialized but we
possibly just want to disable all this memory checking when running
under valgrind, otherwise this might just decrease valgrinds
performance even more.

Thats it for now, a implementation with more details (log flags and all)
will
follow some day, but don't hold your breath.


       Christian
_______________________________________________
Nobug mailing list
Nobug at lists.pipapo.org
http://lists.pipapo.org/cgi-bin/mailman/listinfo/nobug


More information about the Nobug mailing list