[Nobug] memory checking (braindump)

Mon Aug 30 19:48:21 CEST 2010

On Mon, 30 Aug 2010 16:40:03 +0200
<benny.lyons at uniserv.com> wrote:

> Hi Christian, 
> I wonder if this is the correct way forward for NoBug? or would all of
> this 
> be more appropriately done under another project, e.g. NoLeak?  (I am
> asking
> a question here, not making a statement.)  Maybe someone can supply a
> few
> convincing arguments, I'm just brain dumping here as your email is
> interesting
> enough to deserve comment.
> 
> Here is the thesis against:
> 
> --NoBug is small, self-contained and does what it does well.  
>   Trying to do the other things is a project probably bigger than
> NoBug itself;
>   this will also require a lot of work, and never be perfect, as you
> correctly 
>   point out.
> --We already have other tools: valgrind (ok, arguments against
> valgrind are
>   perfectly valid: it is slow and has holes, as a comparison with
> Purify will
>   confirm; although Purify also has its problems, apart from being
> $-ware).
>   Electric Fence (I cannot recall if this is still being maintained,
> but, if
>   I recall corretly, that is what Electric Fence set out to do.)
>   dbx Debugger on OpenSolaris does these things also, but again, not
> perfectly.
>   There are also a meriad of other tools out there that I can't recall
> out
>   of the top of my head.
>   The point being: do we want another? or would the time be better
> spent improving/using another tool?
>   Or put a completely different way: what will NoLeak provide, that
> others don't
>   already?  (Maybe someone does have some viable suggestions to
> offer!) --Such an endevour is error prone: take a look at valgrind,
> purify, Electric Fence
>   or even the dbx debugger on Solaris (check -memuse, -access)
> --Such a project is going to be big and cost & a lot of time.
> --This sounds like Electric Fence, i.e.
>    >...each allocation should have some guard area around it to detect
>    >unintentional writes before and after the object. I'd recommend to
> have
>    >this guard area at least 1 object size before and after the object
> (to
>    >catch off-by-1 indirection errors)...
>   The point I am trying to make is a question: what's the difference
> between 
>   Electric Fence and NoLeak? 

I agree with you, such a feature would be mostly a "because we can"
kind of thing to complement NoBug. I was thinking about splitting the
NoBug library up into different parts which then can be conditionally
linked. Someone asked me on FrOSCon about NoBug for embedded systems,
isolating the NoBug Subsystems might be a benefit there. This has to
be decided. But implementation wise this memory checker would be very
small and reusing existing NoBug facilities, I doubt that it will add
weight to the library size. basically it needs checksums and list
operations which are already there plus the usual logging and
assertiong things. Instrumenting code with it will have
some considerable (unavoidable) overhead.

Well there are some things we a intrusive memory checker will shine
which can not be coped with other tools (like valgrind/efence etc).
That is you can instrument custom allocators, for example in Lumiera
we have tightly packed tmpbuf's and a custom memory pool (NoBug uses
it too in the resource tracker) and I would also add such to my GC
eventually. All these have in common that user-memory is not directly
served by the clib malloc but memory is packed and augmented with
metadata. Actually this capabilities are my main motivation about
adding the memory checker to NoBug, as I pointed out, we (Lumiera)
didnt need much other need for it yet (which eventually may change).

With a intrusive memory checker it should be also possible to
protect objects on the stack or static objects and maybe (not really
considered yet) having the capability of freezing single
variables/elements.  

	Christian

> 
> 
> ...and pro NoLeak
> --Yes, it would compliment NoBug adding useful stuff
> --It could be kept simple as your API shows.
> --So what, about the arguments above, the API below looks simple
> enough and
>   has a few good ideas, enough to have a go.
> 
> 
> Maybe someone out there knows a little more about Electric Fence &
> other tools to 
> point out a few more arguments pro or con doing a NoLeak or leak
> checker in NoBug.
> 
> Anyway, the more I think about it the more I like the concept of doing
> all of this 
> separate from NoBug, but, still and all, make it easy to plug-into
> NoBug.  This makes
> it more modular, if things in NoLeak (or whatever its called) turn out
> to be not 
> everyones cup of tea.
> 
> 
> 
> Benny
> 
> -----Original Message-----
> From: nobug-bounces at lists.pipapo.org
> [mailto:nobug-bounces at lists.pipapo.org] On Behalf Of Christian Thaeter
> Sent: Saturday, August 28, 2010 12:41 AM
> To: nobug at lists.pipapo.org
> Subject: [Nobug] memory checking (braindump)
> 
> hi,
> 
> I just want to drop a braindump about memory validation/watching this
> time.
> Some time ago I already talked with ichthyo about this and we decided
> we currently don't need this feature for Lumiera. But eventually it
> would be a
> nice thing to complete NoBug's featureset, so i will implement it
> someday.
> Documenting this here may help to work it further out.
> 
> First I conclude what kinds of errors such a feature should address:
>  * Writing data out of bounds
>  * Unintentional writes to valid data
>  * Finding memory leaks
> 
> Whats the Rationale to do this in NoBug instead of Valgrind? Valgrind
> has an
> insane overhead, one can not run performance demanding applications
> easily
> under valgrind. It's up to 20 times slower than running something
> natively and
> also needs about as much more memory. More importantly, Valgrind has
> by design
> some blind spots, some stack corruption and other tricks with the
> stack are
> going unnoticed by valgrind. This is just to point this out. Valgrind
> is a
> valuable tool I don't want to miss, I am just after filling the gaps
> and add
> memory debugging to NoBug to gain speed and catch the cases where
> Valgrind is
> blind. The NoBug way will be explicit, intrusive instrumention, NoBug
> will not
> hook into the system like valgrind or other tools do. In this regard,
> NoBug
> will have its own blind spots too, for example use of uninitialized
> memory is
> rather hard to track down and certainly a strength of valgrind.
> 
> Doing memory checks in a intrusive way has some benefits (except for
> the usual
> drawback that you have to set up this intrusive macros whenever
> required).
> This means you can add memory checks at higher levels (your object
> factories)
> as well as on lower levels for example if you implement custom
> allocators
> (memory pools, garbage collectors, temporary buffers,...).
> 
> So lets look at an idea:
> 
> each allocation should have some guard area around it to detect
> unintentional writes before and after the object. I'd recommend to
> have this guard area at least 1 object size before and after the
> object (to catch off-by-1 indirection errors). For bulk (array,
> memory pool,..) allocations it would suffice to have half (rounded
> up) times the object size guard areas around, except that you want to
> add some more margin the to the complete memory block where the
> objects will be in.
> 
> Unintentional writes can be detected by calculating a checksum
> over the memory block watched and then assert its validity.
> 
> Memory is often allocated for some domains/subsystems or hierarchical.
> For leak checking you want to be sure that all child allocations are
> freed before you free the parent.
> 
> Thus follows the following API considerations:
> 
> Every allocation will be instrumented to include the guard areas, for
> this we need a SIZEOF() which accounts for it. (I using short names
> here, names in a real API will differ)
> 
> Then every return of an allocation has to be adjusted by the offset if
> the user data (except the allocation indicated failure by returning
> NULL). So firstly an basic instrumentation would look like:
> 
>   Object* myobject = DATAOF( malloc( SIZEOF(*myobject)));
> 
> and freeing it by:
> 
>  free(BASEOF(myobject));
> 
> 
> To implement a registry of all allocations we need a (double) linked
> list
> chaining up all allocations in random order and a central node being
> the entry
> to all this nodes. Double linked because we need fast removes on any
> node. We
> need to store the size of the object, the size of the guard areas and
> we have
> to store a checksums over the data and guard areas. Further a
> nobug_context
> should be stored to track down from where the object was allocated.
> 
> With some care this metadata can be stored within the the guard areas
> before
> and after the object to preserve locality and optimize memory usage.
> Further
> some things can be further optimized:
> 
>  * It makes no sense to store a full size_t for the object size,
> gigantic
>    allocations with (possibly equally large) guard areas in front and
> back are
>    impractical and rare. I proclaim that 3 bytes (16MB) objects should
> be
>    enough. Biggier allocation areas should be watched otherwise or we
> when we
>    have that big allocations having biggier management structures
> which store
>    the size at another place wont hurt (we can indicate this by
> storing 0 in
>    the ordinary size location).
> 
>  * The guard size could be just a factor, then signed char will
> suffice. 1 to
>    127 gives a multiplier, -1 to -128 a (abs) divisor.
> 
> 
> Thus follows:
> 
> struct before
> {
>         llist node;
>         int32_t guard_size : 8;
>         uint32_t allocation_size : 24;
>         uint32_t guard_checksum;
> };
> 
> struct after
> {
>         uint32_t data_checksum;
>         nobug_context context;
> };
> 
> 
> and wrap a user will have the layout like:
> {
> 	char front_guard[guard_size - sizeof(struct before)];
> 	struct before front_meta;
> 
> 	user_data here; /* this is the address of the user data */
> 
> 	char pad_guard[calculate_alignment_somehow()];
> 	struct after back_meta;
> 	char back_guard[guard_size - sizeof(struct after)];
> }
> 
> 
> Now we can assert that the guard areas didnt got corrupted by
> calculating and
> comparing the checksum (we need to do this on each llist update
> touching a
> node). When the data checksuim is set (we define '0' as being unused)
> then we
> can assert that the data didnt got modified too. This should be done
> with an
> explicit API, MEMCHECK(object) for checking and FREEZE(object) for
> recalculating the checksum after a mutation. Freeing an object should
> do the
> MEMCHECK too. To release the FREEZE we need a UNFREEZE() too.
> 
> Next, we have all allocated memory available in the list, this means
> we can
> check for potential leaks at the end of the application lifetime (or
> inbetween) by doing a garbage collector like conservative scan. Since
> we have
> no roots (as in a GC) and not all memory/references are necceesary
> covered
> with the intrusive mechanism this should be taken with a grain of
> salt, as
> there is no gurantee for it to be exact.
> 
> 
> 
> Implications?
> 
> There is a gotcha now, when the program runs under valgrind, then
> initializing memory guards will prevent valgrind detecting illegal
> reads to
> it. Valgrind has some hooks to mark memory areas uninitialized but we
> possibly just want to disable all this memory checking when running
> under valgrind, otherwise this might just decrease valgrinds
> performance even more.
> 
> Thats it for now, a implementation with more details (log flags and
> all) will
> follow some day, but don't hold your breath.
> 
> 
>        Christian
> _______________________________________________
> Nobug mailing list
> Nobug at lists.pipapo.org
> http://lists.pipapo.org/cgi-bin/mailman/listinfo/nobug