[egenix-users] BeeDict memory usage

Mon Sep 16 22:55:08 CEST 2002

Daniel Naber wrote:
> On Monday 16 September 2002 17:53, you wrote:
> 
> 
>>>index that calls free_cache() on every 50th file gets less matches
>>>when searching (yes, the call to free_cache() is really the only
>>>difference in the program).
>>
>>That's strange indeed. Can you come up with a short demo which
>>displays the problem ?
> 
> 
> Okay, this is not very short, as it seems you need a certain amount of data 
> to trigger the problem. Call this script like this:
> 
> ./FullText2.py /data/bigindex/test/ widget
> 
> The first parameter is a directory, the second one a search term. Then look 
> for "####" in the script and comment in the free_cache() call and run the 
> script again with the same parameters and you should get less matches when 
> free_cache is called, and the data files are also smaller. If it doesn't 
> work I can send you an archive of about 30 HTML files that let you 
> reproduce the problem .

Thanks for the script. I can reproduce the problem here, but
still don't understand what is causing it. The table index size
is the same in both cases, the file sizes differs.

This could relate to the way you store the data: using dictionaries
of lists as values in the BeeDict. I'll have to investigate this
some more.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/