[egenix-users] cursor.rowcount abnormalities and performance question

Wed Jul 30 13:51:21 CEST 2003

Thanks Marc-Andre, Steve and Jim for the quick responses to my questions.

I guess I shouldn't be too surprised that MS doesn't provide a mechanism
for retrieving the number of rows but it is rather comical that they do
provide 2 distinct values which you can at least (hopefully, correctly)  
infer that there are 0 rows that match -or- at least 1 row that matches.  
You'd think that they could go the proverbial extra mile and provide a
real value in the latter case, but then again, it would probably put a
huge load on SQL Server.  Cynicism of MS aside, at least I can safely test 
for the existence of data using cursor.rowcount != 0.

Interesting discussion about fetchone, fetchall and fetchmany.  Seems 
like fetchmany is the way to go if there is a large data set expectancy. 
For smaller datasets it's more of a grey area but I suppose fetchall is 
the easiest to implement.

If time permits I'll create a few test cases using all 3 methods to
retrieve small, medium and large rowsets with small, medium and large
column sets to see if I can make any general assumptions of which fetchXYZ 
method to use based on the dataset expectancy.

Thanks again for the replies,

Phil

On Wed, 30 Jul 2003, M.-A. Lemburg wrote:

> stuff at mailzilla.net wrote:
> > Hello,
> > 
> > When using mxODBC with the Windows SQL Server Driver I've noticed that 
> > when performing a query with executedirect the cursor object has a 
> > rowcount of -1 or 0 and nothing else.  I am using Python 2.2 and the 
> > mxODBC package that ships with BlackAdder.
> > 
> > -1 seems to be the initial value of rowcount before queries are performed
> > but I'm unsure as to why this would remain the value after a query
> > considering that the count should be >= 0, at least fundamentally.  In my
> > tests I have only seen values of 0 (when there are no records) and -1 when
> > there are some records-- in these cases, I am expecting this to be a
> > positive value representing the number of records that matched the query.  
> 
> -1 means "no information available".
> 
> > If I do a fetch operation (i.e. fetchone) on the cursor that has a 
> > rowcount of -1 I do indeed retrieve records.  There seems to be a 
> > disconnect between rowcount and being able to fetch records.
> > 
> > Is SQL Server/ODBC sending over misleading info or is this a problem with 
> > mxODBC?
> 
> It's a general problem with the ODBC driver you are using. MS Access
> has similar problems. Both seem to not know how many rows there
> actually are in the result set until you try to fetch them all.
> 
> Some ODBC drivers update the .rowcount information after the first
> fetch which is why mxODBC tries hard to update the information
> (it does so each time you request .rowcount until something
> other than -1 is returned).
> 
> > I generally use rowcount to determine whether I need to retreive 
> > records using fetchone inside a while loop (no need to execute the loop 
> > if rowcount is 0).  It's a little cleaner if I can reliably use rowcount 
> > but if the value doesn't sync with the actual result set then I can safely 
> > do something like this:
> > 
> > while 1:
> >    row = c.fetchone()
> >    if not row: break
> >    processTheRecord(row)
> > 
> > If rowcount can be relied upon, I could do this, which is much cleaner:
> > 
> > for i in range(c.rowcount):
> >    row = c.fetchone
> >    processTheRecord(row)
> 
> True. In this case, however, you will have to complain to
> Microsoft for not providing the necessary information to the
> application using their ODBC driver :-/
> 
> > Performance question- Since I need to perform some manipulation on each 
> > record I believe it is more efficient to perform a while loop and 
> > fetchone rather than getting everything via fetchall and then iterating 
> > over that result (since theoretically, fetchall would be iterating to get 
> > each record so technically speaking that would require iterating twice).  
> > If this assumption is false, please let me know.  Perhaps there are some 
> > internal ODBC/mxODBC optimizations that allow fetchall to actually perform 
> > much faster than several fetchone's.  My concern is that if fetchall 
> > iterates through the result set to produce the list of rows and then I am 
> > iterating through the fetched list then I can accomplish the same goal w/ 
> > 1 less iteration via a series of fetchone's.  Also, I imagine it would be 
> > safer to process each record individually then fetching the entire result 
> > set at once in terms of system resource utilization (i.e. if the result 
> > set were millions of records there would need to be data structs allocated 
> > to hold them).  
> > 
> > I'm curious to see if my reluctance to use fetchall is well-founded or
> > overly paranoid.
> 
> This is hard to tell: by using .fetchall() you save quite
> a few Python function calls which are costly, .fetchone()
> OTOH save memory. You might even be better off by using
> .fetchmany() and process e.g. 10 rows at a time.
> 
> I'd suggest you write a short test script and time the results.
> 
> > Thanks for any info,
> 
>