[egenix-users] PROBLEM: eGenix.com mxBASE beta 3
M.-A. Lemburg
mal at lemburg.com
Sat Jul 27 12:41:42 CEST 2002
Pekka Niiranen wrote:
> Hi,
>
> I tried the latest beta 3 by:
>
> a) compiling it myself from sources and
> b) installing from the precompiled package for python v2.2
>
> Of the scripts below only the script that uses Simpleparse returns
> anything.
> The others run without errors, but return [].
>
> They all run OK with the beta 2 though.
If they did, then you've hit a bug in beta2. Here are the corrected
versions. Note that the problem was with the EOF handling. If AllNotIn
doesn't match at least one char it'll fail and using 0 as jne offset
causes the same effect as MatchFail.
#--- solution 1 starts (with limiting letters)---
from mx.TextTools import *
def test1():
text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
tables = [] # used for recursion only
tab = ('start',
(None,Is+LookAhead,'(',+1,'nesting'), # If next character is "(" then recurse
(None,Is,')',+1,MatchOk), # If current character is ")" then stop or return from recursion
(None,AllNotIn,'()',+1,'start'), # Search all characters except "(" and ")"
(None,EOF,Here,MatchOk),
'nesting',
('group',SubTable+AppendMatch,
((None,Is,'(',MatchFail,+1), # Since we have looked ahead, collect "(" -sign
(None,SubTableInList, (tables,0)), # Recurse
)
),
(None,Jump,To,'start')) # After recursion jump back to 'start'
tables.append(tab) # Add tab to tables
result, taglist, nextindex = tag(text,tab)
print result, nextindex
print taglist
#--- solution 2 starts (without limiting letters) ---
def test2():
text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
tab = ('start',
(None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is seen stop recursion
(None, Is, '(', 'letters', +1),
('group', SubTable+AppendMatch, ThisTable), # Recurse
(None, Skip, 1, MatchFail, 'start'), # Last character in recursion was ")" so jump over it back to 'start'
'letters',
(None, AllNotIn, '()', +1, 'start'), # Collect all characters except "(" and ")"
(None, EOF, Here, MatchOk),
)
result,taglist,nextindex = tag(text, tab)
print result, nextindex
print taglist
print 'Test 1:'
test1()
print
print 'Test 2:'
test2()
print
> I am using Windows 2000 professional, Python 2.2.1 and Winpython
> v148.
>
> -pekka-
>
>
> Pekka Niiranen wrote:
>
>
>>Thank you all for your help and inspiration! It is payback time ;)
>>
>>I have tried past two months to create parser that returns
>>strings limited by two different letters. The strings can be nested.
>>I considered recursive call of regular expression to be too slow
>>and decided to use mxTextTools 2.1 beta2 and the latest alpha of
>>Simpleparse 2.0.
>>
>>Below are three solutions I found.
>>Note that Simpleparse creates different tagtable as the "manually"
>>found.
>>
>>Further ideas to be implemented:
>>
>>1) Input of limiting letters as parameters (easy)
>>2) Unicode support
>>3) Test for equal amount of limiting letters before calling of parser
>>(will this speed up the solution ?)
>>4) Parsing one line at a time without looping thru lines of the text
>>with "while" or "for"
>> (maybe "None, AllNotIn, '()\n'" )
>>
>>One development idea to mxTextTools:
>>
>>1) Instead of using list of tables to recurse, would it be possible to
>>use "global jump" to outside of current table ?
>>
>>--- solution 1 starts (with limiting letters)---
>>
>>from mx.TextTools import *
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>tables = [] # used for recursion only
>>
>>tab = ('start',
>> (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
>>then recurse
>> (None,Is,')',+1,MatchOk), # If current character is ")" then stop
>>or return from recursion
>> (None,AllNotIn,'()',0,'start'), # Search all characters except
>>"(" and ")"
>> 'nesting',
>> ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
>>have looked ahead, collect "(" -sign
>> (None,SubTableInList,
>>(tables,0)))), # Recurse
>> (None,Jump,To,'start')) # After recursion jump back to 'start'
>>
>>tables.append(tab) # Add tab to tables
>>
>>if __name__ == '__main__':
>>
>> result, taglist, nextindex = tag(text,tab)
>> print taglist
>>
>>--- solution 1 ends ---
>>
>>--- solution 2 starts (without limiting letters) ---
>>
>>from mx.TextTools import *
>>
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>tab = ('start',
>> (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
>>seen stop recursion
>> (None, Is, '(', 'letters', +1),
>> ('group', SubTable+AppendMatch, ThisTable), # Recurse
>> (None, Skip, 1, 0, 'start'), # Last character in recursion was
>>")" so jump over it back to 'start'
>> 'letters',
>> (None, AllNotIn, '()', 0, 'start')) # Collect all characters
>>except "(" and ")"
>>
>>result,taglist,next = tag(text, tab)
>>print taglist
>>
>>--- solution 2 ends ---
>>
>>--- solution 3 starts (Simpleparse solution) ---
>>
>>from simpleparse.parser import Parser
>>from mx.TextTools import *
>>
>>declaration = r'''
>>
>>>line< := (a/match)+
>>
>>match := '(', line, ')'
>><a> := -[()]
>>'''
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>parser = Parser(declaration)
>>success, children, nextcharacter = parser.parse(text, production =
>>"line")
>>print_tags(text,children)
>>
>>--- solution 3 ends ---
>>
>>-pekka-
>
>
>
> _______________________________________________________________________
> eGenix.com User Mailing List http://www.egenix.com/
> http://lists.egenix.com/mailman/listinfo/egenix-users
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/
More information about the egenix-users
mailing list