[egenix-users] PROBLEM: eGenix.com mxBASE beta 3
Pekka Niiranen
krissepu at vip.fi
Mon Jul 29 23:23:33 CEST 2002
Fine,
but the line:
(None,EOF,Here,MatchOk)
will make text = "aa(AA" match too. If I analysed it correctly,
it is because EOF matches allways. Would it be possible
to add mxTextTool parameter that will make EOF cause failing if necessary ?
Something like: "if EOF is encountered here, fail the whole subgroup ?"
-pekka-
"M.-A. Lemburg" wrote:
> Pekka Niiranen wrote:
> > Hi,
> >
> > I tried the latest beta 3 by:
> >
> > a) compiling it myself from sources and
> > b) installing from the precompiled package for python v2.2
> >
> > Of the scripts below only the script that uses Simpleparse returns
> > anything.
> > The others run without errors, but return [].
> >
> > They all run OK with the beta 2 though.
>
> If they did, then you've hit a bug in beta2. Here are the corrected
> versions. Note that the problem was with the EOF handling. If AllNotIn
> doesn't match at least one char it'll fail and using 0 as jne offset
> causes the same effect as MatchFail.
>
> #--- solution 1 starts (with limiting letters)---
>
> from mx.TextTools import *
>
> def test1():
>
> text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
> tables = [] # used for recursion only
>
> tab = ('start',
> (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "(" then recurse
> (None,Is,')',+1,MatchOk), # If current character is ")" then stop or return from recursion
> (None,AllNotIn,'()',+1,'start'), # Search all characters except "(" and ")"
> (None,EOF,Here,MatchOk),
> 'nesting',
> ('group',SubTable+AppendMatch,
> ((None,Is,'(',MatchFail,+1), # Since we have looked ahead, collect "(" -sign
> (None,SubTableInList, (tables,0)), # Recurse
> )
> ),
> (None,Jump,To,'start')) # After recursion jump back to 'start'
>
> tables.append(tab) # Add tab to tables
>
> result, taglist, nextindex = tag(text,tab)
> print result, nextindex
> print taglist
>
> #--- solution 2 starts (without limiting letters) ---
>
> def test2():
>
> text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
> tab = ('start',
> (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is seen stop recursion
> (None, Is, '(', 'letters', +1),
> ('group', SubTable+AppendMatch, ThisTable), # Recurse
> (None, Skip, 1, MatchFail, 'start'), # Last character in recursion was ")" so jump over it back to 'start'
> 'letters',
> (None, AllNotIn, '()', +1, 'start'), # Collect all characters except "(" and ")"
> (None, EOF, Here, MatchOk),
> )
>
> result,taglist,nextindex = tag(text, tab)
> print result, nextindex
> print taglist
>
> print 'Test 1:'
> test1()
> print
>
> print 'Test 2:'
> test2()
> print
>
> > I am using Windows 2000 professional, Python 2.2.1 and Winpython
> > v148.
> >
> > -pekka-
> >
> >
> > Pekka Niiranen wrote:
> >
> >
> >>Thank you all for your help and inspiration! It is payback time ;)
> >>
> >>I have tried past two months to create parser that returns
> >>strings limited by two different letters. The strings can be nested.
> >>I considered recursive call of regular expression to be too slow
> >>and decided to use mxTextTools 2.1 beta2 and the latest alpha of
> >>Simpleparse 2.0.
> >>
> >>Below are three solutions I found.
> >>Note that Simpleparse creates different tagtable as the "manually"
> >>found.
> >>
> >>Further ideas to be implemented:
> >>
> >>1) Input of limiting letters as parameters (easy)
> >>2) Unicode support
> >>3) Test for equal amount of limiting letters before calling of parser
> >>(will this speed up the solution ?)
> >>4) Parsing one line at a time without looping thru lines of the text
> >>with "while" or "for"
> >> (maybe "None, AllNotIn, '()\n'" )
> >>
> >>One development idea to mxTextTools:
> >>
> >>1) Instead of using list of tables to recurse, would it be possible to
> >>use "global jump" to outside of current table ?
> >>
> >>--- solution 1 starts (with limiting letters)---
> >>
> >>from mx.TextTools import *
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>tables = [] # used for recursion only
> >>
> >>tab = ('start',
> >> (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
> >>then recurse
> >> (None,Is,')',+1,MatchOk), # If current character is ")" then stop
> >>or return from recursion
> >> (None,AllNotIn,'()',0,'start'), # Search all characters except
> >>"(" and ")"
> >> 'nesting',
> >> ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
> >>have looked ahead, collect "(" -sign
> >> (None,SubTableInList,
> >>(tables,0)))), # Recurse
> >> (None,Jump,To,'start')) # After recursion jump back to 'start'
> >>
> >>tables.append(tab) # Add tab to tables
> >>
> >>if __name__ == '__main__':
> >>
> >> result, taglist, nextindex = tag(text,tab)
> >> print taglist
> >>
> >>--- solution 1 ends ---
> >>
> >>--- solution 2 starts (without limiting letters) ---
> >>
> >>from mx.TextTools import *
> >>
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>
> >>tab = ('start',
> >> (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
> >>seen stop recursion
> >> (None, Is, '(', 'letters', +1),
> >> ('group', SubTable+AppendMatch, ThisTable), # Recurse
> >> (None, Skip, 1, 0, 'start'), # Last character in recursion was
> >>")" so jump over it back to 'start'
> >> 'letters',
> >> (None, AllNotIn, '()', 0, 'start')) # Collect all characters
> >>except "(" and ")"
> >>
> >>result,taglist,next = tag(text, tab)
> >>print taglist
> >>
> >>--- solution 2 ends ---
> >>
> >>--- solution 3 starts (Simpleparse solution) ---
> >>
> >>from simpleparse.parser import Parser
> >>from mx.TextTools import *
> >>
> >>declaration = r'''
> >>
> >>>line< := (a/match)+
> >>
> >>match := '(', line, ')'
> >><a> := -[()]
> >>'''
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>
> >>parser = Parser(declaration)
> >>success, children, nextcharacter = parser.parse(text, production =
> >>"line")
> >>print_tags(text,children)
> >>
> >>--- solution 3 ends ---
> >>
> >>-pekka-
> >
> >
> >
> > _______________________________________________________________________
> > eGenix.com User Mailing List http://www.egenix.com/
> > http://lists.egenix.com/mailman/listinfo/egenix-users
>
> --
> Marc-Andre Lemburg
> CEO eGenix.com Software GmbH
> _______________________________________________________________________
> eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
> Python Consulting: http://www.egenix.com/
> Python Software: http://www.egenix.com/files/python/
>
> _______________________________________________________________________
> eGenix.com User Mailing List http://www.egenix.com/
> http://lists.egenix.com/mailman/listinfo/egenix-users
More information about the egenix-users
mailing list