[egenix-users] More questions and PEPs: How to create nested search pattern with mxTextTools?

Mon Jun 10 00:43:51 CEST 2002

I tried previously various approaches, but two problems remain:

1)    Re.findall() returns also empty matches, which I have to clean.
2)    I am looping thru several files and displaying results as dictionary:
        ("filename", [list of parameters in this file]) so when later
replacing
       parameters I only access those parameter that exist in a file.
       This should speed up replacements compared to a case where
       I run single huge merged list (like a sed -file ) of all
(parameter,replacement) -pairs
       to all files in one go. (i.e searching parameters that do not necessary

       exist in current processed file)

The current bottleneck is the building of [list of parameters in this file]
which leads to the following "PEPs":

a)    Parameter to re.findall() to allow only non-empty matches in search
result.
b)    Option to mxTextTools to return only unique matches from processed text
       as list of strings.

True, I can get rid of looping thru single lines, but then I have to take
account the contents of the previous line in EBNF as we discussed before.
I will try that in future too.

In your post below, I only have to replace the integer numbers as mxTextTool's
command
names and then add manually "+AppendMatch" -parameter to certain commands,
right ?

-pekka-

"Mike C. Fletcher" wrote:

> Pekka, are you sure you're not optimising this app too early?  I mean,
> you're only dealing with 2MB files.  The simple loop over the results
> table isn't likely to be a time problem compared to your original
> solution with line-by-line regex runs.  If you code the whole grammar as
> an EBNF so you can process the whole file in a single call to
> TextTools.tag, you're not likely going to be able to detect the time
> required to do the parsing, and the looping approach to getting the
> results from the tagging results is pretty fast for must uses.
>
> As for the mxTextTool equivalent of the EBNF:
>
> mxDecl = []
> mxDecl.extend(
> [
>      (
>          (None, 207, ((None, 204, (mxDecl, 2)),
>                       (None, 207, (('match', 204, (mxDecl, 1)),
>                                    (None, 204, (mxDecl, 2))), 1, 0))),),
>      (
>          (None, 207, (
>              (None, 21, '?'),
>              (None, 204,(mxDecl, 2)),
>              (None, 207, (('match', 204, (mxDecl, 1)),
>                           (None, 204, (mxDecl, 2))),1, 0),
>              (None, 21, '!'))),),
>      (
>          (None, 207, (
>              (None, 11,
> '-_abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ:. ',
> 1),)),)
> ]
>
> You can find it in the generator's tupleset attribute:
>
>         table = generator.buildParser(
>                 declaration
>         ).tupleset
>
> I've never felt the need to add AppendMatch, but the code is available
> if someone wants to add the syntax to the EBNF grammar and the
> objectgenerator code.
>
> Enjoy,
> Mike
>
> Pekka Niiranen wrote:
> > I am searching nested strings that are limited (and include)
> > with ?- and ! -signs.
> >
> > I have created a nested EBNF-search pattern with Simpleparse-1.0 module
> > for mxTextTools. The problem is: I am interested only on matched strings
> >
> > and cannot use flag (true or not ?) "AppendMatch" with SimpleParse.
> >
> > What is the mxTextTool equivalent of the EBNF -notation below:
> >
> > declaration := a,(match,a)*
> > match := '?',a,(match,a)*,'!'
> > <a> := [-_a-z0-9A-Z:. ]*        #Not returning this line
> >
> > in case of a string "aa?BB?CC!DD!ee?FF!gg"
> > it should return:
> >
> > [?BB?CC!DD!, ?CC!, ?FF!]
> >
> > Any help appreciated,
> >
> >     -pekka-
> >
> >
> >
> > _______________________________________________________________________
> > eGenix.com User Mailing List                     http://www.egenix.com/
> > http://lists.egenix.com/mailman/listinfo/egenix-users
> >
>
> --
> _______________________________________
>    Mike C. Fletcher
>    http://members.rogers.com/mcfletch/