[egenix-users] More questions and PEPs: How to create nested search pattern with
mxTextTools?
Pekka Niiranen
krissepu at vip.fi
Mon Jun 10 00:43:51 CEST 2002
I tried previously various approaches, but two problems remain:
1) Re.findall() returns also empty matches, which I have to clean.
2) I am looping thru several files and displaying results as dictionary:
("filename", [list of parameters in this file]) so when later
replacing
parameters I only access those parameter that exist in a file.
This should speed up replacements compared to a case where
I run single huge merged list (like a sed -file ) of all
(parameter,replacement) -pairs
to all files in one go. (i.e searching parameters that do not necessary
exist in current processed file)
The current bottleneck is the building of [list of parameters in this file]
which leads to the following "PEPs":
a) Parameter to re.findall() to allow only non-empty matches in search
result.
b) Option to mxTextTools to return only unique matches from processed text
as list of strings.
True, I can get rid of looping thru single lines, but then I have to take
account the contents of the previous line in EBNF as we discussed before.
I will try that in future too.
In your post below, I only have to replace the integer numbers as mxTextTool's
command
names and then add manually "+AppendMatch" -parameter to certain commands,
right ?
-pekka-
"Mike C. Fletcher" wrote:
> Pekka, are you sure you're not optimising this app too early? I mean,
> you're only dealing with 2MB files. The simple loop over the results
> table isn't likely to be a time problem compared to your original
> solution with line-by-line regex runs. If you code the whole grammar as
> an EBNF so you can process the whole file in a single call to
> TextTools.tag, you're not likely going to be able to detect the time
> required to do the parsing, and the looping approach to getting the
> results from the tagging results is pretty fast for must uses.
>
> As for the mxTextTool equivalent of the EBNF:
>
> mxDecl = []
> mxDecl.extend(
> [
> (
> (None, 207, ((None, 204, (mxDecl, 2)),
> (None, 207, (('match', 204, (mxDecl, 1)),
> (None, 204, (mxDecl, 2))), 1, 0))),),
> (
> (None, 207, (
> (None, 21, '?'),
> (None, 204,(mxDecl, 2)),
> (None, 207, (('match', 204, (mxDecl, 1)),
> (None, 204, (mxDecl, 2))),1, 0),
> (None, 21, '!'))),),
> (
> (None, 207, (
> (None, 11,
> '-_abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ:. ',
> 1),)),)
> ]
>
> You can find it in the generator's tupleset attribute:
>
> table = generator.buildParser(
> declaration
> ).tupleset
>
> I've never felt the need to add AppendMatch, but the code is available
> if someone wants to add the syntax to the EBNF grammar and the
> objectgenerator code.
>
> Enjoy,
> Mike
>
> Pekka Niiranen wrote:
> > I am searching nested strings that are limited (and include)
> > with ?- and ! -signs.
> >
> > I have created a nested EBNF-search pattern with Simpleparse-1.0 module
> > for mxTextTools. The problem is: I am interested only on matched strings
> >
> > and cannot use flag (true or not ?) "AppendMatch" with SimpleParse.
> >
> > What is the mxTextTool equivalent of the EBNF -notation below:
> >
> > declaration := a,(match,a)*
> > match := '?',a,(match,a)*,'!'
> > <a> := [-_a-z0-9A-Z:. ]* #Not returning this line
> >
> > in case of a string "aa?BB?CC!DD!ee?FF!gg"
> > it should return:
> >
> > [?BB?CC!DD!, ?CC!, ?FF!]
> >
> > Any help appreciated,
> >
> > -pekka-
> >
> >
> >
> > _______________________________________________________________________
> > eGenix.com User Mailing List http://www.egenix.com/
> > http://lists.egenix.com/mailman/listinfo/egenix-users
> >
>
> --
> _______________________________________
> Mike C. Fletcher
> http://members.rogers.com/mcfletch/
More information about the egenix-users
mailing list