[egenix-users] Re: mxTextTools issues
M.-A. Lemburg
mal at egenix.com
Wed Jan 18 18:02:20 CET 2006
Scot Wilcoxon wrote:
> I tried to use mxTextTools in a pywikipedia bot (Python script for
> improving Wikipedia.org articles). I thought I'd point out several
> issues I noticed.
>
> 1. No co-nested patterns. I needed to handle several patterns which
> could be nested within other patterns, for example [[...]] could contain
> {{...}} and vice versa. Having parsing rules for those two patterns
> worked OK, but I could not seem to create a forward declaration so the
> parsing table which was declared first could refer to the second parsing
> table. (Creating a null or simple table for just seemed to include that
> in the first table rather than being replaced with the second table when
> it was defined.)
You can have nesting by using nested tables. These then work
recursively, do rollback, etc.
> 2. Painful debugging. I realize when mxTextTools was created that
> Python did not have the Logging module available. It is hard to find
> how parsing is proceeding and figuring out why a parse pattern is not
> being recognized. It doesn't help that the tree-printing function
> isn't documented nor how to invoke it (I had to figure it out from the
> source code).
>
> 3. Infinite loops. Yes, I found the Charming Python jump_count loop
> detector.
mxTextTools uses a very low-level state machine to do the actual
parsing - that's why it's so fast, but also makes programming
the patterns a bit cumbersome. That's the price to pay for
performance, I guess.
If you want low-level tracing, you'll have to compile mxTextTools
with debugging enabled. If you then run Python with -d command
line flag, the module will create a log file mxTextTools.log
which has lots and lots of details.
Compiling the debug version is easy:
python setup.py mx_autoconf --enable-debugging install
The 2.1.0 version of mxTextTools makes writing tag tables a
lot easier by supporting jump target strings and is fully
Unicode aware.
This is our latest snapshot:
http://www.egenix.com/files/python/egenix-mx-base-2.1.0-2005-05-01.zip
> 4. The documentation for WordStart and WordEnd at one point (I don't
> remember where) does not make the difference between them apparent.
WordStart leaves the head on the first char of the word, WordEnd
on the last.
> 5. Community. This email address is the only apparent contact point. A
> public forum of some sort might help make it apparent whether
> mxTextTools is a live or dead project, would allow people to help each
> other, and people with problems could search for similar previously
> solved problems.
egenix-users is our user mailing list, you can use that as
forum. We usually listen to what our users have to say :-)
The archives are also scanned by Google, so it's easy to do
searches.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jan 18 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the egenix-users
mailing list