[egenix-users] A question from a mx.TextTools newbie
M.-A. Lemburg
mal at egenix.com
Fri Mar 6 23:38:40 CET 2009
On 2009-03-06 23:34, Wenping Wang wrote:
> Thanks Marc-Andre for your reply.
>
> Your explanation for why the 1st approach fails makes sense to me. However, following the same logic, should one expect that the 3rd approach would fail too since there is NO EOF testing there? That the 1st fails & the 3rd succeeds is puzzling.
The 3rd example fails as well (ie. doesn't match):
>>> #--- 3rd approach: success
... word = []
>>> def emiter(tl,txt,l,r,s):
... word.append(txt[l:r])
...
>>> words = ((emiter, AllIn+CallTag, alpha),
... (None, AllIn, whitespace, MatchFail, -1))
>>> tag('this and that', words)
(0, [], 13)
However, the tagging engine has no control over what you do in a
tag function, so it cannot undo the appends you did in there.
>>> word
['this', 'and', 'that']
That's why you still see a result.
> Best,
> Wenping Wang
>
>
>
>
> ________________________________
> From: M.-A. Lemburg <mal at egenix.com>
> To: Wenping Wang <wenping0 at yahoo.com>
> Cc: egenix-users at egenix.com
> Sent: Friday, March 6, 2009 5:14:16 PM
> Subject: Re: [egenix-users] A quiestion from a mx.TextTools newbie
>
> On 2009-03-06 19:21, Wenping Wang wrote:
>> Hello Friends,
>>
>> I'm new to mx.TextTools & am learning this powerful tool by reading David Mertz's excellent book "Text Processing in Python". Naturally I tried the examples in the book. One particular example concerning mx.TextTools is the one under "mx.TextTools.AppendTagobj" on Page 307.
>>
>> Unfortunately, that example from David's book doesn't work. I was able to tweak the code to make it work. But I was puzzled why it doesn't work. I attach the segment of codes here.
>>
>> #--- example code starts
>> from mx.TextTools import *
>>
>> #--- 1st approach: failure
>> words = (('word', AllIn+AppendTagobj, alpha),
>> (None, AllIn, whitespace, MatchFail, -1))
>> tag('this and that', words)
>>
>> #--- 2nd approach: success
>> words = (('word', AllIn, alpha),
>> (None, AllIn, whitespace, +1),
>> (None,EOF,Here,-2))
>> tag('this and that', words)
>>
>> #--- 3rd approach: success
>> word = []
>> def emiter(tl,txt,l,r,s):
>> word.append(txt[l:r])
>>
>> words = ((emiter, AllIn+CallTag, alpha),
>> (None, AllIn, whitespace, MatchFail, -1))
>> tag('this and that', words)
>> #--- example code ends
>>
>> Can someone provide me some insight why the 1st approach doesn't work while the 2nd & 3rd work? BTW, I'm using mx.TextTools 3.1.2 on Windows. It fails because the tagging engine pointers moves beyond the
> last 't' in the string and there is no EOF check in the tag
> table. AllIn will never succeed on EOF. BTW: mxTextTools has support for jump labels, so it's usually
> better to use those (the compiler will then convert these to offsets),
> e.g. from mx.TextTools import * words = ( 'parse_word', ('word', AllIn, alpha, 'parse_whitespace'), 'parse_whitespace', (None, AllIn, whitespace, 'test_for_eof', 'parse_word'), 'test_for_eof', (None, EOF, Here, 'parse_word'), ) print tag('this and that', words) --
> Marc-Andre Lemburg
> eGenix.com Professional Python Services directly from the Source (#1, Mar 06 2009)
>>>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
>
>
>
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Mar 06 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the egenix-users
mailing list