note: This original thread was started by Andrew ("abc"). abc wrote
Quote:I have a couple of python scripts for reading a large Evernote XML file and pasting into an Ecco outline.
the trouble is that a great many characters are being dropped when I try this. This has to be a flaw either in python or in ecco -- the inserted items are all truncated at the same place on successive runs, but this place is different for every item, and doesn't correspond to any particular character in the original string, nor to any particular character position. It doesn't correspond, in fact, to anything I can find.
The DDE script is too long to fit into one of these messages. So I am uploading it as a text file.
Here is my main script, smaller tha the dde class, obviously
Code:#! /usr/env/python
# acb script to extract plain text and tags from evernote
# and subsequently move it into an ecco file
# a cheap hack 23/8/07
#
import sys, unicodedata,re,time
sys.path.append('D:/andrew/python') # to bring in classeccodde
import classeccodde
from elementtree import ElementTree as ET
wombats=ET.parse('D:/Andrew/ENExport.enx')
root=wombats.getroot()
notes=root.getiterator('NOTE')
e=classeccodde.Ecco()
# notes is now a list of all the notes in the files. Within each note there are plain text contents, and tags.
# some of these will be encoded images , which I would like to match and throw away
image=re.compile('[^a-z]{100}',re.M)
# a string of at least 50 characters consisting entirely of spaces, upper case letters, and numbers
# the next trick is to extract them.
def cleancrap(enstring):
try:
tmpstring=enstring.replace(u'\u201c','"')
enstring=tmpstring.replace(u'\u201d','"')
tmpstring=enstring.replace(u'\u2018',"'")
enstring=tmpstring.replace(u'\u2019',"'")
tmpstring=enstring.replace(u'\u2013',"--") # OOO uses en dashes (2013) not word's em dashes (2014)
enstring=tmpstring.replace(u'\u2026','...') # ellipsis (...)
tmpstring=enstring.replace(u'\u2014',"--") # OOO uses en dashes (2013) not word's em dashes (2014)
# two spaces become a carriage return
# enstring=tmpstring.replace(' ','\n')
# and tabs become spaces, though a really elaborate routine would make them into sub-items
# this is now dealt with in tabtooutline in classeccodde
return tmpstring
except UnicodeDecodeError, ee:
print ee
dudchar=re.compile('byte (....)',)
dfound=dudchar.search(str(ee))
if dfound:
print 'Cleancrap Error: Dudchar was ',unicodedata.name(unichr_(int(dfound.group(1),16)))
for note in notes:
if note.get('name'):
title=note.get('name')
# register and throw away the images.
# print "Processing ",title.encode('latin-1','xmlcharrefreplace')
rc=note.get('created')
Eccodate=rc[:4]+rc[5:7]+rc[8:10]+rc[11:13]+rc[14:16]
rawplaintext=note.find('CONTENTPLAIN').text
if image.match(rawplaintext):
print 'discarding suspected image', title.encode('latin-1','xmlcharrefreplace')
continue
crapless=cleancrap(rawplaintext)
try:
utext=crapless.strip()
except UnicodeDecodeError, ee:
print ee
dudchar=re.compile('byte ([^ ]+)',)
dfound=dudchar.search(str(ee))
print 'Error cleancrapping utext: dudchar was\n ',unicodedata.name(unichr_(int(dfound.group(1),16)))
ENtags=note.find('.//NOTECATEGORIES') # a list
EccoTagIDs=[] # another list, wherein are stored the IDs of all the checkmark folders this should belong to
try:
for tag in ENtags:
tagname=cleancrap(tag.get('name'))
if tagname and tagname != 'Done':
EccoTagIDs.append(e.createCheckmarkFolder(cleancrap(tagname).encode('latin-1','xmlcharrefreplace')))
if title:
NewID=e.AddTLItoFolder(','.join(EccoTagIDs),title.encode('latin-1','xmlcharrefreplace'))
if NewID:
BodyID=e.AddTLItoFolder(EccoTagIDs[0],utext.encode('latin-1','xmlcharrefreplace'))
if e.areConsistent(NewID,utext.encode('latin-1','xmlcharrefreplace')):
print '%s OK: %s chars copied' %(utext.encode('latin-1','xmlcharrefreplace')[:30],len(utext.encode('latin-1','xmlcharrefreplace')))
e.AddChild(NewID,BodyID)
e.setNoteCreationTime(NewID, Eccodate)
else:
NewID=e.AddTLItoFolder(','.join(EccoTagIDs),utext.encode('latin-1','xmlcharrefreplace'))
e.setNoteCreationTime(NewID, Eccodate)
except UnicodeEncodeError, ee:
print ee
dudchar=re.compile('character ....([\dabcdef]+)',)
dfound=dudchar.search(str(ee))
print 'Error in mainloop: dudchars was\n ',unicodedata.name(unichr_(int(dfound.group(1),16)))
break
except:
print ('Some Error occured with %s: continuing' %(utext.decode('latin-1')))
break
e.Beepoff()
skimmed code,
1. likely u have it, but where (?)... convert text to CSV format
2. also, upload the included files... ie. especially the DDE....
abc replies
Quote:I don't understand the first point. All of the actual DDE code is in the class I uploaded earlier, which is to large to fit into a message, and so was uploaded as a file here
I am not sure how to link to it from bbcode. Does
http://www.eccomagic.com/forum/Attachments/classeccodde_py.txt work? (apparently, yes). It is straight python code, though I had to rename it as a text file to upload it.
Andrew
ecco needs text in CSV format, key being
1) you enclose the text of the item in quotes,
2) you have no single quotes in the item (convert them to two apostrophies '' or double quotes "" )
3) and using commas to seperate fields (of course) in-line.
and abc replies
Quote:Ah. I see: you want the python libraries. I will see if I can't attach them here, then. I promise you that the calls I use do work normally, and are what is recommended in the (feeble) python documentation in the DDE module. the problem is that I get characters dropped, not that I can't send anything over ...
Thank you once more for taking the trouble to look at this. but if the flaw is in the python libraries, I don't know what we can do. POling around, it turns ot I don't have the source code for them -- only the demo scripts. The official python client demo is
Code:# 'Request' example added jjk 11/20/98
import win32ui
import dde
server = dde.CreateServer()
server.Create("TestClient")
conversation = dde.CreateConversation(server)
conversation.ConnectTo("RunAny", "RunAnyCommand")
conversation.Exec("DoSomething")
conversation.Exec("DoSomethingElse")
conversation.ConnectTo("RunAny", "ComputeStringLength")
s = 'abcdefghi'
sl = conversation.Request(s)
print 'length of "%s" is %s'%(s,sl)
which is not a whole lot of help. But, as I said, the script normally works
1. where is your CSV formatting routine ?
ie.
Code:ddestring=('%s,"%s",%s,%s') %('CreateItem', Text, FolderID,1)
isn't doing the job for you...
"Text" works great, but what happens with: He said "Hi! Bob!!!"
you get "He Said" Hi! Bob!!!"" which, well, won't work!!!