Das deutsche Python-Forum

Hallo miteinader
ich habe zwei file im gleichen Ordner abgelegt, hyphenator.py und translate.py. Nun möchte aus der hyphenator.py datei die Klasse Hyphenator benutzen, wie importiere die Klassen.

Code: Alles auswählen

from hyphenator import Hyphenator
h = Hyphenator("hyph_de_CH.dic")

erhalte aber die Meldung.

Code: Alles auswählen

Traceback (most recent call last):
  File "<string>", line 238, in <fragment>
Syntax Error:         print i: hyphenator.py, line 23815

Das Importieren hat geklappt, ist halt ein Fehler in hyphenator.py. Du benutzt nicht zufaellig Python 3, wo print kein Statement mehr ist?

Genau, ich benutze python 3 und das war der Fehler.
Und noch eine Frage was eine <huge dict 0xcf90c0> Dictionaries.?

kostonstyle hat geschrieben: Und noch eine Frage was eine <huge dict 0xcf90c0> Dictionaries.?

Wo taucht das auf?

@Fehler: Wenn bei dem gezeigten Code-Snippet tatsächlich ein print() aufgerufen wird, solltest Du das ändern

Beim Importieren und Anlegen von Objekten ist das wenig sinnvoll und eher unerwünscht.

siehe Bild

Hyperion hat geschrieben: @Fehler: Wenn bei dem gezeigten Code-Snippet tatsächlich ein print() aufgerufen wird, solltest Du das ändern Beim Importieren und Anlegen von Objekten ist das wenig sinnvoll und eher unerwünscht.

Das hat damit nichts zu tun. Syntaxfehler werden immer gefunden, auch wenn der Code nicht ausgeführt wird.

Ich meinte eher den Source-Code ...

Sieht halt nach einem Datentypen aus! Wenn Du mehr wissen willst, solltest Du gucken, wie "patterns" initialisiert wird.

jbs hat geschrieben:
Hyperion hat geschrieben: @Fehler: Wenn bei dem gezeigten Code-Snippet tatsächlich ein print() aufgerufen wird, solltest Du das ändern Beim Importieren und Anlegen von Objekten ist das wenig sinnvoll und eher unerwünscht.
Das hat damit nichts zu tun. Syntaxfehler werden immer gefunden, auch wenn der Code nicht ausgeführt wird.

Stimmt - sorry

Der Code sieht so aus

Code: Alles auswählen

"""

This is a Pure Python module to hyphenate text.

It is inspired by Ruby's Text::Hyphen, but currently reads standard *.dic files,
that must be installed separately.

In the future it's maybe nice if dictionaries could be distributed together with
this module, in a slightly prepared form, like in Ruby's Text::Hyphen.

Wilbert Berendsen, March 2008
info@wilbertberendsen.nl

License: LGPL.

"""

import sys
import re

__all__ = ("Hyphenator")

# cache of per-file Hyph_dict objects
hdcache = {}

# precompile some stuff
parse_hex = re.compile(r'\^{2}([0-9a-f]{2})').sub
parse = re.compile(r'(\d?)(\D?)').findall

def hexrepl(matchObj):
    return unichr(int(matchObj.group(1), 16))


class parse_alt(object):
    """
    Parse nonstandard hyphen pattern alternative.
    The instance returns a special int with data about the current position
    in the pattern when called with an odd value.
    """
    def __init__(self, pat, alt):
        alt = alt.split(',')
        self.change = alt[0]
        if len(alt) > 2:
            self.index = int(alt[1])
            self.cut = int(alt[2]) + 1
        else:
            self.index = 1
            self.cut = len(re.sub(r'[\d\.]', '', pat)) + 1
        if pat.startswith('.'):
            self.index += 1

    def __call__(self, val):
        self.index -= 1
        val = int(val)
        if val & 1:
            return dint(val, (self.change, self.index, self.cut))
        else:
            return val


class dint(int):
    """
    Just an int some other data can be stuck to in a data attribute.
    Call with ref=other to use the data from the other dint.
    """
    def __new__(cls, value, data=None, ref=None):
        obj = int.__new__(cls, value)
        if ref and type(ref) == dint:
            obj.data = ref.data
        else:
            obj.data = data
        return obj


class Hyph_dict(object):
    """
    Reads a hyph_*.dic file and stores the hyphenation patterns.
    Parameters:
    -filename : filename of hyph_*.dic to read
    """
    def __init__(self, filename):
        self.patterns = {}
        f = open(filename)
        #charset = f.readline().strip()
        #if charset.startswith('charset '):
        #    charset = charset[8:].strip()

        for pat in f:
            #pat = pat.decode(charset).strip()
            pat = pat.strip()
            if not pat or pat[0] == '%': continue
            # replace ^^hh with the real character
            pat = parse_hex(hexrepl, pat)
            # read nonstandard hyphen alternatives
            if '/' in pat:
                pat, alt = pat.split('/', 1)
                factory = parse_alt(pat, alt)
            else:
                factory = int
            tag, value = zip(*[(s, factory(i or "0")) for i, s in parse(pat)])
            # if only zeros, skip this pattern
            if max(value) == 0: continue
            # chop zeros from beginning and end, and store start offset.
            start, end = 0, len(value)
            while not value[start]: start += 1
            while not value[end-1]: end -= 1
            self.patterns[''.join(tag)] = start, value[start:end]
        f.close()
        self.cache = {}
        self.maxlen = max(map(len, self.patterns.keys()))

    def positions(self, word):
        """
        Returns a list of positions where the word can be hyphenated.
        E.g. for the dutch word 'lettergrepen' this method returns
        the list [3, 6, 9].

        Each position is a 'data int' (dint) with a data attribute.
        If the data attribute is not None, it contains a tuple with
        information about nonstandard hyphenation at that point:
        (change, index, cut)

        change: is a string like 'ff=f', that describes how hyphenation
            should take place.
        index: where to substitute the change, counting from the current
            point
        cut: how many characters to remove while substituting the nonstandard
            hyphenation
        """
        word = word.lower()
        points = self.cache.get(word)
        if points is None:
            prepWord = '.%s.' % word
            res = [0] * (len(prepWord) + 1)
            for i in range(len(prepWord) - 1):
                for j in range(i + 1, min(i + self.maxlen, len(prepWord)) + 1):
                    p = self.patterns.get(prepWord[i:j])
                    if p:
                        offset, value = p
                        s = slice(i + offset, i + offset + len(value))
                        res[s] = map(max, value, res[s])

            points = [dint(i - 1, ref=r) for i, r in enumerate(res) if r % 2]
            self.cache[word] = points
        return points


class Hyphenator(object):
    """
    Reads a hyph_*.dic file and stores the hyphenation patterns.
    Provides methods to hyphenate strings in various ways.
    Parameters:
    -filename : filename of hyph_*.dic to read
    -left: make the first syllabe not shorter than this
    -right: make the last syllabe not shorter than this
    -cache: if true (default), use a cached copy of the dic file, if possible

    left and right may also later be changed:
      h = Hyphenator(file)
      h.left = 1
    """
    def __init__(self, filename, left=2, right=2, cache=False):
        self.left  = left
        self.right = right
        if not cache or filename not in hdcache:
            hdcache[filename] = Hyph_dict(filename)
        self.hd = hdcache[filename]

    def positions(self, word):
        """
        Returns a list of positions where the word can be hyphenated.
        See also Hyph_dict.positions. The points that are too far to
        the left or right are removed.
        """
        right = len(word) - self.right
        return [i for i in self.hd.positions(word) if self.left <= i <= right]

    def iterate(self, word):
        """
        Iterate over all hyphenation possibilities, the longest first.
        """
        if isinstance(word, str):
            word = word.decode('latin1')
        for p in reversed(self.positions(word)):
            if p.data:
                # get the nonstandard hyphenation data
                change, index, cut = p.data
                if word.isupper():
                    change = change.upper()
                c1, c2 = change.split('=')
                yield word[:p+index] + c1, c2 + word[p+index+cut:]
            else:
                yield word[:p], word[p:]

    def wrap(self, word, width, hyphen='-'):
        """
        Return the longest possible first part and the last part of the
        hyphenated word. The first part has the hyphen already attached.
        Returns None, if there is no hyphenation point before width, or
        if the word could not be hyphenated.
        """
        width -= len(hyphen)
        for w1, w2 in self.iterate(word):
            if len(w1) <= width:
                return w1 + hyphen, w2

    def inserted(self, word, hyphen='-'):
        """
        Returns the word as a string with all the possible hyphens inserted.
        E.g. for the dutch word 'lettergrepen' this method returns
        the string 'let-ter-gre-pen'. The hyphen string to use can be
        given as the second parameter, that defaults to '-'.
        """
        if isinstance(word, str):
            word = word.decode('latin1')
        l = list(word)
        for p in reversed(self.positions(word)):
            if p.data:
                # get the nonstandard hyphenation data
                change, index, cut = p.data
                if word.isupper():
                    change = change.upper()
                l[p + index : p + index + cut] = change.replace('=', hyphen)
            else:
                l.insert(p, hyphen)
        return ''.join(l)

    __call__ = iterate
    
h = Hyphenator("C:\\temp\hyph_de_CH.dic")
hyn = h.inserted('Geschmacksverstärker')
print(hyn)

Habs von Internet heruntergeladen. Beim debuggen kann man es deutlich erkennen

So viel Code solltest Du auslagern in ein Pastebin.

Hier sehr beliebt ist paste.pocoo.org

Ich sehe da im Code nichts außergewöhnliches. Imho wird da ein ganz normales Dict erzeugt. Ich habe aber kein Python3 hier installiert um zu testen, was einem da so als Typ angezeigt wird... müßtest Du mal selber machen

Setze mal den Breakpoint bei der Line 109 und schaue der variable self.patterns an. dann wirst du sehen, das huge dict der Datentyp ist.
Wenn man irgendwo liest, steht es immer, python sei so einfach zum lernen, meine Meinung ist das Gegenteil. Man kann mit Python einiges schneller Projekte realisieren, aber einfach zum erlernen ist wirklich nicht. Zum Beispiel

Code: Alles auswählen

points = [dint(i - 1, ref=r) for i, r in enumerate(res) if r % 2]

wie soll man hier als Anfänger begreifen? Ich weis das hier eine List Comprehensions ist, aber das funktioniert keine Ahnung.

Danke kostonstyle

Ich frage mich was an ListCompr. schwer sein soll. Ist IMO ein Konstrukt, das am nächsten der sprachlich formulierten Form kommt:

Code: Alles auswählen

Lege für jeden Index+Wert aus der Aufzählung von 'res' eine dint-Instanz an, wenn der Wert ungerade ist.

(In Engl. passt es noch besser mit der Satzstellung)

Außerdem sind LCs fast äquivalent zu mathematischen Listen-/Mengenausdrücken.
Vgl:

Code: Alles auswählen

S = {x^2 : x in {0...9}}
V = (1; 2; 4; 8; ...; 2^12)
M = {x | x in S and x even}

Code: Alles auswählen

S = [x**2 for x in range(10)]
V = [2**i for i in range(13)]
M = [x for x in S if x % 2 == 0]

@kostonstyle: Dein Screenshot sagt IMHO nicht viel aus, insbesondere nicht, dass `patterns` von Type "huge dict" ist. Was ist das überhaupt für ein Debugger? In einer IDE? Der zeigt auf jeden Fall nicht an, was CPython, ob nun 2.x oder 3.x, bei `type()` liefert. Laut Quelltext ist das ein stinknormales Dictionary.

ich benutze zur zeit wing ide personal, möchte gerne zwar von netbeans python nutzen, aber es ist noch in der entwicklung. Es ist wirklich ein hug dictionary siehe screenshots.
was ich bei python sprache deutlich feststelle, es ist alles auf mathematisch bezogen im gegensatz zu abap.
Die Probleme könne man wirklich elegant lösen.....lang lebe python

kostonstyle hat geschrieben:Es ist wirklich ein hug dictionary siehe screenshots.

Ich wollte mich da nicht zu weit aus dem Fenster lehnen, da ich nur Python 2.6 installiert habe und mich mit 3.1 noch nicht gut auskenne. Aber BlackJack hat es ja schon gesagt: So etwas wie ein "Huge Dict" als Datentyp gibt es eben eigentlich nicht. Ich hatte mich ja auch schon gewundert, da es im Quellcode ganz normal als

Code: Alles auswählen

patterns = {}

definiert wird.

Also ist die Frage, was die zugrunde liegende Python-Implementierung eigentlich ist? CPython kann es ja nun demzufolge nicht sein. Ist es vielleicht Jython?

Schon mal überlegt, dass das eine reine Interpretation des Debuggers ist?!
In dem dict werden die gesamten Silbentrennungs-Regeln der deutschen Sprache hinterlegt! Das muss ja ein "riesiges" dict sein...

ice2k3 hat geschrieben:Schon mal überlegt, dass das eine reine Interpretation des Debuggers ist?!

Das glaube ich kaum. Welcher Entwickler würde so einen Unsinn in einen Debugger einbauen? Wer sich so tief damit auskennt, so etwas zu bauen, der sollte wohl auch wissen, dass man keine Datentypen "erfindet".

In dem dict werden die gesamten Silbentrennungs-Regeln der deutschen Sprache hinterlegt! Das muss ja ein "riesiges" dict sein...

Wenn es den Typen "Huge Dict" aber nun einmal nicht gibt, was soll dann der Name? Wie man ja leicht sieht, führt das zu Verwirrungen... Ich habe dazu jedenfalls nichts in der Doku gefunden. Aber evtl. kannst Du uns ja das Gegenteil beweisen?

ein glück, dass es google gibt:

http://wingware.com/psupport/wingide-1. ... gdebugdata

Wing may encounter values too large to handle - Wing will not package and transfer large sequences, arrays or strings that exceed the size limits set by preferences debug.huge-list-threshold and debug.huge-string-threshold (described in section 6.25). On the debugger display, oversized sequences and arrays are annotated as huge and <truncated> is prepended to large truncated strings.

genau, benutze wing ide, welche ide würde ihr mir empfehlen?

Ich verwende Eclipse mit Pydev-Plugin.

Das deutsche Python-Forum

Klasse importieren

Klasse importieren