mal wieder ein RE Problem...

jens · Mittwoch 9. April 2008, 09:46

Erstmal code:

import re

text = """
Tabelle 1 start
|Cell 1.1 |Cell 1.2 |
|Cell 2.1 |Cell 2.2 |
Tabelle 1 ende

Tabelle 2 start
|Cell 1.1 |Cell 1.2 |
|Cell 2.1 |Cell 2.2 |
Tabelle 2 ende
"""
text = re.sub(r'(?ms)(^\|.*?\|$)', r">>>\1<<<", text)
print text

Ausgabe:

Code: Alles auswählen

Tabelle 1 start
>>>|Cell 1.1 |Cell 1.2 |<<<
>>>|Cell 2.1 |Cell 2.2 |<<<
Tabelle 1 ende

Tabelle 2 start
>>>|Cell 1.1 |Cell 1.2 |<<<
>>>|Cell 2.1 |Cell 2.2 |<<<
Tabelle 2 ende

Schön und gut... Ich möchte aber nicht das Zeile für Zeile gefunden wird, sondern direkt der ganze Block... Also so:

Code: Alles auswählen

Tabelle 1 start
>>>|Cell 1.1 |Cell 1.2 |
|Cell 2.1 |Cell 2.2 |<<<
Tabelle 1 ende

Tabelle 2 start
>>>|Cell 1.1 |Cell 1.2 |
|Cell 2.1 |Cell 2.2 |<<<
Tabelle 2 ende

Wie kann man das machen? Ich bekomme es einfach nicht hin

z.Z. fällt mir nur eine nicht-re-Lösung ein...

helduel · Mittwoch 9. April 2008, 09:54

Moin,

meinst du sowas:

Code: Alles auswählen

import re
print re.findall(r'Tabelle (\d+) start\s+(.*)\s+Tabelle \1 ende', text, re.S)
[('1', '|Cell 1.1 |Cell 1.2 |\n|Cell 2.1 |Cell 2.2 |'), ('2', '|Cell 1.1 |Cell 1.2 |\n|Cell 2.1 |Cell 2.2 |')]

Gruß,
Manuel

sma · Mittwoch 9. April 2008, 10:02

Scheint mir ein schöner Fall für einen negativen Lookahead zu sein: `r"(?ms)^\|.*?\|\n(?!\|)"`.

Stefan

jens · Mittwoch 9. April 2008, 10:21

@helduel: Nein, das suche ich nicht. Ich kenne den Text vor und nach einer Tabelle nicht...
@sma: Danke, das scheint das richtige zu sein!

Das ganze sieht nun so ähnlich aus:

Code: Alles auswählen

import re

text = """
Tabelle 1 start
|=head1   |=head2   |
|Cell 1.1 |Cell 1.2 |
|Cell 2.1 |Cell 2.2 |
Tabelle 1 ende

Tabelle 2 start
|Cell 1.1 |Cell 1.2 |Cell 1.3 |
|Cell 2.1 |Cell 2.2 |Cell 2.3 |
Tabelle 2 ende
"""

def table(matchobj):
    text = matchobj.group(1)

    result = ""
    for line in text.splitlines():
        line = line.strip("|").split("|")
        result_line = ""
        for cell in line:
            if cell.startswith("="):
                tag = "th"
                cell = cell[1:]
            else:
                tag = "td"
            cell = cell.strip()
            result_line += "\t<%(t)s>%(c)s</%(t)s>\n" % {"t": tag, "c": cell}

        result += "<tr>\n%s</tr>\n" % result_line

    return '<table>\n%s</table>\n' % result


print re.sub(r"(?ms)(^\|.*?\|\n(?!\|))", table, text)

Liefert dann das:

Code: Alles auswählen

Tabelle 1 start
<table>
<tr>
	<th>head1</th>
	<th>head2</th>
</tr>
<tr>
	<td>Cell 1.1</td>
	<td>Cell 1.2</td>
</tr>
<tr>
	<td>Cell 2.1</td>
	<td>Cell 2.2</td>
</tr>
</table>
Tabelle 1 ende

Tabelle 2 start
<table>
<tr>
	<td>Cell 1.1</td>
	<td>Cell 1.2</td>
	<td>Cell 1.3</td>
</tr>
<tr>
	<td>Cell 2.1</td>
	<td>Cell 2.2</td>
	<td>Cell 2.3</td>
</tr>
</table>
Tabelle 2 ende