Print Argumente erkennen

Mezis · Donnerstag 30. November 2023, 13:11

SEND HELP!
Vorab: Vielen lieben dank
Bin ein totaler Anfänger der erst 2weeks Programmiert

ich habe diesen Code geschrieben:

def print_argumente_erkennen(zeile, key):
parts = zeile.split('("', 1)
if len(parts) > 1:
rest_parts = parts[1].split('")', 1)
if len(rest_parts) > 1:
argument, rest = rest_parts
return obfuscation('("' + argument + '")', key) + rest
else:
return zeile
else:
return zeile

Das Problem bei dem Code ist er Obfuskiert zu viel!
Er darf Sachen wie Input, print erst nach der klammer Obfuskieren.(Im allgemeinen Befehle darf er nicht Obfuskieren)
Der Code muss nach dem Obfuskieren noch lauffähig sein.
Hier ein paar Beispiele was passiert:
Orginal:

ausgangsdatei = input("Bitte den Dateinamen der zu verschleiernden Datei angeben [script.py]: ")

if ausgangsdatei:
if os.path.exists(ausgangsdatei):
neuedatei_erstellen(ausgangsdatei)
else:
print("Die angegebene Datei existiert nicht.")
else:
print("Dateiauswahl abgebrochen")

Obfuskiert:
_Yy ««W¥W{« ¥¤¥W©W±¬W©ª£ ©¥¥W{« W¥¥Wª© §«e§°qWY`

if ausgangsdatei:
if os.path.exists(ausgangsdatei):
¥¬« ©ª«££¥
else:
_Y{ W¥¥W{« W¯ ª« ©«W¥ «eY`
else:
_Y{« ¬ª®£W©¦¥Y`

Sirius3 · Donnerstag 30. November 2023, 13:56

Bitte benutze Code-Tags, damit der Code im Forum auch lesbar wird.
Warum machst Du nicht bei Deinem anderen Beitrag (viewtopic.php?p=425083#p425083) weiter? Dort ist auch die zugehörige obfuscation-Funktion zu finden, die wie dort angesprochen einige Probleme hat.

Wenn Du es richtig machen möchtest, müßtest Du den kompletten Python-Code parsen, zumindest aber Strings sicher identifizieren können, also das `tokenizer`-Modul benutzen.

__blackjack__ · Donnerstag 30. November 2023, 14:15

@Mezis: Ich denke für 2 Wochen Python hast Du Dir damit ein bisschen zu viel vorgenommen. Das ist nicht so einfach wie es auf den ersten Blick aussieht. Es gibt verschiedene Arten literale Zeichenketten zu schreiben, die man berücksichtigen muss. Man muss beachten, das etwas das so *aussieht* wie Zeichenketten auch *in* *Zeichenketten* oder *Kommentaren* vorkommen kann.

Im Betreff steht `print()` Argumente, im Text dann aber auch `input()`‽ Wenn man Probleme lösen will, ist eine präzise und vollständige Problembeschreibung der erste Schritt. Also `print()` und `input()`? Oder vielleicht noch mehr? Nur *einzelne* literale Zeichenketten als Positionsargumente, oder auch Ausdrücke in denen literale Zeichenketten vorkommen? Also bei ``input(prompt + ": ")``, soll da ": " verarbeitet werden oder nicht?

Was ist denn überhaupt das Ziel von dem ganzen?

__blackjack__ · Freitag 15. Dezember 2023, 13:06

Mal als Beispiel etwas das ein bisschen einfacher ”obfuskiert”, in dem einfach nur alle nicht-whitespace-Zeichen durch ein ”Zensurzeichen” ersetzt werden:

Code: Alles auswählen

#!/usr/bin/env python3
import io
import re
import sys
from tokenize import ENCODING, ERRORTOKEN, STRING, tokenize

STRING_RE = re.compile(
    r"""^([^'"]*('{3}|"{3}|'|"))(.*)(\2)$""", re.MULTILINE | re.DOTALL
)
"""
Regular expression that matches syntactically correct Python string literals,
including possible prefixes (f, r, u, …).
"""

def is_encodable(text, encoding):
    """
    Test if given `text` can be encoded with given `encoding`.
    
    >>> is_encodable("abc", "ascii")
    True
    >>> is_encodable("█", "ascii")
    False
    >>> is_encodable("█", "cp437")
    True
    >>> is_encodable("█", "utf-8")
    True
    """
    try:
        text.encode(encoding)
    except UnicodeEncodeError:
        return False

    return True


def redact_string(string_representation, redaction_character):
    r"""
    Redact the content of a given Python `string_representation` with the given
    `redaction_character`.  Only non-whitespace characters are replaced with the
    character.

    >>> redact_string('"abc def"', "█")
    '"███ ███"'
    >>> redact_string("f'This is an f-string.'", "#")
    "f'#### ## ## #########'"
    >>> redact_string("'''multi\nline\nstring'''", "X")
    "'''XXXXX\nXXXX\nXXXXXX'''"
    """
    return STRING_RE.sub(
        lambda match: (
            match[1] + re.sub(r"\S", redaction_character, match[3]) + match[4]
        ),
        string_representation,
    )


def get_lines(lines, start_position, end_position=(None, None)):
    """
    Get lines from `start_position` to `end_position`, or `start_position` to
    the end of the text if `end_position` is not given.
    
    Start and end are given as tuples of line number and column number, and the
    first and last line are sliced at the column numbers.
    
    Line numbers start at 1 and column numbers at 0!
    
    >>> lines = ["first", "second", "third", "fourth"]
    >>> get_lines(lines, (2, 1), (4, 3))
    ['econd', 'third', 'fou']
    >>> get_lines(lines, (3, 2))
    ['ird', 'fourth']
    """
    start_line, start_column = start_position
    end_line, end_column = end_position

    result = lines[slice(start_line - 1, end_line)]

    if len(result) == 1:
        return [result[0][slice(start_column, end_column)]]
    else:
        result[0] = result[0][slice(start_column, None)]
        result[-1] = result[-1][slice(None, end_column)]

    return result


def analyze(source_bytes):
    """
    Get encoding and string tokens from given Python source.
    
    >>> analyze(b'print("Hello, World!")')
    ('utf-8', [TokenInfo(type=3 (STRING), string='"Hello, World!"', start=(1, 6), end=(1, 21), line='print("Hello, World!")')])
    """
    tokens = tokenize(io.BytesIO(source_bytes).readline)
    encoding_token = next(tokens)
    if encoding_token.type != ENCODING:
        raise ValueError(f"expected encoding, got {encoding_token!r}")

    encoding = encoding_token.string

    string_tokens = []
    for token in tokens:
        if token.type == ERRORTOKEN:
            raise SyntaxError(f"{token.string} in line {token.start[0]}")

        if token.type == STRING:
            string_tokens.append(token)

    return encoding, string_tokens


def redact_strings(source_bytes, encoding, string_tokens):
    r"""
    Redact the `string_tokens` in given `source_bytes`.  `encoding` is used to
    decode the input and encode the output.
    
    >>> source = b'''\
    ... # coding: ascii
    ... print("Hello, World!")
    ... '''
    >>> encoding, string_tokens = analyze(source)
    >>> redact_strings(source, encoding, string_tokens)
    b'# coding: ascii\nprint("XXXXXX XXXXXX")\n'
    """
    redaction_character = "█" if is_encodable("█", encoding) else "X"
    source_lines = source_bytes.decode(encoding).splitlines(keepends=True)
    result = []
    start_position = 1, 0
    for token in string_tokens:
        end_position = token.start
        #
        # Copy source code before the current string token as is.
        # 
        result.extend(get_lines(source_lines, start_position, end_position))
        result.append(redact_string(token.string, redaction_character))
        start_position = token.end
    #
    # Copy source code after last string token as is.
    # 
    result.extend(get_lines(source_lines, start_position))

    return "".join(result).encode(encoding)


def main():
    source_bytes = sys.stdin.buffer.read()
    encoding, string_tokens = analyze(source_bytes)
    sys.stdout.buffer.write(
        redact_strings(source_bytes, encoding, string_tokens)
    )


if __name__ == "__main__":
    main()

Das ganze auf sich selbst angewendet:

Code: Alles auswählen

#!/usr/bin/env python3
import io
import re
import sys
from tokenize import ENCODING, ERRORTOKEN, STRING, tokenize

STRING_RE = re.compile(
    r"""█████████████████████████████████""", re.MULTILINE | re.DOTALL
)
"""
███████ ██████████ ████ ███████ █████████████ ███████ ██████ ██████ █████████
█████████ ████████ ████████ ███ ██ ██ ███
"""

def is_encodable(text, encoding):
    """
    ████ ██ █████ ██████ ███ ██ ███████ ████ █████ ███████████
    
    ███ ███████████████████ ████████
    ████
    ███ █████████████████ ████████
    █████
    ███ █████████████████ ████████
    ████
    ███ █████████████████ ████████
    ████
    """
    try:
        text.encode(encoding)
    except UnicodeEncodeError:
        return False

    return True


def redact_string(string_representation, redaction_character):
    r"""
    ██████ ███ ███████ ██ █ █████ ██████ ███████████████████████ ████ ███ █████
    ██████████████████████  ████ ██████████████ ██████████ ███ ████████ ████ ███
    ██████████

    ███ ███████████████████ ██████ ████
    █████ █████
    ███ █████████████████████ ██ ██ ████████████ ████
    ███████ ██ ██ ███████████
    ███ ██████████████████████████████████████████ ████
    ███████████████████████████
    """
    return STRING_RE.sub(
        lambda match: (
            match[1] + re.sub(r"██", redaction_character, match[3]) + match[4]
        ),
        string_representation,
    )


def get_lines(lines, start_position, end_position=(None, None)):
    """
    ███ █████ ████ ████████████████ ██ ███████████████ ██ ████████████████ ██
    ███ ███ ██ ███ ████ ██ ██████████████ ██ ███ ██████
    
    █████ ███ ███ ███ █████ ██ ██████ ██ ████ ██████ ███ ██████ ███████ ███ ███
    █████ ███ ████ ████ ███ ██████ ██ ███ ██████ ████████
    
    ████ ███████ █████ ██ █ ███ ██████ ███████ ██ ██
    
    ███ █████ █ █████████ █████████ ████████ █████████
    ███ ████████████████ ███ ███ ███ ███
    █████████ ████████ ██████
    ███ ████████████████ ███ ███
    ███████ █████████
    """
    start_line, start_column = start_position
    end_line, end_column = end_position

    result = lines[slice(start_line - 1, end_line)]

    if len(result) == 1:
        return [result[0][slice(start_column, end_column)]]
    else:
        result[0] = result[0][slice(start_column, None)]
        result[-1] = result[-1][slice(None, end_column)]

    return result


def analyze(source_bytes):
    """
    ███ ████████ ███ ██████ ██████ ████ █████ ██████ ███████
    
    ███ ███████████████████████ ██████████
    █████████ █████████████████ █████████ ███████████████ █████████ █████████ ███ ███████ ████ ███████████████████ ████████████
    """
    tokens = tokenize(io.BytesIO(source_bytes).readline)
    encoding_token = next(tokens)
    if encoding_token.type != ENCODING:
        raise ValueError(f"████████ █████████ ███ ██████████████████")

    encoding = encoding_token.string

    string_tokens = []
    for token in tokens:
        if token.type == ERRORTOKEN:
            raise SyntaxError(f"██████████████ ██ ████ ████████████████")

        if token.type == STRING:
            string_tokens.append(token)

    return encoding, string_tokens


def redact_strings(source_bytes, encoding, string_tokens):
    r"""
    ██████ ███ ███████████████ ██ █████ ███████████████  ██████████ ██ ████ ██
    ██████ ███ █████ ███ ██████ ███ ███████
    
    ███ ██████ █ █████
    ███ █ ███████ █████
    ███ █████████████ ████████
    ███ ███
    ███ █████████ █████████████ █ ███████████████
    ███ ██████████████████████ █████████ ██████████████
    ███ ███████ ████████████████████ ███████████
    """
    redaction_character = "█" if is_encodable("█", encoding) else "█"
    source_lines = source_bytes.decode(encoding).splitlines(keepends=True)
    result = []
    start_position = 1, 0
    for token in string_tokens:
        end_position = token.start
        #
        # Copy source code before the current string token as is.
        # 
        result.extend(get_lines(source_lines, start_position, end_position))
        result.append(redact_string(token.string, redaction_character))
        start_position = token.end
    #
    # Copy source code after last string token as is.
    # 
    result.extend(get_lines(source_lines, start_position))

    return "".join(result).encode(encoding)


def main():
    source_bytes = sys.stdin.buffer.read()
    encoding, string_tokens = analyze(source_bytes)
    sys.stdout.buffer.write(
        redact_strings(source_bytes, encoding, string_tokens)
    )


if __name__ == "████████":
    main()