Neuronales Netzwerk trainiert zu langsam

nooby · Dienstag 3. Mai 2016, 18:45

Hallo zusammne

Ich habe ein neuronales Netz in Python implementiert. Momentan trainiere ich es auf die MNIST-Datenbank.
Das Netz besteht aus 784 Input, 200 Hidden und 10 Output-Neuronen.
Leider ist das Training sehr sehr langsam... habt ihr mir Tipps, wie ich die Performance verbessern kann?

[Pastebin]http://pastebin.com/UBAgfbzN[/Pastebin]

Vielen Dank für eure Hilfe.

BlackJack · Mittwoch 4. Mai 2016, 08:55

@nooby: Ein paar Anmerkungen zum Quelltext:

Die drei `*_dim`-Attribute werden nirgends verwendet.

Was bedeuten die ganzen ``np.array([])``-Werte in der `__init__()`? Wird mit diesen Werten tatsächlich gerechnet oder sollten das eigentlich `None`-Werte sein, für nicht gesetzte Werte, die dann auch entsprechend sofort auf die Nase fallen wenn man versucht etwas damit zu rechnen‽

Bei den Methoden `__tanh()` und `__softmax()` ist jeweils ein Unterstrich zu viel im Namen. Beides sind zudem überhaupt keine Methoden. `__softmax` wird nirgends verwendet, und `__tanh` macht je nach übergebenem Flag etwas anderes, das sind also eigentlich zwei Funktionen. Die allerdings so trivial sind, dass man sie direkt in den Quelltext schreiben könnte.

Einige Zeilen sind länger als 80 Zeichen.

`Exception` auszulösen ist keine gute Idee. Wie soll denn der Aufrufer *da* sinnvoll und vor allem gezielt drauf reagieren können?

Was soll ``// 1`` bewirken wenn das Ergebnis als nächstes der `int()`-Funktion übergeben wird?

`activate()` ruft nur `forward_propagate()` mit den gleichen Argumenten auf, ist also eigentlich nur ein anderer Name für `forward_propagate()`.

Ich lande dann ungefähr hier:

Code: Alles auswählen

import pickle
import numpy as np


class FeedForwardNetwork(object):

    def __init__(self, input_dim, hidden_dim, output_dim):
        # 
        # TODO Are those arrays really neccessary or better replaced
        #   by `None`‽
        # 
        self.input_layer = np.array([])
        self.hidden_layer = np.array([])
        self.output_layer = np.array([])
        self.weights_input_hidden = (
            (2 * np.random.random((input_dim, hidden_dim)) - 1) / 1000
        )
        self.weights_hidden_output = (
            (2* np.random.random((hidden_dim, output_dim)) - 1) / 1000
        )
        # 
        # TODO Are those arrays really neccessary or better replaced
        #   by `None`‽
        # 
        self.validation_data = np.array([])
        self.validation_data_solution = np.array([])

    def set_training_data(self, training_data_input, training_data_target):
        """Splits the data up into training and validation data with a
        ratio of 0.75/0.25 and sets the data for training.
        """
        if len(training_data_input) != len(training_data_target):
            raise ValueError(
                'Number of training examples and'
                ' training targets does not match!'
            )
        len_training_data = int(len(training_data_input) / 100 * 75)
        self.input_layer = training_data_input[:len_training_data]
        self.output_layer = training_data_target[:len_training_data]
        self.validation_data = np.array(
            [training_data_input[len_training_data:]]
        )
        self.validation_data_solution = np.array(
            [training_data_target[len_training_data:]]
        )

    def save(self, filename):
        """Saves the weights into a pickle file."""
        with open(filename, 'wb') as network_file:
            pickle.dump(self.weights_input_hidden, network_file)
            pickle.dump(self.weights_hidden_output, network_file)

    def load(self, filename):
        """Loads network weights from a pickle file."""
        with open(filename, 'rb') as network_file:
            weights_input_hidden = pickle.load(network_file)
            weights_hidden_output = pickle.load(network_file)
            
        if (
            len(weights_input_hidden) != len(self.weights_input_hidden)
            or len(weights_hidden_output) != len(self.weights_hidden_output)
        ):
            raise ValueError(
                'File contains weights that does not'
                ' match the current networks size!'
            )
        self.weights_input_hidden = weights_input_hidden
        self.weights_hidden_output = weights_hidden_output

    def measure_error(self, input_data, output_data):
        return 0.5 * np.sum((output_data - self.activate(input_data))**2)

    def forward_propagate(self, input_data):
        """Proceeds the input data from input neurons up to output
        neurons and returns the output layer.
        """
        self.hidden_layer = np.tanh(
            np.dot(input_data, self.weights_input_hidden)
        )
        return np.tanh(np.dot(self.hidden_layer, self.weights_hidden_output))

    #: Sends the given input through the net and returns the net's
    #: prediction.
    activate = forward_propagate

    def back_propagate(self, input_data, output_data, eta):
        """Calculates the difference between target output and output
        and adjusts the weights to fit the target output better.
        
        The parameter eta is the learning rate.
        """
        sample_count = len(input_data)
        output_layer = self.forward_propagate(input_data)
        output_layer_error = output_data - output_layer
        output_layer_delta = output_layer_error * (1 - np.tanh(output_layer)**2)
        # 
        # How much did each hidden neuron contribute to the output
        # error?  Multiplys delta term with weights.
        # 
        hidden_layer_error = output_layer_delta.dot(
            self.weights_hidden_output.T
        )
        # If the prediction is good, the second term will be small and
        # the change will be small.
        # 
        # Ex: target: 1 -> Slope will be 1 so the second term will be big.
        # 
        hidden_layer_delta = (
            hidden_layer_error * (1 - np.tanh(self.hidden_layer)**2)
        )
        # Both lines return a matrix.  A row stands for all weights
        # connected to one neuron.
        # 
        # E.g. [1, 2, 3] -> Weights to Neuron A
        #      [4, 5, 6] -> Weights to Neuron B
        # 
        hidden_weights_change = (
            self.input_layer.T.dot(hidden_layer_delta) / sample_count
        )
        output_weights_change = (
            self.hidden_layer.T.dot(output_layer_delta) / sample_count
        )
        self.weights_hidden_output += (
            (output_weights_change * eta) / sample_count
        )
        self.weights_input_hidden += (
            (hidden_weights_change * eta) / sample_count
        )

    def batch_train(self, epoch_count, eta, patience=10):
        """Trains the network in batch mode.  That means the weights
        are updated after showing all training examples.
        
        Eta is the learning rate and patience is the number of epochs
        that the validation error is allowed to increase before
        aborting.
        """
        validation_error = self.measure_error(
            self.validation_data, self.validation_data_solution
        )
        for epoch in range(epoch_count):
            self.back_propagate(self.input_layer, self.output_layer, eta)
            validation_error_new = self.measure_error(
                self.validation_data, self.validation_data_solution
            )
            if  validation_error_new < validation_error:
                validation_error = validation_error_new
            else:
                patience -= 1
                if patience == 0:
                    print(
                        'Abort Training. Overfitting has started!'
                        ' Epoch: {0}. Error: {1}'.format(
                            epoch, validation_error_new
                        )
                    )
                    return
            print('Epoch: {0}, Error: {1}'.format(epoch, validation_error))

Was die Performance angeht würde ich als erstes mal messen wo die meiste Zeit verbraucht wird und was wie oft ausgeführt wird, um den Punkt zu finden wo Verbesserungen/Veränderungen am meisten Wirkung zeigen können.

nooby · Mittwoch 4. Mai 2016, 22:10

Vielen Dank für deine Anmerkungen.

Betreffend np.empty() hast du absolut recht, das genügt!
Soll ich also die Exceptions weglassen und keine Fehler abfangen, so, dass einfach numpy error geworfen wird, bei falschen Eingaben?
//1 ist ist eine Floor Division und gibt einen float zurück. Ich benötige das Ergebnis aber als int. Soll ich das ganze anders programmieren?

Dav1d · Mittwoch 4. Mai 2016, 22:43

nooby hat geschrieben://1 ist ist eine Floor Division und gibt einen float zurück. Ich benötige das Ergebnis aber als int. Soll ich das ganze anders programmieren?

Ich hab' jetzt nur den Satz gelesen: das ist ein Float weil zumindest einer der Operanden ein Float ist, du kannst das Ergebnis aber ganz einfach via `int()` zu einem machen. `int(foo // 1)`. Was du aber wahrscheinlich wirklich suchst ist aber wahrscheinlich math.floor.

BlackJack · Mittwoch 4. Mai 2016, 22:47

@nooby: `np.empty()`?

Wieso Exceptions weglassen? Das Problem sind nicht die Ausnahmen sondern das Du da `Exception` auslöst. Also die allgemeine absolut nichtssagende Ausnahme die man nicht wirklich sinnvoll behandeln kann weil ein ``except Exception:`` nicht nur Deine behandelt sondern *alle anderen auch*. Ich habe die deshalb zumindest mal durch `ValueError` ersetzt.

Ich weiss was ``// 1`` macht (dachte ich zumindest bis eben). Ich weiss aber auch was `int()` macht. Und wenn Du wirklich den Effekt von dem // 1 bei negativen Zahlen haben willst, dann würde ich das expliziter machen, mit der `math.floor()`-Funktion.

nooby · Dienstag 10. Mai 2016, 11:04

@BlackJack: Dann werde ich das ganze über Math.floor machen anstatt über //1.
Vielen Dank.