Neuronales Netz schneller trainieren lassen

Wenn du dir nicht sicher bist, in welchem der anderen Foren du die Frage stellen sollst, dann bist du hier im Forum für allgemeine Fragen sicher richtig.
Antworten
uLocked
User
Beiträge: 27
Registriert: Dienstag 9. Februar 2021, 10:29

Hallo,

ich bin gerade dabei ein Neuronales Netz zu trainieren. Das Problem dabei ist, dass es mit der jetzigen Datenmenge sehr lange braucht, etwa 40 Minuten pro Epoche.

Code: Alles auswählen

import os
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras.optimizers import RMSprop
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import TensorBoard
from PIL import ImageFile

ImageFile.LOAD_TRUNCATED_IMAGES = True

CLASS_NAME = ["Klasse_1", "Klasse_2",  "Klasse_3"]

train_datagen = ImageDataGenerator(rescale=1. / 255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)
# Vorverarbeiten der Testdaten
test_datagen = ImageDataGenerator(rescale=1. / 255)

BS = 128

# Batch Size: Wieviele Bilder pro durchgang verwendet werden
# Trainingsdaten erschaffen
training_set = train_datagen.flow_from_directory(r"D:/Bilder/Train",
                                                 target_size=(128, 128),
                                                 batch_size=BS,
                                                 classes=CLASS_NAME,
                                                 class_mode="categorical")
# Testdaten erschaffen
test_set = test_datagen.flow_from_directory(r"D:/Bilder/Test",
                                            target_size=(128, 128),
                                            batch_size=BS,
                                            classes=CLASS_NAME,
                                            class_mode="categorical")


dense_layers = [1]  # Dense Layer = Anzahl Output Neuronen
layer_sizes = [64]  # Anzahl der Neuronen im Layer
conv_layers = [3]  # Anzahl faltener Layer

for dense_layer in dense_layers:
    for layer_size in layer_sizes:
        for conv_layer in conv_layers:
            NAME = "{}-conv-{}-nodes-{}-dense-{}".format(conv_layer, layer_size, dense_layer, int(time.time()))
            print(NAME)

            tensorBoard = TensorBoard(log_dir="logs/{}".format(NAME))

            model = Sequential()
            # Input Layer
            model.add(Conv2D(16, (3, 3), activation="relu", input_shape=(128, 128, 3)))
            model.add(MaxPooling2D(2, 2))
            for l in range(conv_layer - 1):
                model.add(Conv2D(layer_size, (3, 3), activation="relu"))
                model.add(MaxPooling2D(2, 2))

            model.add(Flatten())
            for l in range(dense_layer):
                model.add(Dense(dense_layer, activation="relu"))

            model.add(Dropout(0.1))
            model.add(Dense(3, activation="softmax"))
            model.summary()

            opt = RMSprop(lr=0.001)
            model.compile(loss="categorical_crossentropy",
                          optimizer=opt,
                          metrics=["accuracy"])

            model.fit(training_set, batch_size=BS, epochs=3, callbacks=[tensorBoard], verbose=1, validation_data=test_set)

            # Speichern des Models

model.save(NAME + ".model")

Hier ist der Code dazu. An der Batch Size ausprobieren bringt leider nicht viel. Hat sonst noch jemand eine Idee?
uLocked
User
Beiträge: 27
Registriert: Dienstag 9. Februar 2021, 10:29

Ich habe vergessen zu erwähnen, dass ich pro Klasse 10000 Bilder habe.
Benutzeravatar
ThomasL
User
Beiträge: 1366
Registriert: Montag 14. Mai 2018, 14:44
Wohnort: Kreis Unna NRW

Für sowas ist es von Vorteil wenn man eine Nvidia GPU im Desktop/Laptop hat. Diese kann von Tensorflow über CUDA Treiber benutzt werden und beschleunigt das Trainieren erheblich.

Wenn du keine hast, kannst du auch kostenlos Google Colab benutzen: https://colab.research.google.com/noteb ... come.ipynb

Des Weiteren wäre es hilfreich wenn deine Bilddaten auf einem Datenträger liegen, der eine hohe Leserate hat, z.B. einer SSD.
Wenn du einen großen Hauptspeicher hast, kannst du einen Teil davon als RAMDISK benutzen und die Bilder da ablegen. Schneller geht´s dann nicht mehr.
Ich bin Pazifist und greife niemanden an, auch nicht mit Worten.
Für alle meine Code Beispiele gilt: "There is always a better way."
https://projecteuler.net/profile/Brotherluii.png
uLocked
User
Beiträge: 27
Registriert: Dienstag 9. Februar 2021, 10:29

CUDA nutze ich schon. Ich weiß aber nicht, ob dies optimal läuft.

Code: Alles auswählen

2021-10-19 16:12:20.438584: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2021-10-19 16:12:20.438744: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2021-10-19 16:12:20.463844: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1614] Profiler found 1 GPUs
2021-10-19 16:12:20.464792: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti64_112.dll'; dlerror: cupti64_112.dll not found
2021-10-19 16:12:20.465642: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti.dll'; dlerror: cupti.dll not found
2021-10-19 16:12:20.465831: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1666] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2021-10-19 16:12:20.466115: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2021-10-19 16:12:20.466248: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1757] function cupti_interface_->Finalize()failed with error CUPTI could not be loaded or symbol could not be found.
2021-10-19 16:12:20.497424: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-19 16:12:20.959029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1321 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1
C:\Users\<user>\PycharmProjects\pythonProject\venv\lib\site-packages\keras\optimizer_v2\optimizer_v2.py:355: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  warnings.warn(
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 126, 126, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 63, 63, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 61, 61, 64)        9280      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 64)        36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 1)                 12545     
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 6         
=================================================================
Total params: 59,207
Trainable params: 59,207
Non-trainable params: 0
_________________________________________________________________
2021-10-19 16:12:26.966125: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/3
2021-10-19 16:12:35.946306: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8202
2021-10-19 16:12:38.085692: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
  1/235 [..............................] - ETA: 44:37 - loss: 1.0986 - accuracy: 0.30472021-10-19 16:12:38.594412: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2021-10-19 16:12:38.594542: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2021-10-19 16:12:38.594680: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1666] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
  2/235 [..............................] - ETA: 12:39 - loss: 1.0986 - accuracy: 0.32812021-10-19 16:12:41.779158: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2021-10-19 16:12:41.779341: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1757] function cupti_interface_->Finalize()failed with error CUPTI could not be loaded or symbol could not be found.
2021-10-19 16:12:41.781255: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 0 callback api events and 0 activity events. 
2021-10-19 16:12:41.781923: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2021-10-19 16:12:42.058144: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42

2021-10-19 16:12:42.158963: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.trace.json.gz
2021-10-19 16:12:42.166976: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42

2021-10-19 16:12:42.220108: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.memory_profile.json.gz
2021-10-19 16:12:42.512266: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42
Dumped tool data for xplane.pb to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.xplane.pb
Dumped tool data for overview_page.pb to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/3-conv-64-nodes-1-dense-1634652740\train\plugins\profile\2021_10_19_14_12_42\ODYSSEY.kernel_stats.pb

Es läuft, aber ich bekomme da eine Menge Warnungen von Tensorflow
Das mit der SSD oder dem RAMDISK ist eine gute Idee. Vielen Dank.
Benutzeravatar
ThomasL
User
Beiträge: 1366
Registriert: Montag 14. Mai 2018, 14:44
Wohnort: Kreis Unna NRW

Code: Alles auswählen

reated device /job:localhost/replica:0/task:0/device:GPU:0 with 1321 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1050

Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. 
The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Die GPU wird wohl benutzt hat aber jetzt nicht gerade viel Speicher onboard.
Ich empfehle dir wirklich mal Google Colab auszuprobieren.

Eventuell könnte eine Reduzierung der Batchsize von 128 auf 16 oder 32 etwas helfen.
128 ist imho auch zu hoch gegriffen.
Ich bin Pazifist und greife niemanden an, auch nicht mit Worten.
Für alle meine Code Beispiele gilt: "There is always a better way."
https://projecteuler.net/profile/Brotherluii.png
Antworten