Problem mit Linear Discriminant Analysis

me4gqp · Donnerstag 30. Mai 2019, 12:29

Hi,
ich habe Probleme bei einer Abgabe.

In this assignment you will estimate cognitive states from electroencephalogram (EEG) data. The data matrix X contains 5 selected time windows of EEG activity at 62 electrodes after a visual stimulus was presented on the screen in front of the subject. If the first row of 𝑌 is 1, the stimulus was a target stimulus, if the second row of 𝑌 is 1, the stimulus was a non-target stimulus.

Code: Alles auswählen

import pylab as pl
import scipy as sp
from scipy.linalg import eig
from scipy.io import loadmat
from sklearn.model_selection import train_test_split

def load_data(fname):
    # load the data
    data = loadmat(fname)
    # extract images and labels
    X = data['X']
    Y = data['Y']
    # collapse the time-electrode dimensions
    X = sp.reshape(X,(X.shape[0]*X.shape[1],X.shape[2])).T
    # transform the labels to (-1,1)
    Y = sp.sign((Y[0,:]>0) -.5)
    return X,Y

X,Y = load_data(fname='bcidata.mat')

Code: Alles auswählen

def ncc_fit(X, Y):
    '''
    Train a nearest centroid classifier for N data points in D dimensions
    
    Input: 
    X N-by-D Data Matrix
    Y label vector of length N, labels are -1 or 1
    Output: 
    w weight vector of length D
    b bias vector of length D
    '''
    # class means
    # IMPLEMENT CODE HERE
    mupos = 
    muneg =
    w = 
    b = (w.dot(mupos) + w.dot(muneg))/2.
    # return the weight vector
    return w,b

X_train, X_test, Y_train, Y_test = train_test_split(X,Y)

w_ncc, b_ncc = ncc_fit(X_train, Y_train)

pl.hist(X_test[Y_test<0, :] @ w_ncc)
pl.hist(X_test[Y_test>0, :] @ w_ncc)
pl.plot([b_ncc, b_ncc], [0, 500], color='k')
pl.xlabel('$Xw_{NCC}$')
pl.legend(('$b_{ncc}$','non-target','target'))
pl.ylim([0, 450])
acc = int((sp.sign(X_test @ w_ncc - b_ncc)==Y_test).mean()*100)
pl.title(f"NCC Acc {acc}%");

Train a linear discriminant classifier and compare it with the NCC one.

Code: Alles auswählen

def lda_fit(X,Y):
    '''
    Train a Linear Discriminant Analysis classifier
    
    Input: 
    X N-by-D Data Matrix
    Y label vector of length N, labels are -1 or 1
    Output: 
    w weight vector of length D
    b bias vector of length D
    '''
    # class means
    # IMPLEMENT CODE HERE
    mupos = ...
    muneg = ...
    
    # D-by-D inter class covariance matrix (signal)
    Sinter = ...
    # D-by-D intra class covariance matrices (noise)
    Sintra = ...
    # solve eigenproblem
    eigvals, eigvecs = sp.linalg.eig(Sinter,Sintra)
    w = eigvecs[:,eigvals.argmax()]
    # bias term
    b = (w.dot(mupos) + w.dot(muneg))/2.
    # return the weight vector
    return w,b

Über Hilfe freue ich mich.

__deets__ · Donnerstag 30. Mai 2019, 12:32

Ist das dein Ernst? Einfach die komplette Aufgabenstellung reinstellen, und erwarten, dass dir da jemand erledigt? Was an deinem Problem ist dir genau nicht klar? Was hast du versucht?

me4gqp · Donnerstag 30. Mai 2019, 12:43

Ich habe folgendes versucht:

Code: Alles auswählen

mupos = np.zeros((len(np.unique(Y)),X.shape[1]))
muneg = np.zeros((len(np.unique(Y)),X.shape[0]))

Der w vektor ist die Differenz von mupos und muneg. Nur kann ich die nicht subtrahieren, da die Vektoren unterschiedlich groß sind.
Was kann man da tun ?

__deets__ · Donnerstag 30. Mai 2019, 12:52

Und was soll mupos und muneg sein?

me4gqp · Donnerstag 30. Mai 2019, 12:54

mupos= Y label vector of length N 1
muneg= Y label vector of length N -1

__deets__ · Donnerstag 30. Mai 2019, 12:58

Mir fehlt da eine Operation bei mupos. Soll da plus oder minus stehen? Und wenn Y die Quelle der Laenge sein soll, warum benutzt du dann X? Und warum mal die eine Dimension und mal die andere?

me4gqp · Donnerstag 30. Mai 2019, 13:08

bei mupos soll Y label vector of length N +1 stehen.

Y stellt die unterschiedlichen Labels vor.
Der Ausdruck

Code: Alles auswählen

print(Y)
print(np.unique(Y))
print(Y.size)

liefert das Ergebnis:
[-1. -1. 1. ... 1. -1. -1.]
[-1. 1.]
5322
Also es ist ein 1-dimensionales Array, bestehend aus der Zuordnung -1 und 1 und insgesamt gibt es 5322 Einträge.
Ich benutze bei X die unterschiedlichen DImensionen um diese Einträge zuzuordnen.

__deets__ · Donnerstag 30. Mai 2019, 13:20

Und warum benutzt du unique? Wozu soll das gut sein in deinem Fall?

me4gqp · Donnerstag 30. Mai 2019, 13:32

Gebe ich ein:

Code: Alles auswählen

 mupos = np.zeros((len(Y),X.shape[0]))
    muneg = np.zeros((len(Y),X.shape[0]))

bekomme ich als Fehlermeldung:

Code: Alles auswählen

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3991 is different from 310)

Aber ich bin für VOrschläge offen.