Added statred assignment 2.

parent af701045
Execute pca_main.py, then eigenimages.py, then reconstruct.py. Output is saved
to file, which can then be used for the next function
#
#Lab 1. Generating Multivariate Normal Distributed Samples
De eerste opdracht bestaat uit de drie LabExercises uit de handout "Multivariate
Random Variables" (de laatste drie secties uit de handout).
Gedocumenteerde Python code is meer dan genoeg voor deze opdracht, inclusief de
plaatjes die gemaakt worden.
De Iris dataset verwijzing is fout gegaan in de opdracht (hij verwijst naar mijn
locale copie en niet naar de openbare). De goeie link is:
http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Bij de eerste opdracht word gevraagd een mu vector en covariantie matrix zelf te
bepalen. Dat is minder eenvoudig dan het lijkt. Niet elke 4x4 matrix is een
covariantiematrix. Kom je er zelf niet uit dan kun je ook gebruiken:
mu = array( [ [3],[4],[5],[6] ] )
Sigma = array(
[[ 3.01602775, 1.02746769, -3.60224613, -2.08792829],
[ 1.02746769, 5.65146472, -3.98616664, 0.48723704],
[ -3.60224613, -3.98616664, 13.04508284, -1.59255406],
[ -2.08792829, 0.48723704, -1.59255406, 8.28742469]] )
Bij de laatste opdracht moet de Iris data set worden gevisualiseerd in een
scatter matrix. I.p.v. de plot opdracht is het dan handig om de scatter functie
te gebruiken. In deze functie kun je naast x en y coordinaat vectoren ook
vectoren meegeven om voor elk punt een kleur, afmeting en markertype te bepalen.
#LabExercise
PCA De tweede practicum opgave is het maken van OF LabExercise 5.2 OF
LabExercise 5.3. In feite is het (bijna) dezelfde opgave maar dan toegepast op
een andere dataset. Ingeleverd moet worden een python file met daarin verplicht
de volgende functies:
1. PCA() deze functie leest de data in en doet de PCA analyse en laat in
figure(1) het 'scree diagram' zien. (voor labE 5.1 heeft deze functie een
parameter waarmee bepaald wordt met welke dataset gewerkt wordt: data='natural'
of data='munsell')
2. EigenImages(k) deze functie plot in figure(2) de eerste k eigenvectoren (voor
LabE 5.2 als functies, voor LabE 5.3 als beelden). Wederom bij LabE 5.1 de
parameter data='xxx' die bepaald met welke dataset wordt gewerkt.
3. Reconstruct( k, sample )
1. Voor Lab5.2: Reconstruct( k, sample, data='natural'), hier is k het aantal
principale componenten dat meegenomen wordt in de reconstructie en sample is de
index in de dataset van het spectrum dat gereconstrueerd moet worden.
Het resultaat moet een figure(3) zijn met daarin het originele spectrum en het
gereconstrueerde spectrum.
2. Voor Lab 5.3 Reconstruct( k, (x,y) ), hierin is k het aantal principale
componenten dat meegenomen wordt in de reconstructie en (x,y) geeft de
coordinaten van het punt linksboven van het detail dat reconstrueerd moet
worden.
Het resultaat moet een figure(3) zijn met daarin naast elkaar het originele
beelddetail met daarnaast het gereconstrueerde detail.
This diff is collapsed.
#!/usr/bin/env python
"""
File: EigenImages.py
Class: Statistisch Redeneren 2011
Author: Joris Stork
Student nr: 6185320
Created: 18 April 2011
Modified: 18 April 2011
Plots the first k eigenvectors from the dataset specified by data='xxx'
"""
from pylab import figure, subplot, imshow, gray, sqrt
import pickle
NR_COMPONENTS = 6
EV_FILENAME = 'eigenvectors'
E_IMAGES_FILENAME = 'figure_2_eigen_images.svg'
welcome = '*** Eigenimages plotter ***'
# plots the first k eigenvectors from the dataset specified by data = 'xxx'
def eigenimages(k, vectorsfile, imagesfile):
# load the eigenvectors created by pca_main.py
file = open(vectorsfile, 'r')
U = pickle.load(file)
print '%s%s%s' % ('Loaded eigenvectors from file: \'', vectorsfile,'\'')
# pick the first k eigenvectors
kU = U[:,0:k]
kU_rows, kU_columns = kU.shape
# plot each chosen eigenvector : one 25x25 image per vector
kU_sqrt = int(sqrt(kU_rows))
fig1 = figure()
for i in range (k):
subplot(3,2,i+1)
imshow(kU[:,i].reshape(kU_sqrt,kU_sqrt))
gray()
fig1.savefig(imagesfile)
print '%s%s%s' % ('Saved eigen images to file: \'', imagesfile,'\'')
if __name__ == '__main__':
print "\n",welcome
eigenimages(NR_COMPONENTS, EV_FILENAME, E_IMAGES_FILENAME)
This diff is collapsed.
This diff is collapsed.
#!/usr/bin/env python
"""
File: PCA_Main.py
Class: Statistisch Redeneren 2011
Author: Joris Stork
Student nr: 6185320
Created: 18 April 2011
Modified: 18 April 2011
The main routine for the PCA lab exercise, and the PCA() function, which reads
in the relevant data, performs a PCA analysis, displays a 'scree diagram'.
"""
import sys
from pylab import eigh, argsort
import pickle
# choose number of relevant principal components by looking at scree diagram
K_VALUE = 6
DETAIL_D = 25
SCREE_FILE = 'figure_1_scree.pdf'
IMAGE_FILE = 'trui.png'
EV_FILE = 'eigenvectors'
BIG_MATRIX_FILE = 'big_matrix'
MEAN_FILE = 'mean'
welcome = '*** The PCA programme ***'
# R v.d. Boomgaard's wrapper to return the eigenvalues in sorted order
def sorted_eig(M):
d,U = eigh(M)
# store sorted index numbers in si
si = argsort(d)[-1::-1]
# use sorted index numbers to place elements from d in sorted order
d = d[si]
# so the same with columns of U
U = U[:,si]
print 'Sorted the eigenvalues and eigenvectors.'
return (d,U)
# converts the data file into a matrix (credit: R.vd.Boomgaard)
def PCA(img, scree_file, ev_file, mean_file, big_matrix_file):
# load the image data
img = img
print 'Loaded the image file.'
n,N = img.shape
# nr of details of size DETAIL_D X DETAIL_D in an n X m image
nr_details = (n - DETAIL_D + 1) * (N - DETAIL_D + 1)
# nr of variables in a detail
detail_size = DETAIL_D * DETAIL_D
# matrix whose columns are the (nr_details) details of the image
big_m = zeros((detail_size, nr_details), dtype='float')
printer = 0
# populate big_m with all details from img: one detail per column
sys.stdout.write('Populating the big matrix with all details')
sys.stdout.flush()
for i in range (nr_details):
for j in range (detail_size):
printer += 1
x = (int(j / DETAIL_D) % DETAIL_D) + int(i / (N - DETAIL_D + 1))
y = (j % DETAIL_D) + (i % (N - DETAIL_D))
big_m[j, i] = img[x, y]
if ((printer % 1000000) == 0):
sys.stdout.write('.')
sys.stdout.flush()
print ' Done.'
#
# now we're going to derive the covariance matrix of our big matrix using an
# incremental, memory-saving method:
#
# matrix that holds the running sum of each detail multiplied by its transpose
sigma_runner = zeros((detail_size, detail_size), dtype = 'float')
# vector that holds the running sum of the details
mean_runner = zeros((detail_size, 1), dtype = 'float').flatten(1)
# incrementally compute the runner
sys.stdout.write('Incrementally calculating Sigma and mean')
for i in range (nr_details):
sigma_runner = sigma_runner + big_m[:,i] * big_m[:,i].reshape(detail_size,1)
mean_runner = add(big_m[:,i], mean_runner)
if (i % 2000 == 0):
sys.stdout.write('.')
sys.stdout.flush()
print ' Done.'
bigm_mean = mean_runner / nr_details
temp = sigma_runner - nr_details * (bigm_mean * bigm_mean.reshape(detail_size,1))
Sigma = temp / (nr_details - 1)
# calculate eigenvalues and eigenvectors
d, U = sorted_eig(Sigma)
# plot (ordered) eigenvalues, save to pdf file
plot(d)
savefig(scree_file)
print '%s%s' % ('Saved scree diagram to ', scree_file)
file = open(big_matrix_file, 'w')
pickle.dump(big_m, file)
print '%s%s' % ('Saved matrix of image details to ', big_matrix_file)
file = open(ev_file, 'w')
pickle.dump(U, file)
print '%s%s' % ('Saved eigenvectors to ', ev_file)
file = open(mean_file, 'w')
pickle.dump(bigm_mean, file)
print '%s%s' % ('Saved mean to ', mean_file)
return U
if __name__ == '__main__':
print "\n",welcome
img = imread(IMAGE_FILE)
U = PCA(img, SCREE_FILE, EV_FILE, MEAN_FILE, BIG_MATRIX_FILE)
#eigenimages(U, K_VALUE)
#reconstruct(K_VALUE, (40,70))
#!/usr/bin/env python
"""
File: Reconstruct.py
Class: Statistisch Redeneren 2011
Author: Joris Stork
Student nr: 6185320
Created: 18 April 2011
Modified: 18 April 2011
performs reconstruction
"""
import pickle
from pylab import figure, subplot, gray, imshow, dot
REC_IMG_FILE = 'figure_3_3_details.pdf'
SCREE_FILE = 'figure_3_1_scree.pdf'
BIG_MATRIX_FILE = 'big_matrix'
EV_FILE = 'eigenvectors'
MEAN_FILE = 'mean'
E_IMAGES_FILE = 'figure_3_2_eigen_images.pdf'
welcome = '*** Reconstructor ***'
NR_COMPONENTS = 6
DETAIL_NR = 10123
DETAIL_D = 25
# performs reconstruction
def reconstruct(k, detail_nr):
# load a sample of the full image
file = open(BIG_MATRIX_FILE, 'r')
big_m = pickle.load(file)
print 'big_m'
print big_m.shape
detail = big_m[:, detail_nr]
print '%s%s%s' % ('Loaded image detail from file: \'', BIG_MATRIX_FILE,'\'')
# load eigenvectors generated by pca_main
file = open(EV_FILE, 'r')
U = pickle.load(file)
print '%s%s%s' % ('Loaded eigenvectors from file: \'', EV_FILE,'\'')
# load mean generated by pca_main
file = open(MEAN_FILE, 'r')
bigm_mean = pickle.load(file)
print '%s%s%s' % ('Loaded eigenvectors from file: \'', MEAN_FILE,'\'')
# translate + transform detail to eigenbasis + remove last (detail size - k) elements
temp = detail - bigm_mean
eig_detail = dot(U.transpose(), temp)
print 'eig_detail'
print eig_detail.shape
eig_detail_k = eig_detail[0:k]
# transform + translate back to reconstruct
kU = U[:,0:k]
reconstructed = dot(kU, eig_detail_k)
reconstructed = reconstructed + bigm_mean
reconstructed = reconstructed.reshape(DETAIL_D, DETAIL_D)
# plot original detail and reconstructed detail side by side, save to file
detail = detail.reshape(DETAIL_D, DETAIL_D)
fig3 = figure(3)
subplot(1,2,1)
imshow(detail)
gray()
subplot(1,2,2)
imshow(reconstructed)
gray()
fig3.savefig(REC_IMG_FILE)
print '%s%s%s' % ('Saved detail and reconstructed detail to: \'',
REC_IMG_FILE,'\'')
if __name__ == '__main__':
print "\n",welcome
reconstruct(NR_COMPONENTS, DETAIL_NR)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment