
Note
: It is important to check the call signature of the functions produced
by f2py. Issuing the command help(mxcorr) in python prompt produces
the output:
NAMEmxcorr
FILE
/home/ahsan/MEGA/javaProgramsPostProcessing/NeutralChannel/turbStruc3D_corr3D/testJava/mxcorr.so
DESCRIPTION
This module 'mxcorr' is auto-generated with f2py (version:2).
Functions:
status,res = multilevelcrosscorrfftm(arr3d,template,nx=shape(arr3d,0),ny=shape(arr3d,1),nz=shape(arr3d,2),snx=shape(template,0),sny=shape(template,1))
compare this with the original fortran call signature
subroutine multilevelcrosscorrfftm(status, res, arr3d, nx, ny, nz, template, snx, sny)
implicit none
include 'fftw3.f'
integer,intent(in) :: nx, ny, nz, snx, sny
integer, intent(out) :: status
real(kind=8),dimension(nx,ny,nz), intent(in) :: arr3d
real(kind=8),dimension(nx,ny,nz), intent(out) :: res
It is
obvious that f2py has modified the call signature. Not only it has
removed the "intent(out)" dummy arguments from the call signature but
also it has rearranged the position of input arguments. It has moved
'template' argument from 7 th position to 2nd (next to arr3d)! Also note
'intent(out)' variables (status, res) are being returned in a tuple in
the order they appeared in original fortran subroutine.
file : forFromPy.py
[code language="python"]import numpy as np from scipy import signal import mxcorr import time import matplotlib.pyplot as plt nx = 200; ny = 200; imgf = np.random.random((nx,ny,1)) temf = imgf[:,:,0] stime = time.time() status, corrf = mxcorr.multilevelcrosscorrfftm(imgf,temf,nx,ny,1,nx,ny) etime = time.time() print "Elapsed time, fortran call :", etime-stime," sec"; corrp = np.zeros((nx,ny)); stime = time.time() corrp[:,:] = signal.correlate2d(imgf[:,:,0], temf, mode='same') etime = time.time() print "Elapsed time, python call :", etime-stime, " sec"; corrp = corrp/(sum(sum(temf**2))*sum(sum(imgf**2)))**0.50 plt.subplot(2, 2, 1) plt.imshow(corrf[:,:,0], cmap='RdBu', interpolation='nearest') plt.colorbar() plt.subplot(2, 2, 2) plt.imshow(corrp[:,:], cmap='RdBu', interpolation='nearest') plt.title('Correlation results') plt.colorbar() plt.show() [/code]
Elapsed time, fortran call : 0.053929805755 sec
Elapsed time, python call : 5.06791615486 sec
Speed Gain : 101 times !!
Well
this is one of those extreme cases! AS has been pointed out, If it is
needed to calculate correlation between very large arrays, it will not
be wise to use the regular template matching routines like
Matlab's 'xcorr2' or in this case 'correlate2d' from scipy.signal. I don
not know the exact algorithm that are being used in these
functions but I have a feeling that the correlation/convolution is being
calculated in this cases using the trivial term by term
multiplications. The trivial algorithm "slide the template image over
all elements of the target image and multiply term by term and take the sum" will have a computational cost of approximately (nx * ny * nz * (nx * ny+ nx * ny)) whereas,
calculating the correlation/convolution using optimized fft libraries
like fftw3 will incur a total computational cost of approximately
(nz * (nx * ny * log(nx * ny))). So the speed up in the execution time
is not surprising at all. A very similar function like
"multilevelcrosscorrfftm" can be written in python using pythons fft
routines and I guess it will be as fast as the fortran routine. The
point is, it is important to realize early what algorithm we are using
for our calculations. If our data file is small, say, couple of megabytes
we will not notice the performance difference. As soon as data gets
large and large, picking the right algorithm becomes a paramount
task otherwise, one may find himself in utter frustration like me ! I
particularly choose fortran in this case because other calculations
involved in my project will make use of extensive loop operations, at
least 3 level deep for loops and data set is also large. Both Matlab and
Python have got bad reputation regarding loop executions. I am just
being cautious from the beginning and writing routines using fortran.
file : multiLevelCrossCorrFFTM.F90
[code language="cpp"] ! author : ahsan, cfd.ahsan@outlook.com, Aug 2015 subroutine multilevelcrosscorrfftm(status, res, arr3d, nx, ny, nz, template, snx, sny) ! this subroutine calcuates the correlation between the template image (template) ! and target image (arr3d). the template image is a 2d array and can be smaller than ! than any of the x-y plane of the target image array (arr3d) ! the target image is assumed to be a 3d array and each x-y plane is treated ! as a separate image. the resulting correlation array has the size of the target ! array. nx, ny, nz are the number of row (assued x-direction), ! column (assumed y-direction), and page(assumed z-direction) of the arr3d array. !` snx and sny are the number of row and column of the template image implicit none include 'fftw3.f' integer, intent(in):: nx, ny, nz, snx, sny real(kind=8),dimension(nx,ny,nz), intent(in) :: arr3d real(kind=8),dimension(snx,sny), intent(in) :: template integer, intent(out) :: status real(kind=8),dimension(nx,ny,nz), intent(out) :: res real(kind=8), dimension(:,:), allocatable :: planeData, templateData,xx,yy real(kind=8), dimension(:), allocatable :: onerow, onecol complex(kind=8), dimension(:,:), allocatable :: templatefft, planeftt, corr integer :: i,j,levelRow, levelCol, templateRow, templateCol, m ,n,nrow,ncol integer(kind=8) :: planfd, planbd, planft, planbt status = 0 ! status 0 mean success ! ! add an additional row or column at the end to make data and template size odd if(mod(nx,2) == 0 .and. mod(ny,2)==0) then levelRow = nx+1; levelCol = ny+1 else if(mod(nx,2) == 0 .and. mod(ny,2)==1) then levelRow = nx+1; levelCol= ny; else if(mod(nx,2) == 1 .and. mod(ny,2)==0) then levelRow = nx; levelCol= ny+1; else if(mod(nx,2) == 1 .and. mod(ny,2)==1) then levelRow = nx; levelCol = ny; end if if(mod(snx,2) == 0 .and. mod(sny,2)==0) then templateRow = snx+1; templateCol = sny+1; else if(mod(snx,2) == 0 .and. mod(sny,2)==1) then templateRow = snx+1; templateCol= sny; else if(mod(snx,2) == 1 .and. mod(sny,2)==0) then templateRow = snx; templateCol = sny+1; else if(mod(snx,2) == 1 .and. mod(sny,2)==1) then templateRow= snx; templateCol = sny; end if allocate(planeData(levelRow+templateRow -1, levelCol+templateCol-1), &
STAT = status) allocate(templateData(levelRow+templateRow -1, levelCol+templateCol-1), &
STAT = status) allocate(templatefft(size(templateData,1)/2+1, size(templateData,2)), &
STAT=status) allocate(planeftt(size(planeData,1)/2+1, size(planeData,2)), STAT=status) allocate(corr(size(planeData,1)/2+1, size(planeData,2)), STAT=status) allocate(onerow(size(planeData,1)), onecol(size(planeData,2))) allocate(xx(size(planeData,1), ((size(planeData,2))-1)/2)) allocate(yy((size(planeData,1)-1)/2,size(planeData,2))) nrow = size(templateData,1); ncol = size(templateData,2); do i=1,nz planeData = 0.0 templateData = 0.0 xx = 0.0; yy = 0.0; planeData(1:size(arr3d(:,:,i),1), 1:size(arr3d(:,:,i),2)) = arr3d(:,:,i) templateData(1:size(template(:,:),1), 1:size(template(:,:),2)) = template call dfftw_plan_dft_r2c_2d(planfd,size(planeData,1),size(planeData,2), & planeData,planeftt,fftw_estimate) call dfftw_execute_dft_r2c(planfd, planeData,planeftt) call dfftw_destroy_plan(planfd) call dfftw_plan_dft_r2c_2d(planft,size(templateData,1), &
size(templateData,2), templateData,templatefft,fftw_estimate)
call dfftw_execute_dft_r2c(planft, templateData,templatefft) call dfftw_destroy_plan(planft) corr = planeftt * conjg(templatefft) call dfftw_plan_dft_c2r_2d(planbd,size(planeData,1), &
size(planeData,2),corr, planeData, fftw_estimate)
call dfftw_execute_dft_c2r(planbd,corr, planeData) call dfftw_destroy_plan(planbd) planeData = planeData / size(planeData) ! ! since fftw library does not have a fftshift function ! ! manually rearrange the data to keep highest auto
! ! correlation at center ! ! resulting correlation matrix output will conform to the matlab xcorr2 result onecol = planeData(:,1); xx = planeData(:,2:(size(planeData,2)-1)/2+1); planeData(:,1:(size(planeData,2)-1)/2)= & planeData(:, (size(planeData,2)-1)/2+2:size(planeData,2)); planeData(:, (size(planeData,2)-1)/2+2:size(planeData,2)) = xx; planeData(:,(size(planeData,2)-1)/2+1) = onecol onerow = planeData(1,:); yy = planeData(2:(size(planeData,2)-1)/2+1, : ) planeData(1:(size(planeData,2)-1)/2, :)= & planeData((size(planeData,2)-1)/2+2:size(planeData,2),: ) planeData((size(planeData,2)-1)/2+2:size(planeData,2),:) = yy planeData((size(planeData,2)-1)/2+1,:) = onerow ! ! normalize the result to get maximum correlation of 1 planeData = planeData/dsqrt(sum(arr3d(:,:,i)**2)*sum(template**2)) ! ! return the middle portion of resultant matrix that corresponds ! ! to the size of the input matrix res(:,:,i) = planeData((templateRow-1)/2+2:size(planeData,1)- &
(templateRow-1)/2+1, & (templateCol-1)/2+2:size(planeData,2)-(templateCol-1)/2+1) end do deallocate(planeData, onecol, onerow,planeftt, templatefft,corr, STAT=status) deallocate(templateData, xx, yy, STAT=status) end subroutine multilevelcrosscorrfftm [/code]
!! compiling with f2py to make a python callable module !! f2py -I/usr/include -c -L/usr/lib/x86_64-linux-gnu -lfftw3 -m mxcorr multiLevelCrossCorrFFTM.F90
No comments:
Post a Comment