Bits and Pieces: Calling A Fortran Subroutine from Python

Calling fortran subroutine from python is really easy using the f2py tool that does the compiling and wrapping of fortran code and ultimately produces a python callable module. The process is really painless and fun! Now the question comes why you would want to call fortran from python. One of the important reasons could be plotting ! Yes, if you had ever wanted to do all the post processing stuffs apart from the main simulation in a language that provides easy-to-use plotting libraries you would have possibly ruled out all basic languages like fortran/cpp/c/java. Most of the people I know either use matlab or python+numpy for post processing for obvious reasons. Well, life does not go smooth and we can not get all the goodies all the time. There might come a time when you may feel frustrated by the speed and memory management of python or matlab. In such situations binding fortran with python may provide you with a relief. In my case, I was trying to calculate correlation between different x-y planes of a large 3D array having dimensions or ranks 2048x2048x64. The data file takes approximately of space. I was trying to calculate correlation using Matlab's xcorr2 and it was excruciatingly slow. It is not like Matlab is slow. I picked the wrong function for my purpose. For large arrays xcorr2 or scipy.signal's 'correlate2d' is not the right choice for correlation calculations and I am going to show why. Before that, lets see how to call fortran from python. The key tool for this purpose is f2py. I use Debian/Ubuntu linux systems and it was installed from the repository. To get desired output and avoid surprise you just need to declare variables in fortran subroutines with the 'intent' attribute. From 'intent(in)' or 'intent(out)' attribute f2py will figure out how to arrange the dummy variables in call signature. f2py removes all variables with 'intent(out)' attribute. The resulting module from f2py will return the original 'intent(out)' variables in a tuple when called from python. By the way I have not used any deferred shape arrays in fortran subroutine dummy arguments. The fortran file that I compiled using f2py is "multiLevelCrossCorrFFTM.F90" and created a module named "mxcorr". "multiLevelCrossCorrFFTM.F90" uses the "fftw3" library for calculating correlation. libfftw3.so will be dynamically linked to f2py generated module without any issue. However I had hard time forcing the compiler to search for 'fftw.h' header file. After searching the web I realized that if the subroutine is saved with ".f90" extension the preprocessor does not run correctly to include the header but saving fortran files with " .F90" extension solves the problem! The produced mxcorr module by f2py is imported in "forFromPy.py" script and used. Full codes are given and the procedure should be apparent from the example.

A very good tutorial on f2py can be found here : http://www.ucs.cam.ac.uk/docs/course-notes/unix-courses/pythonfortran/files/f2py.pdf

Note : It is important to check the call signature of the functions produced by f2py. Issuing the command help(mxcorr) in python prompt produces the output:

NAME
mxcorr
FILE
/home/ahsan/MEGA/javaProgramsPostProcessing/NeutralChannel/turbStruc3D_corr3D/testJava/mxcorr.so
DESCRIPTION
This module 'mxcorr' is auto-generated with f2py (version:2).
Functions:
status,res = multilevelcrosscorrfftm(arr3d,template,nx=shape(arr3d,0),ny=shape(arr3d,1),nz=shape(arr3d,2),snx=shape(template,0),sny=shape(template,1))

compare this with the original fortran call signature

subroutine multilevelcrosscorrfftm(status, res, arr3d, nx, ny, nz, template, snx, sny)
    implicit none    
    include 'fftw3.f'
    integer,intent(in) ::  nx, ny, nz, snx, sny 
    integer, intent(out) :: status
    real(kind=8),dimension(nx,ny,nz), intent(in) :: arr3d
    real(kind=8),dimension(nx,ny,nz), intent(out) :: res

It is obvious that f2py has modified the call signature. Not only it has removed the "intent(out)" dummy arguments from the call signature but also it has rearranged the position of input arguments. It has moved 'template' argument from 7 th position to 2nd (next to arr3d)! Also note 'intent(out)' variables (status, res) are being returned in a tuple in the order they appeared in original fortran subroutine.

file : forFromPy.py

[code language="python"]import numpy as np
from scipy import signal
import mxcorr
import time
import matplotlib.pyplot as plt

nx = 200; ny = 200;
imgf = np.random.random((nx,ny,1))
temf = imgf[:,:,0]
stime = time.time()
status, corrf = mxcorr.multilevelcrosscorrfftm(imgf,temf,nx,ny,1,nx,ny)
etime = time.time()
print "Elapsed time, fortran call :", etime-stime," sec";

corrp = np.zeros((nx,ny));
stime = time.time()
corrp[:,:] = signal.correlate2d(imgf[:,:,0], temf, mode='same')
etime = time.time() 
print "Elapsed time, python call :", etime-stime, " sec";
corrp = corrp/(sum(sum(temf**2))*sum(sum(imgf**2)))**0.50

plt.subplot(2, 2, 1)
plt.imshow(corrf[:,:,0], cmap='RdBu', interpolation='nearest')
plt.colorbar()
plt.subplot(2, 2, 2)
plt.imshow(corrp[:,:], cmap='RdBu', interpolation='nearest')
plt.title('Correlation results')
plt.colorbar()
plt.show()


[/code]

Elapsed time, fortran call : 0.053929805755 sec

Elapsed time, python call : 5.06791615486 sec

Speed Gain : 101 times !!

Well this is one of those extreme cases! AS has been pointed out, If it is needed to calculate correlation between very large arrays, it will not be wise to use the regular template matching routines like Matlab's 'xcorr2' or in this case 'correlate2d' from scipy.signal. I don not know the exact algorithm that are being used in these functions but I have a feeling that the correlation/convolution is being calculated in this cases using the trivial term by term multiplications. The trivial algorithm "slide the template image over all elements of the target image and multiply term by term and take the sum" will have a computational cost of approximately (nx * ny * nz * (nx * ny+ nx * ny)) whereas, calculating the correlation/convolution using optimized fft libraries like fftw3 will incur a total computational cost of approximately (nz * (nx * ny * log(nx * ny))). So the speed up in the execution time is not surprising at all. A very similar function like "multilevelcrosscorrfftm" can be written in python using pythons fft routines and I guess it will be as fast as the fortran routine. The point is, it is important to realize early what algorithm we are using for our calculations. If our data file is small, say, couple of megabytes we will not notice the performance difference. As soon as data gets large and large, picking the right algorithm becomes a paramount task otherwise, one may find himself in utter frustration like me ! I particularly choose fortran in this case because other calculations involved in my project will make use of extensive loop operations, at least 3 level deep for loops and data set is also large. Both Matlab and Python have got bad reputation regarding loop executions. I am just being cautious from the beginning and writing routines using fortran.

file :  multiLevelCrossCorrFFTM.F90

[code language="cpp"]
! author : ahsan, cfd.ahsan@outlook.com, Aug 2015
subroutine multilevelcrosscorrfftm(status, res, arr3d, nx, ny, nz, template, snx, sny)
!    this subroutine calcuates the correlation between the template image (template) 
!    and target image (arr3d). the template image is a 2d array and can be smaller than 
!    than any of the x-y plane of the target image array (arr3d)
!    the target image is assumed to be a 3d array and each x-y plane is treated
!    as a separate image. the resulting correlation array has the size of the target
!    array. nx, ny, nz are the number of row (assued x-direction), 
!    column (assumed y-direction), and page(assumed z-direction) of the arr3d array.
!`    snx and sny are the number of row and column of the template image 
    implicit none    
    include 'fftw3.f'
    integer, intent(in)::  nx, ny, nz, snx, sny 
    real(kind=8),dimension(nx,ny,nz), intent(in) :: arr3d
    real(kind=8),dimension(snx,sny), intent(in) :: template
    integer, intent(out) :: status
    real(kind=8),dimension(nx,ny,nz), intent(out) :: res
    real(kind=8), dimension(:,:), allocatable  :: planeData, templateData,xx,yy
    real(kind=8), dimension(:), allocatable  :: onerow, onecol
    complex(kind=8), dimension(:,:), allocatable :: templatefft, planeftt, corr
    integer :: i,j,levelRow, levelCol, templateRow, templateCol, m ,n,nrow,ncol
    integer(kind=8) :: planfd, planbd, planft, planbt    
    
    status = 0 ! status 0 mean success    
! ! add an additional row or column at the end to make data and template size odd
    if(mod(nx,2) == 0 .and. mod(ny,2)==0) then
    levelRow = nx+1; levelCol = ny+1    
    else if(mod(nx,2) == 0 .and. mod(ny,2)==1) then
     levelRow = nx+1; levelCol= ny;     
    else if(mod(nx,2) == 1 .and. mod(ny,2)==0) then
    levelRow = nx; levelCol= ny+1;
    else if(mod(nx,2) == 1 .and. mod(ny,2)==1) then
    levelRow = nx; levelCol = ny;
    end if
    
    if(mod(snx,2) == 0 .and. mod(sny,2)==0) then 
    templateRow = snx+1; templateCol = sny+1;
    else if(mod(snx,2) == 0 .and. mod(sny,2)==1) then
    templateRow = snx+1; templateCol= sny;
    else if(mod(snx,2) == 1 .and. mod(sny,2)==0) then
    templateRow = snx; templateCol = sny+1;
    else if(mod(snx,2) == 1 .and. mod(sny,2)==1) then
    templateRow= snx; templateCol = sny;
    end if
    
    allocate(planeData(levelRow+templateRow -1, levelCol+templateCol-1), &

      STAT = status)
    allocate(templateData(levelRow+templateRow -1, levelCol+templateCol-1), &

      STAT = status)
    allocate(templatefft(size(templateData,1)/2+1, size(templateData,2)), &

      STAT=status)
    allocate(planeftt(size(planeData,1)/2+1, size(planeData,2)), STAT=status)
    allocate(corr(size(planeData,1)/2+1, size(planeData,2)), STAT=status)
    allocate(onerow(size(planeData,1)), onecol(size(planeData,2)))
    allocate(xx(size(planeData,1), ((size(planeData,2))-1)/2))
    allocate(yy((size(planeData,1)-1)/2,size(planeData,2)))
    nrow = size(templateData,1); ncol = size(templateData,2);
    do i=1,nz
        planeData = 0.0
        templateData = 0.0
        xx = 0.0; yy = 0.0;
        planeData(1:size(arr3d(:,:,i),1), 1:size(arr3d(:,:,i),2)) = arr3d(:,:,i)
        templateData(1:size(template(:,:),1), 1:size(template(:,:),2)) = template    
            
            
        call dfftw_plan_dft_r2c_2d(planfd,size(planeData,1),size(planeData,2), &
              planeData,planeftt,fftw_estimate)    
        call dfftw_execute_dft_r2c(planfd, planeData,planeftt)    
        call dfftw_destroy_plan(planfd)
        
        call dfftw_plan_dft_r2c_2d(planft,size(templateData,1), &

            size(templateData,2), templateData,templatefft,fftw_estimate)

        call dfftw_execute_dft_r2c(planft, templateData,templatefft)    
        call dfftw_destroy_plan(planft)
        
        corr = planeftt * conjg(templatefft)
        call dfftw_plan_dft_c2r_2d(planbd,size(planeData,1), &

          size(planeData,2),corr, planeData, fftw_estimate)

        call dfftw_execute_dft_c2r(planbd,corr, planeData)
        call dfftw_destroy_plan(planbd)
        planeData = planeData / size(planeData)
        ! !     since fftw library does not have a fftshift function
        ! !     manually rearrange the data to keep highest auto

        ! !     correlation at center
        ! !     resulting correlation matrix output will conform to the matlab xcorr2 result
        onecol = planeData(:,1);
        xx = planeData(:,2:(size(planeData,2)-1)/2+1);
        planeData(:,1:(size(planeData,2)-1)/2)= &
           planeData(:, (size(planeData,2)-1)/2+2:size(planeData,2));
        planeData(:, (size(planeData,2)-1)/2+2:size(planeData,2)) = xx;
        planeData(:,(size(planeData,2)-1)/2+1) = onecol   
     
        onerow = planeData(1,:);
        yy = planeData(2:(size(planeData,2)-1)/2+1, : ) 
        planeData(1:(size(planeData,2)-1)/2, :)= &
            planeData((size(planeData,2)-1)/2+2:size(planeData,2),: ) 
        planeData((size(planeData,2)-1)/2+2:size(planeData,2),:) = yy
        planeData((size(planeData,2)-1)/2+1,:) = onerow
        ! !     normalize the result to get maximum correlation of 1
        planeData = planeData/dsqrt(sum(arr3d(:,:,i)**2)*sum(template**2))
        ! !     return the middle portion of resultant matrix that corresponds 
        ! !     to the size of the input matrix
        res(:,:,i) = planeData((templateRow-1)/2+2:size(planeData,1)- &

            (templateRow-1)/2+1, &
            (templateCol-1)/2+2:size(planeData,2)-(templateCol-1)/2+1)

    end do   

    deallocate(planeData, onecol, onerow,planeftt, templatefft,corr, STAT=status)
    deallocate(templateData, xx, yy, STAT=status)
end subroutine multilevelcrosscorrfftm


[/code]

!! compiling with f2py to make a python callable module
!! f2py -I/usr/include -c -L/usr/lib/x86_64-linux-gnu -lfftw3 -m mxcorr multiLevelCrossCorrFFTM.F90

Bits and Pieces

Blog Archive

Saturday, August 8, 2015

Calling A Fortran Subroutine from Python

No comments:

Post a Comment