I've found another masterpiece from Prof. Brian Greene. It talks about quantum theory in the most illustrative way. I think I understood what's going on there. Highly recommended! about 52 mins. Have fun!
Wednesday, November 7, 2012
Quantum Theory: Is it understandable for us?
I've found another masterpiece from Prof. Brian Greene. It talks about quantum theory in the most illustrative way. I think I understood what's going on there. Highly recommended! about 52 mins. Have fun!
Sunday, June 17, 2012
An easy to understand documentary about Einstein theory and many new theories after that!
If you want to understand Einstein theory but you don't have time to read and derive 300 pages of mathematics, The following documentary is just made for you!
This is the link to the best cosmos doc I've ever seen.
It mainly discusses the effect of an empty space and the forces it exerts in a spacetime concept. This movie makes the hard subject to be understandable for typical non physicists. There some theories that will be discussed in addition to Einstein's theory. Enjoy!
This is the link to the best cosmos doc I've ever seen.
The Fabric of the Cosmos What Is Space
It mainly discusses the effect of an empty space and the forces it exerts in a spacetime concept. This movie makes the hard subject to be understandable for typical non physicists. There some theories that will be discussed in addition to Einstein's theory. Enjoy!
Sunday, May 6, 2012
Recommended Tutorial for Learning Emacs Basics
If you have not decided on your text editor, I strongly recommend Emacs. The learning curve is steep meaning that you have to put so much time on basic stuff that you think is stupid at first! like copy and paste which is much harder than doing Ctrl+C/Ctrl+v in other editors. But you will get used to it as soon as you finish one project and then your never gonna leave it! Vim is also good but I personally prefer Emacs. These are typically editors of choice for advanced programmers working on giant codes.
The Emacs code has been under continuous development for more than two decades and it has been proven to be able to handle a wide range of programming tasks. It has syntax highlighting for almost all programming languages to be named a few here C/C++, Python, Pascal, Java, Fortran, Octave/Matlab, Perl, VHDL, Verilog, ...
To discover Emacs I recommend to first see this awesome introductory tutorial and then you are ready to jump on the advanced techniques.
Then you need to print the following cheatsheet (http://www.cs.princeton.edu/~hains/os/emacs.html )and paste it on the walls exactly in front of you.
Try to use keyboard and shortcuts as mush as you can. Don't submit yourself to beginner style programming with mouse and graphical interface. The power of Emacs is that you can do task using keyboard much faster and easier than using a GUI. Have a nice programming experience!
The Emacs code has been under continuous development for more than two decades and it has been proven to be able to handle a wide range of programming tasks. It has syntax highlighting for almost all programming languages to be named a few here C/C++, Python, Pascal, Java, Fortran, Octave/Matlab, Perl, VHDL, Verilog, ...
To discover Emacs I recommend to first see this awesome introductory tutorial and then you are ready to jump on the advanced techniques.
Then you need to print the following cheatsheet (http://www.cs.princeton.edu/~hains/os/emacs.html )and paste it on the walls exactly in front of you.
Try to use keyboard and shortcuts as mush as you can. Don't submit yourself to beginner style programming with mouse and graphical interface. The power of Emacs is that you can do task using keyboard much faster and easier than using a GUI. Have a nice programming experience!
Friday, May 4, 2012
ThreeDimensional MixedElement Unstructured FiniteVolume Solver for Euler/NavierStokes Equations
Abstract
(Please refer the manual.pdf file in the attached archive for complete documentation of the code)In this post, we implement and validate a threedimensional algorithm for solving inviscid Euler equations on general geometries represented by a single block for a fair regime of fluid flow. To this end, we first use a set of predefined routines to read the grid topology. The input file is assumed to be in UGRID format. The grid topology is defined using the number of nodes, number of different elements in the grid, coordinates of all nodes, definitions for various elements including triangle and quadlateral for boundaries, and tetrahederal, pyramid, prism and hex for the interior regions. Prior to writing grid maps, we implement different elementbased maps which locally specify the relation between face and node, relation between node and nodes, and the relation between edge and node for a particular element. Using these local maps, we proceed to implement various gird maps to access the grid point/edge of interest in the flowsolver section. These maps include, element sourrounding a point, points surrounding a particular point, edges sorrounding point, and element sourrounding an element.
In the second part of of the work, we transfer the cellcentered grid to the equal node centered representation which is more suitable for finite volume approach. In doing so, we define our geometrical data structure based on the edges in the interior of the domain and ghost edges on the boundaries. For each triangle boundary element, we define three ghost edges while for each quad, we will have four. The volume of the new nodecentered cell together with conservative variables are stored in nodebased array. All maps and transformed nodecentered grid are validated using a set of predefined meshes including a 13 node generic grid, an allhexahederal and alltetrahederal meshes, a course and fine ramp mesh for supersonic evaluations and a heavy visoucs mesh for NACA0012 airfoil.
In the third part of the work, we implement the flowsolver. The conservationlaws are discretized using finitevolume approach \footnote{integration in space + Divergence theorem)} and a generic equation for temporal update is obtained. Flux vectors are using Roe approach. Roe averages are evaluated at the center of edges providing a Jacobianbased firstorder approximation of flow field in space. For temporal discretization, both firstorder Eulerexplicit and fourthorder classic RungeKutta are implemented. A fifthorder RungeKutta scheme from JamesonBaker is also implemented in the code.
To validate the current algorithm, a supersonic ramp flow is solved on both coarse and refined meshes. Convergence study is performed on both coarse and refined meshes. The shock angle and various parameter before and after shock are compared with the onedimensional oblique shock theory.
The method is extended to secondorder in space by using least square method for gradient construction. In addition, viscous terms are added by averaging the gradients over an edge. The final solver is validated for several benchmark problem including nonlinear acoustics of standing waves inside a pipe, vortex propagation and laminar boundary layer over flat plates. A comparison of the results with experimental data and analytical results for boundary layers proves the robustness of current solver.
Sample Results
Here are some pictures. For complete documentation of the theory and running the code please download the attached archive.
coarse grid ramp
A coarse grid supersonic ramp test case with secondorder spatial Roe scheme. 
residual history for firstorder spatial discretization 
Fine grid ramp
Fine grid ramp with firstorder spatial accuracy. The Shock is resolved perfectly! 
Viscous Boundary Layer
Top) the threedimensional grid for capturing the boundary layer on flat plate. The grid is refined where flat plate starts. Bottom) results from the current solver. For more info please see the PDF manual in the attached file.
Comparison Between Exact Blasius solution, Fluent code, and current code. 
Download the Source Code and Documentation from Here:
PDF Manual: http://ifile.it/amr59xi
Source Code in C: please put your email in the comments and I'll send you one copy of the source code.
Update Feb 2nd, 2016 : The code is uploaded here : https://github.com/arrgasm/ns3D
Wednesday, May 2, 2012
Completely Validated Unstructured TwoDimensional Euler Solver Written in C
Here is my implementation of VanLeer fluxes (and Jacobians) in the following C code. It has the following functionalities:
1 Has a special python script for plotting contours on unstructured grids!!! I wrote this Python script to get rid of third party packages like Tecplot and Fieldview. I can plot any contour using Python script right away in my Ccode without calling external program.
2 Solve twodimensional Euler equations with firstorder and secondorder spatial accuracy.
3 Implements VanLeer fluxes and Jacobians. (ANALYTICAL Jacobians)
4 Has builtin explicit and implicit timemarching schemes.
5 Reads arbitrary unstructured grid in ".mesh" format.
6 Has a library for verification and validation using method of manufacturing solutions.
Below are some results for NACA0012 airfoil
1 Has a special python script for plotting contours on unstructured grids!!! I wrote this Python script to get rid of third party packages like Tecplot and Fieldview. I can plot any contour using Python script right away in my Ccode without calling external program.
2 Solve twodimensional Euler equations with firstorder and secondorder spatial accuracy.
3 Implements VanLeer fluxes and Jacobians. (ANALYTICAL Jacobians)
4 Has builtin explicit and implicit timemarching schemes.
5 Reads arbitrary unstructured grid in ".mesh" format.
6 Has a library for verification and validation using method of manufacturing solutions.
Below are some results for NACA0012 airfoil
Fig. 1 Mach = .8 angle of attack is 1.25 degrees (secondorder)
Fig. 3 Mach = 1.2 angle of attack is 0.0 degrees (firstorder)
Fig. 4 Mach = results from method of manufacturing solution. these contours are compared to exact solution and they are same contours.
To download my code please click on the following link:
please put your email in the comments and I'll send you one copy of the source code.
UPDATE FEB 1, 2016 : The code is published in https://github.com/arrgasm/UnstEuler2D
UPDATE FEB 1, 2016 : The code is published in https://github.com/arrgasm/UnstEuler2D
Massively Parallel GMRES Ccode for Giant Sparse System of Equations
Algorithm and Implementation
The GMRES (Generalized Minimal RESidual) algorithm is completely presented and discussed in [2, 3]. Since the implementation of the algorithm is done in Cprogramming language in a modular and function oriented manner, the mainfile is very short and looks like a pseudocode itself! So we bring the main file here with some small details (like local variables, allocations, comments etc ..) omitted for brevity. In the next pages we describe the algorithm based on the implementation.In line 5 each process reads its won share of the input matrix. In line 7 each process reads the corresponding section of right hand side array. The process of localization (hacking indices) is done in line 11. Then to increase the performance of the code we pick the diagonal entries of the matrix for only one time in line 1925 and store them in array
diag
. So in each part of preconditioning when we need to have diagonal entries, instead of looping over matrix A
we just use the values stored at this array.In line 29 the initialization is done using 1. We will see that if we initialize with zero we will get different convergence behavior in the following sections. In lines 3151 the initial vector of Krylov subspace is created based on normalization of the residual of the initial guess. Also this residual is stored in
g [ 0 ]
(the right hand side vector.Then we will have the main GMRES loop in line 53 where the columns of the Krylov matrix are eventually created during this loop. Please note that the final implementation of GMRES which is presented in appendix I includes an outer loop for restarting. In lines 5561 there is a section for dynamic allocation using
realloc
function which is neglected here. In lines 6787 depending on the user flag for preconditioning, the appropriate type of matrixvector multiplication is selected and applied. Now to make the resulting vector orthogonal to the previous vectors, a modified GramSchmidtt procedure is implemented in lines 8793 where the components of the new vector along the previous vectors is gradually eliminated to have one final component which is normal to all previous vectors. Finally based on the normalization of the resulting vector, a new search vector is created in lines 9599. Then previous givens are applied to the new column of Hessenberg matrix in lines 99105. Once this is done we need to compute new givens to apply them the new column of the Hessenberg matrix. This is done in line 107111. In lines 111115 given are applied to the RHS vector g
where we obtaine the residual of the leastsquare problem and store it in g [ k+1 ]
. This is interesting because without solving ax=b and doing matrix vector multiplication we directly know the residual from the leastsquare problem that we solved using givens.Then in line 119 we decide that if the residual is less than particular value, then we terminate the gmres loop and compute the final optimal update and report the solution. In this case the optimal coefficient is obtained by backward solution of the upperdiagonal system in lines 123129. Then the update value
zk
is computed and in the case of preconditioning it is divided by the diagonal entries. Finally the final solution is computed in line 143 and reported to user in line 147.Verification and Validation
In this section, we verify the implementation.Matrix A : dense6x6.mtx
The first argument in the code is the location and name of input matrix, the second argument is the RHS matrix, the third argument is the number of GMRES iterations and the last argument is 0 for normal GMRES and 1 for preconditioned GMRES algorithm. Running the parallel gmres code for k=6 and p=1 we obtain$mpirun np 1 ./gmres ../matrices/dense6x6.mtx (void) 6 0 $ cat out.0 contents of x : 0.18861997210055, 0.15060264446066, 1.02527163328527, 0.43822000212772, 0.98705425807485, 1.21946494911443,Note that since we don't read the rhs matrix (and actually in the source code I simply comment it here!), I put an arbitrary string
(void)
to preserve the number and arrangment of the arguments.For k=6 and p=2, the parallel code gives us$mpirun np 2 ./gmres ../matrices/dense6x6.mtx (void) 6 0 $ cat out.0 out.1 contents of x : > process 0 0.18861997210055, 0.15060264446066, 1.02527163328527, contents of x : > process 1 0.43822000212772, 0.98705425807485, 1.21946494911443,For k=6 and p=3, the parallel code gives us
$mpirun np 3 ./gmres ../matrices/dense6x6.mtx (void) 6 0 $ cat out.0 out.1 out.3 contents of x : > process 0 0.18861997210055, 0.15060264446066, contents of x : > process 1 1.02527163328527, 0.43822000212772, contents of x : > process 2 0.98705425807485, 1.21946494911443,For the extreme case of six processes p=6, the parallel code gives us
$mpirun np 6 ./gmres ../matrices/dense6x6.mtx (void) 6 0 $ cat out.0 out.1 out.2 out.3 out.4 out.5 contents of x : > process 0 0.18861997210055, contents of x : > process 1 0.15060264446066, contents of x : > process 2 1.02527163328527, contents of x : > process 3 0.43822000212772, contents of x : > process 4 0.98705425807485, contents of x : > process 5 1.21946494911443,Using the GMRES program written in GNU Octave (Appendix II), we will obtain
ans = 0.188619972100554 0.150602644460663 1.025271633285273 0.438220002127719 0.987054258074850 1.219464949114430which is in exact agreement with the parallel Ccode. The residual history (including the initial residual before GMRES main for loop) is plotted in fig.(0.2.1). As we see, it converges to machine zero after 6 iterations as we expected before.
Matrix B : fidapm05.mtx
In this section we consider the residual history of matrix fidapm05.mtx which is a 42x42 sparse matrix. The structure of the matrix is shown in fig.(2).First we set k=42 for the case that we have all search vectors. Therefore we expect that the residual should reach to machine zero. This is shown in fig.(3).
To validate the parallel implementation, here we increase the number of processes to and . As shown the residual curves are identical.Another examination is to test the effect of preconditioning. For this purpose we first read the RHS vector for 42x42 given in the input file
fidamp05_rhs1.mtx
. The residual curves for GMRES with preconditioning and without preconditioning are shown in fig.(4).As we see there isn't much difference (compared to the matrix D that will be discussed in the following section) between residual curves. Thus the procedure of diagonal preconditioning does improve the convergence in the intermediate stages but generally there is no big difference. This is mainly because the matrix B42x42 is highly offdiagonal^{1} so when we approximate it with only diagonal entries, we induce considerable ambiguity into the preconditioner. Therefore we expect that diagonal preconditioning shouldn't work in this case as perfectly as it works for closetodiagonal matrices. Here I expect that if we use the tridiagonal form of matrix B42x42 instead of only main diagonal, the convergence should improve.
Matrix D : s3dkq4m2.mtx
This a huge 90448x90448 matrix will all nonzero elements close to the main diagonal. So we expect that diagonal preconditioning works well for this matrix because the main diagonal seems to be a good estimate of the matrix D itself. Here we fixed the number of GMRES iterations to 40 and without restarting we run the code with/without preconditioning option for one, two and sixteen processes. The results are presented in fig.(5).As we have expected, the diagonal preconditioning greatly improves the convergence of the original GMRES algorithm. As mentioned before, the reason for improving convergence for matrix D is that the main diagonal is a good choice for estimating the matrix since the matrix is closetothemaindiagonal oriented. We also notice that variation in the number of processes has absolutely no effect in the convergence curves and again curves are identical (to the eye).
Performance
To investigate the effect of restarting on the performance of PGMRES, we run the code for a couple of test cases represented by PGMRES(k,m) where k is the number of iterations and m is the number of restarts. For , we found that for restarts, the code converges to 1.e10. However for larger k values, the number of required restarts greatly reduces. Therefore since number of restarts 'm' should be same for all cases we keep same for all of them. The residual curves for PGMRES(10,8), PGMRES(20,8), PGMRES(40,8) and PGMRES(80,8) are shown in figs(6, 7, 8, 9) respectively. The wall time for different number of precesses/iterations is measured usingMPI_Wtime( void )
function. The results are presented in table(1). As we see there is no difference in residual curves for different number of processes. For the rest of discussion please see to the last page.

According to time table(1), we observe this key point that for small number of iterations ``k'' the computation time decreases. This is mainly because in the GMRES algorithm, there are two interior loops that depends to the value of k.
//main k loop ... for ( j = 0 ; j <= k; j++) { dotproduct(v[j], u, &h[j][k], nrowsI, commun); //dot product for( ss = 0; ss < nrowsI; ss++) u[ss] = h[j][k] * v[j][ss]; } ... updates for ( j = 0 ; j < k; j++) { delta = h[j][k]; h[j][k] = c[j] * delta + s[j] * h[j+1][k]; h[j+1][k] = s[j] * delta + c[j] * h[j+1][k]; }The first loop is off course very expensive, because a collective dot product must be performed in each cycle, So for very large number of iterations ``k'', we expect the number of dot products to dramatically increases. Therefore instead of increasing ``k'', we prefer to reach to some point in the iteration space and the restart the process again by the new solution.
The result of numerical experiments on the etowah machine validates that the restarting approach decreases computation time. For one process, and , we see that the computation time is 6.7 seconds. If we decrease ``k'' to 10 we get 2.7 seconds which is 40 percent of case . We also note that the parallelization leads to speedup which is limited by Amdahl's Law. For and , we get () speedup when we increase the number of processes from 1 to 2. Also we get () when we increase to . However, for large ``k'' the speedup improves. For to in case we get 1.8 speedup while for to in the same case we get 3.6 which is better than 3.1 for case .
Another important thing that should be mentioned here is the choice of the initial solution (guessed solution) and the way it affects the residual curves. For the previous cases the initial solution was . However we change this condition to to see what happens in the residual curves when we restarting. Below the residual curves for restarting case is plotted. As we see the behavior which was a monotonically descending straight curve before is changing to curves which bumps.
As shown in fig.(10), the more the number of gmres iterations increased, the better the solution is estimated before each restart. Therefore another important parameter is the choice of initial guess . In the previous section where we used the reason that residual curves were very similar was that after each restart the solution is a very good estimate of final solution so there is some bump but they are almost flat. But here after each restart, the solution needs to be improved and hence for each search vector that we find the accuracy of the solution dramatically improves and hence we see a visible bump.
References
[1]
[2]
[3] Y. Saad and M.H. Schultz, ``GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems'', SIAM J. Sci. Stat. Comput., 7:856869, 1986.
All codes/reports are valid under GNU license. http://www.gnu.org/licenses/gpl.html
Download my source code from here!
please put your email in the comments and I'll send you one copy of the source code.
Subscribe to:
Posts (Atom)