{"id":14,"date":"2010-03-06T00:36:53","date_gmt":"2010-03-05T23:36:53","guid":{"rendered":"http:\/\/numbercrunch.de\/blog\/?p=14"},"modified":"2023-01-18T23:40:08","modified_gmt":"2023-01-18T22:40:08","slug":"parallel-fft-performance","status":"publish","type":"post","link":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/","title":{"rendered":"Parallel FFT performance"},"content":{"rendered":"<p style=\"text-align: justify;\">In a recent project (the numerical solution of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Dirac_equation\">Dirac equation<\/a>) I am working on, the computation of the fast Fourier transform (FFT) of four interwoven two-dimensional grids is the main computational task. Interwoven grids means that the memory layout of the matrices is such that data starts with the 1st element of the 1st matrix followed by the 1st element of the 2nd, 3rd and 4th matrix, respectively. The 5th data element represents the 2nd element of the 1st matrix and so on.<\/p>\n<p style=\"text-align: justify;\">I tested two different FFT libraries that are able to compute a FFT in parallel on shared memory computers, the <a href=\"http:\/\/www.fftw.org\/\">FFTW<\/a> (version 3.2.2) and the Intel\u00ae <a href=\"http:\/\/software.intel.com\/en-us\/intel-mkl\/\">MKL<\/a> (the version that comes with the Intel\u00ae compiler suite version 11.1.056), and was quite surprised to see very large performance differences for this particular kind of parallel FFT. The Intel\u00ae <a href=\"http:\/\/software.intel.com\/en-us\/intel-mkl\/\">MKL<\/a> FFT routines show a very poor parallel speedup. Even with eight cores, the speedup never exceeds two, whereas the FFTW library reaches a reasonable speedup.<\/p>\n<figure id=\"attachment_12\" aria-describedby=\"caption-attachment-12\" style=\"width: 576px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-12\" title=\"fftw_timing_dirac1\" src=\"\/\/numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac1.png\" alt=\"\" width=\"576\" height=\"432\" srcset=\"https:\/\/www.numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac1.png 576w, https:\/\/www.numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac1-300x225.png 300w\" sizes=\"(max-width: 576px) 100vw, 576px\" \/><\/a><figcaption id=\"caption-attachment-12\" class=\"wp-caption-text\">Parallel speed up of parallel FFT of four interwoven two-dimensional matrices of size N\u00d7N.<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">The FFTW library also excels the MKL in terms of absolute computing time.\u00a0 Depending on the problem size, FFTW may be four times faster. Only for very small matrices, the MKL is superior. Thus, one my conclude that the FFTW library is a good example how a high quality open source software can outperform a vendor library.<\/p>\n<figure id=\"attachment_13\" aria-describedby=\"caption-attachment-13\" style=\"width: 576px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-13\" title=\"fftw_timing_dirac2\" src=\"\/\/numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac2.png\" alt=\"\" width=\"576\" height=\"432\" srcset=\"https:\/\/www.numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac2.png 576w, https:\/\/www.numbercrunch.de\/blog\/wp-content\/uploads\/2010\/03\/fftw_timing_dirac2-300x225.png 300w\" sizes=\"(max-width: 576px) 100vw, 576px\" \/><\/a><figcaption id=\"caption-attachment-13\" class=\"wp-caption-text\">Ratio of computing times for parallel FFT of four interwoven two-dimensional matrices of size N\u00d7N with eight cores.<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">All performance measurements have been done on a system with two Quad Core Intel\u00ae Xeon\u00ae CPUs (E5345) at 2.5GHz. FFT plans were generated in measure mode. It would be very interesting for me to do a similar FFT benchmark on <a href=\"http:\/\/www.macresearch.org\/cuda-quick-look-and-comparison-fft-performance\">graphics hardware<\/a>. However, I do not own a high performance graphics card.<\/p>\n<p style=\"text-align: justify;\"><strong>Update:<\/strong> My test program utilized a lightweight convenience wrapper around FFTW (and the MKL internal FFTW-wrapper). For this reason I do not want to publish the original code. However, here is an equivalent program that does not depend on external libraries except FFTW.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"cpp\">\/\/ depending on your compiler and os compile with\n\/\/\n\/\/ g++ -fopenmp -O3 -o time_dirac_fft time_dirac_fft.cc -lfftw3 -lfftw3_threads -pthread\n\/\/\n\/\/ or with\n\/\/\n\/\/ icc -openmp -O3 -o time_dirac_fft time_dirac_fft.cc -lmkl_intel -lmkl_intel_thread -lmkl_core\n\/\/\n\/\/ or with\n\/\/\n\/\/ icc -openmp -O3 -o time_dirac_fft time_dirac_fft.cc -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core\n\n#include &lt;cstdlib&gt;\n#include &lt;complex&gt;\n#include &lt;iostream&gt;\n#include &lt;sstream&gt;\n#include &lt;fstream&gt;\n#include &lt;omp.h&gt;\n#include &lt;fftw3.h&gt;\n#if defined __unix__\n#include &lt;unistd.h&gt;\n#include &lt;sys\/time.h&gt;\n#include &lt;sys\/times.h&gt;\n#else\n#include &lt;ctime&gt;\n#endif\n\n\/\/ helper class for time measurements\nclass timer {\nprivate:\n  const double _resolution;\n  double _t, _t_start;\n  bool isrunning;\n\n  double get_time() const {\n#if defined __unix__\n    struct timeval tv;\n    gettimeofday(&amp;tv, 0);\n    return static_cast&lt;double&gt;(tv.tv_sec)+static_cast&lt;double&gt;(tv.tv_usec)*1e-6;\n#else\n    return static_cast&lt;double&gt;(std::clock())*_resolution;\n#endif\n  }\npublic:\n  void reset() {\n    _t=0.0;\n  }\n  void start() {\n    _t_start=get_time();\n    isrunning=true;\n  }\n  void stop() {\n    if (isrunning) {\n      _t+=get_time()-_t_start;\n      isrunning=false;\n    }\n  }\n  double time() const {\n    return _t+( isrunning ? get_time()-_t_start : 0.0 );\n  }\n  double resolution() const {\n    return _resolution;\n  };\n  timer() :\n#if defined __unix__\n    _resolution(1e-6),\n#else\n    _resolution(1.0\/CLOCKS_PER_SEC),\n#endif\n    _t(0), _t_start(get_time()),\n    isrunning(true) {\n  }\n};\n\ntypedef std::complex&lt;double&gt; complex;\n\n\/\/ test FFT performance as required for one-dimensional Dirac equation\nvoid time_1d(int p, std::ostream &amp;out) {\n  out &lt;&lt; \"% one dimensional FFT\\n\"\n      &lt;&lt; \"% two grids interwoven\\n\"\n      &lt;&lt; \"%\\n\"\n      &lt;&lt; \"% number of cpus = \" &lt;&lt; p &lt;&lt; \"\\n\"\n      &lt;&lt; \"%\\n\"\n      &lt;&lt; \"% n\\ttime in sec.\\n\";\n  omp_set_num_threads(p);\n  omp_set_dynamic(false);\n  fftw_plan_with_nthreads(p);\n  for (int n=4; n&lt;=16777216; n*=2) {\n    complex *v=reinterpret_cast&lt;complex *&gt;(fftw_malloc(2*n*sizeof(*v)));\n    if (v!=0) {\n      fftw_iodim io_n[1]={ {n, 2, 2} };\n      fftw_iodim io_is[1]={ {2, 1, 1} };\n      fftw_plan p1=fftw_plan_guru_dft(1, io_n, 1, io_is,\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      FFTW_FORWARD, FFTW_MEASURE);\n      fftw_plan p2=fftw_plan_guru_dft(1, io_n, 1, io_is,\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      FFTW_BACKWARD, FFTW_MEASURE);\n      for (int i=0; i&lt;n; ++i) {\n    v[2*i]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n               static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n));\n    v[2*i+1]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n             static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n));\n      }\n      timer T1;\n      T1.start();\n      int i=0;\n      do {\n    fftw_execute_dft(p1,\n             reinterpret_cast&lt;fftw_complex *&gt;(v),\n             reinterpret_cast&lt;fftw_complex *&gt;(v));\n    fftw_execute_dft(p2,\n             reinterpret_cast&lt;fftw_complex *&gt;(v),\n             reinterpret_cast&lt;fftw_complex *&gt;(v));\n    ++i;\n      } while (T1.time()&lt;4 and i&lt;16);\n      double t1=T1.time()\/i;\n      fftw_free(v);\n      fftw_destroy_plan(p1);\n      fftw_destroy_plan(p2);\n      out &lt;&lt; n &lt;&lt; '\\t' &lt;&lt; t1 &lt;&lt; std::endl;\n      std::cerr &lt;&lt; \"1d\\tp = \" &lt;&lt; p\n        &lt;&lt; \"\\tN = \" &lt;&lt; n\n        &lt;&lt; \"\\ttime = \" &lt;&lt; t1 &lt;&lt; std::endl;\n    } else\n      std::cerr &lt;&lt; \"2d\\tp = \" &lt;&lt; p\n        &lt;&lt; \"\\tN = \" &lt;&lt; n\n        &lt;&lt; \"\\tnot enough memory\" &lt;&lt; std::endl;\n  }\n}\n\n\/\/ test FFT performance as required for two-dimensional Dirac equation\nvoid time_2d(int p, std::ostream &amp;out) {\n  out &lt;&lt; \"% two dimensional FFT\\n\"\n      &lt;&lt; \"% four grids interwoven\\n\"\n      &lt;&lt; \"%\\n\"\n      &lt;&lt; \"% number of cpus = \" &lt;&lt; p &lt;&lt; \"\\n\"\n      &lt;&lt; \"%\\n\"\n      &lt;&lt; \"% n\\ttime in sec.\\n\";\n  omp_set_num_threads(p);\n  omp_set_dynamic(false);\n  fftw_plan_with_nthreads(p);\n  for (int n=4; n&lt;=8192; n*=2) {\n    complex *v=reinterpret_cast&lt;complex *&gt;(fftw_malloc(4*n*n*sizeof(*v)));\n    if (v!=0) {\n      fftw_iodim io_n[2]={ {n, 4, 4}, {n, 4*n, 4*n} };\n      fftw_iodim io_is[1]={ {4, 1, 1} };\n      fftw_plan p1=fftw_plan_guru_dft(2, io_n, 1, io_is,\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      FFTW_FORWARD, FFTW_MEASURE);\n      fftw_plan p2=fftw_plan_guru_dft(2, io_n, 1, io_is,\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      reinterpret_cast&lt;fftw_complex *&gt;(v),\n                      FFTW_BACKWARD, FFTW_MEASURE);\n      for (int j=0; j&lt;n; ++j)\n    for (int i=0; i&lt;n; ++i) {\n      v[4*(j*n+i)]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n                   static_cast&lt;double&gt;(j+1)\/static_cast&lt;double&gt;(n));\n      v[4*(j*n+i)+1]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n                 static_cast&lt;double&gt;(j+1)\/static_cast&lt;double&gt;(n));\n      v[4*(j*n+i)+2]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n                 static_cast&lt;double&gt;(j+1)\/static_cast&lt;double&gt;(n));\n      v[4*(j*n+i)+3]=complex(static_cast&lt;double&gt;(i+1)\/static_cast&lt;double&gt;(n),\n                 static_cast&lt;double&gt;(j+1)\/static_cast&lt;double&gt;(n));\n    }\n      timer T1;\n      T1.start();\n      int i=0;\n      do {\n    fftw_execute_dft(p1,\n             reinterpret_cast&lt;fftw_complex *&gt;(v),\n             reinterpret_cast&lt;fftw_complex *&gt;(v));\n    fftw_execute_dft(p2,\n             reinterpret_cast&lt;fftw_complex *&gt;(v),\n             reinterpret_cast&lt;fftw_complex *&gt;(v));\n    ++i;\n      } while (T1.time()&lt;4 and i&lt;16);\n      double t1=T1.time()\/i;\n      fftw_free(v);\n      fftw_destroy_plan(p1);\n      fftw_destroy_plan(p2);\n      out &lt;&lt; n &lt;&lt; '\\t' &lt;&lt; t1 &lt;&lt; std::endl;\n      std::cerr &lt;&lt; \"2d\\tp = \" &lt;&lt; p\n        &lt;&lt; \"\\tN = \" &lt;&lt; n\n        &lt;&lt; \"\\ttime = \" &lt;&lt; t1 &lt;&lt; std::endl;\n    } else\n      std::cerr &lt;&lt; \"2d\\tp = \" &lt;&lt; p\n        &lt;&lt; \"\\tN = \" &lt;&lt; n\n        &lt;&lt; \"\\tnot enough memory\" &lt;&lt; std::endl;\n  }\n}\n\nint main() {\n  fftw_init_threads();\n  for (int p=1; p&lt;=8; ++p) {\n    {\n      std::stringstream name;\n      name &lt;&lt; \"fftw_timing_dirac_1d_p=\" &lt;&lt; p &lt;&lt; \".dat\";\n      std::ofstream out(name.str().c_str());\n      time_1d(p, out);\n    }\n    {\n      std::stringstream name;\n      name &lt;&lt; \"fftw_timing_dirac_2d_p=\" &lt;&lt; p &lt;&lt; \".dat\";\n      std::ofstream out(name.str().c_str());\n      time_2d(p, out);\n    }\n  }\n  return EXIT_SUCCESS;\n}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In a recent project (the numerical solution of the Dirac equation) I am working on, the computation of the fast Fourier transform (FFT) of four interwoven two-dimensional grids is the main computational task. Interwoven grids means that the memory layout of the matrices is such that data starts with the 1st element of the 1st&hellip; <a href=\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Parallel FFT performance<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,5],"tags":[],"class_list":["post-14","post","type-post","status-publish","format-standard","hentry","category-performance","category-software"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Parallel FFT performance - Number Crunch<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Parallel FFT performance - Number Crunch\" \/>\n<meta property=\"og:description\" content=\"In a recent project (the numerical solution of the Dirac equation) I am working on, the computation of the fast Fourier transform (FFT) of four interwoven two-dimensional grids is the main computational task. Interwoven grids means that the memory layout of the matrices is such that data starts with the 1st element of the 1st&hellip; Continue reading Parallel FFT performance\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\" \/>\n<meta property=\"og:site_name\" content=\"Number Crunch\" \/>\n<meta property=\"article:published_time\" content=\"2010-03-05T23:36:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-01-18T22:40:08+00:00\" \/>\n<meta name=\"author\" content=\"Heiko Bauke\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Heiko Bauke\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\"},\"author\":{\"name\":\"Heiko Bauke\",\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413\"},\"headline\":\"Parallel FFT performance\",\"datePublished\":\"2010-03-05T23:36:53+00:00\",\"dateModified\":\"2023-01-18T22:40:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\"},\"wordCount\":368,\"commentCount\":12,\"publisher\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413\"},\"articleSection\":[\"Performance\",\"Software\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\",\"url\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\",\"name\":\"Parallel FFT performance - Number Crunch\",\"isPartOf\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#website\"},\"datePublished\":\"2010-03-05T23:36:53+00:00\",\"dateModified\":\"2023-01-18T22:40:08+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.numbercrunch.de\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Parallel FFT performance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#website\",\"url\":\"https:\/\/www.numbercrunch.de\/blog\/\",\"name\":\"Number Crunch\",\"description\":\"A computational science blog.\",\"publisher\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.numbercrunch.de\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413\",\"name\":\"Heiko Bauke\",\"logo\":{\"@id\":\"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Parallel FFT performance - Number Crunch","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/","og_locale":"en_US","og_type":"article","og_title":"Parallel FFT performance - Number Crunch","og_description":"In a recent project (the numerical solution of the Dirac equation) I am working on, the computation of the fast Fourier transform (FFT) of four interwoven two-dimensional grids is the main computational task. Interwoven grids means that the memory layout of the matrices is such that data starts with the 1st element of the 1st&hellip; Continue reading Parallel FFT performance","og_url":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/","og_site_name":"Number Crunch","article_published_time":"2010-03-05T23:36:53+00:00","article_modified_time":"2023-01-18T22:40:08+00:00","author":"Heiko Bauke","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Heiko Bauke","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#article","isPartOf":{"@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/"},"author":{"name":"Heiko Bauke","@id":"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413"},"headline":"Parallel FFT performance","datePublished":"2010-03-05T23:36:53+00:00","dateModified":"2023-01-18T22:40:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/"},"wordCount":368,"commentCount":12,"publisher":{"@id":"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413"},"articleSection":["Performance","Software"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/","url":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/","name":"Parallel FFT performance - Number Crunch","isPartOf":{"@id":"https:\/\/www.numbercrunch.de\/blog\/#website"},"datePublished":"2010-03-05T23:36:53+00:00","dateModified":"2023-01-18T22:40:08+00:00","breadcrumb":{"@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.numbercrunch.de\/blog\/2010\/03\/parallel-fft-performance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.numbercrunch.de\/blog\/"},{"@type":"ListItem","position":2,"name":"Parallel FFT performance"}]},{"@type":"WebSite","@id":"https:\/\/www.numbercrunch.de\/blog\/#website","url":"https:\/\/www.numbercrunch.de\/blog\/","name":"Number Crunch","description":"A computational science blog.","publisher":{"@id":"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.numbercrunch.de\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/e73eab65b1721dd0c3d408edb887e413","name":"Heiko Bauke","logo":{"@id":"https:\/\/www.numbercrunch.de\/blog\/#\/schema\/person\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/posts\/14"}],"collection":[{"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":13,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":997,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/posts\/14\/revisions\/997"}],"wp:attachment":[{"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/media?parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/categories?post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.numbercrunch.de\/blog\/wp-json\/wp\/v2\/tags?post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}