<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Benny Istanto&#39;s Blog</title>
<link>https://benny.istan.to/site/blog.html</link>
<atom:link href="https://benny.istan.to/site/blog.xml" rel="self" type="application/rss+xml"/>
<description>Exploring climate, GIS, and data science - with stories from work, family moments, and journeys around the world</description>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Fri, 28 Feb 2025 00:00:00 GMT</lastBuildDate>
<item>
  <title>Word Clock</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20250228-word-clock.html</link>
  <description><![CDATA[ 





<p>Ever wondered what time it would be if clocks could speak? This Word Clock translates the current time into natural language phrases that we use in everyday conversation. Instead of looking at hands or digits, you simply read the highlighted words to tell the time!</p>
<p><em>“IT IS TWENTY MINUTES PAST FOUR” or “IT IS QUARTER TO SEVEN”</em> - just like you’d say it to a friend.</p>
<p>The clock updates every minute, automatically highlighting the words that form the correct time phrase. Notice how at different times, different word combinations light up to create readable sentences.</p>
<p>It’s a fun, more human way to experience time passing. Enjoy watching the words change as the minutes tick by!</p>
<p><strong>Inspiration</strong></p>
<p>Inspired by the Word Clock sold by <a href="https://www.walmart.com/ip/The-Word-Clock-Shows-The-Time-In-A-Sentence/102811752">Walmart - The Word Clock - Shows The Time In A Sentence</a></p>
<p><a href="../assets/image-blog/20250228-word-clock-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20250228-word-clock-01.jpg" class="img-fluid"></a></p>
<p>See it at Observable - https://observablehq.com/<span class="citation" data-cites="bennyistanto/word-clock">@bennyistanto/word-clock</span></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Data Science</category>
  <category>General</category>
  <guid>https://benny.istan.to/site/blog/20250228-word-clock.html</guid>
  <pubDate>Fri, 28 Feb 2025 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20250228-word-clock-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>xkcd style for Country map</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240525-xkcd-style-for-country-map.html</link>
  <description><![CDATA[ 





<p>Source: <a href="https://gist.github.com/bennyistanto/7b391b11e861334bc020dd03c06815f2" class="uri">https://gist.github.com/bennyistanto/7b391b11e861334bc020dd03c06815f2</a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <category>GIS</category>
  <guid>https://benny.istan.to/site/blog/20240525-xkcd-style-for-country-map.html</guid>
  <pubDate>Sun, 26 May 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>xkcd style for LSEQM illustration</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240523-xkcd-style-for-lseqm-illustration.html</link>
  <description><![CDATA[ 





<p>Source: <a href="https://gist.github.com/bennyistanto/b9e5d9c932dc6b034f559deaa26e2743" class="uri">https://gist.github.com/bennyistanto/b9e5d9c932dc6b034f559deaa26e2743</a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20240523-xkcd-style-for-lseqm-illustration.html</guid>
  <pubDate>Thu, 23 May 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Skip PEARSON fitting on climate-indices python package</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240503-skip-pearson-fitting-on-climate-indices-python-package.html</link>
  <description><![CDATA[ 





<p>Source: <a href="https://gist.github.com/bennyistanto/e8710f89bfbebaf24498dd957a1fa961" class="uri">https://gist.github.com/bennyistanto/e8710f89bfbebaf24498dd957a1fa961</a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20240503-skip-pearson-fitting-on-climate-indices-python-package.html</guid>
  <pubDate>Fri, 03 May 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Utilizing CUDA</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240416-utilizing-cuda.html</link>
  <description><![CDATA[ 





<p>This week I try to utilize CUDA on my desktop, to support the upcoming activities on heavy geospatial and climate analytics. Bit tricky but I managed to install it in both Windows 11 and WSL2 Debian 12. See below.</p>
<section id="install-cuda-and-cudnn-using-conda" class="level3">
<h3 class="anchored" data-anchor-id="install-cuda-and-cudnn-using-conda">Install CUDA and cuDNN using Conda</h3>
<p>Tested on:</p>
<p>Windows 11 Pro for Workstations and WSL2 Debian 12<br>
Processor: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz 2.00 GHz (2 processors)<br>
Installed RAM: 384 GB<br>
VGA: NVIDIA Quadro P2000 5GB</p>
<hr>
<section id="install-the-gpu-driver" class="level4">
<h4 class="anchored" data-anchor-id="install-the-gpu-driver">1. Install the GPU driver</h4>
<p><strong>This step only apply to Windows</strong></p>
<p>Download and install the <a href="https://www.nvidia.com/download/index.aspx">NVIDIA Driver for GPU Support</a> to use with your existing CUDA ML workflows. For my case, I choses:</p>
<ul>
<li>Product type: NVIDIA RTX/Quadro</li>
<li>Product series: Quadro Series</li>
<li>Product: Quadro P2000</li>
<li>Operating System: Windows 11</li>
<li>Download Type: Production Branch/Studio</li>
<li>Language: English (US)</li>
</ul>
<p>Click Search, then you will Click Download, follow with Click on Agree &amp; Download. It will grab a file from this link <a href="https://us.download.nvidia.com/Windows/Quadro_Certified/551.86/551.86-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql.exe" class="uri">https://us.download.nvidia.com/Windows/Quadro_Certified/551.86/551.86-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql.exe</a> with size 483 MB.</p>
<p>Next, install and follow to step until completed.</p>
<blockquote class="blockquote">
<p>Note</p>
<p><strong>This is the only driver we need to install. Do not install any Linux display driver in WSL.</strong></p>
<p>Reference: <a href="https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2" class="uri">https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2</a></p>
</blockquote>
<hr>
<p>Step 2-7 below, apply for both Windows and WSL</p>
</section>
<section id="create-new-conda-environment" class="level4">
<h4 class="anchored" data-anchor-id="create-new-conda-environment">2. Create new Conda environment</h4>
<p>Open Anaconda Prompt on Windows or Terminal on WSL (I am sure both are in the same Windows Terminal with different Tab). Please make sure we are outside the Conda environment, by typing:</p>
<pre><code>conda deactivate</code></pre>
<p>Let’s create new Conda environment, called <code>cuda</code>with Python version <code>3.11</code></p>
<pre><code>conda create -n cuda python==3.11</code></pre>
</section>
<section id="install-essential-python-package-for-geospatial-analysis-and-data-visualization" class="level4">
<h4 class="anchored" data-anchor-id="install-essential-python-package-for-geospatial-analysis-and-data-visualization">3. Install essential Python package for geospatial analysis and data visualization</h4>
<p>I would like to use this <code>cuda</code> env to do heavy geospatial and climate data process, so I will install Python <code>geospatial</code> <a href="https://geospatial.gishub.org/">package</a></p>
<pre><code>conda install -c conda-forge geospatial</code></pre>
<p>If needed, we can install other package too. Example: <code>cdo</code>, <code>nco</code>, <code>gdal</code>, <code>awscli</code></p>
<blockquote class="blockquote">
<p><code>cdo</code> package only available in Linux (WSL) environment.</p>
</blockquote>
</section>
<section id="install-cuda-toolkit" class="level4">
<h4 class="anchored" data-anchor-id="install-cuda-toolkit">4. Install CUDA toolkit</h4>
<p>Install <code>cudatoolkit v11.8.0</code> - <a href="https://anaconda.org/conda-forge/cudatoolkit" class="uri">https://anaconda.org/conda-forge/cudatoolkit</a></p>
<pre><code>conda install -c conda-forge cudatoolkit</code></pre>
</section>
<section id="install-cudnn" class="level4">
<h4 class="anchored" data-anchor-id="install-cudnn">5. Install cuDNN</h4>
<p>Install <code>cudnn v8.9.7</code> - <a href="https://anaconda.org/conda-forge/cudnn" class="uri">https://anaconda.org/conda-forge/cudnn</a></p>
<pre><code>conda install -c conda-forge cudnn</code></pre>
</section>
<section id="install-pytorch" class="level4">
<h4 class="anchored" data-anchor-id="install-pytorch">6. Install Pytorch</h4>
<p>Install Pytorch - <a href="https://pytorch.org/" class="uri">https://pytorch.org/</a></p>
<pre><code>conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia</code></pre>
</section>
<section id="install-tensorflow" class="level4">
<h4 class="anchored" data-anchor-id="install-tensorflow">7. Install Tensorflow</h4>
<p>Install Tensorflow 2.14.0, as this is the last Tensorflow compatible version with CUDA 11.8. Reference: <a href="https://www.tensorflow.org/install/source#gpu" class="uri">https://www.tensorflow.org/install/source#gpu</a></p>
<pre><code>conda install -c conda-forge tensorflow=2.14.0=cuda118py311heb1bdc4_0</code></pre>
</section>
<section id="setting-the-library" class="level4">
<h4 class="anchored" data-anchor-id="setting-the-library">8. Setting the Library</h4>
<p><strong>This step only apply to WSL</strong></p>
<p>If we installed CUDA and cuDNN via Conda, then typically we should not need to manually set <code>LD_LIBRARY_PATH</code> or <code>PATH</code> for these libraries, as describe by many tutorial when we install the CUDA and cuDNN system-wide, because Conda handles the environment setup for us.</p>
<p>However, sometimes we are encountering issues like - errors related to cuDNN not being registered correctly - there might still be a need to ensure that TensorFlow is able to find and use the correct libraries provided by the Conda environment.</p>
<p><strong>Why We Might Still Need to Set <code>LD_LIBRARY_PATH</code>?</strong></p>
<p>Even though Conda generally manages library paths internally, in some cases, especially when integrating complex software stacks like TensorFlow with GPU support, the automatic configuration might not work perfectly out of the box.</p>
<p><strong>Find the library paths</strong>: We can look for CUDA and cuDNN libraries within the Conda environment’s library directory:</p>
<pre><code>ls $CONDA_PREFIX/lib | grep libcudnn
ls $CONDA_PREFIX/lib | grep libcublas
ls $CONDA_PREFIX/lib | grep libcudart</code></pre>
<p><strong>Manually Set</strong> <code>LD_LIBRARY_PATH</code> (If Needed)</p>
<p>If we find that TensorFlow still fails to recognize these libraries despite them being present in the Conda environment, we might try setting <code>LD_LIBRARY_PATH</code> manually:</p>
<pre><code>export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH</code></pre>
<p>In my case, I have set the PATH in <code>.zshrc</code>, so above approach is already done</p>
<pre><code># Anaconda 
# &gt;&gt;&gt; conda initialize &gt;&gt;&gt;
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/bennyistanto/anaconda3/bin/conda' 'shell.zsh' 'hook' 2&gt; /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/bennyistanto/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/bennyistanto/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/bennyistanto/anaconda3/bin:$PATH"
        export LD_LIBRARY_PATH="/home/bennyistanto/anaconda3/lib:$LD_LIBRARY_PATH"
    fi
fi
unset __conda_setup
# &lt;&lt;&lt; conda initialize &lt;&lt;&lt;</code></pre>
<p>Based on my <code>.zshrc</code> settings and the Conda environment settings, my <code>LD_LIBRARY_PATH</code> is already set to include the Conda libraries at <code>/home/bennyistanto/anaconda3/lib</code>. This should generally be sufficient for TensorFlow to locate and use the CUDA and cuDNN libraries installed via Conda, given that Conda typically manages its own library paths very well.</p>
<p><strong>Evaluation of Current Setup</strong></p>
<p>Since I’ve already set <code>LD_LIBRARY_PATH</code> in my <code>.zshrc</code>, TensorFlow should correctly recognize and utilize the CUDA and cuDNN libraries installed in my Conda environment, assuming there are no other conflicting settings or installations. The <code>LD_LIBRARY_PATH</code> in my <code>.zshrc</code> appears correctly configured to point to the general Conda library directory, but there are a few additional things we might consider:</p>
<p>Make sure we are stil working inside <code>cuda</code> environment.</p>
<p>If TensorFlow continues to have issues finding or correctly using the cuDNN libraries, we might consider adding a direct link to the specific CUDA and cuDNN library paths in <code>LD_LIBRARY_PATH</code> within our Conda activation scripts. We can modify the environment’s activation and deactivation scripts as follows:</p>
<ul>
<li><p><strong>Activate Script</strong> (<code>$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh</code>):</p>
<pre><code>#! /bin/sh
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH</code></pre></li>
<li><p><strong>Deactivate Script</strong> (<code>$CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh</code>):</p>
<pre><code>#! /bin/sh
export LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | sed -e "s|$CONDA_PREFIX/lib:||g")</code></pre></li>
</ul>
<p>This explicitly ensures that our specific Conda environment’s library path is prioritized while the environment is active.</p>
<p>In my case (as I am working inside <code>cuda</code> environment, <code>$CONDA_PREFIX</code> = <code>/home/bennyistanto/anaconda3/envs/cuda</code></p>
<p>If the <code>env_vars.sh</code> file does not exist in both the <code>activate.d</code> and <code>deactivate.d</code> directories within our Conda environment, we should create them. These scripts are useful for setting up and tearing down environment variables each time we activate or deactivate our Conda environment. This ensures that any customizations to our environment variables are applied only within the context of that specific environment and are cleaned up afterwards.</p>
<p>Here’s how to create and use these scripts:</p>
<p><strong>Step 1: Create the Directories</strong></p>
<p>If the <code>activate.d</code> and <code>deactivate.d</code> directories don’t exist, we’ll need to create them first. Here’s how we can do it:</p>
<pre><code>mkdir -p $CONDA_PREFIX/etc/conda/activate.d
mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d</code></pre>
<p><strong>Step 2: Create the Activation Script</strong></p>
<p>Create the <code>env_vars.sh</code> script in the <code>activate.d</code> directory. This script will run every time we activate the environment.</p>
<ol type="1">
<li><p>Navigate to the directory:</p>
<pre><code>cd $CONDA_PREFIX/etc/conda/activate.d</code></pre></li>
<li><p>Create and edit the <code>env_vars.sh</code> file:</p>
<pre><code>nano env_vars.sh</code></pre></li>
<li><p>Add the following content to set up the <code>LD_LIBRARY_PATH</code>:</p>
<pre><code>#!/bin/sh
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH</code></pre></li>
<li><p>Save and exit the editor (in nano, press <code>Ctrl+O</code>, <code>Enter</code>, and then <code>Ctrl+X</code>).</p></li>
</ol>
<p><strong>Step 3: Create the Deactivation Script</strong></p>
<p>Similarly, create the <code>env_vars.sh</code> script in the <code>deactivate.d</code> directory. This script will clear the environment variables when we deactivate the environment.</p>
<ol type="1">
<li><p>Navigate to the directory:</p>
<pre><code>cd $CONDA_PREFIX/etc/conda/deactivate.d</code></pre></li>
<li><p>Create and edit the <code>env_vars.sh</code> file:</p>
<pre><code>nano env_vars.sh</code></pre></li>
<li><p>Add the following content to unset the <code>LD_LIBRARY_PATH</code>:</p>
<pre><code>#!/bin/sh
export LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | sed -e "s|$CONDA_PREFIX/lib:||g")</code></pre></li>
<li><p>Save and exit the editor.</p></li>
</ol>
<p><strong>Step 4: Make Scripts Executable</strong></p>
<p>Ensure that both scripts are executable:</p>
<pre><code>chmod +x $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh</code></pre>
<p><strong>Step 5: Testing</strong></p>
<p>Activate our environment again to test the changes:</p>
<pre><code>conda deactivate
conda activate cuda</code></pre>
<p>Check that the <code>LD_LIBRARY_PATH</code> is correctly set:</p>
<pre><code>echo $LD_LIBRARY_PATH</code></pre>
<p>This should reflect the changes we’ve made, showing that the library path of our Conda environment is included.</p>
<p>In my case, the output from <code>echo $LD_LIBRARY_PATH</code> shows <code>/home/bennyistanto/anaconda3/envs/cuda/lib:</code> indicates that my <code>LD_LIBRARY_PATH</code> is correctly set to include the library directory of our Conda environment named “cuda”. This setup is what we want because it directs the system to look in our Conda environment’s <code>lib</code> directory for shared libraries, such as those provided by CUDA and cuDNN, which are crucial for TensorFlow to correctly utilize GPU resources.</p>
</section>
<section id="configure-jupyter-notebook" class="level4">
<h4 class="anchored" data-anchor-id="configure-jupyter-notebook">9. Configure Jupyter Notebook</h4>
<p>To configure Jupyter Notebook to use GPUs, we need to create a new kernel that uses the Conda environment we created earlier <code>cuda</code> and specifies the GPU device. We can do this by running the following command:</p>
<pre><code>python -m ipykernel install --user --name cuda --display-name "Python 3 (GPU)"</code></pre>
<p>This command installs a new kernel called “Python (GPU)” that uses the <code>cuda</code> Conda environment and specifies the GPU device.</p>
<hr>
<p>Voila, the installation process is completed. Next we can test using <code>test_GPU.ipynb</code></p>
<p>Github Gist file: <a href="https://gist.github.com/bennyistanto/46d8cfaf88aaa881ec69a2b5ce60cb58" class="uri">https://gist.github.com/bennyistanto/46d8cfaf88aaa881ec69a2b5ce60cb58</a></p>


</section>
</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <guid>https://benny.istan.to/site/blog/20240416-utilizing-cuda.html</guid>
  <pubDate>Tue, 16 Apr 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Maximizing Thinkpad T14 Gen 2 AMD</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240125-maximizing-thinkpad-t14-gen-2-amd.html</link>
  <description><![CDATA[ 





<p>I bought a <a href="https://www.lenovo.com/us/en/p/laptops/thinkpad/thinkpadt/t14-g2-amd/22tpt14t4a1">Thinkpad T14 Gen 2 AMD</a> (released on August 2022) end of December 2023, it’s second hand with mint condition and standard specification (AMD Ryzen™ 7 PRO 5850U Processor with Radeon Graphics, 256GB nvme and 16GB RAM, FHD 14”).</p>
<p>This is my second-owned Thinkpad after the <a href="../blog/20220501-maximizing-thinkpad-t480">T480</a>, and same as before I also did some upgrades on my T14. Here’s the list of the components:</p>
<ol type="1">
<li>T14 Gen 2: 14″, 3840×2160, IPS, 500 nits, 100% Adobe RGB, Anti-glare from <a href="https://www.myfixguide.com/store/screen-for-thinkpad-t14/" class="uri">https://www.myfixguide.com/store/screen-for-thinkpad-t14/</a></li>
<li>40pin UHD cable from <a href="https://www.myfixguide.com/store/lcd-cable-for-t14-gen2/" class="uri">https://www.myfixguide.com/store/lcd-cable-for-t14-gen2/</a></li>
<li>4TB NVMe SSD from <a href="https://www.crucial.com/ssd/p3-plus/CT4000P3PSSD8" class="uri">https://www.crucial.com/ssd/p3-plus/CT4000P3PSSD8</a></li>
<li>32GB RAM DDR4 SODIMM 3200MHz from <a href="https://www.corsair.com/us/en/p/memory/cmsx32gx4m1a3200c22/corsair-high-performance-vengeance-memory-kit-cmsx32gx4m1a3200c22" class="uri">https://www.corsair.com/us/en/p/memory/cmsx32gx4m1a3200c22/corsair-high-performance-vengeance-memory-kit-cmsx32gx4m1a3200c22</a></li>
<li>WWAN card and the antenna from <a href="https://thinkparts.com/products/-4970" class="uri">https://thinkparts.com/products/-4970</a></li>
</ol>
<p>I did all the installation myself by following the awesome guideline and use Pro Tech Toolkit <a href="https://www.ifixit.com/Store/Tools/Pro-Tech-Toolkit/IF145-307" class="uri">https://www.ifixit.com/Store/Tools/Pro-Tech-Toolkit/IF145-307</a> from <a href="https://www.ifixit.com/">IFIXIT</a>.</p>
<p>See some of the picture below:</p>
<p>[caption id=“” align=“alignnone” width=“3333”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-01.jpg" class="img-fluid" alt="Remove the old screen and install the WWAN antenna"> Remove the old screen and install the WWAN antenna [/caption] [caption id=“” align=“alignnone” width=“3333”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-02.jpg" class="img-fluid" alt="Install the new screen"> Install the new screen [/caption] [caption id=“” align=“alignnone” width=“3333”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-06.jpg" class="img-fluid" alt="Put the memory in the free slot"> Put the memory in the free slot [/caption] [caption id=“” align=“alignnone” width=“3333”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-07.jpg" class="img-fluid" alt="Replace the old SSD with new one, and put the WWAN card in place and antenna cable too"> Replace the old SSD with new one, and put the WWAN card in place and antenna cable too [/caption] [caption id=“” align=“alignnone” width=“4032”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-05.jpg" class="img-fluid" alt="This is the machine with upgrades"> This is the machine with upgrades [/caption] [caption id=“” align=“alignnone” width=“3024”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-04.jpg" class="img-fluid" alt="Look, the Cellular connection is available now"> Look, the Cellular connection is available now [/caption] [caption id=“” align=“alignnone” width=“3024”]<img src="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-03.jpg" class="img-fluid" alt="The new 4K screen is on and I just need to install the bezel 😎"> The new 4K screen is on and I just need to install the bezel 😎 [/caption]</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <guid>https://benny.istan.to/site/blog/20240125-maximizing-thinkpad-t14-gen-2-amd.html</guid>
  <pubDate>Fri, 26 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20240125-maximizing-thinkpad-t14-gen-2-amd-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Drought Propagation</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240124-drought-propagation-01.html</link>
  <description><![CDATA[ 





<p>Last month I did experiment test to analyze the propagation of <strong>Meteorological Drought</strong> (Standardized Precipitation Index - SPI) to <strong>Hydrological Drought</strong> (Standardized Streamflow Index - SSI) using <strong>Lagged Correlation</strong> at the pixel level with area of interest, Indonesia.</p>
<p>To download the full repository, you can ccess it via this link: <a href="https://github.com/bennyistanto/drought-propagation" class="uri">https://github.com/bennyistanto/drought-propagation</a></p>
<section id="data" class="level3">
<h3 class="anchored" data-anchor-id="data">Data</h3>
<p>I use the <strong>Standardized Precipitation Index</strong> (<a href="https://library.wmo.int/viewer/39629/download?file=wmo_1090_en.pdf&amp;type=pdf&amp;navigator=1">SPI</a>) - as proxy for meteorological drought, and the (<a href="https://doi.org/10.1029/2019WR026315">SSI</a> - as proxy for hydrological drought.</p>
<p>The SPI use monthly gridded Satellite precipitation estimates from Climate Hazards Group InfraRed Precipitation with Station data (<a href="https://doi.org/10.1038/sdata.2015.66">CHIRPS</a>).</p>
<p>The SSI use daily gridded River discharge in the last 24 hours from <a href="https://doi.org/10.5194/essd-12-2043-2020">GloFAS-ERA5 operational global river discharge reanalysis 1979–present</a> as a proxy for the streamflow time series infomation.</p>
</section>
<section id="folder-structure-and-files" class="level3">
<h3 class="anchored" data-anchor-id="folder-structure-and-files">Folder structure and files</h3>
<p>There are 3 notebook along with support folder that required to run the analysis. Feel free to use your own preferences for this setting/folder arrangements.</p>
<ol type="1">
<li><code>hyd</code> # Files required to proceed the hydrological drought goes here.</li>
<li><code>met</code> # Files required to proceed the meteorological drought goes here.</li>
<li><code>prop</code> # File required to proceed the propagation using lagged correlation goes here.</li>
<li><code>subset</code> # In this folder I put <code>idn_subset_chirps.nc</code> file, a subset file to clip the input data to follow the area of interest. Basically this file are came from a shapefile polygon which has <code>land</code> attribute column with <code>value = 1</code>, then converting to raster based on <code>land</code> column, and set the cell size following our standard (I use 0.05 deg, because the SPI and SSI also has the same spatial resolution, 0.05 deg). After that, convert it to netCDF. All is done in ArcGIS Desktop.</li>
</ol>
<p>The notebook</p>
<ol type="1">
<li><a href="https://github.com/bennyistanto/drought-propagation/blob/main/1_Steps_to_Generate_SPI_Using_CHIRPS_Data.ipynb"><code>1_Steps_to_Generate_SPI_Using_CHIRPS_Data.ipynb</code></a></li>
<li><a href="https://github.com/bennyistanto/drought-propagation/blob/main/2_Steps_to_Generate_SSI_Using_GloFAS-ERA5_Data.ipynb"><code>2_Steps_to_Generate_SSI_Using_GloFAS-ERA5_Data.ipynb</code></a></li>
<li><a href="https://github.com/bennyistanto/drought-propagation/blob/main/3_Drought_Propagation_Met2Hyd_Using_CCA.ipynb"><code>3_Drought_Propagation_Met2Hyd_Using_CCA.ipynb</code></a></li>
</ol>
<blockquote class="blockquote">
<p>This is using Cross-Correlation for each pixel accross the entire time series, also employ noise filtering techniques like Singular Spectrum Analysis (SSA) which can help in isolate the underlying trends and patterns in our data before performing the CCA. This step is crucial for enhancing the signal-to-noise ratio in our datasets.</p>
</blockquote>
</section>
<section id="approach" class="level3">
<h3 class="anchored" data-anchor-id="approach">Approach</h3>
<p>The analysis using combination from various time scale [<code>3</code>, <code>6</code>, <code>9</code>, and <code>12-month</code>] and Lag range from 1 - 12 month</p>
<pre><code>time_scale_combinations = [
    "spi03_ssi03", "spi06_ssi03", "spi09_ssi03", "spi12_ssi03",
    "spi03_ssi06", "spi06_ssi06", "spi09_ssi06", "spi12_ssi06",
    "spi03_ssi09", "spi06_ssi09", "spi09_ssi09", "spi12_ssi09",
    "spi03_ssi12", "spi06_ssi12", "spi09_ssi12", "spi12_ssi12"
]</code></pre>
<section id="preprocessing" class="level4">
<h4 class="anchored" data-anchor-id="preprocessing">Preprocessing</h4>
<p>The drought characteristics originally following the method proposed by Yevjevich in <a href="https://www.engr.colostate.edu/ce/facultystaff/yevjevich/papers/HydrologyPapers_n23_1967.pdf">1967</a> and has been employed to recognize the feature of droughts. The paper from Le, et al in <a href="https://www.researchgate.net/publication/333171255_Space-time_variability_of_drought_over_Vietnam">2019</a> provide better explanation about it: duration, severity, intensity, and interarrival.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../assets/image-blog/20240124-drought-propagation-01.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Drought"><img src="https://benny.istan.to/site/assets/image-blog/20240124-drought-propagation-01.png" class="img-fluid figure-img" alt="Drought"></a></p>
<figcaption>Drought</figcaption>
</figure>
</div>
<p><strong>Masking for Drought Event</strong> The drought condition is set when the SPI or SSI value negative, or less than -1.2. Focusing on drought conditions could be a more relevant approach for our analysis compare to using all SPI and SSI data, as it has dry and wet condition. By concentrating on these periods, we can potentially gain more insight into the correlation between meteorological and hydrological droughts.</p>
<p><strong>Calculate Drought Magnitude</strong> Compute the absolute cumulative values during drought events for both datasets. This gives a measure of drought magnitude, which may be more meaningful for correlation analysis than using raw SPI/SSI values.</p>
<p><strong>Applying Singular Spectrum Analysis (SSA)</strong> For noise filtering and trend extraction in drought magnitude data in SPI and SSI datasets. In drought propagation analysis, noise filtering with SSA is a critical step for data preparation. SSA effectively separates the underlying signal from the noise in climate datasets, such as SPI and SSI.</p>
<p>SSA decomposes a time series into a sum of components:</p>
<p><img src="https://latex.codecogs.com/png.latex?X(t)%20=%20T(t)%20+%20S(t)%20+%20N(t)"></p>
<p>Where:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?X(t)">: Original time series</li>
<li><img src="https://latex.codecogs.com/png.latex?T(t)">: Trend component</li>
<li><img src="https://latex.codecogs.com/png.latex?S(t)">: Seasonal component</li>
<li><img src="https://latex.codecogs.com/png.latex?N(t)">: Noise component</li>
</ul>
<p>This process is crucial for enhancing the clarity and accuracy of the data, which in turn facilitates a more precise understanding of drought patterns and their progression.</p>
</section>
<section id="analysis" class="level4">
<h4 class="anchored" data-anchor-id="analysis">Analysis</h4>
<p><strong>Cross-Correlation Analysis</strong> Especially when applied to data refined through SSA noise filtering, is pivotal in understanding drought propagation. This technique examines the relationship between different drought indicators across various time scales. By utilizing data filtered through SSA, which isolates the core signal from noise, Cross-Correlation Analysis can more accurately determine the time lag and intensity with which meteorological droughts (indicated by SPI) translate into hydrological droughts (indicated by SSI).</p>
<p>The cross-correlation coefficient <img src="https://latex.codecogs.com/png.latex?%5Crho_%7Bxy%7D(%5Ctau)"> at lag <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is calculated as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Crho_%7Bxy%7D(%5Ctau)%20=%20%5Cfrac%7B%5Csum((X_i%20-%20%5Cbar%7BX%7D)(Y_%7Bi+%5Ctau%7D%20-%20%5Cbar%7BY%7D))%7D%7B%5Csqrt%7B%5Csum(X_i%20-%20%5Cbar%7BX%7D)%5E2%20%5Csum(Y_i%20-%20%5Cbar%7BY%7D)%5E2%7D%7D"></p>
<p>where:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?X_i">: Value of the first time series at time <img src="https://latex.codecogs.com/png.latex?i"></li>
<li><code>Yi</code>: Value of the second time series at time <code>i + τ</code></li>
<li><code>τ</code>: Time lag</li>
<li><code>X̄</code>: Mean of the first time series</li>
<li><code>Ȳ</code>: Mean of the second time series</li>
<li><code>N</code>: Number of data points</li>
</ul>
<p>This approach is essential for predicting the onset and progression of drought conditions, enabling timely decision-making and effective resource management to mitigate the adverse impacts of droughts.</p>
<p><strong>Frequency Analysis</strong> In the context of drought propagation analysis, frequency analysis plays a critical role in identifying the most prominent patterns of correlation between meteorological and hydrological drought indicators over time. By classifying cross-correlation values into distinct ranges (e.g., 0.0-0.1, 0.1-0.2, etc.) and analyzing these across different lag times, researchers can pinpoint the range that most frequently occurs.</p>
<p>This approach helps in understanding the typical strength of correlation and the temporal shift (lag) between the onset of meteorological drought and its subsequent impact on hydrological conditions. The most frequent range provides insights into the commonality of correlation strengths, while the corresponding lag sheds light on the typical delay between atmospheric changes and their effects on hydrological systems. We can also derive the maximum correlation value that can provides insight on which areas has the best correlation, and Lag time where the maximum correlation between SPI and SSI is observed.</p>
</section>
</section>
<section id="visualisation" class="level3">
<h3 class="anchored" data-anchor-id="visualisation">Visualisation</h3>
<p>There are two map type that use to illustrate the results of the cross-correlation analysis between meteorological and hydrological droughts.</p>
<p><strong>Lag Map</strong> This map displays the time lag (in months) between meteorological and hydrological droughts across the study area. It helps identify regions where hydrological responses to meteorological changes are immediate or delayed.</p>
<p><strong>Strength Map</strong> This map shows the strength of the correlation between meteorological and hydrological droughts. It highlights areas with a strong predictive relationship, indicating regions sensitive to meteorological changes.</p>
<p>Below some example of the individual Strength Map from various time scale combinations and lag.</p>
<ol type="1">
<li><p>SPI 03 and SSI 03, Lag 1-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi03_ssi03_lag01.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="SM1"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi03_ssi03_lag01.png" class="img-fluid figure-img" alt="SM1"></a></p>
<figcaption>SM1</figcaption>
</figure>
</div></li>
<li><p>SPI 06 and SSI 03, Lag 1-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi06_ssi03_lag01.png" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="SM2"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi06_ssi03_lag01.png" class="img-fluid figure-img" alt="SM2"></a></p>
<figcaption>SM2</figcaption>
</figure>
</div></li>
<li><p>SPI 06 and SSI 03, Lag 3-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi06_ssi03_lag03.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="SM3"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi06_ssi03_lag03.png" class="img-fluid figure-img" alt="SM3"></a></p>
<figcaption>SM3</figcaption>
</figure>
</div></li>
<li><p>SPI 12 and SSI 06, Lag 6-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi12_ssi06_lag06.png" class="lightbox" data-gallery="quarto-lightbox-gallery-5" title="SM4"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/cor_spi12_ssi06_lag06.png" class="img-fluid figure-img" alt="SM4"></a></p>
<figcaption>SM4</figcaption>
</figure>
</div></li>
</ol>
<p>And below some example of the composite Strength and Lag Map from various time scale combinations.</p>
<ol type="1">
<li><p>Most frequent correlation and Lag where the most frequent observe of SPI to SSI 3-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-6" title="SM1"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_1.png" class="img-fluid figure-img" alt="SM1"></a></p>
<figcaption>SM1</figcaption>
</figure>
</div></li>
<li><p>Most frequent correlation and Lag where the most frequent observe of SPI to SSI 6-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_2.png" class="lightbox" data-gallery="quarto-lightbox-gallery-7" title="SM2"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_2.png" class="img-fluid figure-img" alt="SM2"></a></p>
<figcaption>SM2</figcaption>
</figure>
</div></li>
<li><p>Most frequent correlation and Lag where the most frequent observe of SPI to SSI 9-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_3.png" class="lightbox" data-gallery="quarto-lightbox-gallery-8" title="SM3"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_3.png" class="img-fluid figure-img" alt="SM3"></a></p>
<figcaption>SM3</figcaption>
</figure>
</div></li>
<li><p>Most frequent correlation and Lag where the most frequent observe of SPI to SSI 12-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_4.png" class="lightbox" data-gallery="quarto-lightbox-gallery-9" title="SM4"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_freq_corr_combination_4.png" class="img-fluid figure-img" alt="SM4"></a></p>
<figcaption>SM4</figcaption>
</figure>
</div></li>
<li><p>Maximum correlation and Lag where the maximum observe of SPI to SSI 3-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-10" title="SM1"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_1.png" class="img-fluid figure-img" alt="SM1"></a></p>
<figcaption>SM1</figcaption>
</figure>
</div></li>
<li><p>Maximum correlation and Lag where the maximum observe of SPI to SSI 6-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_2.png" class="lightbox" data-gallery="quarto-lightbox-gallery-11" title="SM2"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_2.png" class="img-fluid figure-img" alt="SM2"></a></p>
<figcaption>SM2</figcaption>
</figure>
</div></li>
<li><p>Maximum correlation and Lag where the maximum observe of SPI to SSI 9-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_3.png" class="lightbox" data-gallery="quarto-lightbox-gallery-12" title="SM3"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_3.png" class="img-fluid figure-img" alt="SM3"></a></p>
<figcaption>SM3</figcaption>
</figure>
</div></li>
<li><p>Maximum correlation and Lag where the maximum observe of SPI to SSI 12-month</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_4.png" class="lightbox" data-gallery="quarto-lightbox-gallery-13" title="SM4"><img src="https://raw.githubusercontent.com/bennyistanto/drought-propagation/main/prop/images/idn_cli_max_corr_combination_4.png" class="img-fluid figure-img" alt="SM4"></a></p>
<figcaption>SM4</figcaption>
</figure>
</div></li>
</ol>
</section>
<section id="to-do" class="level3">
<h3 class="anchored" data-anchor-id="to-do">To do</h3>
<p>Number of Lag from 1-12 month in the existing simulation is good enough.</p>
<p>Adding more time scale from <code>3, 6, 9, 12</code> to <code>1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12</code> and the combination, potentially produce more insight.</p>
<p><strong>THIS WORK STILL IN PROGRESS</strong></p>
</section>
<section id="live-testing" class="level3">
<h3 class="anchored" data-anchor-id="live-testing">Live testing</h3>
<p>You can access the notebook via Binder</p>
<p><a href="https://mybinder.org/v2/gh/bennyistanto/drought-propagation/HEAD" class="uri">https://mybinder.org/v2/gh/bennyistanto/drought-propagation/HEAD</a></p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <category>Research</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20240124-drought-propagation-01.html</guid>
  <pubDate>Wed, 24 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20240124-drought-propagation-01.png" medium="image" type="image/png" height="55" width="144"/>
</item>
<item>
  <title>Firmware upgrade on Thuraya SatSleeve for iPhone</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone.html</link>
  <description><![CDATA[ 





<p><a href="../assets/image-blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone-01.jpg" class="img-fluid"></a></p>
<p>I have old version of <a href="https://www.thuraya.com/en/products-list/legacy/satsleeve-for-iphone">Thuraya SatSleeve for iPhone</a>, now categorized as a legacy product on Thuraya website.</p>
<p>Since year ago the SatSleeve device experiencing intermittent connection, between bluetooth on iPhone SE and SatSleeve, also the GPS connection. Luckily, Thuraya providing a firmware upgrade with release v3.0.1 on their website <a href="https://www.thuraya.com/en/support/upgrades/legacy/thuraya-satsleeve-for-iphone" class="uri">https://www.thuraya.com/en/support/upgrades/legacy/thuraya-satsleeve-for-iphone</a></p>
<p>Before upgrading a SatSleeve, please check which firmware is installed (SatSleeve &gt; Settings &gt; Device Info &gt; Firmware version). Perform the upgrade only if Thuraya releases a firmware version newer than your existing one (mine is v2.94)</p>
<p>To upgrade the firmware, follow steps on the website:</p>
<p>Step 1</p>
<ul>
<li><p>Download the below SatSleeve Upgrader program.</p>
<p><a href="https://www.thuraya.com/-/media/thuraya/downloads/upgrades/legacy-downloads/satsleeve-for-iphone/thurayasatsleeveupgraderv1331-1.zip?la=en&amp;hash=6383DE73A533A43B0C8043BF355806EB19BB8F05">SatSleeve upgrader</a></p></li>
<li><p>Unzip and Run the setup file - the Upgrader program including the USB driver will be installed.</p></li>
</ul>
<p>Step 2</p>
<ul>
<li><p>Download the latest Thuraya SatSleeve firmware release to your hard disk.</p>
<p><a href="https://www.thuraya.com/-/media/thuraya/downloads/upgrades/legacy-downloads/satsleeve-for-iphone/satsleevedatathurayav301.zip?la=en&amp;hash=C2145EA5F0579D34268B8B4ED1056D3CCF3436D1">SatSleeve iPhone firmware release v3.0.1</a></p>
<p>(works only on the SatSleeve for iPhone Data model) - Unzip it.</p>
<p>Release notes of v3.0.1: Fixed GPS rollover issues</p></li>
</ul>
<p>Step 3</p>
<ul>
<li><p>Connect your SatSleeve with the PC/laptop via USB data cable.</p></li>
<li><p>You can now start the SatSleeve Upgrader program (please make sure you run this software as Administrator.</p>
<ul>
<li>The requirement is using PC with Windows 8/8.1, Windows 7 or Windows Vista, and I am using Windows 11 is working fine</li>
<li>Right click Thuraya SatSlevee Upgrader &gt; More &gt; Run as administrator</li>
</ul></li>
<li><p>and locate the firmware on your hard disk. The Upgrader program will help you through the upgrade process.</p></li>
</ul>
<p>Now, my SatSleeve is working fine on iPhone SE</p>
<p><a href="../assets/image-blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone-02.jpg" class="img-fluid"></a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <guid>https://benny.istan.to/site/blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone.html</guid>
  <pubDate>Thu, 04 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20240103-firmware-upgrade-on-thuraya-satsleeve-for-iphone-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Hourly Humidity Data</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20231015-hourly-humidity-data.html</link>
  <description><![CDATA[ 





<p><a href="../assets/image-blog/20231015-hourly-humidity-data-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20231015-hourly-humidity-data-01.jpg" class="img-fluid"></a></p>
<p>Recently, I embarked on a journey to calculate humidity data from a myriad of sources. Throughout this process, I experimented with various methods, ranging from the saturation water vapour pressure using Teten’s formula (with parameters according to Buck) to saturation over ice from Alduchov and Eskridge, and finally to Clausius-Clapeyron.</p>
<p>For those in need of hourly humidity data spanning from 1 Jan 1950 to the present, there’s good news! You can seamlessly extract this information from ERA5-Land Hourly data via Google Earth Engine (GEE). The Specific and Relative Humidity is meticulously calculated based on three core parameters: T2m (Temperature at 2 meters), Dew Point, and Surface Pressure.</p>
<p>Interested in exploring further? Check out my GEE script: <a href="https://code.earthengine.google.com/9b23f929939122fb1fdc8418d17c43f5" class="uri">https://code.earthengine.google.com/9b23f929939122fb1fdc8418d17c43f5</a></p>
<p>By the way, for those diving deep into the technicalities, the GEE script I’ve shared leans on a simpler approach, kinda like a nod to the good ol’ Magnus formula. So, it’s pretty straightforward and user-friendly</p>
<p>I hope this proves beneficial to researchers, data scientists, and enthusiasts in the realm of climatology. If you have any suggestions, feedback, or improvements, please don’t hesitate to reach out.</p>
<p><strong>Reference</strong></p>
<p>Alduchov, O. A., &amp; Eskridge, R. E. (1996). Improved Magnus form approximation of saturation vapor pressure. Journal of Applied Meteorology, 35(4), 601-609.</p>
<hr>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20231015-hourly-humidity-data.html</guid>
  <pubDate>Sun, 15 Oct 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20231015-hourly-humidity-data-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>A certified GISP</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230825-a-certified-gisp.html</link>
  <description><![CDATA[ 





<p><a href="../assets/image-blog/20230825-a-certified-gisp-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230825-a-certified-gisp-01.jpg" class="img-fluid"></a></p>
<p>Exciting news! I’m now a certified GIS Professional (GISP)</p>
<p>Big thanks to GIS Certification Institute (GISCI) for this recognition. To learn more about the GISP certification process, visit <a href="https://www.gisci.org" class="uri">https://www.gisci.org</a></p>
<p>Shoutout to my mentors at the WBG, <a href="https://www.worldbank.org/en/about/people/k/keith-patrick-garrett">Keith Garrett</a> and <a href="https://www.worldbank.org/en/about/people/b/benjamin-p-stewart">Ben Stewart</a>, for their guidance in the last 2 years.</p>
<p>Looking forward to doing more meaningful work in climate analytics and geospatial technology for greater impact! 🗺️</p>
<p><a href="../assets/image-blog/20230825-a-certified-gisp-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230825-a-certified-gisp-02.jpg" class="img-fluid"></a></p>
<p>Update: happy to received the certificate at the end of 2023</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>General</category>
  <guid>https://benny.istan.to/site/blog/20230825-a-certified-gisp.html</guid>
  <pubDate>Fri, 25 Aug 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230825-a-certified-gisp-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Monthly mosaic of modified Radar Vegetation Index</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index.html</link>
  <description><![CDATA[ 





<p><a href="../assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-02.jpg" class="img-fluid"></a></p>
<p>Few months ago, I wrote a <a href="../blog/20230614-sentinel-1-modified-radar-vegetation-index">post</a> about how to calculate The modified Radar Vegetation Index (mRVI) using Sentinel-1 satellite. It was try to extract the mRVI every Dekad with study case is Ukraine.</p>
<p>For areas in Europe, getting the S1 data every dekad is doable, but it’s bit tricky for area outside Europe. Currently, for my work, I would like to extract the mRVI for <a href="https://en.wikipedia.org/wiki/Mpumalanga">Mpumalanga</a> province in South Africa. The location is near -25.5 S, then according to below picture the revisit time is every 12 days.</p>
<p>[caption id=“” align=“alignnone” width=“3579”][<img src="https://benny.istan.to/site/assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-03.jpg" class="img-fluid" alt="Source: https://sentinel.esa.int/documents/247904/4748961/Sentinel-1-Repeat-Coverage-Frequency-Geometry-2021.jpg">](https://sentinel.esa.int/documents/247904/4748961/Sentinel-1-Repeat-Coverage-Frequency-Geometry-2021.jpg) Source: https://sentinel.esa.int/documents/247904/4748961/Sentinel-1-Repeat-Coverage-Frequency-Geometry-2021.jpg [/caption]</p>
<p>Getting monthly mosaic mRVI seems possible for case in South Africa, as if I keep the dekad, some of them will return an empty collection.</p>
<p>So, I need to modify the existing code to get the monthly list, calculate monthly mosaics, calculate monthly mean and the ratio anomaly</p>
<p>Full GEE code is here: <a href="https://code.earthengine.google.com/aea00cb8f3f1ccc921d5f6698b5c0c5a" class="uri">https://code.earthengine.google.com/aea00cb8f3f1ccc921d5f6698b5c0c5a</a></p>
<p><a href="../assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-01.jpg" class="img-fluid"></a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <guid>https://benny.istan.to/site/blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index.html</guid>
  <pubDate>Thu, 24 Aug 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230824-monthly-mosaic-of-modified-radar-vegetation-index-02.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Parsing BMKG’s daily climate data</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230822-parsing-bmkgs-daily-climate-data.html</link>
  <description><![CDATA[ 





<p>To replicate below code, please download daily climate data from BMKG Data Online <a href="https://dataonline.bmkg.go.id/home" class="uri">https://dataonline.bmkg.go.id/home</a>, just a heads up, if you haven’t already registered on the portal, you might want to do so. It’s a necessary step before you can download any data.</p>
<p>Then go to Climate Data &gt; Daily Data, choose the Station Type, Parameter, Province, Regency, Station Name and the Date Period and click Process button. You will get the data in *.xlsx format.</p>
<p>You can get one of the data example from this link: <a href="https://docs.google.com/spreadsheets/d/1xbBWeHhiMNs8IehHbsrMV9yeZlcu8GqR/edit?usp=sharing&amp;ouid=104182606454912191559&amp;rtpof=true&amp;sd=true" class="uri">https://docs.google.com/spreadsheets/d/1xbBWeHhiMNs8IehHbsrMV9yeZlcu8GqR/edit?usp=sharing&amp;ouid=104182606454912191559&amp;rtpof=true&amp;sd=true</a></p>
<p>In above example, I tried to get daily precipitation data for all station from 1 Jun 2000 - 31 Dec 2021. I would like to use it to correct the value and distribution daily <a href="https://gpm.nasa.gov/data/imerg">IMERG</a> data using bias correction method that currently I have develop.</p>
<p>Unfortunately, too many missing value.</p>
<p>I should find alternative daily timeseries precipitation data, probably gridded data will suit my objectives.</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230822-parsing-bmkgs-daily-climate-data.html</guid>
  <pubDate>Tue, 22 Aug 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>SPI-based drought characteristics</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230811-spi-based-drought-characteristics.html</link>
  <description><![CDATA[ 








<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <category>GIS</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230811-spi-based-drought-characteristics.html</guid>
  <pubDate>Fri, 11 Aug 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Fourier regression model to generate monthly to daily temperature data</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data.html</link>
  <description><![CDATA[ 





<section id="introduction" class="level3">
<h3 class="anchored" data-anchor-id="introduction">1 Introduction</h3>
<p>In the sphere of meteorology, the significance of statistical models in comprehending and forecasting diverse weather patterns is incontestable. Within this context, the Fourier regression model has emerged as a formidable asset, specifically in generating daily time series from monthly temperature data (Wilks, 1998). The model lays a robust foundation for simulating temperature patterns, yielding crucial insights that are indispensable for weather prediction, climate change studies, and managing water resources.</p>
<p>The Fourier regression model has been proven to be a highly effective tool for generating daily time series from monthly temperature data, enhancing our understanding and prediction capabilities in weather forecasting, climate change studies, and water resource management. This model’s unique ability to incorporate historical context allows it to capture intricate dependencies and transitions in temperature data, which are crucial in understanding temperature patterns.</p>
<p>By applying Fourier series, it is possible to reduce the number of parameters involved in the process, thereby simplifying complex calculations and making the model more efficient. Moreover, the Fourier regression model can seamlessly replace missing values and handle anomalies, which are often challenges in data analysis. This enables more accurate simulations and predictions, making it a vital tool in fields such as agriculture and urban planning.</p>
<p>The Fourier regression model’s success in generating daily time series from monthly temperature data not only contributes to our understanding of weather patterns but also provides practical solutions for real-world challenges, making it a powerful instrument in various domains.</p>
</section>
<section id="data" class="level3">
<h3 class="anchored" data-anchor-id="data">2 Data</h3>
<p>Over the past three decades, Bogor’s climate has remained relatively consistent. The city experiences an average annual temperature of around 26 °Celsius. The temperature varies little throughout the year, with the warmest month averaging around 27 °Celsius and the coolest month averaging around 25 °Celsius.</p>
<p>Daily temperature data of Bogor Climatological Station from 1984-2021 were used in this analysis, downloaded from BMKG Data Online in *.xlsx format. The file then manipulated by remove the logo and unnecessary text, aggregated into monthly, leaving only two columns, namely date in column A and temperature in column B for the header with the format extending downwards, and save as *.csv format.</p>
<p>The final input file is accessible via this link: <a href="https://drive.google.com/file/d/1vKT5ekDnqahkG6um5wIm-ZfhExqZTAm8/view?usp=sharing" class="uri">https://drive.google.com/file/d/1vKT5ekDnqahkG6um5wIm-ZfhExqZTAm8/view?usp=sharing</a></p>
</section>
<section id="methods" class="level3">
<h3 class="anchored" data-anchor-id="methods">3 Methods</h3>
<p>This exercise focuses on the Fourier regression model as a tool for generating daily temperature data from monthly time series (Boer, 1999).</p>
<p>Fourier series can also be employed to generate other climate data. McCaskill (1990a) utilized Fourier series regression, incorporating rainfall events to generate pan evaporation data, maximum and minimum air temperature, and daily radiation intensity (P(i)).</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-01.jpg" class="img-fluid"></a></p>
<p>where <img src="https://latex.codecogs.com/png.latex?f"> represents a rain function, <img src="https://latex.codecogs.com/png.latex?R(i+j)"> is a rain event on day <img src="https://latex.codecogs.com/png.latex?(i+j)">, and <img src="https://latex.codecogs.com/png.latex?c_j">, <img src="https://latex.codecogs.com/png.latex?l">, and <img src="https://latex.codecogs.com/png.latex?n"> are determined through regression analysis. In the context of Australia, the incorporation of rain events in the Fourier series function did not exert a significant impact, although it substantially reduced the error level of the estimated value (McCaskill, 1990a).</p>
<p>In the above equation 1, the effect of rainfall events is assumed to be additive. However, for certain regions, this rainfall event impact could be multiplicative.</p>
<p>In many cases, climate data is generally presented as monthly data, making analysis requiring daily data difficult to execute. Fourier series regression can also be used to generate daily climate data from average monthly climate data (Epstein, 1991). The equation is written as follows:</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-02.jpg" class="img-fluid"></a></p>
<p>where <img src="https://latex.codecogs.com/png.latex?t'%20=%20%5Cfrac%7B2%5Cpi%20t%7D%7B12%7D">, and <img src="https://latex.codecogs.com/png.latex?t"> is the month. This equation assumes an equal number of days in each month, which is not the case in reality. Therefore, to adjust it, the value of <img src="https://latex.codecogs.com/png.latex?t"> in the above equation is changed as the <img src="https://latex.codecogs.com/png.latex?m">-th day for the <img src="https://latex.codecogs.com/png.latex?T">-th month so that the value <img src="https://latex.codecogs.com/png.latex?t%20=%20(T-0.5)+%5Cfrac%7B(m-0.5)%7D%7BD%7D">, where <img src="https://latex.codecogs.com/png.latex?D"> is the number of days in month <img src="https://latex.codecogs.com/png.latex?T">. The use of equation 2 to create fitting lines for daily data is highly effective. The fitting lines composed from daily data and those derived from monthly data almost overlap.</p>
<p>For simulation purposes, an error component, <img src="https://latex.codecogs.com/png.latex?e(i)">, which has a normal distribution with a mean of 0 and a variance of <img src="https://latex.codecogs.com/png.latex?s%5E2"> is typically included. Thus, the data series generated by each simulation will differ but still reflect the seasonal diversity of data. Errors in climate data simulation models often autocorrelate. Therefore, the error component can be modeled using a k-th order autocorrelation function (Wannacott and Wannacott, 1987), which is:</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-03.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-03.jpg" class="img-fluid"></a></p>
<p>where <img src="https://latex.codecogs.com/png.latex?r"> is the correlation value and <img src="https://latex.codecogs.com/png.latex?w(i)"> is the random error (white noise). The simplest autocorrelation error function linearly connects the error on day <img src="https://latex.codecogs.com/png.latex?i"> with the error on day <img src="https://latex.codecogs.com/png.latex?i-1"> plus the random error on day <img src="https://latex.codecogs.com/png.latex?i"> (first-order autocorrelation function), namely:</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-04.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-04.jpg" class="img-fluid"></a></p>
<p>Therefore, if the value of <img src="https://latex.codecogs.com/png.latex?r"> is positive, the error on day <img src="https://latex.codecogs.com/png.latex?i"> tends to increase if the error on the previous day was high, and vice versa. Practically, the value of <img src="https://latex.codecogs.com/png.latex?r"> is always less than one, but its magnitude is unknown.</p>
</section>
<section id="implementation" class="level3">
<h3 class="anchored" data-anchor-id="implementation">4 Implementation</h3>
<p>In the implementation phase of this analysis, we utilized Python and the Pandas, Numpy and Matplotlib library to develop a Fourier regression model to generate daily time series from monthly temperature data.</p>
<section id="how-to" class="level4">
<h4 class="anchored" data-anchor-id="how-to">4.1 How-to?</h4>
<p>The step-by-step guide for the model is readily accessible in Google Colab, an ideal platform for data analysis and machine learning. This comprehensive how-to guide explains the entire process, starting with reshaping the data to ensure compatibility with the model, and aggregate calculation from daily to monthly, assigning monthly data across the corresponding days of the month, Fourier Series Modeling and Coefficient Extraction, Temperature estimation using Fourier Coefficients, Autocorrelated Error Calculation and Final estimates adjusted temperature..</p>
<p>4.1.1 Reshape the data</p>
<p>The first step in our analysis involves pre-processing and reshaping the data to fit the requirements of the subsequent statistical modeling. Our raw temperature data, originally in a CSV file, consists of daily temperature readings recorded over several years. In this data, dates are represented in a ‘YYYY-MM-DD’ format. However, for our analysis, we require the ‘day of the year’ and the ‘year’ as separate variables.</p>
<p>We start by loading the data into a Pandas DataFrame. Next, we convert the ‘date’ column into a datetime format using the pd.to_datetime() function, which facilitates date-specific manipulations. This allows us to extract the ‘day of the year’ and the ‘year’ information from each date and store these in new columns titled ‘dayofyear’ and ‘year’, respectively.</p>
<p>Since we have multiple temperature readings per day, we average these readings for each day of the year across all years. We do this by grouping the data by ‘dayofyear’ and ‘year’, and then calculating the mean temperature for each group using the groupby() and mean() functions.</p>
<p>However, this leaves us with a long format DataFrame, where each row represents a day of a particular year. For easier visualization and modeling, we convert this into a wide format DataFrame, where each column represents a year and each row represents a day of the year. This transformation is performed using the unstack() function.</p>
<p>Lastly, we reset the DataFrame index for neatness and compatibility with future operations. The resulting DataFrame is saved into a new CSV file. This reshaping of data forms the foundation for our Fourier regression model and helps ensure accuracy and efficiency in the subsequent analysis.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-05.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-05.jpg" class="img-fluid"></a></p>
<p><strong>Table 1.</strong> Reshape data from long to wide format</p>
<p>4.1.2 Daily to Monthly</p>
<p>In addition to the daily analysis, we decided to explore the temperature trends on a monthly basis. The process for reshaping the data for monthly temperature averages mirrors the daily approach.</p>
<p>First, we load the raw temperature data from a CSV file into a Pandas DataFrame and convert the ‘date’ column to a datetime format. With the datetime format, we’re able to extract the ‘month’ and ‘year’ from each date, creating new columns for each.</p>
<p>As with the daily analysis, we handle multiple temperature readings per day by averaging these for each month of each year. We achieve this by grouping the data by ‘month’ and ‘year’, then calculating the mean temperature for each group.</p>
<p>To facilitate further analysis and visualization, we convert this long format DataFrame to a wide format DataFrame, with each column representing a year and each row representing a month. This is done using the unstack() function.</p>
<p>After resetting the DataFrame index for better data structure, we save the resulting DataFrame into a new CSV file. This CSV file contains average monthly temperatures over the years and will be useful for understanding broader temperature trends and providing context to our Fourier regression model.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-06.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-06.jpg" class="img-fluid"></a></p>
<p><strong>Table 2.</strong> Monthly average of temperature</p>
<p>4.1.3 Assigning monthly data into across the corresponding days of the month</p>
<p>In order to prepare our dataset for Fourier regression modeling, we need to map the average monthly temperature values to their corresponding days of the year. This step is crucial as it enables the creation of a continuous time series from the previously calculated monthly averages.</p>
<p>We begin this process by defining the number of days in each month, differentiating between leap and non-leap years. Then, we create a new DataFrame, dayofyear_df, with a ‘dayofyear’ column that sequentially enumerates each day of the year from 1 to 366. A binary ‘leap’ column is also added to indicate if the day corresponds to a leap year.</p>
<p>To map the ‘dayofyear’ to the corresponding month, we create a ‘month’ column using np.repeat() to repeat the month index according to the number of days in each month. This column is then adjusted for non-leap years.</p>
<p>The average monthly temperatures, stored in monthly_avg_df, are merged with the dayofyear_df DataFrame, repeating each monthly average across the corresponding days of the month. As a result, we obtain a DataFrame with daily granularity, which contains the corresponding average monthly temperature for each day.</p>
<p>We then handle the 366th day of non-leap years, setting the temperature to NaN, as it doesn’t exist in those years.</p>
<p>Finally, we remove the unnecessary ‘month’ and ‘leap’ columns, reset the index, and save this DataFrame into a new CSV file. This final, reshaped DataFrame serves as our input for the Fourier regression modeling, enabling us to predict temperatures at a daily level from average monthly temperatures.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-07.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-7"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-07.jpg" class="img-fluid"></a></p>
<p><strong>Table 3.</strong> Assigning monthly data into daily</p>
<p>4.1.4 Fourier series modeling and coefficient extraction</p>
<p>The next step in the analysis process involves fitting a Fourier series to our daily temperature data. The Fourier series is a mathematical tool used for analyzing periodic functions, making it suitable for modeling periodic patterns in weather data like temperature.</p>
<p>To begin, we first load the reshaped DataFrame containing the daily average temperatures. Next, we define a Fourier function, specifying the form it should take. The function is expressed in terms of trigonometric terms (cosine and sine functions) and includes coefficients that we aim to estimate (a0, a1, b1, a2, b2).</p>
<p>To perform this estimation, we iterate over each year in the DataFrame. For each year, we calculate new variables ‘T’, ‘m’, ‘D’, and ‘t’. These variables represent respectively the month, the day of the month, the number of days in the month, and a transformed time index (where each month is considered as a unit time interval). We exclude data points with NaN or infinite values.</p>
<p>We then utilize the curve_fit function from the scipy.optimize module to fit the Fourier function to the temperature data for each year. This function returns the optimal values for the coefficients a0, a1, b1, a2, and b2 that best fit the data.</p>
<p>In cases where there’s insufficient data to fit the Fourier series, we handle the errors and assign NaN values to the coefficients for that year.</p>
<p>Once we obtain the coefficients for each year, we save this data into a new CSV file. This file will then be used to generate our Fourier regression model and perform temperature estimation. The generated Fourier coefficients provide insights into the amplitude and phase of the cyclical patterns in the temperature data.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-08.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-8"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-08.jpg" class="img-fluid"></a></p>
<p><strong>Table 4.</strong> Fourier coefficient</p>
<p>4.1.5 Temperature estimation using Fourier coefficient</p>
<p>Having determined the coefficients of the Fourier series for each year, we can now use these coefficients to generate temperature estimates. This step entails constructing a time series model for the daily temperatures based on the Fourier series.</p>
<p>We start by loading the DataFrame that contains the Fourier coefficients for each year. These coefficients were calculated in the previous step and are used to define the form of the Fourier series for each year.</p>
<p>Our next task is to create a new DataFrame, ‘temp_estimates’, to store our estimated temperatures. This DataFrame is initially populated with a ‘dayofyear’ column, containing each day of the year (from 1 to 367).</p>
<p>We then iterate over each year in our coefficients DataFrame. For each year, we create a separate DataFrame ‘year_df’ and calculate the transformed time index ‘t’ just as we did when fitting the Fourier series. This time index is used as the input to our Fourier function.</p>
<p>Next, we use our Fourier function, along with the coefficients for the current year, to calculate the estimated temperature for each day of that year. These estimated temperatures are then added as a new column in the ‘year_df’ DataFrame, with the column name being the current year.</p>
<p>We repeat this process for all years in our dataset, merging the temperature estimates for each year into the ‘temp_estimates’ DataFrame.</p>
<p>Finally, we save these temperature estimates to a new CSV file. The end result of this process is a DataFrame that provides a day-by-day estimate of the temperature for each year based on the Fourier regression model. These estimates serve as the basis for our subsequent analysis and allow us to visualize and quantify the cyclical patterns present in the temperature data.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-09.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-9"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-09.jpg" class="img-fluid"></a></p>
<p><strong>Table 5.</strong> Temperature estimates</p>
<p>4.1.6 Autocorrelated error calculation</p>
<p>This code calculates the autocorrelated error between the observed temperature and the estimated temperature from the Fourier model for each year, and stores the errors in a dataframe.</p>
<p>Firstly, we load the wide-format data and the estimated temperature data. Then, we specify an autocorrelation factor (r), which is a parameter that describes the correlation between values of the error at different points in time.</p>
<p>We loop over each year from 1986 to 2022, and for each year we:</p>
<p>Calculate the difference between the observed and estimated temperatures to get the error.</p>
<p>Generate a sequence of random numbers from a normal distribution, called white noise.</p>
<p>Compute the autocorrelated error. The error for the first day is simply the white noise, and for each subsequent day, the error is the autocorrelation factor multiplied by the previous day’s error, plus the white noise for that day.</p>
<p>Finally, we create a DataFrame from the dictionary of autocorrelated errors, and save it to a CSV file.</p>
<p>This autocorrelated error represents the error in our model’s estimate that cannot be explained by the model itself, but rather depends on previous errors. This could be due to factors that we did not include in our model, such as atmospheric conditions or climate change. By including this autocorrelation in our analysis, we can better understand and model these unexplained variations in temperature.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-10.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-10"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-10.jpg" class="img-fluid"></a></p>
<p><strong>Table 6.</strong> Autocorrelated error</p>
<p>4.1.7 Final estimates adjusted temperature</p>
<p>We integrated the autocorrelated error into our temperature estimates to generate a more refined model of temperature estimation. With this data in place, we transformed our wide-format data into a long-format data frame. Each row of this data frame represented a specific day from a specific year, containing information on the date, observed temperature, estimated temperature, error, and the adjusted estimated temperature (estimate + error).</p>
<p>This transformed format provided us with a holistic and granular view of our data, suitable for subsequent detailed analyses. Once the transformation was complete, the data was saved into a CSV file, enabling easy access for further research or data visualization tasks.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-11.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-11"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-11.jpg" class="img-fluid"></a></p>
<p><strong>Table 7.</strong> Final adjusted temperature</p>
</section>
<section id="jupyter-notebook" class="level4">
<h4 class="anchored" data-anchor-id="jupyter-notebook">4.2 Jupyter Notebook</h4>
<p>The provided code shows the process on how we can use monthly temperature data to generate daily time series temperature using Fourier regression models. Here’s a summary of what each part of the code does: <a href="https://gist.github.com/bennyistanto/a9e6045a78b230dbd5c443a0e0e4fa41" class="uri">https://gist.github.com/bennyistanto/a9e6045a78b230dbd5c443a0e0e4fa41</a></p>
</section>
</section>
<section id="results" class="level3">
<h3 class="anchored" data-anchor-id="results">5 Results</h3>
<p>The graphical visualization of the estimated daily temperature against the observed temperature provided a robust means of evaluating the efficacy of the Fourier model across the study period (1986-2021) at the Bogor Climatological Station. The estimated temperature, generated from the Fourier model, was superimposed onto a scatter plot of the observed temperatures. The latter were smoothed using the nonparametric LOESS technique to discern major trends within the data.</p>
<p>Each subplot delineated a separate year’s worth of data, allowing for an insightful year-to-year examination of the model’s performance. The observed temperatures were presented as scatter points, with the LOESS smoothing line capturing the general pattern of the temperature across different days of the year.</p>
<p>The comparison between the observed temperature trends and the estimates from the Fourier model revealed a substantial degree of congruence, indicating the model’s reliability in predicting daily temperature patterns. The Fourier model demonstrated a commendable ability to generate daily temperature estimates from monthly data. This affirms the model’s utility in climatological studies, particularly when granular daily data are not readily available.</p>
<p>Above code will produce a chart visualization below</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-13.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-12"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-13.jpg" class="img-fluid"></a></p>
<p><strong>Picture 1.</strong> Observed temperature vs Estimated temperature, year-by-year</p>
<p>In this exercise, we implemented a method to compute autocorrelated errors between observed and estimated temperatures from 1986 to 2022. We began by loading our dataset, after which we defined an autocorrelation factor, a parameter that reveals the correlation between different points in time within the error series.</p>
<p>Each year’s temperature discrepancy was calculated and white noise, a random sequence derived from a normal distribution, was added. We then introduced the autocorrelation factor into the errors, with the first day’s error being only the white noise. The error for subsequent days factored in both the white noise and a portion of the previous day’s error, weighted by the autocorrelation factor.</p>
<p>Post computation, we organized the autocorrelated errors into a dictionary, subsequently transforming it into a DataFrame for further analysis. This dataset of autocorrelated errors presents the unexplained variance within our model, potentially stemming from unaccounted factors such as climate changes or certain atmospheric conditions.</p>
<p>Finally, we integrated these autocorrelated errors with our estimated temperatures, resulting in an adjusted and more refined temperature prediction. This revised dataset was then visualized in a time series plot, allowing for a comparative analysis between observed and adjusted estimated temperatures. The plot revealed a clear upward trend in the temperature at the Bogor Climatological Station from 1986 to 2021. Notably, the application of autocorrelated errors offered an excellent fit to the observed temperatures, thus affirming the effectiveness of our model.</p>
<p>Above code will produce a chart visualization below</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-14.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-13"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-14.jpg" class="img-fluid"></a></p>
<p><strong>Picture 2.</strong> Observed temperature vs Estimated temperature with error</p>
<p>The Fourier model’s estimates were further scrutinized by integrating autocorrelated errors into the calculations. This facilitated the generation of a modified temperature prediction that comprised the original estimate and the error. Upon examination, the plots vividly displayed a comprehensive juxtaposition of this adjusted forecast against the observed data for each year from 1986 to 2021.</p>
<p>Notably, the error-adjusted estimates, visualized through LOESS-smoothed lines, revealed minor disparities compared to the original model predictions. These charts underscored the pertinence of accommodating inherent model errors and substantiated the robustness of the Fourier model’s initial estimates. The insights derived from this comparison can guide further refinements to the model for superior accuracy in future temperature estimates.</p>
<p>Above code will produce a chart visualization below</p>
<p><a href="../assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-12.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-14"><img src="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-12.jpg" class="img-fluid"></a></p>
<p><strong>Picture 3.</strong> Observed temperature vs Estimated temperature with error, year-by-year</p>
</section>
<section id="conclusion" class="level3">
<h3 class="anchored" data-anchor-id="conclusion">6 Conclusion</h3>
<p>The application of the Fourier regression model for generating daily time series data from monthly temperature observations has demonstrated considerable efficacy in climatological studies. This modeling approach provides a mathematically rigorous way to interpolate intra-monthly fluctuations, leveraging periodicity inherent in annual temperature patterns. The model is thus capable of filling data gaps and offering granular insights into day-to-day temperature variations, a granularity that monthly data alone cannot provide.</p>
<p>Notably, the model’s provision for the autocorrelation of errors adds an additional layer of realism to the estimations, acknowledging the dependence of errors on preceding values. This factor makes the model responsive to the serial correlation often seen in climatic data, enhancing its predictive capabilities.</p>
<p>In conclusion, Fourier regression modeling serves as an invaluable tool for climatologists, offering an effective means of generating daily time series data from sparse or aggregated observations. Through its utilization, it is possible to acquire more detailed insights into temperature dynamics, paving the way for refined climate studies, policy formulation, and mitigation strategies against climatic anomalies. The model’s robustness, flexibility, and accommodating nature towards error correlation further enhance its applicability, making it a staple in the data-driven examination of climate patterns.</p>
</section>
<section id="references" class="level3">
<h3 class="anchored" data-anchor-id="references">7 References</h3>
<p>Epstein, E.S. 1991. On Obtaining daily climatological values from monthly means. J. Climate 4:465-368. <a href="https://doi.org/10.1175/1520-0442(1991)004%3C0365:OODCVF%3E2.0.CO;2" class="uri">https://doi.org/10.1175/1520-0442(1991)004%3C0365:OODCVF%3E2.0.CO;2</a></p>
<p>Boer, R., Notodipuro, K.A., Las, I. 1999. Prediction of Daily Rainfall Characteristics from Monthly Climate Indices. RUT-IV report. National Research Council, Indonesia.</p>
<p>Castañeda-Miranda, A., Icaza-Herrera, M. de, &amp; Castaño, V. M. (2019). Meteorological Temperature and Humidity Prediction from Fourier-Statistical Analysis of Hourly Data. Advances in Meteorology, 2019, 1–13. <a href="https://doi.org/10.1155/2019/4164097" class="uri">https://doi.org/10.1155/2019/4164097</a></p>
<p>Hernández-Bedolla, J.; Solera, A.; Paredes-Arquiola, J.; Sanchez-Quispe, S.T.; Domínguez-Sánchez, C. A Continuous Multisite Multivariate Generator for Daily Temperature Conditioned by Precipitation Occurrence. Water 2022, 14, 3494. <a href="https://doi.org/10.3390/w14213494" class="uri">https://doi.org/10.3390/w14213494</a></p>
<p>McCaskill, M.R. 1990. An efficient method for generation of full climatological records from daily rainfall. Australian Journal of Agricultural Research 41, 595-602. <a href="https://doi.org/10.1071/AR9900595" class="uri">https://doi.org/10.1071/AR9900595</a></p>
<p>Parra-Plazas, J., Gaona-Garcia, P. &amp; Plazas-Nossa, L. Time series outlier removal and imputing methods based on Colombian weather stations data. Environ Sci Pollut Res 30, 72319–72335 (2023). <a href="https://doi.org/10.1007/s11356-023-27176-x" class="uri">https://doi.org/10.1007/s11356-023-27176-x</a></p>
<p>Srikanthan, R., &amp; McMahon, T. A. (2001). Stochastic generation of annual, monthly and daily climate data: A review. Hydrology and Earth System Sciences Discussions, 5(4), 653-670. <a href="https://doi.org/10.5194/hess-5-653-2001" class="uri">https://doi.org/10.5194/hess-5-653-2001</a></p>
<p>Stern, R. D., &amp; Coe, R. (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society. Series A (General), 147(1), 1-34. <a href="https://doi.org/10.2307/2981736" class="uri">https://doi.org/10.2307/2981736</a></p>
<p>Wannacott, T.H. abd R.J. Wannacott. 1987. Regression: A second course in statistics. Robert E. Krieger Publishing, Co.&nbsp;Florida.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Research</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data.html</guid>
  <pubDate>Sun, 02 Jul 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230702-fourier-regression-model-to-generate-monthly-to-daily-temperature-data-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Regression analysis with dummy variables</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230616-regression-analysis-with-dummy-variables.html</link>
  <description><![CDATA[ 





<p>This exercise aims to determine the best reduced model (RM) in regression analysis with dummy variables from annual rainfall data and altitude data in three different regions. This will result in a new regression equation capable of describing the relationship between altitude and rainfall in these three regions.</p>
<section id="summary" class="level3">
<h3 class="anchored" data-anchor-id="summary">Summary</h3>
<p>The dummy variables constructed in this article are based on regional location, specifically regions 1, 2, and 3. The initial analysis entailed the representation of data for each region through scatter plots. Thereafter, a regression analysis with all parameters was conducted to derive the Full Model (FM). Subsequently, the scatter plot patterns for each region were examined, and regression equation models with identical intercepts or slopes were identified. The objective was to generate simpler regression models or equations (Reduced Models - RM) from the dummy variables constructed based on regional location. Upon obtaining several RMs, all were statistically tested using the F-test to ascertain their similarity to the FM. It was also necessary to compute and analyze the Mallows’s Cp value for all RMs to determine the optimal RM. A good Reduced Model is one that is similar or identical to the Full Model. The F-test performed in this report was designed to determine whether the RM is similar or identical to the FM. The hypotheses for the F-test were as follows.</p>
<p>H0: FM = RM</p>
<p>H1: FM ≠ RM</p>
<p>The null hypothesis is refuted in instances where the observed F-value surpasses the F-table value. This suggests that, as of yet, there’s insufficient robust evidence to proclaim that the Reduced Model (RM) bears resemblance to the Full Model (FM) (Kutner et al., 2005). Within the conducted F-test analysis, a confidence interval of 95% is employed. The observed F-value can be calculated using the following formulation (Kutner et al., 2005):</p>
<p><img src="https://latex.codecogs.com/png.latex?F_%7B%5Ctext%7BObserved%7D%7D%20=%20%5Cfrac%7B(SSR_%7BFM%7D%20-%20SSR_%7BRM%7D)/(df_%7BR,FM%7D%20-%20df_%7BR,RM%7D)%7D%7BSSE_%7BFM%7D/df_%7BE,FM%7D%7D"></p>
<p>The F-table value is derived from the F-distribution with the calculated degrees of freedom</p>
<p><img src="https://latex.codecogs.com/png.latex?F_%7B%5Ctext%7Btable%7D%7D%20=%20F(df_%7BR,FM%7D%20-%20df_%7BR,RM%7D,%20df_%7BE,FM%7D)"></p>
<p>The RM is considered as efficient or akin to the FM if the Mallows’s Cp value is equal to or less than the total number of parameters (<img src="https://latex.codecogs.com/png.latex?C_%7BP,%5Ctext%7BMallows%7D%7D%20%5Cleq%20p">) (Mallows, 1973). The Cp value is determined using the equation:</p>
<p><img src="https://latex.codecogs.com/png.latex?C_%7BP,%5Ctext%7BMallows%7D%7D%20=%20p%20+%20%5Cfrac%7B(S%5E2%20-%20%5Csigma%5E2)(n%20-%20p)%7D%7B%5Csigma%5E2%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?p"> represents the number of parameters utilized in the RM, <img src="https://latex.codecogs.com/png.latex?n"> denotes the total number of observations within the model (<img src="https://latex.codecogs.com/png.latex?n=45">), <img src="https://latex.codecogs.com/png.latex?S%5E2"> is the variance of the RM, and <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> is the variance of the FM.</p>
</section>
<section id="data" class="level3">
<h3 class="anchored" data-anchor-id="data">Data</h3>
<p>Total rainfall in different altitude and region. The data available in csv format with columns: altitude; rainfall; region</p>
<p>The data for this analysis is available from this link: <a href="https://drive.google.com/file/d/1v3CGHBykg3UUqjKS3oyy8rIGsogN1DY5/view?usp=sharing" class="uri">https://drive.google.com/file/d/1v3CGHBykg3UUqjKS3oyy8rIGsogN1DY5/view?usp=sharing</a></p>
</section>
<section id="implementation" class="level3">
<h3 class="anchored" data-anchor-id="implementation">Implementation</h3>
<p>In the implementation phase of this analysis, we utilized Python and the library to develop dummy variables regression</p>
<section id="plot-the-input-data" class="level4">
<h4 class="anchored" data-anchor-id="plot-the-input-data">Plot the input data</h4>
<p>The code presented aims to investigate the relationship between rainfall and altitude across different regions. The dataset, obtained from a CSV file, contains information on rainfall and altitude for various regions. The code utilizes the <code>pandas</code> library to read the data and <code>matplotlib</code> and <code>seaborn</code> libraries for data visualization.</p>
<p>To begin, unique regions in the dataset are identified. A dictionary, <code>regression_params</code>, is created to store the coefficients of the regression equations for each region. Subsequently, a scatter plot is generated for each region, where altitude is plotted on the x-axis and rainfall on the y-axis. This is achieved using the <code>sns.scatterplot</code> function from the <code>seaborn</code> library.</p>
<p>A linear regression model is then fitted to the data for each region using the <code>LinearRegression</code> class from the <code>sklearn.linear_model</code> module. The model is trained with altitude as the predictor variable (<code>X</code>) and rainfall as the target variable (<code>y</code>). The slope and intercept coefficients of the regression equation are obtained from the fitted model.</p>
<p>The regression coefficients are stored in the <code>regression_params</code> dictionary, associating them with their respective regions. Additionally, the regression equation is displayed on the plot for each region using the <code>ax.text</code> function. A regression line is drawn on the plot using the <code>ax.plot</code> function to visualize the relationship between altitude and rainfall.</p>
<p>The resulting plot showcases the rainfall-altitude relationship for different regions, with each region’s data points, regression line, and equation displayed. The plot is saved as an image file, and the figure is displayed for further examination.</p>
<p>Finally, the regression parameters for each region are printed to provide insights into the specific regression equations obtained. The slope and intercept values are extracted from the <code>regression_params</code> dictionary and displayed for each region.</p>
<p><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-11.jpg" class="img-fluid"> [caption id=“” align=“alignnone” width=“1920”]<img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-01.jpg" class="img-fluid" alt="For region 1, the regression equation is y = 2.43x + 996.53 For region 2, the regression equation is y = 1.99x + 1096.34 For region 3, the regression equation is y = 0.70x + 623.63"> For region 1, the regression equation is y = 2.43x + 996.53 For region 2, the regression equation is y = 1.99x + 1096.34 For region 3, the regression equation is y = 0.70x + 623.63 [/caption]</p>
<p>The provided code snippet focuses on data preprocessing and feature creation based on the information in a CSV file. It employs the pandas library for data manipulation and transformation.</p>
<p>Initially, the CSV file is read into a DataFrame using the pd.read_csv function, with the resulting DataFrame stored as df.</p>
<p>Next, several new columns are created based on the region column. These new columns serve as indicator variables to represent different regions in the dataset. Specifically, columns I1, I2, and I3 are generated using logical comparisons to check if the region value matches the respective region number. The astype(int) method is then applied to convert the resulting Boolean values to integers.</p>
<p>Similarly, additional columns H1, H2, and H3 are created by multiplying the altitude column with the corresponding indicator variables (I1, I2, and I3). This results in the creation of separate altitude columns for each region, where the altitude values are present only for the respective region and are set to zero for other regions.</p>
<p>Following this, combinations of indicator variables are generated to represent different combinations of regions. Columns I12, I13, I23, and I123 are created using logical comparisons to check if the region value matches the respective region combination. Column I123 is assigned a constant value of 1 since it represents the inclusion of all regions.</p>
<p>Similarly, new altitude columns H12, H13, H23, and H123 are created by multiplying the altitude column with the respective combination indicator variables. These columns enable the representation of altitude values for specific region combinations.</p>
<p>Lastly, the modified DataFrame is saved as a new CSV file using the to_csv function, with the file path specified and the separator set to ;. The resulting DataFrame is displayed using the df.head() method to show the first few rows of the transformed dataset.</p>
<p>In summary, this code segment demonstrates a data preprocessing step where new columns are created to represent regions and region combinations based on the original data. These transformations facilitate subsequent analysis and modeling tasks by providing a more informative and structured dataset.</p>
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-10.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-10.jpg" class="img-fluid"></a></p>
</section>
<section id="full-model-avoid-dummy-trap" class="level4">
<h4 class="anchored" data-anchor-id="full-model-avoid-dummy-trap">Full model, avoid dummy trap</h4>
<p>The Full Model regression equation doesn’t include I3 because of a technique used in regression analysis known as dummy coding. When we have a categorical variable with k levels (in this case, region with 3 levels), we need to create k-1 dummy variables to represent it in the regression model.</p>
<p>The reason for using k-1 dummy variables instead of k is to avoid the dummy variable trap, which is a scenario in which the independent variables are multicollinear. In other words, one variable can be predicted perfectly from the others.</p>
<p>In our case, I1, I2, and I3 represent the three regions. If we included all three in our model, we would have perfect multicollinearity because I3 can be perfectly predicted from I1 and I2 (if I1 = 0 and I2 = 0, then I3 has to be 1). This would make the model’s estimates unstable and uninterpretable.</p>
<p>By leaving out I3, we are implicitly choosing region 3 as the reference category. The coefficients for I1 and I2 then represent the difference in the outcome between regions 1 and 3, and regions 2 and 3, respectively.</p>
<p>If we want to make comparisons between regions 1 and 2, we can either change the reference category (by including I3 and leaving out I1 or I2 instead), or compute the difference between the I1 and I2 coefficients.</p>
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-09.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-09.jpg" class="img-fluid"></a></p>
</section>
<section id="reduced-model" class="level4">
<h4 class="anchored" data-anchor-id="reduced-model">Reduced Model</h4>
<p>Next, we are talking about creating Reduced Models (RMs) from a Full Model (FM) with dummy variables representing regions and altitude variables interacted with these region dummies. The Full Model (FM) in this context is:</p>
<p>FM: y123 = a1 I1 + a2 I2 + a3 I3 + b1 H1 + b2 H2 + b3 H3</p>
<p>where:</p>
<ul>
<li>y123 represents rainfall</li>
<li>I1, I2, I3 are dummy variables for regions 1, 2, and 3, respectively</li>
<li>H1, H2, H3 are altitude variables interacted with the respective region dummies</li>
</ul>
<p>Based on this FM, we can derive 5 different Reduced Models (RMs):</p>
<ul>
<li>RM1: Common slope across region 1 and region 2: y123 = a12 I12 + b1H1 + b2H2 + a3 I3 + b3H3</li>
<li>RM2: Common intercept and slope across all regions: y123 = a123 I123 + b123 H123</li>
<li>RM3: Common intercept and slope across region 1 and region 3, different slope for region 2: y123 = a13 I13 + b13H13 + a2I2 + b2H2</li>
<li>RM4: Common slope across region 1 and region 2, different slope for region 3: y123 = a12 I12 + b12 H12 + a3I3 + b3H3</li>
<li>RM5: Common intercept across all regions, common slope across region 1 and region 2, different slope for region 3: y123 = a1 I1 + a2 I2 + a3 I3 + b12 H12 + b3 H3</li>
</ul>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-08.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-08.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-07.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-07.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-06.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-06.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-05.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-05.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-04.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-7"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-04.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-03.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-8"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-03.jpg" class="img-fluid"></a></p>
</div>
</div>
</section>
<section id="cp-mallows" class="level4">
<h4 class="anchored" data-anchor-id="cp-mallows">Cp Mallows</h4>
<p>Next, a summary table was created to provide a succinct overview of both the Full Model (FM) and each Reduced Model (RM). This table is essential as it encapsulates vital statistical information about each model. This summary table consists of five columns: P, S, σ, n, and C_P_Mallow, and six rows corresponding to FM’,RM1,RM2’, RM3, RM4, and RM5.</p>
<p>The P denotes the number of parameters used in each model, S indicates the standard deviation of the residuals, σ represents the standard deviation of residuals for the full model, n signifies the number of observations, and C_P_Mallow represents the value of Mallow’s C_P statistic.</p>
<p>In the process of determining the effectiveness of the Reduced Models in relation to the Full Model, the Mallow’s <img src="https://latex.codecogs.com/png.latex?C_P"> statistic plays a crucial role. According to Mallows (1973), a Reduced Model can be considered comparable to the Full Model if the Mallow’s <img src="https://latex.codecogs.com/png.latex?C_P"> value is less than or equal to the number of parameters (<img src="https://latex.codecogs.com/png.latex?C_%7BP,%5Ctext%7BMallow%7D%7D%20%5Cleq%20p">). This statistic is calculated using the formula:</p>
<p><img src="https://latex.codecogs.com/png.latex?C_%7BP,%5Ctext%7BMallow%7D%7D%20=%20p%20+%20%5Cfrac%7B(S%5E2%20-%20%5Csigma%5E2)(n%20-%20p)%7D%7B%5Csigma%5E2%7D"></p>
<p>In this context, <img src="https://latex.codecogs.com/png.latex?p"> corresponds to the number of parameters used in the Reduced Model, <img src="https://latex.codecogs.com/png.latex?n"> denotes the total data observations used in the model (in this case, <img src="https://latex.codecogs.com/png.latex?n=45">), <img src="https://latex.codecogs.com/png.latex?S%5E2"> is the variance of the Reduced Model, and <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> is the variance of the Full Model. By making use of this computation, we were able to evaluate the efficiency of each Reduced Model in comparison to the Full Model, aiding in the effective and accurate analysis of our data set.</p>
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-9"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-02.jpg" class="img-fluid"></a></p>
</section>
<section id="plot-cp-mallows" class="level4">
<h4 class="anchored" data-anchor-id="plot-cp-mallows">Plot Cp Mallows</h4>
<p>Based on the provided results from the code, we can create a plot to visualize the CP Mallow statistic. The x-axis of the plot will represent the number of predictors (P), while the y-axis will represent the CP Mallow values.</p>
<p>To begin, we will draw a cross line starting from the point (1, 1) and extending to the point (n, n), where ‘n’ represents the total number of observations. This line will serve as a reference and help us identify the region of interest.</p>
<p>Next, we will plot the CP values on the y-axis corresponding to the respective number of predictors (P) on the x-axis. Each point on the plot will represent a reduced model, with the CP value indicating its performance compared to the full model.</p>
<p>To highlight the specific point that satisfies the given criteria - the lowest number of predictors (P) and falls either above or below the cross line - we can customize the marker style or color for that point. This will make it visually distinct from the other points on the plot.</p>
<p>By examining the plot, we can easily identify the reduced model that strikes a balance between simplicity (fewer predictors) and predictive power (CP Mallow value). The highlighted point will represent the optimal reduced model that meets these criteria.</p>
<p>This plot provides a visual representation of the CP Mallow statistic, allowing us to compare the performance of different reduced models and select the most appropriate one based on the desired balance between complexity and prediction accuracy.</p>
<p><a href="../assets/image-blog/20230616-regression-analysis-with-dummy-variables-12.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-10"><img src="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-12.jpg" class="img-fluid"></a></p>
<p>The execution of linear regression analysis across the three designated regions indicated a correlation between the augmentation of annual precipitation and escalating altitude. The reduced models (RM) that demonstrated a close correspondence to the full model (FM) were RM2 and RM4. However, the models that exhibited remarkable efficacy were RM2. The characteristics embodied by the first and second regions can be postulated to bear similarities, while they exhibit discernible divergence from the attributes of the third region</p>


</section>
</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Data Science</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230616-regression-analysis-with-dummy-variables.html</guid>
  <pubDate>Fri, 16 Jun 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230616-regression-analysis-with-dummy-variables-11.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Sentinel-1 modified Radar Vegetation Index</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230614-sentinel-1-modified-radar-vegetation-index.html</link>
  <description><![CDATA[ 





<p>The Sentinel-1 modified Radar Vegetation Index (RVI) based on Google Earth Engine (GEE) script below originally developed by my friend Jose Manuel Delgado Blasco (<a href="https://scholar.google.com/citations?user=TwtlI-UAAAAJ">Scholar</a>, <a href="https://it.linkedin.com/in/josemanuel-delgadoblasco">Linkedin</a>) as part of our team (GOST) activities to support during Ukraine response last year, published as GOST Public Good’s Github repo <a href="https://github.com/worldbank/GOST_SAR/tree/master/Radar_Vegetation_Index" class="uri">https://github.com/worldbank/GOST_SAR/tree/master/Radar_Vegetation_Index</a></p>
<p>The original GEE script was meant to be used only for individual updates, as time progresses and the need for vegetation monitoring continually increases, I believe it’s necessary to obtain this RVI time-series data, which can be matched with monthly rainfall time-series data for monitoring food crop phenology.</p>
<p>For this reason, I’ve added a function to mosaic every ten days and <a href="../blog/20220319-batch-task-execution-in-google-earth-engine-code-editor">batch downloading</a> if the list of data is quite extensive.</p>
<p>All credit goes to the awesome work of Jose Manuel! Hats off to him!</p>
<p>[caption id=“” align=“alignnone” width=“1277”]<img src="https://benny.istan.to/site/assets/image-blog/20230614-sentinel-1-modified-radar-vegetation-index-01.jpg" class="img-fluid" alt="RVI in Crimean Peninsula"> RVI in Crimean Peninsula [/caption]</p>
<p>Above picture is Vegetation Indices based on Sentinel-1 (generated using the GEE script), below picture is Vegetation Indices for the same period based on Sentinel-2 (generated using Climate Engine <a href="https://climengine.page.link/sZnR" class="uri">https://climengine.page.link/sZnR</a>)</p>
<p><a href="../assets/image-blog/20230614-sentinel-1-modified-radar-vegetation-index-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230614-sentinel-1-modified-radar-vegetation-index-02.jpg" class="img-fluid"></a></p>
<p>NDVI in Crimean Peninsula</p>
<p>Full GEE code is here: <a href="https://code.earthengine.google.com/62f799954525c997629cefdd435c500e" class="uri">https://code.earthengine.google.com/62f799954525c997629cefdd435c500e</a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <guid>https://benny.istan.to/site/blog/20230614-sentinel-1-modified-radar-vegetation-index.html</guid>
  <pubDate>Thu, 15 Jun 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230614-sentinel-1-modified-radar-vegetation-index-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Second-order Markov chain model to generate time series of occurrence and rainfall</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall.html</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">1 Introduction</h2>
<p>In the realm of meteorological studies, the use of statistical models is pivotal for understanding and predicting various weather phenomena. Among these models, the second-order Markov chain model has emerged as a powerful tool, particularly in generating time series of rainfall occurrence (Wilks, 1998). This model provides a robust framework for simulating rainfall patterns, offering valuable insights that are crucial for weather forecasting, water resource management, and climate change studies.</p>
<p>The second-order Markov chain model distinguishes itself from its first-order counterpart through its ability to consider not just the state of the system at the previous time step, but also the state at the time step before that. This additional layer of historical context allows the model to capture more complex dependencies and transitions in the rainfall data (Bellone et al., 2000). This enhanced capability significantly improves the accuracy of the generated time series, making it a powerful tool in the study of rainfall patterns.</p>
<p>Rainfall, as a natural phenomenon, exhibits a high degree of variability and randomness. The second-order Markov chain model, with its ability to incorporate historical context, is well-equipped to handle this variability (Hughes et al., 1999). By considering the state of the system at two previous time steps, the model can capture the inherent randomness in rainfall occurrence, thereby generating a time series that closely mirrors real-world rainfall patterns.</p>
<p>The application of the second-order Markov chain model to rainfall data is not just a theoretical exercise. The generated time series of rainfall occurrence can have practical applications in various fields. For instance, in the field of agriculture, understanding rainfall patterns can help farmers plan their planting and harvesting schedules (Rosenzweig et al., 2000). In urban planning, accurate rainfall predictions can inform the design of drainage systems to prevent flooding (Ashley et al., 2005).</p>
</section>
<section id="data" class="level2">
<h2 class="anchored" data-anchor-id="data">2 Data</h2>
<p>Over the past three decades, Bogor’s climate has remained relatively consistent. The city experiences an average annual temperature of around 26 °Celsius. The temperature varies little throughout the year, with the warmest month averaging around 27 °Celsius and the coolest month averaging around 25 °Celsius.</p>
<p>In terms of rainfall, Bogor receives an average annual precipitation of over 3,000 millimeters. The city experiences the most rainfall from November to March, with each of these months receiving over 300 millimeters of rain on average. Even in the driest months, from June to September, Bogor still receives over 100 millimeters of rain per month on average.</p>
<p>This consistent and significant rainfall, combined with the city’s warm temperatures, contributes to its lush, tropical environment. The climatic conditions of Bogor provide a rich dataset for the application of a second-order Markov chain model to generate time series of occurrence and rainfall.</p>
<p>Daily rainfall data of Bogor Climatological Station from 1984-2021 were used in this analysis, downloaded from BMKG Data Online in *.xlsx format. The file then manipulated by remove the logo and unnecessary text, leaving only two columns, namely date in column A and rainfall in column B for the header with the format extending downwards, and save as *.csv format.</p>
<p>The final input file is accessible via this link: <a href="https://drive.google.com/file/d/1molqggv9o71Z0VT50h5OvEqCxYq4Bp1Z/view?usp=sharing" class="uri">https://drive.google.com/file/d/1molqggv9o71Z0VT50h5OvEqCxYq4Bp1Z/view?usp=sharing</a></p>
</section>
<section id="methods" class="level2">
<h2 class="anchored" data-anchor-id="methods">3 Methods</h2>
<p>This exercise focuses on the second-order Markov chain model as a tool for generating rainfall occurrence probabilities and the gamma distribution for determining rainfall height (Boer, 1999).</p>
<p>The second-order Markov chain model is widely used to represent rainfall occurrence (Stern and Coe, 1984; Hann et al, 1976). In this model, rainfall occurrence on day “i” is influenced by the presence or absence of rainfall in the previous days and the day after the previous day. If rainfall on day “i” is only influenced by rainfall on the previous day, it is considered a first-order Markov chain, and if it is influenced by rainfall two days prior, it is considered a second-order Markov chain, and so on.</p>
<p>The second-order Markov chain model has been demonstrated to be effective in generating time series of rainfall occurrence (Nick and Harp, 1980; Richardson, 1981; Wilks, 1990). Furthermore, the gamma distribution is frequently utilized to determine rainfall height (Wilks, 1990). By employing both the second-order Markov chain model and gamma distribution, this study offers a comprehensive approach to generating rainfall data.</p>
<p>The combined use of the second-order Markov chain model for rainfall occurrence and gamma distribution for rainfall height provides a robust method for generating rainfall data. This approach has significant implications for various fields, such as agriculture and urban planning, where accurate rainfall data is crucial for informed decision-making.</p>
<p>In this exercise, the focus is limited to second-order Markov chains, as the analysis for lower/higher-order chains is fundamentally similar. The analysis employs the symbol 0 for non-rainy days and 1 for rainy days. The probability of rainfall on day i, given that it did not rain on the previous day and the day before previous day, is denoted as P001(i), while the probability of rain given that it rained the previous day and the day before previous day is represented as P111(i). The general form of the estimated probability of rainfall occurrence is as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7Bjkl%7D(i)%20=%20%5Cfrac%7Bn_%7Bjkl%7D(i)%7D%7Bn_%7Bjk0%7D(i)%20+%20n_%7Bjk1%7D(i)%7D%20%5Ctag%7B1%7D"></p>
<p>where njkl(i) represents the number of years in which the event l (0 or 1) occurred on day i, and the event jk (0 or 1) happened on the previous day and the day before the previous day.</p>
<section id="rainfall-occurrence-model" class="level3">
<h3 class="anchored" data-anchor-id="rainfall-occurrence-model">3.1 Rainfall occurrence model</h3>
<p>Rainfall occurrence models commonly use Fourier regression equations to predict the probability of rainfall occurrence. However, these equations can sometimes produce a fitting line with values greater than 1 or smaller than 0. To address this issue, the probability values are first transformed into a logit function gjkl(i).</p>
<p><img src="https://latex.codecogs.com/png.latex?g_%7Bjkl%7D(i)%20=%20%5Cln%5Cleft(%5Cfrac%7BP_%7Bjkl%7D(i)%7D%7B1%20-%20P_%7Bjkl%7D(i)%7D%5Cright)%20%5Ctag%7B2%7D"></p>
<p>To transform gjkl(i) back into probability values, the following equation is used:</p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7Bjkl%7D(i)%20=%20%5Cfrac%7B1%7D%7B1%20+%20%5Cexp(-g_%7Bjkl%7D(i))%7D%20%5Ctag%7B3%7D"></p>
<p>The fitting line for gjkl(i) follows the form presented by Stern and Coe (1984):</p>
<p><img src="https://latex.codecogs.com/png.latex?g_%7Bjkl%7D(i)%20=%20a_0%20+%20a_1%20%5Csin(t'(i))%20+%20b_1%20%5Ccos(t'(i))%20+%20a_2%20%5Csin(2t'(i))%20+%20b_2%20%5Ccos(2t'(i))%20%5Ctag%7B4%7D"></p>
<p>Where <img src="https://latex.codecogs.com/png.latex?t'(i)%20=%20%5Cfrac%7B2%5Cpi%20i%7D%7B365%7D"> and <img src="https://latex.codecogs.com/png.latex?i%20=%201,%202,%20%5Cldots,%20365"></p>
<p>The number of harmonics, m, can be determined using multiple regression techniques, where independent variables are introduced sequentially, starting with harmonic 1, harmonic 2, and so on until no more variance is explained by the newly introduced variable.</p>
</section>
<section id="rainfall-generation-model" class="level3">
<h3 class="anchored" data-anchor-id="rainfall-generation-model">3.2 Rainfall generation model</h3>
<p>To generate rainfall data, the probability information required is the probability of rainfall occurrence on day i, where the previous day’s occurrence is k (0 or 1), and the day before yesterday is j (0 or 1). The estimated value for gjkl(i) can be calculated if daily rainfall observation data is available.</p>
<p>For simulation purposes, probability data must be converted into occurrence data. This is done by generating random numbers from a uniform distribution U(0, 1; VanTassel et al., 1990). If the random value from the uniform distribution is smaller than the probability value, it indicates rainfall; otherwise, it indicates no rainfall. If the simulation result indicates rainfall, the next step is to generate rainfall height using theoretical distributions.</p>
<p>The next step in creating a rainfall data simulation model is to calculate the parameters of a theoretical distribution that approximates the rainfall data distribution. The Gamma distribution is widely used to describe rainfall intensity variability (Ison et al., 1971; Stern and Coe, 1984; Waggoner, 1989; Wilks, 1990). The probability density function is as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(x,%20%5Calpha,%20%5Cbeta)%20=%20%5Cfrac%7B1%7D%7B%5Cbeta%5CGamma(%5Calpha)%7D%5Cleft(%5Cfrac%7Bx%7D%7B%5Cbeta%7D%5Cright)%5E%7B%5Calpha-1%7De%5E%7B-x/%5Cbeta%7D%20%5Ctag%7B5%7D"></p>
<p>with <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> being the shape parameter and <img src="https://latex.codecogs.com/png.latex?%5Calpha"> being the scale parameter of the gamma function <img src="https://latex.codecogs.com/png.latex?%5CGamma">.</p>
<p>Several methods can be employed to estimate the values of the two parameters of the gamma distribution, one of which is the Maximum Likelihood Method. According to Shenton and Bowman (1970, as cited in Haan, 1979), the <img src="https://latex.codecogs.com/png.latex?%5Calpha"> value obtained from the Maximum Likelihood Method may still have a bias, and therefore needs to be corrected. The corrected <img src="https://latex.codecogs.com/png.latex?%5Calpha"> value, calculated using the Greenwood and Durand method, is:</p>
<p><img src="https://latex.codecogs.com/png.latex?FC_%5Calpha%20=%20%5Cfrac%7B(n%20-%203)%5Calpha%7D%7Bn%7D%20%5Ctag%7B6%7D"></p>
<p>Subsequently, the <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> parameter is calculated as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbeta%20=%20%5Cfrac%7B%5Cbar%7BX%7D%7D%7B%5Calpha%7D%20%5Ctag%7B7%7D"></p>
<p>The predicted rainfall is based on a predetermined set of patterns, P001, P010, P011 and P111, referred to as P-types. These P-types represent different combinations of previous and future rainy days and are used as event triggers in the model. The rainfall data is then separated into different seasons, DJF, MAM, JJA and SON,&nbsp; based on the month of occurrence. The model takes into account the day-to-day variability within each season.</p>
<p>The gamma distribution, parameterized by <img src="https://latex.codecogs.com/png.latex?%5Calpha"> (shape) and <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> (scale), is used to generate the predicted rainfall. <img src="https://latex.codecogs.com/png.latex?%5Calpha"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> parameters for each season are pre calculated. These parameters are fetched for each P-type and season combination.</p>
<p>The random gamma function, employed to simulate rainfall events, generates samples from a Gamma distribution. The number of samples drawn (pertaining to the size parameter in the gamma distribution) ideally aligns with the valid event days within a given season, conforming to the original precipitation data (Wilks, 2011). In essence, for each season and event type, the gamma distribution is simulated as frequently as the number of event days occurring within the season, according to the original data.</p>
<p>Although the simulated events from a uniform distribution and the derived rainfall values from the gamma distribution are not intrinsically connected in the simulation process, they both represent the same event category. To maintain consistency in the temporal distribution of events, the generated rainfall values are matched with the valid event days in the original data. This coherence in the number of samples drawn from the gamma distribution is accomplished by aligning it with the structure of the initial precipitation data, rather than the simulated occurrences. Consequently, the gamma distribution is simulated for as many instances as the number of simulated event days within the season, thereby aligning with the frequency of simulated events in the synthetic weather data. The methodology of generating synthetic weather data using stochastic processes is a widely recognized approach in atmospheric sciences (Rodriguez-Iturbe, Cox, &amp; Isham, 1987; Srikanthan &amp; McMahon, 2001).</p>
<p>For every P-type, the model iterates through each season. During each iteration, it identifies the days in the season when an event (rainfall) is predicted to occur. These are the days that have a corresponding 1 in the event data for the current P-type.</p>
<p>Once these event days are identified, the model generates rainfall values for these days using the gamma distribution with the α and β parameters for the current season. This process is repeated for all the P-types and seasons.</p>
<p>The result is a predicted rainfall dataset that takes into account the specific patterns of rainfall events and the seasonal characteristics of rainfall intensity.</p>
</section>
</section>
<section id="implementation" class="level2">
<h2 class="anchored" data-anchor-id="implementation">4 Implementation</h2>
<p>In the implementation phase of this analysis, we utilized Python and the Pandas, Numpy and Matplotlib library to develop a rainfall occurrence generation model.</p>
<section id="how-to" class="level3">
<h3 class="anchored" data-anchor-id="how-to">4.1 How-to?</h3>
<p>The step-by-step guide for the model is readily accessible in Google Colab or Jupyter Notebook, an ideal platform for data analysis and machine learning. This comprehensive how-to guide explains the entire process, starting with reshaping the data to ensure compatibility with the model, generate transition probabilities, essential for accurate predictions, calculate the number of probabilities, followed by translating these probabilities into meaningful rainfall event information. The final step involves chart generation, effectively visualizing the results for clear interpretation and analysis.</p>
<hr>
<p><strong>Configuration</strong></p>
<p>Configuration is a crucial aspect of setting up any data analysis or processing workflow. Proper configuration ensures seamless access to data, efficient execution of tasks, and smooth integration of required tools and libraries. This article covers several essential subtopics related to configuration, such as connecting Google Drive to Colab, installing packages, importing libraries, and setting up working directories.</p>
<p>Google Drive directory into Colab</p>
<p>Connecting Google Drive to Colab is a vital step when working with data stored in Google Drive. It allows us to access and manipulate files directly from our Colab notebook. To connect our Google Drive, we can use the google.colab.drive module to mount our drive, enabling seamless access to our files and folders.</p>
<p>Notes</p>
<p>This is only apply if we are working in Colab</p>
<p>Working Directories</p>
<p>Setting up working directories involves defining the input and output directory paths for our project. This ensures that our code knows where to find the input data and where to store the results. Properly organizing our working directories makes it easier to manage our project, share it with others, and maintain a clean and structured codebase.</p>
<hr>
<section id="rainfall-categorization" class="level4">
<h4 class="anchored" data-anchor-id="rainfall-categorization">4.1.1 Rainfall categorization</h4>
<p>In the first stage of the analysis, we import the data and categorize whether the day is rainy (value = 1) or sunny (value = 0).</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-01.jpg" class="img-fluid"></a></p>
</section>
<section id="function-for-transition-probabilities-order-2" class="level4">
<h4 class="anchored" data-anchor-id="function-for-transition-probabilities-order-2">4.1.2 Function for transition probabilities order 2</h4>
<p>This step explain the process to calculate the transition probabilities of weather states from one day to the next, considering the weather states of the previous two days, based on historical weather data. The weather states are represented as binary values: 0 for “Sunny” and 1 for “Rain”. The transition probabilities are calculated for eight different scenarios:</p>
<ul>
<li>P000: The probability that today is Sunny given that the day before yesterday was Sunny and yesterday was Sunny.</li>
<li>P010: The probability that today is Sunny given that the day before yesterday was Sunny and yesterday was Rain.</li>
<li>P100: The probability that today is Sunny given that the day before yesterday was Rain and yesterday was Sunny.</li>
<li>P110: The probability that today is Sunny given that the day before yesterday was Rain and yesterday was Rain.</li>
<li>P001: The probability that today is Rain given that the day before yesterday was Sunny and yesterday was Sunny.</li>
<li>P101: The probability that today is Rain given that the day before yesterday was Sunny and yesterday was Rain.</li>
<li>P011: The probability that today is Rain given that the day before yesterday was Rain and yesterday was Sunny.</li>
<li>P111: The probability that today is Rain given that the day before yesterday was Rain and yesterday was Rain.</li>
</ul>
<p>The given code defines a function calculate_transition_probabilities_orders_2_long that calculates transition probabilities based on weather conditions in a DataFrame (df). The function takes three conditions (condition1, condition2, and result) and checks if these conditions are met in consecutive rows of the DataFrame. It creates a new column with binary values indicating the occurrence of the specified conditions. NaN values are set for rows with missing data. The code then defines a list of conditions and results and iterates over them to calculate transition probabilities for each scenario. The resulting probabilities are stored in new columns in the DataFrame. The DataFrame is restructured and saved as a CSV file. Finally, the program prints ‘Completed!’ and displays a preview of the DataFrame.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-02.jpg" class="img-fluid"></a></p>
</section>
<section id="reshape-the-data" class="level4">
<h4 class="anchored" data-anchor-id="reshape-the-data">4.1.3 Reshape the data</h4>
<p>The provided code segment executes a series of steps to transform weather data from long to wide format, to make easy for further process.</p>
<p>Firstly, it generates a list of unique scenarios represented by “P” values. Subsequently, a ‘year’ column is added to the DataFrame bin_df based on the ‘date’ information. The code then iterates through each unique “P” value. For each iteration, it selects the relevant columns (‘year’, ‘day’, and the current “P” value) from bin_df while removing rows with missing values. The “P” column is renamed as ‘value’. The DataFrame is then pivoted, organizing the data with ‘day’ as the index, ‘year’ as the columns, and ‘value’ as the values. Each resulting pivoted DataFrame is saved as a CSV file, with the file name corresponding to the current “P” value.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-03.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-03.jpg" class="img-fluid"></a></p>
</section>
<section id="calculate-number-of-event" class="level4">
<h4 class="anchored" data-anchor-id="calculate-number-of-event">4.1.4 Calculate number of event</h4>
<p>This code below is responsible for calculating the total number of occurrences per day for each of the eight possible weather state transitions (P000, P001, P010, P100, P110, P101, P011, P111) over the entire period of the dataset.</p>
<p>In this context, each weather state transition represents a sequence of three consecutive days. For example, P010 represents a sequence where it was sunny two days ago, rained yesterday, and is sunny today. The weather states are represented as binary values: 0 for “Sunny” and 1 for “Rain”.</p>
<p>The code first calculates the total number of occurrences per day for each weather state transition by summing up the values in the respective columns of the binary DataFrame (bin_bin_reshape_dfxxx). It then creates a new DataFrame (num_df) that includes these totals along with the corresponding day. This DataFrame provides a daily summary of the weather state transitions for the entire period of the dataset.</p>
<p>Finally, the code saves this DataFrame to a CSV file for further analysis and previews the data. This step is crucial as it allows for the inspection of the calculated totals and ensures the data is correctly processed and ready for the next steps of the analysis.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-04.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-04.jpg" class="img-fluid"></a></p>
</section>
<section id="calculate-the-probabilities" class="level4">
<h4 class="anchored" data-anchor-id="calculate-the-probabilities">4.1.5 Calculate the probabilities</h4>
<p>This specific code block calculates the transition probabilities for each of the four possible weather state transitions where the current day is rainy (P001, P011, P101, P111) and another four where the current day is sunny (P000, P010, P110, P100).</p>
<p>The transition probabilities are calculated by dividing the total number of occurrences of each rainy/sunny weather state transition by the total number of occurrences of both the rainy and sunny weather state transitions for the same previous two days. For example, the transition probability P011 is calculated by dividing the total number of P011 occurrences by the sum of the total number of P011 and P010 occurrences.</p>
<p>The calculated transition probabilities are then stored in a new DataFrame (prob_df_xxxx), which also includes the corresponding day. This DataFrame provides a daily summary of the transition probabilities for the entire period of the dataset.</p>
<p>Finally, the code saves this DataFrame to a CSV file for further analysis and previews the data. This step is crucial as it allows for the inspection of the calculated probabilities and ensures the data is correctly processed and ready for the next steps of the analysis.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-05.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-05.jpg" class="img-fluid"></a></p>
</section>
<section id="converting-to-logit-function-and-transform-back-to-probability-value" class="level4">
<h4 class="anchored" data-anchor-id="converting-to-logit-function-and-transform-back-to-probability-value">4.1.6 Converting to logit function and transform back to probability value</h4>
<p>The code calculates Fourier coefficients and applies a logit transformation to the probability values in a pandas DataFrame prob_df. First, it modifies prob_df, replacing any instances of 0 or 1 probabilities with a small constant epsilon or 1 - epsilon respectively. This prevents errors when applying logarithms and exponentials later in the process. The script then calculates a set of variables based on the day of the year, including trigonometric functions sin_t_prime, cos_t_prime, sin_2t_prime, and cos_2t_prime based on the day of the year scaled by 2*pi/365 to reflect the cyclical nature of the calendar.</p>
<p>After that, the script computes the logit of the probabilities, g_a_df, which is the log of the odds ratio (i.e., the ratio of the probability of an event occurring to the probability of it not occurring). Fourier coefficients are calculated for each original column in prob_df. The Fourier series is a way to represent a function as a sum of periodic components, and in this context, it’s used to capture the cyclical patterns of the probabilities throughout the year.</p>
<p>Finally, the script constructs a new DataFrame result_df that includes the original probabilities, the calculated g_a_df values, fitted g_fit values (based on the Fourier series representation), and final probabilities (the inverse logit of g_a_df). This DataFrame is saved to a CSV file and then returned for review.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-06.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-06.jpg" class="img-fluid"></a></p>
</section>
<section id="visualize-the-calculated-logit-and-their-fitted" class="level4">
<h4 class="anchored" data-anchor-id="visualize-the-calculated-logit-and-their-fitted">4.1.7 Visualize the calculated logit and their fitted</h4>
<p>The script visualizes the calculated logit (g) values and their fitted counterparts (g_fit) from the result_df DataFrame for both rainy and sunny day scenarios.</p>
<p>This is accomplished by setting up a 2-row, 4-column grid of subplots. In the first row, it plots the rainy day scenarios (P_types_rainy) and in the second row, it plots the sunny day scenarios (P_types_sunny). For each scenario (rainy or sunny) and each type of day (defined by P_types), it creates a scatter plot of ‘g’ values and overlays a line plot of ‘g_fit’ values over the course of the year (represented by the ‘day’ variable).</p>
<p>The script then labels each subplot with its respective day type and scenario, sets the x and y labels, and includes a legend indicating which points represent ‘g’ and which line represents ‘g_fit’.</p>
<p>Finally, it adjusts the layout for better visualization and displays the plot. This way, it helps to analyze how well the fitted values (g_fit) are approximating the calculated logit (g) values.</p>
</section>
<section id="generate-random-numbers-from-a-uniform-distribution-to-get-the-rainfall-events" class="level4">
<h4 class="anchored" data-anchor-id="generate-random-numbers-from-a-uniform-distribution-to-get-the-rainfall-events">4.1.8 Generate random numbers from a uniform distribution to get the rainfall events</h4>
<p>This code generates random numbers from a uniform distribution for each day and compares these to our probabilities to generate the events. Events are coded as 1 for rain and 0 for no rain. The new DataFrame event_df only contains the event data, with columns named event_Pxxx as specified. The data is saved in a CSV file called events.csv.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-07.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-7"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-07.jpg" class="img-fluid"></a></p>
</section>
<section id="visualize-the-probability-of-rainfall-occurrence" class="level4">
<h4 class="anchored" data-anchor-id="visualize-the-probability-of-rainfall-occurrence">4.1.9 Visualize the probability of rainfall occurrence</h4>
<p>The given script produces a set of heatmaps to visualize event data related to different scenarios of rainfall given that the current day is rainy. The data, divided by months and days, represents whether it’s a rainy day (indicated by a color) or a sunny day (represented by a white block).</p>
<p>A heatmap is an apt choice of visualization here as it allows for an immediate visual assessment of patterns and trends in the data over a period of time (in this case, over the days of each month). Moreover, the color contrast between rainy and sunny days helps to easily distinguish between the two events. Heatmaps also excel at handling and displaying data over two dimensions (months and days, in this context), making them a clear choice for this kind of data presentation.</p>
<p>Firstly, the code defines different types of events (represented as ‘P_types’), the layout for the subplots, and the number of days in each month (accounting for leap years).</p>
<p>Then, it loops over each event type, creating a 2D array filled with NaNs to hold the event data for each day of each month. The event data is split by month and filled into this array, ensuring the correct day and month placement for each event.</p>
<p>Next, a heatmap for each event type is generated using seaborn, with a color scheme denoting the presence or absence of rainfall, and an outline for each day block to enhance readability. The heatmap’s axes and title are customized for each scenario.</p>
<p>A legend is also created to indicate the meanings of the colors in the heatmaps. The code finally adds a main title for the set of heatmaps, adjusts the layout for clear viewing, and displays the visualizations.</p>
<p>Before running below code, please make sure yopu already have “seaborn” installed. If not, please install it using “pip install seaborn”</p>
</section>
<section id="gamma-distribution" class="level4">
<h4 class="anchored" data-anchor-id="gamma-distribution">4.1.10 Gamma distribution</h4>
<p>This code analyses a dataset of rainfall patterns. It first loads the data, and prepares it by converting the ‘date’ column into a datetime format and adding a ‘month’ column. It then assigns each entry to a season (DJF, MAM, JJA, or SON) based on the month of the year. After isolating only the rainy days, the script applies a Gamma distribution model for each season’s rainfall data. The parameters (alpha and beta) of the Gamma distribution for each season are corrected for small sample sizes using the Greenwood and Durand method. These corrected parameters are then stored in a new DataFrame, which is exported as a CSV file for future use or analysis. The resulting DataFrame provides a seasonal breakdown of the rainfall data, and offers insights into how the rainfall pattern is distributed for each season.</p>
<p>Above code will produce output previews like below</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-08.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-8"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-08.jpg" class="img-fluid"></a></p>
</section>
<section id="generate-rainfall-value" class="level4">
<h4 class="anchored" data-anchor-id="generate-rainfall-value">4.1.11 Generate rainfall value</h4>
<p>Now that we have estimated the parameters for the gamma distribution for each season, and have generated event data, we can generate rainfall values based on these parameters and events.</p>
<p>The gamma distribution is only used to generate rainfall values for rainy days (where event = 1), as it is typically used to model positive continuous data, and cannot generate the zero values corresponding to non-rainy days.</p>
<p>In this script, we create a new rainfall_PXXX column for each event_PXXX column. For each season, we select the days where event_PXXX = 1, and generate rainfall values for these days using the gamma distribution with the corresponding alpha and beta parameters. These generated values are then stored in the rainfall_PXXX column. At the end, the updated DataFrame is saved to a new CSV file.</p>
<p>Here’s how we could do this for each P-type.</p>
<p>Above code will produce output previews like below.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-09.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-9"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-09.jpg" class="img-fluid"></a></p>
</section>
</section>
<section id="evaluations" class="level3">
<h3 class="anchored" data-anchor-id="evaluations">4.2 Evaluations</h3>
<p>Evaluating the quality of our predicted rainfall values depends on the specific goals of our analysis and the characteristics of our data. However, here are several common methods for evaluating prediction quality.</p>
<section id="visualize-the-rainfall-compared-to-predicted-rainfall" class="level4">
<h4 class="anchored" data-anchor-id="visualize-the-rainfall-compared-to-predicted-rainfall">4.2.1 Visualize the rainfall compared to predicted rainfall</h4>
<p>The given script produces a set of maps to visualize rainfall data compared to predicted rainfall different scenarios of rainfall given that the current day is rainy.</p>
<p>This code is meant to load, process, and plot data on annual rainfall and rainfall predictions from the years 1984 to 2021.</p>
<p>It initializes a plot with 10 rows and 4 columns to make room for a line plot for each year from 1984 to 2021. Each plot will compare actual rainfall (in light blue) with the predicted rainfall (in orange) over the course of a year.</p>
</section>
<section id="performance" class="level4">
<h4 class="anchored" data-anchor-id="performance">4.2.2 Performance</h4>
<p>Distribution of Errors (Residuals): We can plot a histogram or a Kernel Density Estimate plot of the residuals, which are the differences between the actual and predicted values. If our model is a good fit, the residuals should be normally distributed around zero.</p>
<p>Time Series of Residuals: Plotting residuals over time can show whether the errors are consistent throughout the time series, or if they vary significantly at certain time periods.</p>
<p>Boxplot of Errors by Year: This can help us see if the model’s performance varies significantly from year to year.</p>
</section>
</section>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">5 Results</h2>
<p>We delve into a comprehensive analysis of rainfall prediction and its various aspects. By examining the curve adjustment chart and transforming probabilities into rainfall events, we gain insights into the predicted outcomes. Furthermore, we assess the performance of these predictions using visual comparisons, distributed errors (residuals), time series of residuals, and boxplot of error by year. This chapter aims to elucidate the accuracy and reliability of our rainfall prediction model.</p>
<section id="adjustment-curve" class="level3">
<h3 class="anchored" data-anchor-id="adjustment-curve">5.1 Adjustment curve</h3>
<p>The scatter plot visualizes the adjustment curve for generating daily rainfall data using Fourier regression analysis. The data spans from 1984 to 2021. Each subplot corresponds to different weather patterns, characterized by the variables ‘P001’, ‘P011’, ‘P101’, ‘P111’, ‘P000’, ‘P010’, ‘P100’, and ‘P110’.</p>
<p>The top row of plots shows the fitting model for rainy days (‘P001’, ‘P011’, ‘P101’, ‘P111’). Here, the patterns in the fitted models (g_fit) align with the data generated by g_a, indicating that the Fourier model accurately captures the distribution pattern of rainfall across different types of rainy day events.</p>
<p>The second row presents the fitting model for dry days (‘P000’, ‘P010’, ‘P100’, ‘P110’). In these plots, the peak of the dry season, occurring in the June-July-August (JJA) period, is prominently reflected in the peak of the g_fit line plot. Conversely, the rainfall is lowest during this period, which is depicted as a valley in the model.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-19.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-10"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-19.jpg" class="img-fluid"></a></p>
<p>Above visualization effectively demonstrates the application and accuracy of the Fourier regression analysis in modeling and simulating daily weather patterns, both for rainy and dry conditions, over a significant period. The g_fit line plots accurately reflect the distribution patterns of the original data (g), implying that the Fourier model is a suitable tool for simulating these weather patterns.</p>
</section>
<section id="transforming-the-probability-into-rainfall-event" class="level3">
<h3 class="anchored" data-anchor-id="transforming-the-probability-into-rainfall-event">5.2 Transforming the probability into rainfall event</h3>
<p>The image is a set of four heatmaps, each representing a different scenario: ‘P001’, ‘P011’, ‘P101’, and ‘P111’. These scenarios are likely representative of different weather conditions or patterns. Each heatmap shows the pattern of rainfall across a year. The x-axis denotes the day of the month while the y-axis represents the month itself, ranging from 1 (January) to 12 (December).</p>
<p>The color intensity in each cell indicates the probability of rainfall. Darker shades symbolize a higher likelihood of rain, while lighter shades indicate a lower likelihood. This color gradient allows us to visually comprehend the variability and seasonality of rainfall across different periods of the year.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-20.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-11"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-20.jpg" class="img-fluid"></a></p>
<p>From these heatmaps, one can observe the days and months when rainfall is more or less likely, given that the day is classified as ‘rainy’. These visualizations provide an intuitive understanding of rainfall patterns and their variations throughout the year for each respective scenario.</p>
</section>
<section id="predicted-rainfall" class="level3">
<h3 class="anchored" data-anchor-id="predicted-rainfall">5.3 Predicted rainfall</h3>
<p>The daily rainfall generated by the Fourier regression model and compared with the daily observation data from the Bogor Climatology Station from 1984-2021 in the image below (example using Year 2008-200- and 2012-2013) indicates that the predicted rainfall shows rainfall values produced by the model are higher than the observation data (overrated) with a pattern that tends to be somewhat dissimilar.</p>
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-10.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-12"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-10.jpg" class="img-fluid"></a></p>
</section>
<section id="performance-1" class="level3">
<h3 class="anchored" data-anchor-id="performance-1">5.4 Performance</h3>
<p>Evaluating the quality of our predicted rainfall values depends on the specific goals of our analysis and the characteristics of our data. However, here are several common methods for evaluating prediction quality:</p>
<p>Visual comparison: It’s a plot of the predicted values against the observed values. This can give us a quick, intuitive sense of how closely our predictions match the actual values. While visually comparing the predicted and actual rainfall data is important and necessary, it is not sufficient on its own to evaluate the performance of the prediction model.</p>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-21.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-13"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-21.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-22.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-14"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-22.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-23.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-15"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-23.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-24.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-16"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-24.jpg" class="img-fluid"></a></p>
</div>
</div>
<p>Distribution of Errors (Residuals): This plot shows the distribution of residuals (errors), which are the differences between the predicted and actual rainfall values.</p>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-15.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-17"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-15.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-16.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-18"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-16.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-17.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-19"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-17.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-18.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-20"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-18.jpg" class="img-fluid"></a></p>
</div>
</div>
<p>In the context of rainfall prediction, if the residuals are normally distributed and centered around zero, it indicates that your model has made errors that are random and not biased, which is a good sign. If the distribution is not centered around zero or is highly skewed, it indicates that your model may be consistently overestimating or underestimating the rainfall.</p>
<p>Time Series of Residuals: This plot shows how residuals change over time.</p>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-25.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-21"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-25.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-26.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-22"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-26.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-27.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-23"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-27.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-28.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-24"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-28.jpg" class="img-fluid"></a></p>
</div>
</div>
<p>We should expect to see no clear pattern in the residuals over time. If we see patterns, such as the residuals increasing or decreasing over time, it suggests that our model is not capturing some trend in the data. This could indicate a problem with our model that needs to be addressed.</p>
<p>Boxplot of Error by Year: This plot shows the distribution of residuals for each year.</p>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-11.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-25"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-11.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-12.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-26"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-12.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-13.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-27"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-13.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-14.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-28"><img src="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-14.jpg" class="img-fluid"></a></p>
</div>
</div>
<p>This can help you understand if your model’s performance is consistent over time. If some years have much higher or lower residuals, it may indicate that those years had unusual rainfall patterns that your model didn’t capture. You may want to investigate further to understand what’s causing these discrepancies.</p>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">6 Conclusion</h2>
<p>Markov-chain models, when combined with Fourier regression equations and logit transformations, can be useful in estimating rainfall occurrence probabilities and generating synthetic rainfall data. This generated data can have practical applications in various fields, such as agriculture and urban planning, where accurate rainfall data is crucial for informed decision-making.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">7 References</h2>
<p>Ashley, R. M., Balmforth, D. J., Saul, A. J., &amp; Blanskby, J. D. (2005). Flooding in the future–predicting climate change, risks and responses in urban areas. Water Science and Technology, 52(5), 265-273. <a href="https://doi.org/10.2166/WST.2005.0142" class="uri">https://doi.org/10.2166/WST.2005.0142</a></p>
<p>Bellone, E., Hughes, J. P., &amp; Guttorp, P. (2000). A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts. Climate Research, 15(1), 1-12. <a href="https://www.jstor.org/stable/e24867295" class="uri">https://www.jstor.org/stable/e24867295</a></p>
<p>Boer, R., Notodipuro, K.A., Las, I. 1999. Prediction of Daily Rainfall Characteristics from Monthly Climate Indices. RUT-IV report. National Research Council, Indonesia.</p>
<p>Cho, H., K. P. Bowman, and G. R. North. 2004. A Comparison of Gamma and Lognormal Distributions for Characterizing Satellite Rain Rates from the Tropical Rainfall Measuring Mission. J. Appl. Meteor. Climatol., 43, 1586–1597. <a href="https://doi.org/10.1175/JAM2165.1" class="uri">https://doi.org/10.1175/JAM2165.1</a></p>
<p>Hughes, J. P., Guttorp, P., &amp; Charles, S. P. (1999). A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(1), 15-30. <a href="https://doi.org/10.1111/1467-9876.00136" class="uri">https://doi.org/10.1111/1467-9876.00136</a></p>
<p>Rodriguez-Iturbe, I., Cox, D. R., &amp; Isham, V. (1987). Some models for rainfall based on stochastic point processes. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 410(1839), 269-288. <a href="https://doi.org/10.1098/rspa.1987.0039" class="uri">https://doi.org/10.1098/rspa.1987.0039</a></p>
<p>Rosenzweig, C., Tubiello, F. N., Goldberg, R., Mills, E., &amp; Bloomfield, J. (2002). Increased crop damage in the US from excess precipitation under climate change. Global Environmental Change, 12(3), 197-202. <a href="https://doi.org/10.1016/S0959-3780(02)00008-0" class="uri">https://doi.org/10.1016/S0959-3780(02)00008-0</a></p>
<p>Srikanthan, R., &amp; McMahon, T. A. (2001). Stochastic generation of annual, monthly and daily climate data: A review. Hydrology and Earth System Sciences Discussions, 5(4), 653-670. <a href="https://doi.org/10.5194/hess-5-653-2001" class="uri">https://doi.org/10.5194/hess-5-653-2001</a></p>
<p>Stern, R. D., &amp; Coe, R. (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society. Series A (General), 147(1), 1-34. <a href="https://doi.org/10.2307/2981736" class="uri">https://doi.org/10.2307/2981736</a></p>
<p>VanTassell, L.W., J.W. Richardson and J.R. Conner. 1990. Simulation of meteorological data for use in agricultural production studies. Agric. System 34:319-336. <a href="https://doi.org/10.1016/0308-521X(90)90011-E" class="uri">https://doi.org/10.1016/0308-521X(90)90011-E</a></p>
<p>Waggoner, P.E. (1989). Anticipating the frequency distribution of precipitation if climate change alters its mean. Agric. For. Meteor. 47:321-337. <a href="https://doi.org/10.1016/0168-1923(89)90103-2" class="uri">https://doi.org/10.1016/0168-1923(89)90103-2</a></p>
<p>Wilks, D. S. (1990). Maximum likelihood estimation for the gamma distribution using data containing zeros. Journal of Climate, 3(12), 1495-1501. <a href="https://doi.org/10.1175/1520-0442(1990)003%3C1495:MLEFTG%3E2.0.CO;2" class="uri">https://doi.org/10.1175/1520-0442(1990)003%3C1495:MLEFTG%3E2.0.CO;2</a></p>
<p>Wilks, D. S. (1998). Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology, 210(1-4), 178-191. <a href="https://doi.org/10.1016/S0022-1694(98)00186-3" class="uri">https://doi.org/10.1016/S0022-1694(98)00186-3</a></p>
<p>Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic press. <a href="https://doi.org/10.1016/C2017-0-03921-6" class="uri">https://doi.org/10.1016/C2017-0-03921-6</a></p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Research</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall.html</guid>
  <pubDate>Fri, 26 May 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230526-second-order-markov-chain-model-to-generate-time-series-of-occurrence-and-rainfall-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Impact of climate change in cities</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230525-impact-of-climate-change-in-cities.html</link>
  <description><![CDATA[ 





<p>A new World Bank report is <a href="https://www.worldbank.org/en/publication/thriving">launched</a>, in which I had the opportunity to contribute the analysis.</p>
<p>The report examining the two-way relationship between cities and climate change, including valuable insights to help cities boost resilience and thrive, both now and in the future.</p>
<p>The team are utilizing temperature and precipitation based index: SPEI, SPI, CDD, CWD, number of annual hotdays and annual mean temperature, distance and magnitude of tropical cyclone to the city center, to support the analysis which organized into four inter-related workstreams:</p>
<ul>
<li>Who is affected?</li>
<li>Stressor that make urban development less green.</li>
<li>Stressors that make urban development less resilient.</li>
<li>Stressors that make urban development less inclusive.</li>
</ul>
<p>If you are interested to read the story and report, feel free to visit below:</p>
<ol type="1">
<li>Story: <a href="https://www.worldbank.org/en/publication/thriving" class="uri">https://www.worldbank.org/en/publication/thriving</a></li>
<li>Recorded of Live event:<a href="https://live.worldbank.org/events/thriving-making-cities-climate-ready" class="uri">https://live.worldbank.org/events/thriving-making-cities-climate-ready</a></li>
<li>Publication: <a href="https://openknowledge.worldbank.org/entities/publication/7d290fa9-da18-53b6-a1a4-be6f7421d937" class="uri">https://openknowledge.worldbank.org/entities/publication/7d290fa9-da18-53b6-a1a4-be6f7421d937</a></li>
</ol>
<p><a href="../assets/image-blog/20230525-impact-of-climate-change-in-cities-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230525-impact-of-climate-change-in-cities-01.jpg" class="img-fluid"></a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Research</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230525-impact-of-climate-change-in-cities.html</guid>
  <pubDate>Thu, 25 May 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230525-impact-of-climate-change-in-cities-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Fuzzy Inference System (FIS) for Flood Risk Assessment</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment.html</link>
  <description><![CDATA[ 





<p>1 Implementation</p>
<p>In the implementation phase of this analysis, we utilized Python and the <code>Scikit-Fuzzy</code> library to develop a fuzzy logic-based flood risk assessment model. This model took into account four essential factors affecting flood risks: precipitation intensity, soil moisture, land cover, and slope. By defining the fuzzy sets and rules for these variables, the model was able to estimate the flood risk for various combinations of input values. The ultimate goal of this implementation was to identify the conditions under which low flood risks could be achieved, even in situations where precipitation intensity was at its maximum.</p>
<p>1.1 How-to?</p>
<p>In the first stage of the analysis, we defined the variables that influence flood risk. These variables include precipitation intensity, soil moisture, land cover, and slope. Each of these variables was represented as a fuzzy variable using the <code>Scikit-Fuzzy</code> library’s <code>Antecedent</code> class. Additionally, we defined the output variable <code>flood_risk</code> using the <code>Consequent</code> class. This stage set the foundation for the fuzzy logic-based flood risk assessment model by establishing the key variables that the model would use to estimate flood risk.</p>
<p>In the second and third stages, we focused on defining the fuzzy sets and their respective membership functions for each of the variables defined in the first stage. We used the <code>automf()</code> function to automatically generate triangular membership functions for precipitation intensity, soil moisture, and flood risk, each with three levels: <code>low</code>, <code>medium</code>, and <code>high</code>. For the land cover and slope variables, we manually defined triangular membership functions, specifying the appropriate ranges for each fuzzy set (<code>urban</code>, <code>vegetation</code>, and <code>bare_soil</code> for land cover, and <code>flat</code>, <code>moderate</code>, and <code>steep</code> for slope). These stages were critical for establishing the relationships between the input variables and the output flood risk, which would later be used to evaluate different combinations of input values in the fuzzy inference process.</p>
<p>In the fourth stage, we defined the rules that describe the relationships between the input variables (precipitation intensity, soil moisture, land cover, and slope) and the output variable (flood risk). We first created a list of classifications for each input variable and the output variable. Using the multiplication principle, we calculated the total number of possible combinations of these classifications, resulting in 81 unique rules.</p>
<p>For each combination of input classifications, we determined the appropriate flood risk level based on a set of predefined conditions. These conditions were based on expert knowledge and domain understanding, considering factors such as high precipitation and soil moisture, bare soil land cover, and steep slopes. After determining the flood risk level for each combination, we created a fuzzy rule using the <code>Scikit-Fuzzy</code> library’s <code>Rule</code> class, linking the input conditions with the corresponding flood risk level. These rules formed the basis of the fuzzy inference system that was used to evaluate different scenarios and estimate the corresponding flood risks.</p>
<p>In the fifth stage, we created the control system and simulation by combining the defined rules from the previous stage. The <code>Scikit-Fuzzy</code> library’s <code>ControlSystem</code> and <code>ControlSystemSimulation</code> classes were used for this purpose. The <code>ControlSystem</code> class takes the set of rules as input and initializes the fuzzy inference system, while the <code>ControlSystemSimulation</code> class initializes a simulation environment that can be used to compute the output based on the input values.</p>
<p>In the sixth stage, we provided example input values for each input variable (precipitation, soil moisture, land cover, and slope) to test the fuzzy inference system. The input values were assigned to their corresponding input variables in the simulation, and the compute method of the <code>ControlSystemSimulation</code> object was called to perform the fuzzy inference process and obtain the output flood risk level.</p>
<p>In the final stage, we output the computed flood risk level and visualize the result using the <code>Scikit-Fuzzy</code> library’s built-in plotting capabilities. The flood risk level was displayed as a numerical value, while the visualization provided a graphical representation of the membership functions and the defuzzified output. This allowed us to assess the performance of the fuzzy inference system and analyze the relationships between the input variables and the flood risk.</p>
<p>This will returned:</p>
<pre><code>Flood Risk Value: 83.33333333333336
Flood Risk Category: high</code></pre>
<p>And a plot below</p>
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-01.jpg" class="img-fluid"></a></p>
<p>The initial implementation of the fuzzy inference system for flood risk assessment has been completed successfully. By providing example input values for precipitation (<code>100</code>), soil moisture (<code>50</code>), land cover (<code>25</code>), and slope (<code>30</code>) in Stage 6, we have demonstrated the functionality of the fuzzy system. The system processes these inputs through the defined membership functions, rules, and defuzzification methods to produce an output flood risk value and the corresponding flood risk category.</p>
<p>Upon evaluating the system with the given input values, a flood risk value is generated, and the <code>flood_risk.view(sim=flood_risk_sim)</code> function provides a visual representation of the output. The plot displays the aggregated output membership functions and indicates the defuzzified crisp value. In this case, the plot reflects the flood risk level based on the provided inputs, and the computed flood risk category helps to understand the risk associated with the given conditions. With this initial implementation, we have set the foundation for further analyses and can adapt or extend the fuzzy system as needed to address specific flood risk assessment scenarios.</p>
<p>1.2 Plot the membership function of the input variables</p>
<p>The provided code visualizes the membership functions for each of the input variables (Precipitation Intensity, Soil Moisture, Land Cover, and Slope) and the output variable (Flood Risk Level) in the fuzzy inference system. Here’s a summary of what each part of the code does:</p>
<ul>
<li><code>precipitation.view(sim=flood_risk_sim)</code>: Plots the membership functions for the Precipitation Intensity variable, displaying how the input values are categorized into low, medium, and high precipitation levels.</li>
<li><code>soil_moisture.view(sim=flood_risk_sim)</code>: Plots the membership functions for the Soil Moisture variable, showing how the input values are categorized into low, medium, and high soil moisture levels.</li>
<li><code>land_cover.view(sim=flood_risk_sim)</code>: Plots the membership functions for the Land Cover variable, illustrating how the input values are categorized into low, medium, and high land cover levels.</li>
<li><code>slope.view(sim=flood_risk_sim)</code>: Plots the membership functions for the Slope variable, demonstrating how the input values are categorized into low, medium, and high slope levels.</li>
<li><code>flood_risk.view()</code>: Plots the membership functions for the output variable, Flood Risk Level, indicating how the output values are categorized into low, medium, and high flood risk levels.</li>
<li><code>flood_risk.view(sim=flood_risk_sim)</code>: Plots the final Flood Risk Level for given input values, illustrating how the fuzzy inference system computes the flood risk based on the input variable values and the defined fuzzy rules.</li>
</ul>
<div class="image-gallery">
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-04.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-04.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-05.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-05.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-06.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-06.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-07.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-07.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-08.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-08.jpg" class="img-fluid"></a></p>
</div>
<div class="gallery-item">
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-09.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-7"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-09.jpg" class="img-fluid"></a></p>
</div>
</div>
<p>To interpret the plots, observe how each input variable is divided into categories (low, medium, high) based on the membership functions. These categories represent the degree to which an input value belongs to a particular category.</p>
<p>The output variable plot shows how the flood risk levels are determined based on the input variables’ membership values and the fuzzy rules defined in the system. The final plot, Flood Risk Level for Given Input Values, displays the aggregated output membership functions and the computed flood risk level as a single value.</p>
<p>1.3 2D Plot</p>
<p>The provided code generates a 2D contour plot of flood risk as a function of Precipitation Intensity and Land Cover, while fixing the values of Soil Moisture and Slope. Here’s a summary of what each part of the code does:</p>
<ul>
<li>Create grid points for input variables: Define a range of values for each input variable (Precipitation Intensity, Soil Moisture, Land Cover, and Slope) using <code>np.linspace()</code>.</li>
<li>Define <code>compute_flood_risk()</code> function: This function takes Precipitation Intensity (P), Soil Moisture (M), Land Cover (L), and Slope (S) as inputs and computes the flood risk using the fuzzy inference system (<code>flood_risk_sim</code>).</li>
<li>Fix Soil Moisture and Slope values: Assign fixed values to Soil Moisture (M_fixed) and Slope (S_fixed).</li>
<li>Create flood risk matrix: Initialize a matrix with the size of the combination of Precipitation Intensity (P_values) and Land Cover (L_values). Iterate through each combination of these values and compute the flood risk using the <code>compute_flood_risk()</code> function with the fixed values of Soil Moisture and Slope.</li>
<li>Plot the 2D contour plot: Using <code>plt.contourf()</code>, create a contour plot that visualizes the flood risk as a function of Precipitation Intensity and Land Cover. The color map ‘viridis’ is used to represent the flood risk levels, with 20 contour levels.</li>
<li>Add colorbar, labels, and title: Add a colorbar to represent the flood risk values, label the axes, and add a title that includes the fixed values of Soil Moisture and Slope.</li>
</ul>
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-10.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-8"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-10.jpg" class="img-fluid"></a></p>
<p>To interpret the plot, observe how the flood risk values change as the Precipitation Intensity and Land Cover values vary. The plot shows how the flood risk is influenced by these two input variables while keeping the other two (Soil Moisture and Slope) fixed at specific values.</p>
<p>The contour lines in the plot represent different levels of flood risk, with the color intensity indicating the flood risk level. Darker colors represent lower flood risk, and lighter colors represent higher flood risk.</p>
<p>1.4 3D Plot</p>
<p>The provided code generates a 3D surface plot of flood risk as a function of Precipitation Intensity and Land Cover, while fixing the values of Soil Moisture and Slope. Here’s a summary of what each part of the code does:</p>
<ul>
<li>Create a 3D plot figure: Initialize a new figure using <code>plt.figure()</code> and add a 3D subplot with the <code>projection='3d'</code> argument.</li>
<li>Create the 3D surface plot: Use the <code>ax.plot_surface()</code> function to create a 3D surface plot for Precipitation Intensity (Y-axis) vs Land Cover (X-axis), with the flood risk as the Z-axis. The color map ‘viridis’ is used to represent the flood risk levels.</li>
<li>Add colorbar, labels, and title: Add a colorbar to represent the flood risk values, label the axes, and add a title that includes the fixed values of Soil Moisture and Slope.</li>
</ul>
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-11.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-9"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-11.jpg" class="img-fluid"></a></p>
<p>To interpret the plot, observe how the flood risk values (Z-axis) change as the Precipitation Intensity and Land Cover values (X and Y axes) vary. The plot shows how the flood risk is influenced by these two input variables while keeping the other two (Soil Moisture and Slope) fixed at specific values. The color intensity on the surface indicates the flood risk level, with darker colors representing lower flood risk and lighter colors representing higher flood risk.</p>
<p>The 3D surface plot provides a more detailed visualization of the relationship between flood risk, Precipitation Intensity, and Land Cover compared to the 2D contour plot. You can observe the shape of the surface to identify areas with high or low flood risk and better understand the interaction between the input variables.</p>
<p>2 Minimizing Flood Risks under Maximum Precipitation</p>
<p>This chapter emphasizes the focus on reducing flood risks under the most challenging conditions (maximum precipitation) while highlighting the three main variables (soil moisture, land cover, and slope) being examined in the analysis.</p>
<p>Flood risk management is a critical aspect of urban planning and environmental protection. Understanding the factors that contribute to flood risks and identifying strategies to minimize these risks is essential for creating resilient communities. In this analysis, we explore the relationships between four key variables - precipitation intensity, soil moisture, land cover, and slope - to determine their influence on flood risk. <strong>Our goal is to identify the combinations of these variables that result in low flood risks, even under conditions of maximum precipitation.</strong></p>
<p>Using a fuzzy logic-based simulation model, we examine the interactions between these variables and their impact on flood risk. The model incorporates expert knowledge and rule-based systems to predict flood risk levels based on various input scenarios. By analyzing the simulation results, we aim to provide insights into the conditions that can effectively mitigate flood risks, helping policymakers and urban planners make informed decisions for better flood management strategies.</p>
<p>The analysis includes a scatterplot matrix visualization that highlights the relationships between soil moisture, land cover, and slope under maximum precipitation conditions. By interpreting this matrix, we can identify patterns and correlations between these variables that contribute to lower flood risks. These insights will help guide future efforts in designing urban areas and implementing flood management measures that are both effective and sustainable.</p>
<p>The provided code performs a sensitivity analysis to minimize flood risks under maximum precipitation conditions. It evaluates flood risk categories based on all input variables, generates a dataset of data points with different combinations of soil moisture, land cover, and slope values, and finally creates a scatterplot matrix. Here’s a summary of what each part of the code does:</p>
<ul>
<li>Define the maximum precipitation intensity: Set the value of <code>max_precipitation</code> to 100, which is considered high precipitation intensity.</li>
<li>Define the flood risk category function: Create a function <code>get_flood_risk_category()</code> that takes precipitation, soil moisture, land cover, and slope as input variables, and returns the flood risk category using the previously defined <code>categorize_flood_risk()</code> function.</li>
<li>Generate data points for soil moisture, land cover, and slope: Create arrays of evenly spaced values for each of these input variables.</li>
<li>Iterate through all combinations of soil moisture, land cover, and slope: For each combination, use the maximum precipitation value and the <code>get_flood_risk_category()</code> function to obtain the flood risk category. If the category is not <code>None</code>, append the combination to the <code>data_points</code> list.</li>
<li>Create a DataFrame containing the data points: Convert the list of data points into a pandas DataFrame, which makes it easier to analyze and visualize the data.</li>
<li>Create a scatterplot matrix: Use the seaborn library’s <code>pairplot()</code> function to create a scatterplot matrix of the data points, with flood risk categories represented by different colors.</li>
</ul>
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-10"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-02.jpg" class="img-fluid"></a></p>
<p>Here’s how to read the scatterplot matrix:</p>
<ul>
<li>The diagonal plots (from the top-left to the bottom-right) are bar plots showing the distribution of each variable. These plots give an idea of the frequency of different values for each variable when the flood risk is low under maximum precipitation conditions.</li>
<li>The off-diagonal plots are scatter plots showing the relationships between pairs of variables. These plots help identify any patterns or correlations between the variables. The color of the dots indicates the flood risk category associated with each data point.</li>
<li>In the off-diagonal plots, if you see that dots of a specific color (in this case, low flood risk) are clustered in a particular region, it indicates that certain combinations of variables are more likely to result in low flood risk conditions.</li>
</ul>
<p>To interpret the plots, consider the following:</p>
<ul>
<li>In the scatterplot between soil moisture and land cover, if there is a pattern or a specific region where low flood risk dots are clustered, it would suggest that there’s a relationship between these two variables that contributes to lower flood risks under maximum precipitation conditions.</li>
<li>Similarly, in the scatterplot between soil moisture and slope, look for clusters or patterns of low flood risk dots to identify any relationships between these variables that contribute to lower flood risks.</li>
<li>Finally, in the scatterplot between land cover and slope, examine the distribution of low flood risk dots to determine if there’s a connection between these variables that leads to lower flood risks.</li>
</ul>
<p>By analyzing these plots, we can gain insights into the relationships between soil moisture, land cover, and slope that contribute to low flood risks even under maximum precipitation conditions.</p>
<p>3 Sensitivity Analysis</p>
<p>After obtaining the flood risk assessment results from the Fuzzy Inference System (FIS), we can assess the quality of the model by comparing its predictions to observed data. To do this, we’ll need a dataset containing historical flood events along with the corresponding values of the input variables (Precipitation Intensity, Soil Moisture, Land Cover, and Slope).</p>
<p><strong>What if observation data on flood events never exist?</strong></p>
<p>If we don’t have any observation data to compare the FIS model results, evaluating the model’s performance becomes more challenging. However, we can still follow some steps to ensure that your FIS model is reasonable and plausible:</p>
<ul>
<li>Expert knowledge: Consult with experts in the field of flood risk assessment to ensure that your fuzzy sets, membership functions, and fuzzy rules are realistic and based on sound principles. This can help us refine our FIS model even without actual observation data.</li>
<li>Sensitivity analysis: Perform a sensitivity analysis to understand how the output flood risk varies with changes in input variables. By altering the input variables within their expected range and studying the corresponding changes in flood risk, we can gain insight into the behavior of the model and identify any unrealistic responses.</li>
<li>Comparison with other models: If there are other flood risk assessment models available (either deterministic or statistical), compare your FIS model’s predictions with those from the other models. Although this is not a direct comparison with observed data, it can provide some indication of how your model’s performance compares to alternative approaches.</li>
<li>Simulation data: If we have access to hydrological or hydraulic models that can simulate flood events, we can use the simulated data as a proxy for observed data. Although this approach has its limitations, as the simulated data may not perfectly represent real-world conditions, it can still provide valuable information for evaluating your FIS model.</li>
<li>Temporal validation: If we have historical data for some of the input variables but not for the flood risk, we can still evaluate your FIS model by analyzing its performance over time. For instance, we can assess whether the model’s predictions of high flood risk align with periods of heavy rainfall, high soil moisture, or other conditions known to increase flood risk.</li>
</ul>
<p>Remember that without observed data, it is more challenging to assess the performance of your FIS model accurately. However, following the steps outlined above can help us gain some confidence in your model and identify areas for potential improvement.</p>
<p><strong>Let’s try Sensitivity Analysis</strong></p>
<p>We’ll use the One-at-a-time (OAT) sensitivity analysis method to understand the effect of varying each input variable while keeping the others fixed. Assume that we have the FIS model already built and implemented in Python using the variables and fuzzy rules defined earlier.</p>
<p>This code performs a sensitivity analysis to study the relationship between input variables (Precipitation, Soil Moisture, Land Cover, and Slope) and the output variable (Flood Risk) in a FIS. It evaluates the FIS model for different values of the input variables, keeping the other input variables at their median values.</p>
<p>Here is a summary of the main steps in the code:</p>
<ul>
<li>Define the range and step size for each input variable.</li>
<li>Calculate the flood risk for each input variable using the FIS model. The sensitivity_analysis function iterates over different values of each input variable while keeping the other input variables fixed at their median values.</li>
<li>Categorize the flood risk levels (low, medium, high) based on the computed flood risk values.</li>
<li>Plot the sensitivity analysis results, showing how flood risk varies with changes in the input variables.</li>
</ul>
<p>The plot consists of four subplots, one for each input variable, with flood risk on the y-axis and the input variable on the x-axis. The background of each plot is filled with colors corresponding to the flood risk categories (low, medium, and high). The data points are plotted with different markers (‘o’, ‘s’, ‘x’) based on the input variable’s categories (low, medium, and high).</p>
<p><a href="../assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-03.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-11"><img src="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-03.jpg" class="img-fluid"></a></p>
<p>To interpret the plot, observe how the flood risk changes as the input variable value increases or decreases. A steep slope in the plot indicates that the flood risk is highly sensitive to changes in the input variable. If the flood risk remains relatively constant despite changes in the input variable, it suggests that the flood risk is less sensitive to that input variable.</p>
<p>To understand the meaning of the plot, consider that it represents how much the flood risk is affected by each input variable, given that other input variables are kept constant. By analyzing the plot, you can identify which input variables have a more significant impact on flood risk and prioritize interventions or mitigation strategies accordingly.</p>
<p>4 Summary</p>
<p>Flood risk assessment is a critical component of disaster management and urban planning. Accurate and reliable flood risk estimation helps authorities make informed decisions, prioritize resources, and implement effective mitigation strategies. With the increasing impacts of climate change and urbanization, there is a growing need for advanced techniques that can provide better insights into flood risk under varying conditions.</p>
<p>Fuzzy Inference Systems (FIS) offer a robust and flexible approach to model complex relationships between multiple input variables and an output variable, such as flood risk. By incorporating expert knowledge and handling uncertainties, FIS models can capture the intricacies of real-world systems, providing more accurate and reliable estimates of flood risk compared to traditional methods.</p>
<p>FIS models have gained popularity in the field of hydrological modeling and flood risk assessment due to their ability to handle imprecise and incomplete data, as well as their capability to incorporate human reasoning and intuition in the form of linguistic rules. This ability to integrate expert knowledge with quantitative data provides a valuable advantage, especially in situations where data availability is limited or uncertain.</p>
<p>The utilization of FIS in flood risk assessment typically involves defining input variables that influence flood risk, such as precipitation intensity, soil moisture, land cover, and slope. These variables are then used to estimate the flood risk level, which can be categorized into different levels, such as low, medium, or high.</p>
<p>To build an FIS model for flood risk assessment, the first step is to identify relevant input variables and their value domains. Next, fuzzy sets and membership functions are defined for each variable, followed by the formulation of fuzzy rules that describe the relationship between input variables and flood risk. These rules are derived from expert knowledge or empirical data and are used to determine the output flood risk level.</p>
<p>One of the critical aspects of FIS models is their ability to handle uncertainties and vagueness in the input data. This is particularly important in the context of flood risk assessment, where data can be scarce or subject to significant measurement errors. By using fuzzy sets and membership functions, FIS models can accommodate these uncertainties, providing more reliable and robust estimates of flood risk.</p>
<p>Sensitivity analysis is a valuable tool for evaluating the performance of FIS models in flood risk assessment. By varying input variables within their expected range and studying the corresponding changes in flood risk, modelers can gain insight into the behavior of the model and identify any unrealistic responses or potential areas for improvement.</p>
<p>FIS models can be further enhanced by incorporating optimization techniques to identify the most critical factors contributing to flood risk. This can help decision-makers focus on specific areas or interventions that have the most significant impact on reducing flood risk and improving overall resilience.</p>
<p>One of the challenges in applying FIS models for flood risk assessment is the lack of observed data for model validation. In such cases, the performance of the model can be evaluated using expert knowledge, sensitivity analysis, comparison with other models, or the use of simulated data from hydrological or hydraulic models.</p>
<p>In conclusion, FIS provides a promising approach for flood risk assessment, offering a flexible and robust framework for modeling complex relationships and handling uncertainties. By incorporating expert knowledge and quantitative data, FIS models have the potential to significantly improve our understanding of flood risk and support more effective decision-making in disaster management and urban planning.</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Data Science</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment.html</guid>
  <pubDate>Sat, 15 Apr 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230415-fuzzy-inference-system-fis-for-flood-risk-assessment-01.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Visualising the WRF output</title>
  <dc:creator>Benny Istanto</dc:creator>
  <link>https://benny.istan.to/site/blog/20230401-visualising-the-wrf-output.html</link>
  <description><![CDATA[ 





<p><strong>1 Introduction</strong></p>
<p>The Weather Research and Forecasting (WRF) model is a powerful numerical weather prediction system used to simulate atmospheric phenomena at various scales. WRF produces a significant amount of output data that can provide valuable information for meteorological and climatological research, weather forecasting, and environmental management.</p>
<p>WRF output data is typically stored in netCDF files, which contain multiple variables with different units and dimensions. However, the netCDF files generated by WRF are not always following the Climate and Forecast (CF) metadata convention, which can make it difficult to interpret the data.</p>
<p>The CF metadata convention provides a standardized way of describing the metadata and units of variables in netCDF files. Adhering to this convention makes it easier to interpret the data and compare it with other datasets. However, the WRF output files are often not fully CF-compliant.</p>
<p>Visualizing the output from the WRF model can help researchers and practitioners to gain insights into various meteorological and climatological phenomena, including temperature, wind, precipitation, cloud cover, and atmospheric pressure. Visualization is an important step in understanding the data and extracting meaningful insights. Visualization can help identify patterns, trends, and anomalies in the data that might not be apparent from raw numerical output.</p>
<ul>
<li>Time series plots are a common way to visualize the temporal evolution of variables such as temperature, precipitation, and wind speed. These plots can reveal patterns and trends in the data and help to identify anomalies or outliers.</li>
<li>Contour plots can be used to visualize the spatial distribution of variables such as temperature, pressure, and precipitation. These plots can show the magnitude and direction of the variables and help to identify patterns and features such as fronts, ridges, and troughs.</li>
<li>Maps are a common way to visualize the spatial distribution of variables over a region of interest. Maps can be used to display variables such as temperature, precipitation, wind speed, and cloud cover, and can help to identify patterns and features such as mountains, coastlines, and rivers.</li>
<li>Animations can be created from the WRF output data to visualize the temporal and spatial evolution of variables over a specific period. Animations can be useful for identifying trends, patterns, and anomalies in the data and for communicating the results to a wider audience.</li>
<li>3D visualizations can be used to represent the three-dimensional structure of atmospheric phenomena such as clouds and fronts. These visualizations can provide a more detailed and realistic representation of the data and help to identify features such as updrafts, downdrafts, and vortices.</li>
</ul>
<p>There are several tools available for visualizing WRF output, ranging from simple plotting libraries to more advanced graphical user interfaces.</p>
<ul>
<li>One of the most popular tools for visualizing WRF output is the NCAR Command Language (NCL). NCL is a programming language designed specifically for scientific data analysis and visualization. NCL provides a powerful set of tools for working with NetCDF files, including the ability to subset and manipulate data, generate contour plots and maps, and create animations.</li>
<li>Python is a popular programming language for data processing, analysis, and visualization. Python libraries such as Matplotlib, Cartopy, and Basemap can be used to create a wide range of visualizations from the WRF output data.</li>
<li>Xarray is another popular Python library that can be used to handle and visualize multi-dimensional datasets. Xarray provides a powerful set of functions for data manipulation, analysis, and visualization and can be used to create a variety of plots and maps.</li>
<li>R is a statistical programming language that can also be used for data processing, analysis, and visualization. R provides a wide range of packages for creating static and interactive visualizations, including ggplot2, lattice, and Shiny.</li>
<li>Commercial software packages like ArcGIS and open-source Geographic Information System (GIS) software such as QGIS can be used to visualize WRF output data on maps and perform spatial analysis. QGIS provides a user-friendly interface for creating maps, visualizing data, and conducting geospatial analysis.</li>
</ul>
<p>Regardless of the tool used, there are some key considerations to keep in mind when visualizing WRF output. These include choosing appropriate color maps, selecting appropriate contour intervals, and ensuring that the data is presented in a clear and understandable way.</p>
<p>It is also important to consider the spatial and temporal scales of the data when visualizing WRF output. For example, high-resolution data may require different visualization techniques than coarser resolution data, and data spanning multiple time scales may require animations or time series plots.</p>
<p>Visualization of WRF output is not only important for understanding the data but also for communicating results to stakeholders and decision-makers. Effective visualization can help convey complex scientific information in a way that is easily understood by non-experts.</p>
<p>Overall, visualization is a crucial step in the WRF modeling process, enabling scientists and researchers to extract meaningful insights from the vast amounts of data generated by the model. While there are many tools and techniques available for visualizing WRF output, it is important to choose the most appropriate tool for the specific task at hand and to consider best practices for data visualization to ensure clear and accurate representation of the data.</p>
<p><strong>2 New CF based NetCDF files from native WRFOUT NetCDF files</strong></p>
<p>The WRFOUT files created by the WRF model are not the most straightforward to interpret, and pose several challenges. These files include a series of three-dimensional fields that cover a specific region over a specific period of time, which can be used to analyze different meteorological phenomena, such as temperature, pressure, wind speed and direction, clouds, and precipitation.</p>
<p>However, WRF output data does not always follow the CF convention, especially when dealing with large datasets. This makes it difficult to interpret and analyze the data in its raw format, without the ability to visualize it. Additionally, the files can be very large and contain many variables related to the WRF simulation that may not be necessary for a research project.</p>
<p>Modifying the WRF registry can help to address some of these issues by allowing the user to change the variables included in the WRFOUT files. This can be a tedious and potentially messy process though, as some variables may be included in WRFOUT for reasons that the user may never know. As such, it is important to consider the potential unintended consequences of removing certain variables. Furthermore, the variables will still be on the staggered grid and on the model vertical levels, so visualization can still be challenging.</p>
<p>To address this issue, users can use the NCL to translate the netCDF output into a format that follows the CF convention. This can help to standardize the metadata and units of the variables, making it easier to interpret and compare the data.</p>
<p>wrfout_to_cf is an NCL based script designed to create CF compliant NetCDF files with user selectable variables, time reference, vertical levels, and spatial and temporal subsetting. This script is designed to be a simple, user-flexible post-processing utility and is particularly useful for research projects because it produces output files that are more convenient to work with.</p>
<p>However, wrfout_to_cf can be a relatively inefficient post-processing utility and may not be suitable for all applications. It is important to consider the pros and cons of each post-processing utility to determine which one is the best fit for any given research project</p>
<ol type="1">
<li>After running the simulation, WRF will produce three output files, you can check it by typing command in your simulation folder:</li>
</ol>
<p>It will return</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-01.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-01.jpg" class="img-fluid"></a></p>
<p>Download wrfout_to_cf.ncl from <a href="https://sundowner.colorado.edu/wrfout_to_cf/wrfout_to_cf.ncl" class="uri">https://sundowner.colorado.edu/wrfout_to_cf/wrfout_to_cf.ncl</a> and put it in the same folder with WRFOUT file. If you check one of file output using below:</p>
<p>You will get responses and see the netCDF structure, but it is not easy to understand.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-02.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-02.jpg" class="img-fluid"></a></p>
<p>Start to convert the native WRFOUT netCDF file to new CF based netCDF file using below command:</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-03.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-03.jpg" class="img-fluid"></a></p>
<p>You will find a new file wrfpost.nc in the folder. Let check it using ncdump command below:</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-04.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-04.jpg" class="img-fluid"></a></p>
<p>After you get the wrfpost.nc in place, you are ready to visualize it using various tools.</p>
<p><strong>3 Visualizing the Wind</strong></p>
<p>With 3 days of hourly information, there are several ways to visualize wind speed and direction to best illustrate the information.</p>
<p>Let’s start writing the code using Python.</p>
<p>After importing the library and defining the data, we can start visualizing the wind data.</p>
<p>Here are some options:</p>
<p><strong>3.1 Time series plot</strong></p>
<p>We can create a time series plot showing the variation of wind speed and direction over the three-day period. This type of plot is useful for identifying patterns or trends in the data over time. We can use Python libraries such as Matplotlib or Seaborn to create time series plots.</p>
<p>Above code will produce a map below.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-05.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-05.jpg" class="img-fluid"></a></p>
<p>As an alternative, we can have other visualizations for each grid as a line over time.</p>
<p>Above code will produce a chart below.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-06.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-06.jpg" class="img-fluid"></a></p>
<p><strong>3.2 Wind rose plot</strong></p>
<p>A wind rose plot can be used to show the distribution of wind direction and speed over the three-day period. This type of plot is useful for identifying the prevailing wind direction and speed. You can use Python libraries such as Windrose or Matplotlib to create wind rose plots.</p>
<p>Above code will produce a chart below.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-07.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-7"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-07.jpg" class="img-fluid"></a></p>
<p><strong>3.3 Contour plot</strong></p>
<p>A contour plot can be used to show the spatial distribution of wind speed and direction at a particular time during the three-day period. This type of plot is useful for identifying areas with high or low wind speed and direction. You can use Python libraries such as Matplotlib, Cartopy, or Basemap to create contour plots.</p>
<p>Above code will produce a map below.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-08.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-8"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-08.jpg" class="img-fluid"></a></p>
<p><strong>3.4 Streamline plot</strong></p>
<p>&nbsp;A streamline plot can be used to show the flow of wind direction and speed at a particular time during the three-day period. This type of plot is useful for identifying the flow of wind over a geographic area. You can use Python libraries such as Matplotlib, Cartopy, or Basemap to create streamline plots.</p>
<p>Above code will produce a map below.</p>
<p><a href="../assets/image-blog/20230401-visualising-the-wrf-output-09.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-9"><img src="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-09.jpg" class="img-fluid"></a></p>
<p>Overall, the choice of visualization method will depend on the specific research question or application, and the intended audience. It may be useful to try different visualization methods and compare the results to determine which method is most effective for the task at hand.</p>
<p><strong>4 Notebook</strong></p>
<p>The compilation for all the above code available as a notebook and hosted here: <a href="https://gist.github.com/bennyistanto/3f7877b44eebaf0db5e37fa8e7b8603a" class="uri">https://gist.github.com/bennyistanto/3f7877b44eebaf0db5e37fa8e7b8603a</a></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Remote Sensing</category>
  <category>Research</category>
  <category>Climate</category>
  <guid>https://benny.istan.to/site/blog/20230401-visualising-the-wrf-output.html</guid>
  <pubDate>Sat, 01 Apr 2023 00:00:00 GMT</pubDate>
  <media:content url="https://benny.istan.to/site/assets/image-blog/20230401-visualising-the-wrf-output-01.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
