Week 8 | Exploring Different Deep Network Architectures


Now that we have gathered plenty of information on predicting wildfire spread with Unet, it is time to start exploring other deep learning architectures for this problem.

Published on November 05, 2023 by Bronte Sihan Li

Wildfire Deep Learning machine learning AI UNet Transformer Segmentation

2 min READ

In the past few weeks, we have experimented with different ways to improve the performance of our Unet model. We have tried different loss functions, different optimizers, and different input features. We have also explored the potential of using attention based Unet and residual connections. Now that we have gathered plenty of information on predicting wildfire spread with Unet, it is time to start exploring other deep learning architectures for this problem. From looking at the current benchmarks for semantic segmentation, we have identified a few potential architectures that we can explore for this problem.

Swin-Unet

Swin-Unet[1] is a semantic segmentation model that combines the Swin Transformer with Unet. The Swin Transformer is a transformer based encoder-decoder proposed for medical image segmentation, which has been shown to outperform many other convolution based or hybrid models. This network uses shifted window to adapt Transformer to vision tasks and allows for efficient computation of self-attention within non-overlapping local windows. With the symmetric structure of Swin Transformer, the model can be easily adapted to our problem of modeling the spread of wildfires at t+1. We plan on using the implementation available here.

BEiT2

Another potentially promising network to use is the BEiT2 v2 models that is currently the state-of-the-art in semantic segmentation for ADE2K. The BEiT2 model[3] is a transformer based architecture that leverages semantic rich visual tokenizer that can enhance global semantic representation on images. The idea here is we could utilize the pre-trained models available and attempt to fine-tune the model for wildfire spread prediction.

Diffusion Based Generative Approach

Diffusion models have shown a lot of promise in the past few years especially in SOTA performance on image generation and constructing multi-modal distributions. With this, there are also recent developments in using diffusion for autoencoders as generation is believed to facilitate learning of latent representations and therefore enhance true understanding of image data. We are interested in exploring the potential of using diffusion based generative models for wildfire spread prediction. We hope to start with this paper[2] and assess the feasibility of using this approach for our problem, due to the intuition that the spread of wildfire is a generative process.

References

  1. Wei, C., Mangalam, K., Huang, P.(., Li, Y., Fan, H., Xu, H., Wang, H., Xie, C., Yuille, A.L., & Feichtenhofer, C. (2023). Diffusion Models as Masked Autoencoders. ArXiv, abs/2304.03283.
  2. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2021). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. ECCV Workshops.
  3. Bao, H., Dong, L., & Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. ArXiv, abs/2106.08254.