Commentary - (2022) Volume 15, Issue 11
Received: 07-Nov-2022, Manuscript No. jcsb-22-85004;
Editor assigned: 08-Nov-2022, Pre QC No. P-85004;
Reviewed: 21-Nov-2022, QC No. Q-85004;
Revised: 26-Nov-2022, Manuscript No. R-85004;
Published:
03-Dec-2022
, DOI: 10.37421/0974-7230.2022.15.446
Citation: Foster, Evison. “Conflicting Multi Transmitter Configurations for Inter Remote Sensing Information Retrieval.” J Comput Sci
Syst Biol 15 (2022):446.
Copyright: © 2022 Foster E. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
In recent years, remote sensing technology has advanced rapidly. It powers a wide range of civilian and military applications due to the deployment of quantitative and qualitative sensors, as well as the evolution of powerful hardware and software platforms. As a result, large data volumes suitable for a wide range of applications, such as monitoring climate change, become available. However, processing, retrieving, and mining large amounts of data is difficult. Typically, content-based remote sensing image (RS) retrieval approaches use a query image to find relevant images in a dataset. Crossmodal representations based on text-image pairs are becoming popular for increasing the flexibility of the retrieval experience. Combining text and image domains is considered one of the next frontiers in RS image retrieval [1-3]. However, due to the visual-sematic disparity between language and vision worlds, aligning text to the content of RS images is particularly difficult. In this paper, we propose various architectures for text-to-image and image-to-text retrieval based on vision and language transformers. Extensive experimental results are reported and discussed for four different datasets, namely TextRS, Merced, Sydney, and RSICD.
With rapid advancements in Earth observation sensors, more information about the Earth's surface is now available at higher spatial, spectral, and temporal resolutions, resulting in massive growth in the remote sensing (RS) image archive. This massive amount of data has completely altered our perspective on monitoring the Earth's surface and has opened up new possibilities for a wide range of specialised applications. However, as the volume of data grows, it becomes more difficult to extract a specific piece of information from such a large amount of data. As a result, content-based image retrieval (CBIR), or the task of retrieving an image that is best described by a specific query, is critical. CBIR systems, in general, consist of two major steps: feature extraction and similarity matching. The first step extracts useful feature representations from an archive of images, while the similarity measure quantifies the similarity between a query image and the images in this archive. As a result, the performance of retrieval systems is heavily dependent on the extracted feature quality as well as the similarity measure used for matching. Early research on RS image retrieval focused primarily on the use of handcrafted features to represent the visual content of images.
However, manually designed features may fall short of producing a powerful representation capable of describing its detailed content. However, recent advances based on deep-learning methods have resulted in significant improvements in the accuracy of CBIR systems, as well as other RS applications such as crop mapping from image time series, tree species classification, and cloud change detection, to name a few. First and foremost, creating an informative modality-specific representation is a critical step in image retrieval. Despite the popularity of retrieval works based on traditional deep learning models, these models produce a global semantic representation that ignores spatial relationships between image regions [4,5].
This is especially important in cross-modal text-to-image retrieval, which requires modelling the image's global semantic concepts as well as the text description that goes with it. Second, one of the most difficult challenges when dealing with cross-modal data is determining how to learn joint representations and close the heterogeneity gap between multi-modal pairs. Text-to-image retrieval necessitates the accurate alignment of visual and textual data representations, as well as the modelling of the relationships between each image and its corresponding text. Third, the quality of the dataset is crucial in training an accurate cross-modal retrieval method. Text-to-image retrieval methods in RS typically repurpose existing image captioning datasets. Unlike natural image datasets, these datasets are relatively small in size and contain more complex, detailed images. Furthermore, each dataset has a different number of captions per image, and many of these captions are redundant and incomplete.
None.
Authors declare no conflict of interest.
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at
Journal of Computer Science & Systems Biology received 2279 citations as per Google Scholar report