Skip to main content

Synthetic Data as a tool for Earth Observation

By April 12, 2022August 21st, 2023No Comments

With constantly increasing remote sensing observation of the Earth, the ability to use AI to analyze and extract knowledge from 2D and 3D content is critical for optimizing the use and value of the petabytes of sensor data that will accumulate over time. Synthetic data is an important tool in the design, training, and validation of AI to analyze real sensor content.

TLDR version: Sign up to try and experiment with synthetic data for yourself!

Synthetic Data as a tool for Earth Observation,
Synthetic RGB and IR remote sensing example imagery produced in

In 2020, estimates of the amount of satellite imagery collected ranged around 100TB per day. If a human analyst could process 1GB of data per day, then at that rate, it would have taken an analyst approximately 5 working years to process a single day of collected global imagery data.

Since then, the expansion of satellite capabilities into lidar, radar, and compound multimodal sensor arrays has exponentially increased the amount of data collected. With an estimated 2000 to 3000 additional Earth Observation (EO) sensors being deployed in space by the end of the decade, the concept of manual human processing of satellite data, not to mention drone, aerial, and every other type of EO data hitting the market is not just outdated, but absurd.

Limitations of AI

Artificial Intelligence (AI) is the tool that is being embraced most frequently as having hopes of automating manually intensive tasks across many data and information domains. However, AI has some unique constraints that impact effectiveness in the EO domain:

  • Reliance on data —AI requires training. Training requires data. To plan, design, and create classification, characterization, and other knowledge extraction tools for new systems can be difficult or impossible before data is collected, limiting innovation and building risk into an EO collection program.
  • Dataset-driven bias — Regardless of how much data is collected per day, rare events and objects are still rare. AI needs data to recognize rare items and humans could spend lifetimes combing data libraries for enough data items to train an AI for rare object detection.
  • Explainability — With increasing data comes increasing pressure to use data for automated decision-making. AI explainability can be problematic when algorithms process datasets into billions of features that have no anchor in comprehensible human concepts. Without explainability, using AI to inform or make decisions can be legally and even morally risky.

Opportunity of Synthetic Data

With these limitations in mind, we believe that one of the most significant opportunities to enable AI is the use of synthetic data. Synthetic data is engineered or simulated data that models the technical characteristics of real datasets while allowing the engineer or data scientist to design in the functional characteristics that are desired. Synthetic data is perfectly known and labeled because a human configured it.

Synthetic Data as a tool for Earth Observation,
Synthetic datasets, such as those created in, can have annotation, metadata, and even be post-processed to enable a wide amount of variation and experimentation with reliable information about the information captured in each image

This allows users to:

  • Create data when there is none — Synthetic data can be designed to emulate sensors, sensor platforms, and physical scenarios allowing users to plan and create content that will be collected in the future, potentially well before sensors are designed and launched to collect it.
  • Design datasets to reduce or test bias — Rare and unusual objects and events that can be digitally modeled can be used to generate immense amounts of synthetic data, which can then be used for AI training to detect real instances.
  • Experiment with datasets to demonstrate explainability — If dataset generation is configurable, then configuration parameters can be directly modified and tested against training effectiveness or bias, enabling a comprehensible connection between the design and input to the dataset and the resulting impacts on algorithmic performance and effectiveness.

An example: Working with Orbital Insight

If all of this sounds good in concept, we are seeing it happen in real customer cases today. has partnered with Orbital Insight and UC Berkeley for two phases of a project for the National Geospatial-Intelligence Agency.

Synthetic Data as a tool for Earth Observation,
Synthetic and real images used to demonstrate output of the OI, UC Berkeley, NGA project

The project team has demonstrated that:

  1. Synthetic data can be used with real data to increase the training performance of AI for rare and unusual objects
  2. Much less synthetic data can be used to train AI than would be required using real datasets
  3. Potentially much less real data may need to be collected to be able to detect rare and unusual objects

Taken together, these results start to paint a picture that synthetic data has the potential to improve training performance, reduce the time required to collect and curate data for training, and even improve the sustainability and overall cost of real sensor-based data collection.

Read more about this project in Trajectory magazine.

How helps has built our Platform as a Service (PaaS) with the premise that synthetic data is an essential component of future enterprise AI. We have integrated physics-based simulation tools to create Computer Vision (CV) content and added leading AI-based generative capabilities, such as CycleGANs for content creation and post-processing. As we have seen that enabling customer success requires experimentation and iteration, our platform supports an engineering-centric workflow with some of the following features:

Synthetic Data as a tool for Earth Observation, includes a configurable graph editor so that data scientists can control synthetic data generation without software coding

We know that we can’t create every dataset that every CV user needs created. We believe that we can enable our users to imagine the datasets that they need, design synthetic data applications for their purposes, and then create as much data as they need to be successful.

To support those users who want to go further we offer:

  • Developer tutorials and training
  • Open-source and shared-source starter code covering many important applications
  • Licensable 3rd party simulators
  • Support from’s top Synthetic Data Engineers!

How to explore Synthetic Data with

With the broad diversity of CV applications and AI, at the current state of the market, we find that many users are just beginning to hear about synthetic data. To help users get started, we provide a variety of experiences to learn more:

For those brave enough to sign up, we even have a tutorial that allows you to generate a dataset that you could take and use to train algorithms.

If you have any questions or want to specifically try out an EO-related exercise, let us know at

We look forward to seeing what you can create with the help of!

Additional sources:
Future of Satellite analysis 2017

Euroconsult Satellite Market forecast 2021

Leave a Reply