Innovation and Computer Vision: A Discussion with Rendered.ai’s CEO on the Power of Synthetic Data
TL; DR: The push to automation requires us to engineer the data we use for AI. Eventually, AI practitioners will start considering synthetic data as a core corporate capability and incorporate synthetic data into their long-term AI strategy. Rendered.ai CEO, Nathan Kundtz shares his thoughts on the future of synthetic data and AI.
As the use of artificial intelligence (AI) and machine learning (ML) continues to expand, acquiring and using real-world data to train these systems presents challenges, including high costs and the need for accurately labeled, unbiased data. In this Q&A session, the CEO of Rendered.ai, Dr. Nathan Kundtz, discusses what synthetic data is, why it’s needed, how Rendered.ai is addressing the current challenges in the AI industry, and what trends to expect in synthetic data and AI in the near future.
What is synthetic data and why do we need it?
Dr. Kundtz: With the widespread integration of AI in the technology market, the costs and challenges of acquiring and using real-world data to train AI and ML algorithms present a barrier to innovation. Even in cases where data is abundant, much real world is unlabeled, biased, or irrelevant to specific users’ problems. To solve these challenges, data needs to be engineered to allow for control over the data. Synthetic data is engineered data intentionally designed to have characteristics that address a specific problem with real sensor data or with an AI training or validation process.
What are the current challenges in the artificial intelligence industry and how is Rendered.ai addressing them?
Dr. Kundtz: Before starting Rendered, I was the CEO of the Kymeta Corporation, a satellite communications company. I had many friends in the satellite industry who were working to build businesses based on a novel sensor design, like a new radar sensor or IR sensor, and then launch these in satellite constellations. All of my associates faced a consistent problem. They knew that their proposed business model would require them to sell the insights that could be generated from captured satellite data, usually images, which necessitated introducing some amount of analytics stack or artificial intelligence into that data pipeline. Every entrepreneur in this industry runs into the problem that they need data to build that analytics stack before their satellites are launched.
The logical question here is how do you sell that business model before you’ve spent hundreds of millions of dollars on satellite constellations?
The satellite remote sensing industry isn’t some backward industry that has a weird problem with AI. The data issue is a central feature of artificial intelligence. Until we promote data to the status where it can be engineered, similar to how we engineer software, we’re not going to be able to engineer our artificial intelligence algorithms. By extension, we’re not going to be able to proactively engineer many hardware platforms either. Artificial intelligence is just an extension of software and to do software engineering in a world that leverages artificial intelligence we need to be able to do data engineering in a way that gives us control over data. This observation is what led to the creation of Rendered.ai.
What are the big surprises that you’ve seen since starting the company?
Dr. Kundtz: The biggest surprise I’ve seen so far while building the Rendered platform stems from the fact that we approached this as an engineering problem with the expectation that the solution would be dev tools for engineers. As a result, we expected that our users and customers would be, well, engineers. What has been surprising is how interested business development and sales stakeholders are in getting access to this kind of capability because they’re the ones that need to be able to demonstrate the downstream value for their customers. It’s not dev tools for the sake of dev tools, but it’s a tool to facilitate what a business is doing with its technology.
What are the major trends that will impact synthetic data and artificial intelligence in the next year?
Dr. Kundtz: The push to automation and leveraging new capabilities is driving the adoption of synthetic data because as AI gets adopted, synthetic data becomes crucial. Generally, the more we see AI implemented in production, the more important synthetic data will become. The synthetic data industry is emerging at the right time when AI is in a live production environment, where you find defects and need to address them as a part of your ability to provide quality products. The biggest headwind for synthetic data is the number of AI practitioners, so as we see more talent head into the field, we will see the industry grow as well.
We are currently seeing a classic mix of innovators, early adopters, and a few early majorities. The innovators are experimenting with synthetic data, and we hear comments like, “I have heard synthetic data is important, but I don’t know why yet, and I want to try it out and see what we can do with it.” Most of our customers are in the early adopter environment where they are willing to go with technology that isn’t fully mature yet and it’s also where a majority of the market is today. We do occasionally see customers that are in the early majority where they’ve had some success with a few different synthetic data pipelines and know that they will need to maintain a synthetic data strategy for a long time and are now looking into tools that can help them make that a corporate capability.
One of the trends in the market over the next five years is that more and more people will display adoption behavior consistent with the early majority technology market phase. Those who are currently early adopters will move into the early majority, and the number of companies in the early adopter phase that are early adopters in that segment will also continue to grow. As the market shifts, more and more people consider synthetic data as a broad corporate capability and look to how they can handle that long term.
Oh… and ChatGPT, of course. Large Language Models and Diffusion Models are changing the conversation with customers daily. This can also be a source of confusion, but that’s a different topic.
What are the major industries that synthetic data (and Rendered.ai) is currently addressing or will address in the next year?
Dr. Kundtz: Regarding market segmentation, there’s still a lot of building to do within the remote sensing market where we currently support a majority of our customers, so we are certainly committed to continuing to invest in synthetic data for remote sensing and doing that really, really well.
I anticipate other industries to grow, with security, imaging, and insurance tech markets being the next mature markets. Manufacturing and non-destructive testing markets are also on the horizon. Medical AI has the potential to become the largest market because tools built to detect disease using artificial intelligence will impact everybody. When you consider the shortage of providers in the medical industry, there is an even greater need to automate many computer vision applications from diagnostics to tracking assets, and AI will be central to that. Medical is a great real-world use case where you need to control the data, avoid bias, have privacy concerns everywhere, and everything is an edge case. The medical industry will take a long time to penetrate successfully but will be the largest opportunity for synthetic data.
Notably, a market I haven’t mentioned is autonomous driving and that’s because it’s a very concentrated market so it hasn’t been a focus for us. However, it’s still an area I expect to get a lot of attention and suspect will play in some very specific ways, possibly around novel sensor stacks in that space.
What are the big considerations/messages that you’d like to communicate to customers?
Dr. Kundtz: One of the most exciting aspects of our business is that everywhere we turn, we’re helping address significant challenges and business problems with wide-ranging impact on the environment, health, disaster recovery, and security. One day we may be working with a customer simulating images of weaponry to build a better metal detector for schools. The next day, we will be working with a customer who can’t get enough real hyperspectral imagery in diverse weather conditions to build a greenhouse gas detection system. If customers with these significant problems can turn to Rendered.ai and synthetic data for help, we feel that anyone in the computer vision industry should be able to work with us.
Generally considering synthetic data as a strategy and solution, I encourage questions examining if a customer’s strategy is robust enough to be able to use synthetic data in a reproducible way, how do we store results over time, how do we come back to them, how do we establish quality over AI training pipelines? Of course, our platform does a great job on those fronts allowing synthetic data as a capability to be incorporated into just about any AI pipeline.
Find out more… or generate a dataset for yourself!
There are multiple paths to find out more about synthetic data and Rendered.ai. Our support documentation is open. Anyone can sign up to try out our platform. We also publish numerous blogs, interviews, and videos covering the value of synthetic data and how to use our platform.
Follow these links to find out more:
- Sign up to get started generating your own synthetic data on the Rendered.ai platform
- Check out our learning path on our support site
- Here’s an overview video of the platform
- Sign up for Rendered.ai’s newsletter to stay informed!
And if you have any other questions, you can always contact us here!