Have you ever wondered if you could watch all your favorite videos in your own language as if they were recorded just for you? Thanks to Gary Sztajnman, this vision is becoming a reality. Using machine learning and AI, Gary has developed Hello8, a revolutionary video translation tool that lets you watch videos in your language with voice cloning and lip sync.
Welcome to another edition of our exclusive interview series on AllAboutAI.com. Today, we are excited to bring you insights from Gary, an AI and data science expert with over a decade of experience.
With a solid business and video technology background, Gary has leveraged his expertise to create Hello8, an AI-driven platform that is transforming how we translate and dub videos.
Join us as we delve into Gary’s journey, the innovative features of Hello8, and the future of AI in video translation.
Here’s the complete interview!
Building the Future of AI-Driven Video Translation: Gary Sztajnman’s Vision for Hello8
Gary Sztajnman, co-founder of Hello8, envisions a future in which AI-driven video translation revolutionizes how we consume media in different languages.
With advanced machine learning and AI, Hello8 translates dubs and lip-syncs videos, making them appear as if originally recorded in the viewer’s native language. This innovative approach aims to break down language barriers, enhance global communication, and create a seamless, immersive viewing experience.
How did your background and experiences lead you into the AI industry?
Gary Sztajnman: I have worked in AI for the last 10 years. Originally, my background was in business and video. I switched to AI and data science in 2015 while doing my master’s in data science at Columbia University. I was working on the new chatbots emerging in 2016 and 2017. I worked at Verizon, a prominent US telecommunication company focusing on chatbots. I also worked on AI-generated music and other AI video startups. For a long time, I have been passionate about the different use cases in AI.
What inspired you to build Hello8?
Gary Sztajnman: Two or three years ago, I tried to build Hello8. At my previous company, we wanted to know if we could use AI to translate video. By translating video, I mean subtitles, voice, and lip sync. At that time, the technology wasn’t good enough.
The technology has advanced from 30 minutes to 30 seconds; with ChatGPT and new LLMs, what was impossible 3 years ago is now possible!
Why have you chosen AI-driven translation solutions?
Gary Sztajnman: I am French and have lived in the US for a long time, where I have met many people from different backgrounds. I noticed how hard it is to break the language barrier. Translation is a profound thing that unites people when we can speak a common language. We are not from the same background, but can understand each other because we speak English. If we couldn’t speak English, this conversation would be tough. I have experienced this problem as an individual trying to understand content in different languages.
How are AI-driven translation solutions helping companies improve their internal training and learning processes?
Gary Sztajnman: I see that others have the same problem, especially in companies and industries. You usually have to watch many internal videos in a company to learn different things. If you can watch them in your language, your level of learning will be much higher. This is a massive problem for many companies, and I find it super interesting to be able to resolve this and help people learn their languages.
Have you conducted market research before building Hello8?
Gary Sztajnman: We started the company with a massive mistake. It would be more of a B2C move, thinking that translation would help people break the language barrier by sending videos to one another in their language. For example, I have a friend in Mexico who wants to send him a message in Spanish, but I don’t speak Spanish. So, I would use this tech to send him a video in Spanish. We initially started with an app with a B2C focus and began testing it quickly as a proof of concept.
What Feedback Influenced Hello8’s Development into a Web App for Businesses?
Gary Sztajnman: Throughout this process, we spoke with many people because we were already out there and vocal, saying, “Hey, we’re a startup, we have an app, and we can do translation.” This gave us the chance to speak to companies and agencies. One company said, “Your app is cool, but if you make it a web app designed for companies, we will use it.” On the B2C side, people thought it was cool but weren’t ready to pay. Significantly few people were willing to pay, as this technology is expensive. You need people who want to invest and genuinely need this solution. Starting as an app allowed us to meet other businesses and companies, which helped us understand the market’s needs and pain points for video translation.
What specific challenges or gaps does Hello8 aim to address in the video translation and dubbing industry?
Gary Sztajnman: So you have to understand that there are traditional ways of doing video translation. Today, if I am a company and want to translate content, I might find a translator, someone to do the dubbing, like an actor, find a studio, and go through this entire process of translating a video into another language. First, it is super expensive because multiple people work together to do this. Second, it takes a very long time. Many companies are unwilling to spend the time and money; if they are, they would love to find a faster, cheaper, and more engaging solution. We met many companies willing to either set up a new workflow for video translation or change their existing workflow for something faster, better, and more affordable.
How does Hello8 transform the video translation and dubbing industry?
Gary Sztajnman: We are not the only ones doing this. If you search for video translation online, you will find others doing this, too. It is crucial to deliver the highest quality.
The fact that Hello8 not only do dubbing but also lip sync creates a wow effect. People are always amazed to see themselves speaking in Japanese, Arabic, or Spanish. This wow effect is super important.
Can you explain the workflow integration process of Hello8?
Gary Sztajnman: We believe it is all about building the proper workflows. When it comes to translation, you need to integrate it into your entire video process. You need to consider how you edit the video, translate it, add subtitles, and decide whether you want the original or translated subtitles. Then, you must send the videos and have someone check the translation. We are not just building a tool for input and output; we are building a solution to help companies translate videos and integrate them into their processes. For example, we do something that only a few others can; if you want to translate a video with us, we show you the transcription—the words being said in the original language—and the translation—the words translated to the new language.
With Hello8, you can change and edit the translation, tailoring it precisely to your desired context and dialect. This is super important if you are specific about the words you use.
How does Hello8 ensure the accuracy and quality of translations?
Gary Sztajnman: It is a challenge. I do not speak Japanese, and if I translate something into Japanese, how can I tell whether the translation is good quality? That is a really good point. So, first of all, with Hello8, you are in control. We are showing you the translated text and the original text. If you see something wrong in the original text, you can always change it. For example, if I speak in French or English and say, “My name is Gary, I am from Paris,” I will see this text in the tool. I will see the transcription, “My name is Gary, and I am from Paris,” and then I will see this in Japanese. Of course, I do not understand Japanese. However, I noticed that the transcription is wrong because it happens sometimes in 0.1% of the cases. I can edit the transcription, click retranslate, and then it will update the Japanese text. I can then re-translate the video with the correct words.
How does Hello8 handle potential inaccuracies in the translated text?
Gary Sztajnman: You might say, even if the transcription is perfect, maybe the translation is not good. In that case, we always allow our customers to engage with human translators to check the text. At the end of the day, if you want the text to be perfect, AI is amazing, but it is 99% accurate and can still have something called hallucinations, even with the latest models. So, we always want to invite humans into the process. It is a collaboration between AI and humans to get the best out of this tool.
Is Hello8 better than traditional translation methods?
Gary Sztajnman: Today, there are two main methods of translation. Some people use subtitles, which is excellent, and many people use them; others use human dubbing. Both techniques are less engaging than AI video translation with dubbing, subtitles, and lip sync.
The main difference between traditional techniques and what we do at Hello8 is the impact and quality. We aim to be at a much more futuristic level.
How does Hello8 ensure ease of use and customization in its video translation process?
Gary Sztajnman: If you compare it to human dubbing, you will see that it is a process that takes much more time and is much more expensive. We want to create an easy user experience. When you think about AI, chatGPT and the latest innovations, you realize how much people expect something easy to use and understand. We want to do the same with video translation—something that is easy to use, delivers great results, and can be tailored and played to achieve your desired quality. Our output is more engaging, faster, and affordable than other techniques.
Why should I use Hello8 instead of DeepL, Google Translate, or other translation tools?
Gary Sztajnman: When we started with Hello8, we quickly realized that as a young startup, there are things we are going to do and things we are not going to do. When it comes to pure text translation, many companies, like DeepL and Google Translate, have been working on this for years with tons of engineers, and we will not reinvent the wheel. Our business is not to be a competitor to DeepL or Google Translate. Our business is about the entire experience around video translation, dubbing, and lip sync. For the specific part of text translation, we use existing APIs. We are not building this part ourselves. We are building some AI models ourselves, but not this part.
How does Hello8 integrate text translation into the broader video translation experience?
Gary Sztajnman: What we do regarding pure text translation is essential. We are not just translating texts without context. We translate text thinking about its context in the video. For example, in English, “My name is Gary, I am from Paris” might take five seconds. Ideally, translating to Japanese should take five seconds so that the translated video matches the original. We have designed our text translation process to use and define the right prompts so that the AI provides multiple translations and selects the one that matches the duration of the specific video segment. It’s like a fight between different AI models, each providing different translations, and one model discriminates and selects the ideal length, ensuring the translation fits within the video’s length and duration.
How to translate a video with Hello8?
Gary Sztajnman:
- We have a three-step process. First, upload the video and select the target language. For example, if we have recorded this interview in English and want to make it available to Spanish-speaking people, I will take the video, upload it, and select Spanish.
- After the transcription and translation, you will see the text in English, the original, and the text in Spanish. Here, you can change some words, proofread, and ensure you are happy with every word being pronounced.
- When ready, launch the dubbing and lip-sync. The same video will have its entire audio replaced with dubbed audio, and the lips will be synced to make it seem like we are speaking Spanish. At any point in the process, I can review it, make changes, and share it with my team for their input.
What are the key features of Hello8?
Feature | Description |
Video Upload | Upload videos and select the target language |
Transcription & Translation | View original and translated text; proofread and make changes |
Dubbing & Lip Sync | Replace audio with dubbed audio and sync lips to the new language |
Subtitles | Add subtitles to videos |
Collaboration Tools | Collaborate with your team; organize videos in folders |
Review & Edit | Review and edit translations at any stage of the process |
Team Sharing | Share videos with your team for feedback and approval |
We have added features, like adding subtitles and enhancing collaboration. We have included folders and collaboration tools to help you organize all your videos and work with your team. Many of our customers are agencies or companies, so collaboration is key. The main concept of Hello8 is to upload, proofread, and then dub with lip-sync.
These features make Hello8 a comprehensive video translation, dubbing, and collaboration solution.
How does Hello8 handle the diversity of languages?
Gary Sztajnman: Today, we can translate from about 29+ languages. So you can take a video from, let’s say, French to Spanish, Chinese or Japanese. As of today, you cannot specify, for example, an English translation with a Canadian or South African accent.
How does Hello8 handle voice cloning to ensure the translated video respects the original speaker’s voice patterns and style?
Gary Sztajnman: We have a process called voice cloning. With voice cloning, the AI learns patterns in your voice, including tone and style. It captures aspects like pauses, speed, loudness, and softness. When you get the translation, the video respects all these aspects of your voice tone and style, which helps match specific cultural nuances. We would love for you to be able to select an accent or dialect with particular nuances, but we are not able to do this today. I want to be honest, but we are working on it.
What integrations does Hello8 offer?
Gary Sztajnman: When we launched Hello8, we thought that one of our main customers would be YouTubers or influencers. YouTube is working on a new feature that lets you select the subtitles’ language and voice tracks in different languages. You can watch a video in English and select the French voice track in the settings. This feature is currently available to a very small set of YouTubers, such as MrBeast and other big names, but it is not widely available. We are talking with many YouTubers who tell us that as soon as they have the feature to upload different voice tracks, they will start using Hello8. We will work on this YouTube integration when this feature becomes more widely available.
How does Hello8 plan to integrate with other social media and video platforms?
Gary Sztajnman: When other social media platforms can integrate similar features, we will integrate with Vimeo and other social media. Today, the process is quite simple: you just translate your video, download it, and then upload it wherever you want. However, integration is key. For YouTubers, the idea is to translate and push to YouTube. Companies may want the translated video pushed to their internal training systems or platforms like Learning360 or Coursera. The integration process varies depending on the customer.
Is my data secure with Hello8?
Gary Sztajnman: Yes. We are playing with AI models that are crazy powerful but can be scary, and I understand people who would be scared of an AI that can speak like them. The first thing I want to say is that when it comes to security, we have set the highest standards of encryption at every step of the process and are focused on security.
How does Hello8 ensure user control and privacy in the use of AI technology?
Gary Sztajnman: When it comes to ethics and privacy, we want the user to be in control and to be able to say, I agree that my voice is going to be dubbed, that my voice is going to be cloned, but if you do not want your voice to be cloned, that is okay. We won’t clone your voice. We won’t teach a model to speak like you. We give you other voices within our platform, like pre-made voices that you can use. Some of our users come in and say, oh, I want to translate my video, but I do not want the same voice to be used. We will use the voice of John, the voice of Paul, or the voice of Emily. These are pre-made voices in the software.
What measures does Hello8 take to ensure AI-modified content’s ethical use and authenticity?
Gary Sztajnman: Considering that this entire industry uses these super powerful AI models, we need to set up super high standards to track what is original content and what has been modified by AI. We are joining an industry standard called C2PA at the startup. The goal is to integrate an invisible forensic watermark within the video, which allows us to check the video and confirm if AI has modified it. I need to say that we are not in the business of fake news. We have very strong technology, but we only do translation. Even if we use your voice and modify your video, we will not make you say something else. We will make you say the same thing you said in the original language but in another language. This is super key.
These AI technologies are scary and powerful, and we need to set the proper boundaries so that within the user experience, you feel confident, secure, and in control.
How does Hello8 handle voice cloning consent and copyright issues?
Gary Sztajnman: The first step is to ask for consent when cloning a voice. Consent is super important if you are going to use someone’s voice. Regarding copyright issues, copyright is super complicated. You can have videos with different copyrights; translating them into a new language can involve different copyrights. Right now, many countries have a void in the regulations and laws regarding using voices for copyright. We tell you that you are in charge and responsible for the content. We do not add any copyright from our side. The content you put on and translate with the platform is your content, and you own it from A to Z. But if there are different copyrights, then you are responsible for them. We are just the tool to help you do the translation. If you are doing voice training, ensure you have the consent of the person whose voice is used in the video.
How can Hello8 help increase sales and engagement for e-commerce businesses?
Gary Sztajnman: We have worked with many companies, translating their internal videos for training. We have also worked with e-commerce businesses, like those selling online. For example, we worked with a guy who sells iPhone accessories. His primary market is France, and he creates marketing videos in French for social media. These ads lead customers to his website to buy his accessories. One day, he came to Hello8, translated his ad into Italian, and started targeting the Italian market. Suddenly, he saw a massive increase in sales in the Italian market, significantly boosting his revenue. He even told me that the ad in Italian performed better than the one in English. This case shows how translation can open new doors and increase sales. It was a great success story, and I’m excited to help more entrepreneurs and e-commerce businesses use Hello8 to translate their ads into other languages and increase their revenue.
How will AI evolve in the video translation and dubbing industry over the next five years?
Gary Sztajnman: AI will be everywhere, impacting movies, TV shows, NGOs, and more. We will see much more content being translated with AI. One challenge today is live video translation, like instant translation. If you upload a one-minute video, it might take one or two minutes to translate it. But soon, this will be much faster. We’ll reach a stage where you can go on a Zoom call and speak in French; the other person will hear it in English or their native language. This will be an exciting moment, allowing real-time multilingual communication. We are working on this technology, and while it is challenging, we know it is achievable. Models like lip synchronization are very heavy, requiring refinement of many models to get them right. However, dubbing, where the voice is dubbed, will likely happen within one or two years, not five. Technology is advancing rapidly, so these advancements will come sooner than expected. Real-time translation and the sheer volume of translated content will explode.
We’ll reach a stage where you can go on a Zoom call and speak in French; the other person will hear it in English or their native language. This will be an exciting moment, allowing real-time multilingual communication.
How will AI-driven translation impact the need to learn new languages?
Gary Sztajnman: We will see two different trends. Translating for business purposes will likely be handled by AI. However, speaking different languages has a cultural aspect involving real-time, real-life connections. People will still speak multiple languages, driven by passion and a desire to learn about other cultures. Those who speak multiple languages today are often passionate about languages. Speaking different languages will remain necessary for cultural understanding and connection. Even if AI is impressive, there will still be a difference between understanding the nuances of a language and its cultural context. People will continue learning different languages, as it will not disappear.
Can AI replace human intelligence?
Gary Sztajnman: No. There is no doubt that some very defined tasks are going to be replaced. But I see AI much more as a tool. I do not want AI to replace me; I want to play with AI to get the right results. In video translation, AI does 99% of the work. But still, I can edit, change, and control it. I am in power. We’ve seen models like Suno AI for music and Sora for video in creative fields. These models will not replace us but change how we work. Some people will lose their jobs, but new jobs will also be created. Is it a one-to-one ratio? We don’t know. We must embrace, regulate, help people transition, and see it as a tool. That’s the only way we will be successful.
Any advice for young entrepreneurs?
Gary Sztajnman: Try to solve a problem; find a problem that you want to solve. Once you find one, try to speak to about 20 people who have the same problem and understand why they have it. This is true for any company or business. What problem are you trying to solve, and who are you trying to help? Now, specifically for AI, if you have a real problem that you solve, you will find a technical solution. You will find the people and the tech. That is not an issue. It is hard, but you will find a solution. Think about being shell-locked in the next year. “Shell-locked” is an expression from Apple, where an app died because Apple introduced a similar feature. Do not build a technology or business that ChatGPT or a similar tool can completely revolutionise in three months. Ask yourself if your product is just a new feature that ChatGPT could easily replace. What we are building with Hello8 are workflows and the experience of editing a video, which is not something you will do in a chat. We combine multiple AI models, not just one. Speak to people, your customers, and users. Make sure your product cannot be easily replicated by big companies like ChatGPT, Apple, or Microsoft because you will not stand a chance against them.
Any message for viewers/readers?
Gary Sztajnman: I imagine that people watching these videos are interested in AI, which is my case. I have been in this field for the last 10 years, and we all want to innovate and define tomorrow’s future. We need to think about what AI for good means. You don’t have to build something that saves lives or the planet—if you can do that, it’s unique—but even a business tool can be valuable. What is the essence of what you do? How is it going to help people? I don’t usually put this forward because I don’t think, “Oh my God, we are doing AI for good.” We are not saving people; I just want to be honest. But when I look at what happens on Earth in so many wars, I believe that if people could speak a common language and understand each other better, they wouldn’t fight as much. You’re already halfway to peace if you can speak with your enemy. Bringing technology and AI into something essential and good for humanity will make you more passionate about your work. This belief has always been within me: we need to unite people in this world, and AI can help us do this.
Key Takeaways
In our interview with Gary Sztajnman, co-founder of Hello8, he shares his journey from business and video to AI and data science, ultimately leading to the creation of Hello8. The platform aims to revolutionize video translation by providing high-quality dubbing and lip-syncing, making videos accessible in various languages while maintaining the original essence.
Gary highlights Hello8’s unique features, such as seamless integration into video workflows, customizable translations, and the ability to handle multiple languages and dialects. He emphasizes the importance of security and ethical considerations in voice cloning and translation processes.
Gary also discusses the future of AI in the video translation industry, predicting advancements in real-time translation and increased automation in workflows. He advises young entrepreneurs to focus on solving real problems and validating their ideas with potential clients to build successful AI-driven solutions.