Realtime AI video games, AI makes antibodies, video to 3D, colorize animations, new TTS

AI News and Tools: A Summary

Key Concepts:

AI-powered image editing, 360° video generation, animation colorization, real-time video game creation, 3D scene reconstruction, antibody design, reference-to-image generation, depth estimation, multimodal language models, text-to-speech generation.

1. Calligrapher: AI Text Editing in Images

Main Topic: AI tool for editing text within images while preserving the original style and font.
Key Points:
- Can change text in existing images, maintaining font, style, and overall image integrity.
- Allows users to upload a reference image to dictate the desired text style or font.
- Can use abstract images (e.g., lightning bolts, fire) as style references for text generation.
Examples:
- Changing "Groundhog" to "Groundhog Day" while keeping the original font.
- Using a lightning bolt image to style the text "action."
Process: Input image + desired text change + (optional) reference image for style.
Availability: Open-source code and models available on GitHub with a graphical interface.

2. X4D: Image and Video to 360° Video Conversion

Main Topic: AI tool by Pico and ByteDance that converts images or videos into 360° videos.
Key Points:
- Generates 360° videos from input videos, allowing exploration from multiple angles.
- Extrapolates and guesses the appearance of unseen parts of the scene.
- Can create a complete 360° world from a single image.
Examples:
- Converting a video generated with V3 into a 360° scene.
- Generating a 360° world from a single image of a room.
Technical Details: Leverages Alibaba's one 2.1 for video generation. Requires a minimum of 48 GB VRAM.
Availability: Open-source code available on GitHub.

3. Long Animation: Automatic Animation Colorization

Main Topic: AI tool for automatically colorizing long animations while maintaining color consistency.
Key Points:
- Takes a reference colored image and a sketch animation as input.
- Colorizes the sketch animation based on the reference image, saving animators time.
- Can work with long videos (e.g., 27-second clips).
- Can use partially colored images and prompts to create backgrounds.
Examples:
- Colorizing a sketch animation of characters based on a single colored frame.
- Generating a beach background for a partially colored animation with a prompt.
Performance: Outperforms existing colorization methods in terms of color consistency, quality, and sharpness.
Technical Details: Inference tested on one A100 GPU with 80 GB of VRAM.
Availability: Open-source code available on GitHub.

4. Mirage: Real-Time Video Game Creation

Main Topic: AI tool that allows users to create and play any video game in real time using text prompts.
Key Points:
- Generates video games on the fly without requiring downloads or predefined designs.
- Users can control game elements and character movements with text prompts and standard controls (WASD, arrow keys).
- Can generate various video game styles (e.g., Super Mario, GTA, Forza).
Examples:
- Generating a Super Mario-style game in real time.
- Creating a GTA-style gameplay with an interactive map.
Demos: Two playable demos available: "Urban Chaos" (GTA-style) and "Coastal Drift" (Forza-style).
Comparison: More interactive than previous real-time video game generators like Google's game engine and Microsoft's Diamond.
Status: Currently in research preview.

5. Lang Scene X: 3D Scene Reconstruction from Images

Main Topic: AI tool that builds 3D scenes from a few images of the scene.
Key Points:
- Generates a full 3D video from limited views of a scene.
- Can generate a segmentation map of the scene, automatically segmenting objects.
- Can estimate the orientation of surfaces (normal estimation).
- Can detect specific objects in the scene by prompting it.
Process: Input multiple views of a scene -> try map video diffusion model -> field constructor component -> 3D scene reconstruction -> normal estimation, segmentation map, or object detection.
Performance: More accurate than leading segmentation and normal estimation techniques.
Technical Details: Uses SAM, SAM 2, and COG X.
Availability: Open-source code available on GitHub.

6. Chai 2: AI-Powered Antibody Design

Main Topic: AI model that designs new antibodies from scratch.
Key Points:
- Can design antibodies specific to certain protein targets (e.g., cancer cells).
- Generates antibodies zero-shot, requiring only a single prompt.
- Success rate is 100 times better than previous computational methods.
- Also works for designing other proteins like mini proteins with a high success rate (68%).
- Faster and more efficient than previous methods, finding potential antibody candidates in just 2 weeks.
Availability: Open for early access to academic and industry partners.

7. Xverse: Reference-to-Image Generation

Main Topic: Reference-to-image generator by ByteDance that transfers reference images of people or objects into a new photo.
Key Points:
- Accurately transfers faces and objects from reference photos into new scenes.
- Can take in multiple reference images and incorporate them into a single scene.
- Can transfer styles and lighting from reference images.
Technical Details: Features a T-mod adapter, text flow modulation mechanism, and VAE encoded image feature module.
Performance: Outperforms other reference-to-image tools like Dreo, Omni Genen 2, and Uno, especially in multi-object transfer.
Availability: Data set and models available on GitHub with a graphical interface.

8. Depth Anything at Any Condition: Improved Depth Estimation

Main Topic: AI tool for estimating the depth of objects in images, even in challenging conditions.
Key Points:
- Takes a single image as input and outputs a depth map.
- Generates more detailed and accurate depth maps compared to previous versions.
Performance: Significantly better than the original "Depth Anything" tool, especially in noisy or poorly lit conditions.
Availability: Free Hugging Face space for online use and open-source code on GitHub.

9. Autonomous Robot Soccer

Main Topic: First fully autonomous robot soccer match.
Key Points:
- Robots played without human control, using visual sensors to track the ball and navigate the field.
- Robots were able to regain their footing after falling.
Outcome: Stumbling and slow motion, but demonstrates progress in humanoid robot technology.

10. Ovis U1: Multimodal Language Model

Main Topic: Multimodal language model that can chat, analyze images, generate images, and edit images.
Key Points:
- Can summarize images, recognize text in images, generate images from text prompts, and edit images with text prompts.
- Can extract subjects from an image onto a white background.
Performance: Competitive with GPT-40 in some areas and outperforms other image generators in certain benchmarks.
Availability: Free Hugging Face space for online use and open-source code on GitHub.

11. Q-Tai TTS: Open-Source Text-to-Speech Generator

Main Topic: Open-source text-to-speech generator that clones voices from a few seconds of audio.
Key Points:
- Can clone voices and generate speech with different expressions.
- Can generate long audio clips (minutes long) without significant quality loss.
- Limited to English and French.
Real-Time Platform: Can be used in real time via the online platform "unmute."
Performance: Claims to outperform 11 Labs in some benchmarks, but comparisons omit state-of-the-art alternatives.
Availability: Open-source code available on GitHub.

Conclusion:

This week in AI has seen significant advancements across various domains, from image and video manipulation to game creation and scientific research. The availability of open-source code and models for many of these tools empowers developers and researchers to further explore and improve upon these technologies. While some tools are still in early stages or have limitations, they offer a glimpse into the future of AI-driven creativity and problem-solving.