New AI video model, AI operating system, self charging robots, ChatGPT Agent, Kimi K2

By AI Search

AITechnologyRobotics
Share:

Key Concepts

Pussa (open-source video generator), Spatial Tracker V2 (camera and object motion tracking), Hope JR (open-source robotic arm), Neural OS (simulated operating system), Chat LLM (AI platform), Kimik K2 (open-source non-thinking model), Aona (driving scene generator), Digit (Agility Robotics' robot), CL3 (Limx Dynamics' robot), Walker S2 (UBTE robotics' robot), Fizz X (3D object generator with physical properties), Chat GPT Agent (OpenAI's agentic feature), Clif (3D scene reconstruction from photos), Movies (4D scene reconstruction from video).

Pussa: Open-Source Video Generator

  • Main Topic: Introduction of Pussa, a new open-source video generator.
  • Key Points:
    • Based on Alibaba's Juan 2.1 but fine-tuned for better performance.
    • Training cost is 200 times cheaper and data set size is 2500 times smaller than training Juan 2.1 from scratch.
    • Five times faster than Juan 2.1.
    • Requires fewer inference steps.
    • Flexible: supports image-to-video, video extension, and text-to-video.
    • Uses vectorized timestep adaptation for realistic video generation.
  • Examples:
    • Microscopic view of cells undergoing mitosis forming a smiley face.
    • Ice cream machine extruding a transparent frog.
    • Piggy bank surfing.
    • 360° video of a camel walking in the desert.
    • Car changing from gold to white.
    • Person eating a hot dog.
  • Technical Details: Vectorized timestep adaptation.
  • Availability: Model and code available on HuggingFace and GitHub.

Spatial Tracker V2: Camera and Object Motion Tracking

  • Main Topic: Introduction of Spatial Tracker V2, an AI tool for tracking camera and object motion in 3D space.
  • Key Points:
    • Tracks camera movement and object trajectories in videos.
    • Reconstructs 3D scenes from videos.
    • Free Hugging Face space available for online testing.
    • GitHub repo with instructions for local installation.
  • Examples:
    • Tracking a person riding a motorcycle.
    • Tracking a car drifting.
    • Tracking a basketball being passed around.
    • Tracking a person breakdancing.
    • Tracking a robot dog.
  • Step-by-step processes: Upload a video to the Hugging Face space and press "start tracking now".
  • Availability: Free Hugging Face space and GitHub repo.

Hope JR: Open-Source Robotic Arm

  • Main Topic: Introduction of Hope JR, a completely open-source robotic arm.
  • Key Points:
    • Can be 3D printed and assembled for around $500.
    • 23 degrees of freedom, including 16 in the hand.
    • Uses the robot, an open-source library for control.
    • Can be teleoperated via an exoskeleton or sensor gloves.
  • Step-by-step processes: Print the parts using provided templates, assemble the arm, download and set up the robot software, and connect it to the arm.
  • Availability: Comprehensive build guides and documentation on GitHub.

Neural OS: Simulated Operating System

  • Main Topic: Introduction of Neural OS, an AI model that simulates an operating system.
  • Key Points:
    • Nothing is predefined or hard-coded; everything is generated on the fly.
    • Responds to mouse movements and key presses in real time.
    • Demonstration includes simulated Firefox browser and file system.
    • Instructions available for creating a similar operating system neural network.
  • Examples:
    • Opening a simulated Firefox browser and typing in a URL.
    • Navigating to the trash and home folder.
  • Availability: Instructions for creating a similar operating system neural network are available.

Chat LLM: AI Platform

  • Main Topic: Promotion of Chat LLM, an all-in-one platform for using AI models.
  • Key Points:
    • Seamlessly switch between different models.
    • Includes image and video generators.
    • Artifacts feature for previewing generations.
    • Deep agent feature for complex tasks like creating PowerPoints and websites.
    • Subscription costs $10 per month.

Kimik K2: Open-Source Non-Thinking Model

  • Main Topic: Introduction of Kimik K2, a state-of-the-art open-source non-thinking model.
  • Key Points:
    • Developed by Moonshot.
    • Excels in frontier knowledge, math, and coding.
    • Mixture of experts model with 32 billion activated parameters and a trillion total parameters.
    • Two variants: base model for research and instruct variant for general chat.
    • Non-thinking model: responds faster than thinking models.
    • Outperforms other non-thinking models and even some closed-source models like Claude 4 Opus and GPT 4.1 in certain benchmarks.
    • Excellent at tool use, including web searching and Python coding.
  • Examples:
    • Creating an interactive visualization website for an NLP lab.
    • Creating a web version of Minecraft.
    • Creating an interactive viewer of planet Earth with dynamic lines of the magnetosphere.
    • Turning a YouTube video into an interactive slide deck.
    • Creating an interactive Pokédex.
    • Creating interactive educational pages for theorems.
  • Technical Details: Mixture of experts model, non-thinking model.
  • Availability: Completely open-source; can be downloaded and run locally. API access available for $1.3 per 1 million tokens. GitHub repo with links to HuggingFace.

Aona: Driving Scene Generator

  • Main Topic: Introduction of Aona, an AI tool for generating driving scenes.
  • Key Points:
    • Extends driving videos from a few initial frames.
    • Allows control of the car's movements and trajectories.
    • Generates realistic driving videos for training autonomous driving algorithms.
    • Uses a multimodal spatial temporal transformer and a next frame prediction diffusion transformer model.
  • Technical Details: Multimodal spatial temporal transformer, next frame prediction diffusion transformer model, trajectory planning diffusion transformer.
  • Availability: GitHub repo with instructions for local installation.

Humanoid Robot News: Digit, CL3, Walker S2

  • Main Topic: Updates on humanoid robots from Agility Robotics, Limx Dynamics, and UBTE robotics.
  • Key Points:
    • Digit (Agility Robotics): Demonstrates autonomous balance after experiencing obstacles. Reverse knee design. Can carry up to 35 lbs. Partnered with Ford and testing at Amazon.
    • CL3 (Limx Dynamics): Impressive dance routine. 5'5" and weighs 45 kilos. Flexible and learns from videos. Uses proprietary hollow actuator design.
    • Walker S2 (UBTE robotics): Autonomously swaps its own battery. Designed for factories, logistics, and manufacturing. Deployed at BYD.

Fizz X: 3D Object Generator with Physical Properties

  • Main Topic: Introduction of Fizz X, an AI tool for generating 3D objects with real-world physical properties.
  • Key Points:
    • Generates objects with correct size, material, and movement.
    • Determines the material of each object and segments it logically.
    • Can perform wind tests and ball tests on the generated objects.
  • Examples:
    • Generating a knife and identifying its handle and cutting part.
    • Generating a chair and identifying its size and material.
    • Generating a handbag and identifying its material and carrying part.
  • Availability: Data set available on HuggingFace. Code coming soon on GitHub.

Chat GPT Agent: OpenAI's Agentic Feature

  • Main Topic: Introduction of Chat GPT Agent, a new feature in Chat GPT for autonomous task completion.
  • Key Points:
    • Can autonomously book flights, create spreadsheets, and create slideshows.
    • Integrates with Gmail, Google Docs, Microsoft 365, Notion, and Slack.
    • Comparable to or better than humans in roughly half the cases for complex economically viable knowledge work tasks.
    • Requires a Pro Plus or Team subscription.
    • Limited to 400 messages per month for Pro users and 40 messages per month for other tiers.
  • Benchmarks:
    • Humanity's Last Exam: 41.6%
    • Spreadsheet Bench: 45.5%
  • Availability: Rolls out to Pro Plus and Team users first, followed by other tiers.

Clif: 3D Scene Reconstruction from Photos

  • Main Topic: Introduction of Clif, an AI tool for reconstructing 3D scenes from photos.
  • Key Points:
    • Creates new views of a scene that aren't in the original images.
    • Uses compressed light field tokens to reduce data storage while maintaining quality.
    • Three-stage process: tokenization, clustering, and compression.
  • Technical Details: Compressed light field tokens, tokenization, clustering, compression.
  • Availability: GitHub repo with code to be released around August 1st.

Movies: 4D Scene Reconstruction from Video

  • Main Topic: Introduction of Movies, an AI tool for reconstructing 4D scenes from video.
  • Key Points:
    • Reconstructs a 3D scene from a video, including the motion of the camera.
    • Can stabilize shaky videos.
    • Predicts the depth and motion of the video.
    • Tracks certain parts of the video as they move.
    • Segments different objects in the video.
    • Uses an image encoder and three attention heads to detect depth, appearance, and motion.
  • Technical Details: Image encoder, attention heads for depth, appearance, and motion.
  • Availability: GitHub repo with code to be released.

Synthesis/Conclusion

This week in AI saw significant advancements across various domains, including video generation, motion tracking, robotics, and 3D modeling. Pussa offers a faster and more efficient open-source video generation solution. Spatial Tracker V2 provides a powerful tool for analyzing motion in videos. Hope JR makes robotics more accessible with its open-source design. Kimik K2 stands out as a highly performant open-source non-thinking model. Fizz X introduces a new level of realism to 3D object generation by incorporating physical properties. Finally, Chat GPT Agent aims to enhance productivity through autonomous task completion, although its limitations and competition from other models should be considered. These developments highlight the rapid pace of innovation in AI and its potential to transform various industries.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "New AI video model, AI operating system, self charging robots, ChatGPT Agent, Kimi K2". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video