AI News – All About AI

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

The open-source AI landscape has a new entry worth paying attention to. The Qwen team at Alibaba has released Qwen3.6-35B-A3B, the first open-weight model from the Qwen3.6 generation, and it is making a compelling argument that parameter efficiency matters far more than raw model size. With 35 billion total parameters but only 3 billion activated…

A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment

AI NewsJune 6, 20263Views 0Likes 0Comments

In this tutorial, we work through an end-to-end workflow for Qualcomm AI Hub Models. We start by setting up the required package, discovering the available model collection, and loading MobileNet-V2 for local PyTorch inference. We also handle an important input-shape issue by converting NHWC image tensors into the NCHW format expected by the model. From…

Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation

AI NewsJune 1, 20266Views 0Likes 0Comments

For years, the computer vision community has operated on two separate tracks: generative models (which produce images) and discriminative models (which understand them). The assumption was straightforward — models good at making pictures aren’t necessarily good at reading them. A new paper from Google, titled “Image Generators are Generalist Vision Learners” (arXiv:2604.20329), published April 22,…

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

AI NewsMay 27, 20267Views 0Likes 0Comments

If you’ve ever watched a motion capture system struggle with a person’s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity.…

How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control

AI NewsMay 22, 20266Views 0Likes 0Comments

import random, numpy as np, torch, torch.nn as nn, torch.nn.functional as F import matplotlib.pyplot as plt from dataclasses import dataclass from typing import Tuple, Dict, List from torch.utils.data import Dataset, DataLoader try: from tqdm.auto import tqdm except Exception: def tqdm(x, **kwargs): return x SEED = 7 random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED) if device.type == "cuda": torch.backends.cudnn.benchmark = True @dataclass class WorldConfig: …

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

AI NewsMay 17, 20267Views 0Likes 0Comments

Video foundation models can paint a beautiful frame. They are still notoriously bad at remembering it. Push the camera through a corridor in Wan 2.1 or CogVideoX and walls warp, objects morph, and details vanish — the giveaway that these models are fitting 2D pixel correlations rather than simulating a coherent 3D scene. A team…

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation

AI NewsJanuary 23, 202621Views 0Likes 0Comments

Salesforce AI research team present FOFPred, a language driven future optical flow prediction framework that connects large vision language models with diffusion transformers for dense motion forecasting in control and video generation settings. FOFPred takes one or more images and a natural language instruction such as ‘moving the bottle from right to left’ and predicts…

Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

AI NewsJanuary 18, 202622Views 0Likes 0Comments

Black Forest Labs releases FLUX.2 [klein], a compact image model family that targets interactive visual intelligence on consumer hardware. FLUX.2 [klein] extends the FLUX.2 line with sub second generation and editing, a unified architecture for text to image and image to image, and deployment options that range from local GPUs to cloud APIs, while keeping…

Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input

AI NewsDecember 19, 202540Views 0Likes 0Comments

Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the Kimi K2 Thinking reasoning model, OpenAI compatible sampling, and image input through Qwen3-VL vision language models. For AI engineers, this turns Tinker into a practical way to fine tune frontier models without building distributed training…

Zhipu AI Releases GLM-4.6V: A 128K Context Vision Language Model with Native Tool Calling

AI NewsDecember 9, 202546Views 0Likes 0Comments

Zhipu AI has open sourced the GLM-4.6V series as a pair of vision language models that treat images, video and tools as first class inputs for agents, not as afterthoughts bolted on top of text. Model lineup and context length The series has 2 models. GLM-4.6V is a 106B parameter foundation model for cloud and…