SANS

GERARD

Google Developer Expert

Developer Evangelist

International Speaker

Spoken 199 times in 43 countries

Google AI Learning Path

Vertex AI

Complexity

Features

Google AI Ecosystem

VertexAI

AI Platform

Gemini for Workspace

AI Assistant

AI Studio

AI Playground

Gemini

Foundational

Models

Gemini

Chatbot

Specialised training

Multimodal medical model

Med-Palm

Opening the world

Gemini Ultra benchmarks

Gemini for Open Source: Gemma

Responsible AI

Reduce Biases

Safe

Accountable to people

Designed with Privacy

Scientific Excellence

Follow all Principles

Socially Beneficial

Not for Surveillance

Not Weaponised

Not Unlawful

Not Harmful

Vertex AI

Global

Google Cloud AI Platform

Vertex AI

Complexity

Features

Scaling Generative AI

Foundational Models

Voice

text-speech speech-text

Medical

medlm-medium medlm-large

Code

code-bison codechat-bison code-gecko

Multimodal

gemini-pro gemini-pro-vision

gemini-ultra*

1 Million tokens

Gemini 1.0 Pro 128K tokens

(100 pages)

Gemini 1.5 Pro 1M

(800 pages)

Gemini for Google Workspace

Protection: IP infringements

All Google Services

DuetAI

Generated Outputs

VertexAI

Training Data

Stochastic parrot or AGI?

Training: guessing the next word

Adjust model predictions using output

Wikipedia

Christopher is

Christopher Columbus was

Input

Output

Christopher Columbus discovered America

Christopher Columbus discovered America in

Christopher Columbus discovered America in 1492

Christopher Columbus discovered America in 1492 .

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492 .

Hyper-dimensional Graph

who

America

Columbus

discover

d_4

d_{2}

d_5

d_{1408}

d_{1}

d_{3}

Latent Space

Word Embedding

d_{1}

d_{1408}

Columbus

0.3

1.2

-1

0.9

Word Embedding

d_{1}

d_{1408}

America

1.5

0.3

0.7

0.1

A prompt will put you in a certain area of the latent space

Note the density of data points and noise ratios

AI generated text is...

Biased

Non-factual

Inaccurate

1+1= 3

Grounding: reducing hallucinations

VectorDB Embeddings

Google Knowledge Graph

Google Search

Fact Checking

From idea to code

AI Studio

AI Playground

Gemini

API

Gemini

Fine-tuning

Digital art for everyone

Imagen 2: unlocking visual creativity

Generative AI for creatives

magazine style 4k photorealistic,

modern red armchair

natural lighting

Portrait of a french bulldog

at the beach,

85mm f/2.8

Assortment

of delicious,

freshly-baked donuts

Prompt

Imagen 2

Image inpainting and upscaling

Original + mask

Imagen 2

Automatic image captioning

Input

Caption

Explore images via chat with VQA

Input

Question

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

Global

First steps in Generative AI

Vertex AI

Complexity

Features

Paid access for Gemini Ultra

Imagine

Image Generation

Google Lens

Google Search

Learn

Listen Response

+40 Languages

Be Creative

C++, Go, Java, Javascript, Python and Typescript

Code

Generate

complex graphics

Plot

Gemini extensions!

Access your GMail.

Do More

Deep integration with YouTube.

Save Time

US-only

Generative AI for Developers

Vertex AI

Complexity

Features

Your sandbox for prompts

GoogleAI

VertexAI

Pro 1.0

Pro 1.5

Ultra 1.0

Nano 1.0

video

11h

audio

30K

LOC

800

pages

1 Million tokens

Foundational Models

Embeddings

models/embedding-001

Gemini Pro

gemini-pro

Gemini Pro Vision

gemini-pro-vision

New multi-modal architecture

Computer Vision tasks

Source: V7 Labs

Visual Training Datasets

A dog running

Digits 0-9

Ant

French cat

MNIST

60K

10 classes

COCO

330K

80 classes

ImageNet

14M

Image + caption

LAION (Web)

Image + text

Granular Visual Patches (ViT)

Query Patch

Detail

Image

Visual Attention Mechanism

"ginger fur"

"standing on a stone in the garden"

"well-groomed"

Image

Features

Attention

A new generalist Computer Vision

Visual Chat

VQA

Multi-turn

Reasoning

Extract Data

Handwriting

Data entry

OCR

Metadata

Identify

Recognition

Captioning

Categorising

Structure

Elements

Relationships

Hierarchies

Time/Space

Tracking

Activity

Causality

3D/4D

Computer Vision use-cases

Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00

Extract the text for the opening hours and consolidate them in a single paragraph in English

Prompt

Output

Multimodal example

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A

Extract Text

OCR

Image Input

Task

Raw Data

13H
16'30H. A 20'00H

Extract Text

Hand-written OCR

Reason

Mash-up fragments

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A 13H 16'30H. A 20'00H

Translate

Catalan to English

Monday to Friday
from 6:30 to 13:00 from 16:30 to 20:00

Vision Examples

Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Gemini models landscape

Access to Gemini

Mini-Gemini Chatbot Demo

Building a mini-Gemini Chatbot

By Gerard Sans | Axiom 🇬🇧

Building a mini-Gemini Chatbot

In this talk, you will learn how to build a mini-Gemini Chatbot using Google's latest Generative AI using Google AI Studio, Gemini Pro model and Angular. Google AI Studio is a tool to build the new wave of Generative AI applications using Gemini foundational models. We will be introducing the Gemini models to build the foundations of a Gemini Chatbot and explore advanced features like AI Agents, the ability to use tools and call APIs; RAG, or Retrieval Augmented Generation to improve grounding and extend Gemini training data cut off to include external data and more! Google Gemini era is here.

8,430

Gerard Sans | Axiom 🇬🇧 PRO

Founder of Axiom Masterclass, professional trainings // Forging skills for the new era of AI. GDE in AI, Cloud & Angular. Building London's tech & art nexus @nextai_london. Speaker | MC | Trainer.