Welcome, this is a workshop using Google's Gemini LLM to build a photo to geo location guesser using Google's Gemini LLM (for the Build With AI event series).
Gemini is a family of multimodal large language models developed by Google DeepMind. Unlike other LLMs, Gemini was said to be unique in that it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, including text, images, audio, video, and computer code. - Source
Vertex AI is a fully-managed, unified AI development platform for building and using generative AI. Access and utilize AI Studio, Agent Builder, and 130+ foundation models including Gemini 2.0 from Vertex AI. - Source
The goal of this workshop is to build a photo to geo location guesser using Google's Gemini LLM that focuses on landmarks. The user will upload a photo of a landmark and the LLM will guess the geo location of the landmark.
To begin, you will need to have the following pre-requisites sorted:
Please be aware of the Vertex AI Pricing as well.
gemini-geo
or anything relevant as seen below:vertex
on the search bar as seen below:Vertex AI
Freeform
found on the left menugemini-2.0-flash-lite-001
model selected. Then, paste the following prompt in the Prompt
text box:You are an OSINT investigator. Your job is to geolocate where the photos are taken.
Provide the country, region, and city name of the location.
Please pinpoint the exact location with latitude and longitude where the photo was taken.
Could you always explain your methodology and how you concluded?
Provide steps to verify your work.
Also, mention the percentage of how sure you are of the place you have identified it to be
and add a Google Maps link to the exact location
statue-of-liberty-1075752_1920.jpg
file you find in the repository or the photos folder of the unzipped photos.zip
file. Navigate to photos
folder after clicking the Insert Media
option on the middle of the Prompt
textbox. Then click Upload
and upload the Statue of Liberty image.>
button to submit and test out the prompt with the uploaded images, you should get a response similar to the following:You can also try other photos if you like.
Untitled prompt
text, then type image-to-geo-location
then click anywhere, it will look like the below while editing:Save
on the top left part of the right sidebar as shown below:All saved prompts will be accessible in your Prompt management page. You can access it from the Prompt management
link on the left sidebar.
Prompt Name
if you are on the Prompt management
page.Below is a configuration you can try out, the right settings for this configuration depend on how you want the output to be shaped by Gemini:
Safety Settings
correctly as per your use case, for now, we will set it at maximum safety (Responsible AI). As seen below, the safety settings (found on the right sidebar are self-explanatory)You can play around with the prompt and make it more flexible or more specific as per your goals.
The optional code step is next.
<> Get Code
link which shows a slider on the right side as follows:For this workshop, you will use the Python code and try it out. To continue, you will use Cloud shell and Cloud shell editor. You can close the sample code slider, by clicking Close
on the bottom left of the slider.
Activate Cloud Shell
toward the top right corner of the screen as seen below:You can also type G
then S
on your keyborad. You might need to autorise the Cloud Shell to access your Google Cloud resources.
Open Editor
:Hamburger Menu > Terminal > New Terminal
as follow:mkdir projects && cd projects && mkdir gemini-geo && cd gemini-geo
and then pip3 install --upgrade google-genai
to install the google-genai
Python package:google-genai
Python package is installed it will look like the below:Hamburger Menu > File > Open Folder
:projects/gem
and select the gemini-geo
option and click OK
:file+
icon besides GEMINI-WORKSHOP
and name it gemini.py
<>GET CODE
on the Vertex AI Editor screen, while on the Python
option, copy the code into a file called gemini.py
gemini.py
empty file and save it:Hamburger Menu > Terminal > New Terminal
and type in python gemini.py
then hit enter. It will ask you to Authorise
if you have not already done so.:After authorisation the code will run and give an outupt like the below:
Congrats! You are a Gemini and Vertex AI novice now :). You can close the Cloud Shell Editor.
Run on Google Cloud
button as seen below:Then, click on the Run On Google Cloud
blue button. The app code is in the streamlit_app.py file. We can discuss the code if you want to.
Trust repo
and then click on Confirm
as seen below:Authorize
:Cloud Sell Machine
and then run the script. It will ask you for the project, select the project you created in the first step gemini-geo
and hit enter.(Authorize)
us-central1
and hit enter.gemini-geo
or your project name and hit enter. Then again type in us-central1
and hit enter. So that the environment varaibles are set up correctly for the Cloud Run Service.amsteradm-centraal.jpg
image from the photos
folder. Browse and uplod the image then scroll down and click Guess the location!
red button:Guess the location!
button, you will see the geo location guesser in action as seen below:Go back to the slides :).