For many who are usually not conversant in Geoguessr, it’s a easy and enjoyable recreation through which you’re positioned on a random world location on Google Maps and should guess the situation throughout a time countdown, the objective is to get your guess as shut as attainable to the true location. After taking part in this recreation with some mates I began to consider how I might use the sport idea to construct one thing that permits me to observe with generative AI, that was how this undertaking “GenAI GeoGuesser” was born. On my model of the sport you’ll have to guess the nation identify based mostly on hints generated by AI fashions, to assist with the understanding listed here are a number of screenshots showcasing the sport’s workflow.
First, the person selects the specified trace modalities, you’ll be able to select any variety of choices between “Audio”, “Textual content” and ”Picture”, you additionally should choose the variety of hints that shall be generated for every modality. For the instance above you’ll get 1 trace for every one of many 3 varieties.
The textual content trace may have a textual description of the nation.
The picture trace shall be pictures that resemble the nation.
Lastly, the audio trace ought to be an audio/sound associated to the nation (In my expertise the audio hints don’t work in addition to the opposite two).
All of the fashions used to generate the hints above have parameters to fine-tune the era course of, you may generate longer textual content or audio hints, and even change the fashions. The repository has intuitive parameters to play with.
When you end evaluating all of the hints and are able to guess, sort the guess within the “Nation guess” area.
If the guess is flawed you’re going to get the proper nation identify and the gap between your guess and the proper place.
If the guess is appropriate you’ll obtain a congratulations message.
Now that you’re conversant in the sport’s workflow let’s perceive what is going on below the hood at every step.
The sport begins with the nation choice, right here I wished to imitate the unique Geoguessr habits the place probabilistically you’ll be dropped into bigger international locations (extra probability of being positioned there), because of this, simply randomly deciding on a county wouldn’t be sufficient, small international locations would have the identical probability of enormous ones, fortunately I discovered the countryinfo lib which supplied a listing of nations and a few metadata like nation space, beneath you’ll be able to see how the code seems to be like.
Deciding on the nation
from countryinfo import CountryInfocountry_list = listing(CountryInfo().all().keys())
# construct a dict with nation:space pairs
country_df = {
nation: CountryInfo(nation).space() for nation in country_list
}
country_df = pd.DataFrame(country_df.gadgets(), columns=["country", "area"])
# decide a random nation the place the chance is the nation's space
nation = country_df.pattern(n=1, weights="space")["country"].iloc[0]
Textual content hints
For the textual content trace era step, I’ve chosen a Gemma mannequin, the model with 2 billion parameters is ready to generate high-quality textual content whereas nonetheless operating quick sufficient to not disrupt the person expertise, the Gemma fashions are a household of light-weight, state-of-the-art open fashions constructed from the identical analysis and expertise used to create the Gemini fashions.
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfigtokenizer = AutoTokenizer.from_pretrained("google/gemma-1.1-2b-it")
mannequin = AutoModelForCausalLM.from_pretrained("google/gemma-1.1-2b-it")
immediate = f"Describe the nation {nation} with out mentioning its identify"
input_ids = tokenizer(immediate, return_tensors="pt")
text_hint = mannequin.generate(**input_ids)
# extract the textual content from the output and clear up
text_hint = (
tokenizer.decode(text_hint, skip_special_tokens=True)
.change(immediate, "")
)
You can too run the textual content trace era utilizing Gemini fashions through Vertex to get quicker and better-quality outputs (verify the configs file).
from vertexai.generative_models import GenerativeModelmannequin = GenerativeModel("gemini-1.5-pro-preview-0409")
immediate = f"Describe the nation {nation} with out mentioning its identify"
responses = mannequin.generate_content(immediate)
# extract the textual content from the output
text_hint = responses.candidates[0].content material.elements[0].textual content
Picture hints
For the picture era half, I’ve chosen the SDXL-Turbo mannequin, that is the model of the favored Secure Diffusion mannequin that may generate high-quality pictures with as little as a single inference step.
from diffusers import AutoPipelineForText2Imagemannequin = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
immediate = f"A picture associated to the nation {nation}"
img_hints = mannequin(immediate=immediate).pictures
Audio hints
To generate the audio hints we shall be utilizing the AudioLDM2 mannequin. From my experiments with completely different audio era fashions, this one had a very good trade-off between the velocity and high quality of the outputs for this particular use case.
from diffusers import AudioLDM2Pipelinemannequin = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2-music")
immediate = f"A sound that resembles the nation of {nation}"
audio_hints = mannequin(immediate).audios
With this, we conclude the hint-generation course of, as you’ll be able to see the HuggingFace libraries make our work fairly simple right here, the principle complexity of this app was associated to the precise workflow of the Streamlit app, this half is a bit out of context of this text as a result of it’s extra technical and particular to that framework, however in case you are curious to grasp it you’ll be able to go to the Git repository of this project.
Continue to learn
If you wish to look into different enjoyable use circumstances of generative AI utilized to video games you may get pleasure from studying my different undertaking Gemini Hangman.
To look into one other undertaking utilizing a number of modalities of generative AI, try my earlier article on producing music clips with AI.