logo
Published on

AI Image & Video Creation Guide — Gemini API

Read in: 한국어
Authors

1. Two Ways to Create Images

1.1 The Easiest Way — Gemini Chat

The simplest way to create images is by chatting on the Gemini website (gemini.google.com).

Type "Draw an illustration of a cat wearing a spacesuit" and the AI will create an image right away. No setup needed — just log in and start using it.

Image 1

1.2 Limitations of Chat

Chat is convenient for making a few quick images. But for real-world use, there are some limitations.

Visible watermarks are added. When you download images created through chat, they include a visible watermark. This makes them hard to use in portfolios or actual projects.

No automation. If you need 100 images, you have to enter 100 prompts, download 100 times, and rename 100 files.

Limited settings. You can't directly control detailed options like resolution, aspect ratio, or the number of reference images.

1.3 The More Powerful Way — API

The way to go beyond these limitations is the API. When you use the API directly, you can create images without watermarks, automate the process, and freely adjust detailed settings.

If you're not sure what an API is, read What Is an API? first. It also covers how to get an API key.


2. Gemini Image Model — Nano Banana

2.1 What Is Nano Banana?

Google's image generation AI model has the codename "Nano Banana".

In August 2025, Google anonymously released an AI image generation model. It took first place on a platform where people vote on "which AI makes the best images" (Chatbot Arena) while hiding its identity, and the codename used was "Nano Banana".

When its identity was later revealed, it turned out that Gemini 2.5 Flash Image = Nano Banana. Since then, successor models have been released — Nano Banana 2 and Nano Banana Pro.

Image 6

2.2 Three Models

Currently, there are three Gemini image generation models.

Image 7
Nano BananaNano Banana 2Nano Banana Pro
Official NameGemini 2.5 Flash ImageGemini 3.1 Flash ImageGemini 3 Pro Image
API Model IDgemini-2.5-flash-imagegemini-3.1-flash-image-previewgemini-3-pro-image-preview
FeaturesFast speed, efficientFast speed + search integrationBest quality, complex tasks
ResolutionStandard0.5K / 1K / 2K / 4K1K / 2K / 4K
Reference ImagesSupportedUp to 14Up to 11
Text RenderingBasicImprovedAccurate even with complex text
Best ForBulk generation, rapid prototypingBulk generation + latest featuresFinal deliverables, high-quality assets

2.3 Which Model Should You Choose?

  • Want to generate many images quickly? → Nano Banana 2 (latest Flash model)
  • Final quality matters most? → Nano Banana Pro
  • Not sure? → We recommend Nano Banana Pro (the quality difference is noticeable)

3. Asking Claude Code to Build a Program

3.1 Creating an Image Generation Program

Now you're ready. With your API key in hand, you can ask Claude Code to build an image generation program.

Image 8

Here's what a real conversation looks like:

Designer:

"Build a Python program that generates images using the Gemini API. When I type in a prompt as text, it should save the image as a PNG. Have it read the API key from a .env file."

Claude Code:

"Sure, I'll build that." → Writes code to read the API key from .env → Writes code to send requests to the Gemini API → Writes code to save the resulting image as PNG → Program complete!

The designer doesn't need to write a single line of code. Just describe what you want to build and Claude handles the rest.

3.2 How the Program Works

Here's a simple breakdown of how the program Claude Code created works:

Image 9
Prompt (text)
Program on your computer (code created by Claude)
Sent to Gemini API over the internet
Image generated on Google's server
Result image saved to your computer (PNG)

4. Creating Similar Images with Reference Images

4.1 What Are Reference Images?

This is the most useful feature for designers. The Gemini API can send images along with text, not just text alone. You can send a reference image and say "Make it feel like this."

Image 10

Tell Claude Code something like this:

"Write a program that sends this image (reference.png) to the Gemini API along with a prompt to create a city night view image in the same style"

Claude Code will:

  1. Write code to read the image file and attach it to the API call
  2. Write code to send it to Gemini along with the prompt
  3. Write code to save the result as well

4.2 Three Ways to Use References

Image 11

1. Match the Style

Tell Claude Code:

"Using the colors and textures of this image as reference, create a coffee shop illustration with the same feel"

2. Match the Layout/Composition

Tell Claude Code:

"Using the layout of this infographic (3-column structure, icons+text) as reference, create an infographic on the topic 'AI Image Generation Process'"

3. Match the Character Style

Tell Claude Code:

"Using the design style of this character (round shape, big eyes, minimal) as reference, create a cat character in the same style"

Key Point Instead of "Use this image as reference," saying "Use the color palette of this image as reference" — being specific about what to reference — yields much better results.


5. How to Write Great Prompts

5.1 Write It Like a Design Brief

A prompt is a work order you send to AI.

A design brief is a document you write when commissioning design work. It's a request that outlines "what style, what colors, what purpose." If you just tell a freelance designer "make me a banner," it's hard to get what you want. But if you provide a brief with the subject, style, colors, and mood, you'll get much more accurate results.

AI prompts work exactly the same way. The more specific the brief, the closer the result will be to what you want.

Image 12

5.2 The 5 Elements of a Prompt

Image 13
ElementDescriptionExample
SubjectWhat to depict"City night view", "Cat character"
StyleWhat feel to go for"Watercolor", "Flat design", "3D rendering"
CompositionHow to arrange it"Front view", "3-column layout", "Close-up"
ColorWhat colors to use"Pastel tones", "Monotone", "Neon colors"
MoodWhat vibe to convey"Warm", "Futuristic", "Cute"

5.3 Practical Example

Bad prompt:

Make me a banner image

Good prompt:

Social media banner image.
Subject: AI technology introduction
Composition: 16:9 landscape, text on the left, illustration on the right
Left: "The Future with AI" large title
Right: Illustration of a robot and a human shaking hands
Colors: Deep blue + white, cyan accent
Style: Modern and clean tech style, gradient background
Mood: Trustworthy, forward-looking

5.4 Style Keywords Designers Can Use

Designers already have a rich visual vocabulary. Just use that knowledge directly in your prompts.

"Flat design, rounded corners, bright pastel colors"
Friendly and modern feel

"Minimal, lots of whitespace, serif font feel"
Luxurious and sophisticated feel

"Neon colors, dark background, glitch effect"
Cyberpunk / tech feel

"Watercolor texture, soft blending, natural colors"
Emotional and analog feel

5.5 Text Rendering Tips

AI image text rendering has improved a lot. Nano Banana Pro can accurately render Korean text and handle long paragraphs. However, it's not perfect, so text post-processing may sometimes be needed.

Image 14

Recommended approach:

  • AI handles most text rendering well, but check the results and post-process in Figma/Photoshop if needed
  • Let AI handle the visuals, let designers handle fine text adjustments — this is the most practical workflow

6. Putting It Into Practice

6.1 The Complete Workflow

Image 15
Step 1: Prepare API Key (one-time setup)
Step 2: Describe the program you want to Claude Code
Step 3: Claude writes the code
Step 4: Claude runs the program
Step 5: Check the results → Use in Figma

6.2 Real Conversation Examples

Example 1 — Creating a Character Series:

"I want to create character images using the Gemini API. Make 10 animal characters in the same style (cat, dog, rabbit, bear, fox, deer, penguin, owl, squirrel, panda). Round shapes with big eyes, pastel backgrounds. Save each as a PNG."

What Claude Code does:

  1. Generates 10 prompts
  2. Writes a program that calls the Gemini API
  3. Automatically generates and saves 10 images

Example 2 — Reference-Based Variations:

"Using this image (banner_ref.png) as reference, make 5 banners in a similar style. The themes should be 'Spring Sale', 'Summer Collection', 'Fall Event', 'Winter Discount', and 'New Year Special'."

What Claude Code does:

  1. Writes code to read the reference image
  2. Writes a program that sends 5 topic-specific prompts + the reference to the Gemini API
  3. Automatically generates 5 banners

Example 3 — Auto-Inserting Images into a Document:

"Read this markdown document (README.md), create infographic images matching each section's content, and insert them into the document."

What Claude Code does:

  1. Analyzes the document and generates image prompts for each section
  2. Generates images from each prompt
  3. Automatically inserts image paths into the document

6.3 Summary — A Designer's New Superpower

Image 16

7. Veo 3.1 — From Images to Videos

7.1 What Is Veo 3.1?

If Nano Banana is an AI that creates images, Veo 3.1 is an AI that creates videos. Made by Google DeepMind, this video generation model takes text or images and creates high-quality videos up to 8 seconds long.

Image 17
Nano Banana 2Veo 3.1
CreatesStill images (PNG)Videos (MP4)
API Model IDgemini-3.1-flash-image-previewveo-3.1-generate-preview
Output1 imageUp to 8-second video
ResolutionUp to 4K720p / 1080p / 4K
Audio-Native audio auto-generated
AnalogyPhotographerFilm director

7.2 What Can Veo 3.1 Do?

Veo 3.1 can create videos in three ways.

Image 18

1. Text → Video

Generate a video from a text prompt alone.

"A person walking along a beach watching the sunset, cinematic tracking shot" → 8-second video generated

2. Image → Video ⭐ Most useful for designers

Feed in a still image, and it becomes a moving video.

T-shirt mockup image + "A model dances and shows off the t-shirt" → 8-second video of the model actually dancing

3. Reference Images → Video ⭐⭐ Most powerful feature

Register up to 3 reference images, and the AI maintains those image details (logos, text, designs) throughout the entire video.

Front photo (logo) + Back photo (text) + "A model spins and dances" → A video where both the front logo and back text are accurately shown

7.3 Image → Video vs Reference Images — What's the Difference?

Image 19
Image → VideoReference Images
RoleImage = First frameImage = Overall style guide
Number of Images1Up to 3
ProsPrecisely set the starting sceneLogos, text, and details maintained throughout the video
ConsMiddle to end is freely generated by AICannot specify the starting scene
Best For"Start from this scene""Show this design throughout"

Designer Tip For cases where the design needs to be shown accurately — such as t-shirts, packaging, logos — use the reference image method. Include front, back, and side photos together, and the AI will accurately reproduce the design from any angle.

7.4 Specifying First Frame + Last Frame

You can also specify both the "starting scene" and "ending scene" at the same time. The AI creates a video that smoothly transitions between the two scenes.

First frame: Front view photo + Last frame: Back view photo → A video that starts from the front and naturally turns around to the back

However, while this method nails the start and end, the AI freely fills in the middle, so details may disappear in between. If design details matter, the reference image method is the better choice.

7.5 How to Talk to Claude Code

Example 1 — Product Mockup Video:

"Use this t-shirt front photo (front.jpg) and back photo (back.jpg) as references, and create a video with Veo 3.1 showing a model dancing and showing the front and back."

Example 2 — Package Design Presentation:

"Using these 3 package design images (front, side, back) as references, create a video of the package slowly rotating 360 degrees. Use the Veo 3.1 API."

Example 3 — Social Media Reels Content:

"Using this product photo as the first frame, create a video of the product falling from the sky and landing on a table. 9:16 vertical aspect ratio."

7.6 How to Write Great Video Prompts

Similar to image prompts, but for video you need to additionally describe movement and camera work.

Image 20
ElementDescriptionExample
SubjectWhat appears"A man wearing a black t-shirt"
StyleVideo tone"Cinematic", "Documentary", "Music video"
CompositionAspect ratio and framing"9:16 vertical", "Full shot", "Close-up"
ColorColor tone"Natural light", "Neon", "Warm tones"
MoodVibe"Energetic", "Calm", "Dramatic"
MovementHow the subject moves"Smooth rotation", "Hip-hop dance", "Slow walk"
CameraHow the camera moves"Panning wide shot", "Zoom in", "Static shot"

Bad prompt:

Make me a t-shirt video

Good prompt:

A man wearing a black t-shirt with a white paper airplane logo
performs a smooth spinning dance, turning around to show the back
of the shirt with 'spacebar' text clearly visible.
Studio background, natural lighting, cinematic motion.

7.7 Veo 3.1 Limitations

ItemDetails
Max Length8 seconds (8s required for 1080p/4K or reference images), 4s/6s also available
Resolution720p, 1080p (8s only), 4K (8s only)
Aspect Ratio16:9 (landscape), 9:16 (portrait)
Reference ImagesUp to 3
AudioNative audio auto-generated (background music, sound effects)
Video ExtensionSupported (720p only for extensions)