Hugging Face Spaces¶
Spaces is Hugging Face's platform for hosting ML demo apps. You write a Gradio or Streamlit app, push it to a Space repository, and it runs on free CPU hardware (or a GPU for $0.40–$2/hour). No Docker, no Kubernetes, no infrastructure — the app is live in minutes.
Learning objectives¶
- Build a Gradio app that calls a model via the Inference API
- Deploy to a Space using Git or the SDK
- Use the Spaces API to call any running demo programmatically
- Understand hardware tiers and when to upgrade
Your first Gradio app¶
A Gradio app needs one file — app.py. The gr.Interface takes a Python function, defines input/output widgets, and generates the full UI automatically.
# app.py — deploy this to a Space
import os
import gradio as gr
from huggingface_hub import InferenceClient
client = InferenceClient(token=os.getenv("HF_TOKEN"))
def classify_customer_query(query: str) -> dict:
"""Classify a customer support query into a department."""
result = client.zero_shot_classification(
text=query,
labels=["billing", "technical support", "account management", "general inquiry"],
model="facebook/bart-large-mnli"
)
return {label: float(score) for label, score in zip(result["labels"], result["scores"])}
demo = gr.Interface(
fn=classify_customer_query,
inputs=gr.Textbox(
label="Customer query",
placeholder="Type a support question...",
lines=3
),
outputs=gr.Label(
label="Department routing",
num_top_classes=4
),
title="Customer Query Router",
description="Routes customer support queries to the right department using zero-shot classification.",
examples=[
["I was charged twice for my subscription."],
["My app crashes when I try to log in."],
["How do I change my password?"],
],
theme=gr.themes.Soft()
)
if __name__ == "__main__":
demo.launch()
Run locally with python app.py — opens at http://localhost:7860.
Multi-component chat interface¶
Gradio's gr.Blocks API gives full layout control. Use gr.ChatInterface for conversational apps.
# app.py — streaming chat with history
import os
import gradio as gr
from huggingface_hub import InferenceClient
client = InferenceClient(
model="mistralai/Mistral-7B-Instruct-v0.3",
token=os.getenv("HF_TOKEN")
)
def chat(message: str, history: list[list[str]]) -> str:
"""Generate a response given a message and conversation history."""
# Build messages list from Gradio history format
messages = []
for human, assistant in history:
messages.append({"role": "user", "content": human})
if assistant:
messages.append({"role": "assistant", "content": assistant})
messages.append({"role": "user", "content": message})
# Format with Mistral chat template
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = ""
for token in client.text_generation(
prompt=prompt,
max_new_tokens=300,
temperature=0.7,
stream=True
):
response += token
yield response
demo = gr.ChatInterface(
fn=chat,
title="Open-Source Chat",
description="Powered by Mistral 7B via Hugging Face Inference API",
examples=["Explain RAG in simple terms.", "Write a Python function to reverse a string."],
type="messages"
)
if __name__ == "__main__":
demo.launch()
Deploying to a Space¶
Option 1 — Git push (recommended)
# Create the Space on the Hub (or via web UI)
huggingface-cli repo create my-demo --type space --space_sdk gradio
# Clone, add files, push
git clone https://huggingface.co/spaces/your-username/my-demo
cd my-demo
# Add your app.py and requirements.txt
cat > requirements.txt << EOF
gradio>=4.0
huggingface_hub>=0.20
transformers>=4.40
EOF
git add app.py requirements.txt
git commit -m "add initial demo"
git push
# Space builds automatically; live at https://huggingface.co/spaces/your-username/my-demo
Option 2 — Python SDK
from huggingface_hub import HfApi, create_repo
api = HfApi(token=os.getenv("HF_TOKEN"))
# Create the Space
create_repo(
repo_id="your-username/my-demo",
repo_type="space",
space_sdk="gradio",
private=False,
exist_ok=True
)
# Upload files
api.upload_file(path_or_fileobj="app.py", path_in_repo="app.py",
repo_id="your-username/my-demo", repo_type="space")
api.upload_file(path_or_fileobj="requirements.txt", path_in_repo="requirements.txt",
repo_id="your-username/my-demo", repo_type="space")
Setting Space secrets¶
API keys should never be in your app code. Use Space secrets.
from huggingface_hub import add_space_secret
add_space_secret(
repo_id="your-username/my-demo",
key="HF_TOKEN",
value=os.getenv("HF_TOKEN")
)
# The secret is now available as os.environ["HF_TOKEN"] in the Space runtime
Via the web UI: Space settings → Variables and secrets → New secret.
Calling any Space via the API¶
Every Gradio Space exposes a REST API. You can call it from other applications.
from gradio_client import Client
# Connect to any public Space
space_client = Client("your-username/my-demo")
# Call the classify function
result = space_client.predict(
"My subscription was charged twice",
api_name="/predict"
)
print(result)
# List available API endpoints
space_client.view_api()
Space hardware tiers¶
| Hardware | VRAM | Use case | Cost/hour |
|---|---|---|---|
| CPU Basic | — | Lightweight models, demo UI | Free |
| CPU Upgrade | — | Faster CPU, more RAM | $0.03 |
| T4 Small | 16 GB | 7B models in 4-bit | $0.40 |
| T4 Medium | 16 GB | 7B models in float16 | $0.60 |
| A10G Small | 24 GB | 13B models | $1.05 |
| A100 Large | 80 GB | 70B models in float16 | $4.13 |
| ZeroGPU | Shared A100 | Free, 60s time limit per call | Free |
ZeroGPU for prototyping
ZeroGPU allocates a shared A100 slice for each request (up to 60 seconds). It's free for Hugging Face Pro subscribers. For model demos that take < 30 seconds per inference, ZeroGPU is the best free option for GPU-requiring models.