06. HuggingFace Endpoints
Huggingface Endpoints
The Hugging Face Hub is a platform with more than 120,000 models, 20,000 datasets, and 50,000 demo apps, all open source and publicly available. On this online platform, people can easily collaborate and build machine learning together.
Hugging Face Hub also offers a variety of endpoints to build a variety of ML applications. This example shows how to connect to different types of endpoints.
In particular, text generation inference is driven by the Text Generation Inference. These are Rust, Python, and gRPC servers tailored for very fast text generation reasoning.
Issuing a Hugging Face Token
After signing up for the Hugging Face (https://huggingface.co), we apply for the issuance of tokens at the address below.
Token issuer: https://huggingface.co/docs/hub/security-tokens
Reference model list
Hugging Face LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
LogicKor leaderboard: https://lk.instruct.kr/
Using Huggingface Endpoints
Python for use huggingface_hub Install package Should do.
Copy
# !pip install -qU huggingface_hub.env Tokens already issued in the file HUGGINGFACEHUB_API_TOKEN After saving, proceed to the next step roll.
HUGGINGFACEHUB_API_TOKEN Bring.
Copy
Copy
Enter the Hugging Face Token
Copy
Generate a simple prompt.
Copy
Serverless Endpoints
The Reference API is available for free and rates are limited. If you need a reasoning solution for production, Reference Endpoints Check the service. Reference Endpoints makes it easy to deploy all machine learning models to dedicated and fully managed infrastructure. Choose your cloud, region, computing instance, auto-expansion range, and security level to suit your model, latency, throughput, and compliance requirements.
Here is an example of how to access the Reference API.
Reference
repo_id HuggingFace model on variable repo ID Assign (Storage ID).
microsoft/Phi-3-mini-4k-instructModel: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
Copy
Copy
Copy
Copy
Dedicated endpoint
The free serverless API allows you to quickly implement and repeat the solution. However, there may be speed limits in large-capacity use cases because the load is shared with other requests.
For enterprise workloads, Inference Endpoints - Dedicated It is best to use. This gives you access to a fully managed infrastructure that provides more flexibility and speed.
These resources include ongoing support and uptime guarantees, as well as options like AutoScaling.
hf_endpoint_urlSet the URL of the Reference Endpoint to the variable.
Copy
Copy
Copy
Copy
Copy
Copy
Copy
Last updated