06. HuggingFace Endpoints

Huggingface Endpoints

The Hugging Face Hub is a platform with more than 120,000 models, 20,000 datasets, and 50,000 demo apps, all open source and publicly available. On this online platform, people can easily collaborate and build machine learning together.

Hugging Face Hub also offers a variety of endpoints to build a variety of ML applications. This example shows how to connect to different types of endpoints.

In particular, text generation inference is driven by the Text Generation Inference. These are Rust, Python, and gRPC servers tailored for very fast text generation reasoning.

Issuing a Hugging Face Token

After signing up for the Hugging Face (https://huggingface.co), we apply for the issuance of tokens at the address below.

Reference model list

Using Huggingface Endpoints

Python for use huggingface_hub Install package Should do.

Copy

# !pip install -qU huggingface_hub

.env Tokens already issued in the file HUGGINGFACEHUB_API_TOKEN After saving, proceed to the next step roll.

HUGGINGFACEHUB_API_TOKEN Bring.

Copy

Copy

Enter the Hugging Face Token

Copy

Generate a simple prompt.

Copy

Serverless Endpoints

The Reference API is available for free and rates are limited. If you need a reasoning solution for production, Reference Endpoints Check the service. Reference Endpoints makes it easy to deploy all machine learning models to dedicated and fully managed infrastructure. Choose your cloud, region, computing instance, auto-expansion range, and security level to suit your model, latency, throughput, and compliance requirements.

Here is an example of how to access the Reference API.

Reference

repo_id HuggingFace model on variable repo ID Assign (Storage ID).

Copy

Copy

Copy

Copy

Dedicated endpoint

The free serverless API allows you to quickly implement and repeat the solution. However, there may be speed limits in large-capacity use cases because the load is shared with other requests.

For enterprise workloads, Inference Endpoints - Dedicated It is best to use. This gives you access to a fully managed infrastructure that provides more flexibility and speed.

These resources include ongoing support and uptime guarantees, as well as options like AutoScaling.

  • hf_endpoint_url Set the URL of the Reference Endpoint to the variable.

Copy

Copy

Copy

Copy

Copy

Copy

Copy

Last updated