
Image analysis
API Overview
Add image analysis capabilities to any LLM
API Console
Log in to explore more features! Click to Log In
API Reference (1)
| API Description | API Endpoint | Request Method | Stability | Parameter Description |
|---|---|---|---|---|
Chat(Image analysis) | POST | Stable | View Details | |
Document Details adds image recognition capabilities to all models, and there are two ways to enable it, you can choose either one:
Note: If the suffix of the multimodal model has -ocr, it will also use the specified or default OCR model for image analysis, so try to avoid enabling this feature on multimodal models The principle of this function is: before each request, the user’s image is sent to the multimodal model for analysis, and then the analysis results are incorporated into the model context as reference information. The specific process can be viewed in the logs during the API call. The default OCR model currently in use is gpt-4o-mini. Image analysis prompt words:
Price: On the basis of the original model + the cost of the multimodal model Request Parameters Header ParametersContent-TypestringRequired Example Value: application/jsonAcceptstringRequired Example Value: application/jsonAuthorizationstringRequired Example Value: Bearer {{YOUR_API_KEY}}Request Body application/jsonmodelstringRequired ID of the model to be used. For more details on which models are compatible with the chat API, see the model endpoint compatibility table. messagesarray[object]Required Generate chat completion messages in chat format. rolestringOptional contentstringOptional temperatureintegerOptional What sampling temperature is used, between 0 and 2. A higher value (such as 0.8) will make the output more random, while a lower value (such as 0.2) will make the output more centralized and definite. We usually recommend changing this or top_pintegerOptional An alternative to temperature sampling, called kernel sampling, in which the model considers the results of markers with top_p probability mass. So 0.1 means only considering marks that constitute the top 10% probability mass. We usually recommend changing this or nintegerOptional How many chat completion options are generated for each input message. streambooleanOptional If set, partial message increments will be sent, just like in ChatGPT. When tokens are available, tokens will be sent as raw data stopstringOptional API will stop generating a maximum of 4 tokens for more sequences. max_tokensintegerOptional The maximum number of tokens generated when the chat is complete. The total length of input tokens and generated tokens is limited by the model’s context length. presence_penaltynumberOptional Numbers between -2.0 and 2.0. Positive values punish new tags based on whether they appear in the text so far, thereby increasing the possibility that the model will talk about new topics. See more information about frequency and exists penalties. frequency_penaltynumberOptional Numbers between -2.0 and 2.0. Positive values punish new tags based on their existing frequency in the text, reducing the possibility that the model will repeat the same line verbatim. See more information about frequency and exists penalties. logit_biasnullOptional Modify the possibility that the specified mark appears in completion. Accepts a json object that maps the tag (specified by the tag ID in the tagger) to the associated bias value from -100 to 100. Mathematically, bias is added to the model-generated logits before sampling. The exact effect varies by model, but values between -1 and 1 should reduce or increase the likelihood of selection; values like -100 or 100 should result in prohibited or exclusive selection of the relevant token. userstringOptional A unique identifier that represents your end user, which can help OpenAI monitor and detect abuse. Learn more. | ||||
API Pricing
| Model | Description | 302.AI Price |
|---|
Service | Cost of adding a multimodal model on top of the original LLM |
|