Class GPTChatCompletionServiceImpl
- java.lang.Object
-
- com.composum.ai.backend.base.service.chat.impl.GPTChatCompletionServiceImpl
-
- All Implemented Interfaces:
GPTChatCompletionService
public class GPTChatCompletionServiceImpl extends Object implements GPTChatCompletionService
Implements the actual access to the ChatGPT chat API.- See Also:
- "https://platform.openai.com/docs/api-reference/chat/create", "https://platform.openai.com/docs/guides/chat"
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
GPTChatCompletionServiceImpl.EnsureResultFutureCallback
Makes doubly sure that result is somehow set after the call.static interface
GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig
protected static class
GPTChatCompletionServiceImpl.RetryableException
Thrown when we get a 429 rate limiting response.protected class
GPTChatCompletionServiceImpl.StreamDecodingResponseConsumer
-
Field Summary
Fields Modifier and Type Field Description protected String
apiKey
The OpenAI Key for accessing ChatGPT; system default if not given in request.protected org.osgi.framework.BundleContext
bundleContext
protected static String
CHAT_COMPLETION_URL
protected String
chatCompletionUrl
protected int
connectionTimeout
static String
DEFAULT_EMBEDDINGS_MODEL
static String
DEFAULT_HIGH_INTELLIGENCE_MODEL
static String
DEFAULT_IMAGE_MODEL
static String
DEFAULT_MODEL
protected String
defaultModel
protected static int
DEFAULTVALUE_CONNECTIONTIMEOUT
protected static int
DEFAULTVALUE_REQUESTS_PER_DAY
protected static int
DEFAULTVALUE_REQUESTS_PER_HOUR
protected static int
DEFAULTVALUE_REQUESTS_PER_MINUTE
protected static int
DEFAULTVALUE_REQUESTTIMEOUT
protected boolean
disabled
protected RateLimiter
embeddingsLimiter
Rate limiter for embeddings.protected String
embeddingsModel
protected String
embeddingsUrl
protected com.knuddels.jtokkit.api.Encoding
enc
Tokenizer used for GPT-3.5 and GPT-4.protected RateLimiter
gptLimiter
If set, this tells the limits of ChatGPT API itself.protected static com.google.gson.Gson
gson
protected String
highIntelligenceModel
protected org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient
httpAsyncClient
protected String
imageModel
protected long
lastGptLimiterCreationTime
protected RateLimiter
limiter
Limiter that maps the financial reasons to limit.protected static org.slf4j.Logger
LOG
protected Integer
maximumTokensPerRequest
protected Integer
maximumTokensPerResponse
static int
MAXTRIES
The maximum number of retries.static String
OPENAI_API_KEY
Environment variable where we take the key from, if not configured directly.static String
OPENAI_API_KEY_SYSPROP
System property where we take the key from, if not configured directly.protected static String
OPENAI_EMBEDDINGS_URL
protected String
organizationId
protected static Pattern
PATTERN_TRY_AGAIN
protected com.knuddels.jtokkit.api.EncodingRegistry
registry
protected AtomicLong
requestCounter
protected int
requestTimeout
protected ScheduledExecutorService
scheduledExecutorService
protected Double
temperature
protected Map<String,GPTChatMessagesTemplate>
templates
static String
TRUNCATE_MARKER
-
Constructor Summary
Constructors Constructor Description GPTChatCompletionServiceImpl()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
activate(GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config, org.osgi.framework.BundleContext bundleContext)
protected static GPTException
buildException(Integer errorStatusCode, String result)
protected void
checkEnabled()
protected void
checkEnabled(GPTConfiguration gptConfig)
protected void
checkTokenCount(String jsonRequest)
int
countTokens(String text)
Counts the number of tokens for the text for the normally used model.protected String
createJsonRequest(GPTChatRequest request)
void
deactivate()
protected static GPTChatCompletionServiceImpl.RetryableException
extractRetryableException(Throwable e)
List<float[]>
getEmbeddings(List<String> texts, GPTConfiguration configuration)
Calculates embeddings for the given list of texts.String
getEmbeddingsModel()
Returns the model used forGPTChatCompletionService.getEmbeddings(List, GPTConfiguration)
.String
getSingleChatCompletion(GPTChatRequest request)
The simplest case: give some messages and get a single response.GPTChatMessagesTemplate
getTemplate(String templateName)
Retrieves a (usually cached) chat template with that name.protected void
handleStreamingEvent(GPTCompletionCallback callback, long id, String line)
Handle a single line of the streaming response.String
htmlToMarkdown(String html)
Helper for preprocessing HTML so that it can easily read by ChatGPT.boolean
isEnabled()
Whether ChatGPT completion is enabled.boolean
isEnabled(GPTConfiguration gptConfig)
Checks whetherGPTChatCompletionService.isEnabled()
and whether gptConfig enables executing GPT calls.boolean
isVisionEnabled()
Returns true if vision is enabled.protected org.apache.hc.client5.http.async.methods.SimpleHttpRequest
makeRequest(String jsonRequest, GPTConfiguration gptConfiguration, String url)
String
markdownToHtml(String markdown)
Opposite ofGPTChatCompletionService.htmlToMarkdown(String)
.protected void
performCallAsync(CompletableFuture<Void> finished, long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback, int tryNumber, long defaultDelay)
Executes a call with retries.protected <T> long
recalculateDelay(String responsebody, long delay)
If the response body contains a string like "Please try again in 20s." (number varies) we return a value of that many seconds, otherwise just use iterative doubling.protected static String
retrieveOpenAIKey(GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config)
String
shorten(String text, int maxTokens)
Helper method to shorten texts by taking out the middle if too long.void
streamingChatCompletion(GPTChatRequest request, GPTCompletionCallback callback)
Give some messages and receive the streaming response via callback, to reduce waiting time.protected CompletableFuture<Void>
triggerCallAsync(long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback)
Puts the call into the pipeline; the returned future will be set normally or exceptionally when it's done.protected void
waitForLimit()
-
-
-
Field Detail
-
LOG
protected static final org.slf4j.Logger LOG
-
CHAT_COMPLETION_URL
protected static final String CHAT_COMPLETION_URL
- See Also:
- Constant Field Values
-
OPENAI_EMBEDDINGS_URL
protected static final String OPENAI_EMBEDDINGS_URL
- See Also:
- Constant Field Values
-
PATTERN_TRY_AGAIN
protected static final Pattern PATTERN_TRY_AGAIN
-
OPENAI_API_KEY
public static final String OPENAI_API_KEY
Environment variable where we take the key from, if not configured directly.- See Also:
- Constant Field Values
-
OPENAI_API_KEY_SYSPROP
public static final String OPENAI_API_KEY_SYSPROP
System property where we take the key from, if not configured directly.- See Also:
- Constant Field Values
-
DEFAULT_MODEL
public static final String DEFAULT_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_IMAGE_MODEL
public static final String DEFAULT_IMAGE_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_EMBEDDINGS_MODEL
public static final String DEFAULT_EMBEDDINGS_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_HIGH_INTELLIGENCE_MODEL
public static final String DEFAULT_HIGH_INTELLIGENCE_MODEL
- See Also:
- Constant Field Values
-
DEFAULTVALUE_CONNECTIONTIMEOUT
protected static final int DEFAULTVALUE_CONNECTIONTIMEOUT
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTTIMEOUT
protected static final int DEFAULTVALUE_REQUESTTIMEOUT
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_MINUTE
protected static final int DEFAULTVALUE_REQUESTS_PER_MINUTE
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_HOUR
protected static final int DEFAULTVALUE_REQUESTS_PER_HOUR
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_DAY
protected static final int DEFAULTVALUE_REQUESTS_PER_DAY
- See Also:
- Constant Field Values
-
TRUNCATE_MARKER
public static final String TRUNCATE_MARKER
- See Also:
- Constant Field Values
-
MAXTRIES
public static final int MAXTRIES
The maximum number of retries.- See Also:
- Constant Field Values
-
apiKey
protected String apiKey
The OpenAI Key for accessing ChatGPT; system default if not given in request.
-
organizationId
protected String organizationId
-
defaultModel
protected String defaultModel
-
highIntelligenceModel
protected String highIntelligenceModel
-
imageModel
protected String imageModel
-
chatCompletionUrl
protected String chatCompletionUrl
-
httpAsyncClient
protected org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient httpAsyncClient
-
gson
protected static final com.google.gson.Gson gson
-
requestCounter
protected final AtomicLong requestCounter
-
limiter
protected RateLimiter limiter
Limiter that maps the financial reasons to limit.
-
lastGptLimiterCreationTime
protected volatile long lastGptLimiterCreationTime
-
gptLimiter
protected volatile RateLimiter gptLimiter
If set, this tells the limits of ChatGPT API itself.
-
registry
protected com.knuddels.jtokkit.api.EncodingRegistry registry
-
enc
protected com.knuddels.jtokkit.api.Encoding enc
Tokenizer used for GPT-3.5 and GPT-4.
-
bundleContext
protected org.osgi.framework.BundleContext bundleContext
-
templates
protected final Map<String,GPTChatMessagesTemplate> templates
-
requestTimeout
protected int requestTimeout
-
connectionTimeout
protected int connectionTimeout
-
temperature
protected Double temperature
-
disabled
protected boolean disabled
-
scheduledExecutorService
protected ScheduledExecutorService scheduledExecutorService
-
maximumTokensPerRequest
protected Integer maximumTokensPerRequest
-
maximumTokensPerResponse
protected Integer maximumTokensPerResponse
-
embeddingsLimiter
protected volatile RateLimiter embeddingsLimiter
Rate limiter for embeddings. These are a quite inexpensive service (0.13$ per million tokens), so we just introduce a limit that should protect against malfunctions for now.
-
embeddingsUrl
protected String embeddingsUrl
-
embeddingsModel
protected String embeddingsModel
-
-
Method Detail
-
activate
public void activate(GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config, org.osgi.framework.BundleContext bundleContext)
-
deactivate
public void deactivate()
-
retrieveOpenAIKey
protected static String retrieveOpenAIKey(@Nullable GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config)
-
getSingleChatCompletion
public String getSingleChatCompletion(@Nonnull GPTChatRequest request) throws GPTException
Description copied from interface:GPTChatCompletionService
The simplest case: give some messages and get a single response. If the response can be more than a few words, do consider usingGPTChatCompletionService.streamingChatCompletion(GPTChatRequest, GPTCompletionCallback)
instead, to give the user some feedback while waiting.- Specified by:
getSingleChatCompletion
in interfaceGPTChatCompletionService
- Throws:
GPTException
-
makeRequest
protected org.apache.hc.client5.http.async.methods.SimpleHttpRequest makeRequest(String jsonRequest, GPTConfiguration gptConfiguration, String url)
-
streamingChatCompletion
public void streamingChatCompletion(@Nonnull GPTChatRequest request, @Nonnull GPTCompletionCallback callback) throws GPTException
Description copied from interface:GPTChatCompletionService
Give some messages and receive the streaming response via callback, to reduce waiting time. It possibly waits if a rate limit is reached, but otherwise returns immediately after scheduling an asynchronous call.- Specified by:
streamingChatCompletion
in interfaceGPTChatCompletionService
- Throws:
GPTException
-
handleStreamingEvent
protected void handleStreamingEvent(GPTCompletionCallback callback, long id, String line)
Handle a single line of the streaming response.First message e.g.: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}
- Data:
gather {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" above"},"index":0,"finish_reason":null}]}
- End: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
-
waitForLimit
protected void waitForLimit()
-
performCallAsync
protected void performCallAsync(CompletableFuture<Void> finished, long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback, int tryNumber, long defaultDelay)
Executes a call with retries. The response is written to callback; when it's finished the future is set - either normally or exceptionally if there was an error.- Parameters:
finished
- the future to set when the call is finishedid
- the id of the call, for logginghttpRequest
- the request to sendcallback
- the callback to write the response totryNumber
- the number of the try - if it's 5 , we give up.
-
extractRetryableException
protected static GPTChatCompletionServiceImpl.RetryableException extractRetryableException(Throwable e)
-
triggerCallAsync
protected CompletableFuture<Void> triggerCallAsync(long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback)
Puts the call into the pipeline; the returned future will be set normally or exceptionally when it's done.
-
recalculateDelay
protected <T> long recalculateDelay(String responsebody, long delay)
If the response body contains a string like "Please try again in 20s." (number varies) we return a value of that many seconds, otherwise just use iterative doubling.
-
createJsonRequest
protected String createJsonRequest(GPTChatRequest request) throws com.fasterxml.jackson.core.JsonProcessingException
- Throws:
com.fasterxml.jackson.core.JsonProcessingException
-
checkEnabled
protected void checkEnabled()
-
isEnabled
public boolean isEnabled()
Description copied from interface:GPTChatCompletionService
Whether ChatGPT completion is enabled. If not, calling the methods that access ChatGPT throws an IllegalStateException.- Specified by:
isEnabled
in interfaceGPTChatCompletionService
-
isEnabled
public boolean isEnabled(GPTConfiguration gptConfig)
Description copied from interface:GPTChatCompletionService
Checks whetherGPTChatCompletionService.isEnabled()
and whether gptConfig enables executing GPT calls. (That is currently whether there is an api key either globally or in the gptConfig).- Specified by:
isEnabled
in interfaceGPTChatCompletionService
-
checkEnabled
protected void checkEnabled(GPTConfiguration gptConfig)
-
isVisionEnabled
public boolean isVisionEnabled()
Description copied from interface:GPTChatCompletionService
Returns true if vision is enabled.- Specified by:
isVisionEnabled
in interfaceGPTChatCompletionService
-
getTemplate
@Nonnull public GPTChatMessagesTemplate getTemplate(@Nonnull String templateName) throws GPTException
Description copied from interface:GPTChatCompletionService
Retrieves a (usually cached) chat template with that name. Mostly for backend internal use. The templates are retrieved from the bundle resources at "chattemplates/", and are cached.- Specified by:
getTemplate
in interfaceGPTChatCompletionService
- Parameters:
templateName
- the name of the template to retrieve, e.g. "singleTranslation" .- Throws:
GPTException
-
shorten
@Nonnull public String shorten(@Nullable String text, int maxTokens)
Description copied from interface:GPTChatCompletionService
Helper method to shorten texts by taking out the middle if too long. In texts longer than this many tokens we replace the middle with " ... (truncated) ... " since ChatGPT can only process a limited number of words / tokens and in the introduction or summary there is probably the most condensed information about the text. The output has then maxTokens tokens, including the ... marker.- Specified by:
shorten
in interfaceGPTChatCompletionService
- Parameters:
text
- the text to shortenmaxTokens
- the maximum number of tokens in the output
-
markdownToHtml
public String markdownToHtml(String markdown)
Description copied from interface:GPTChatCompletionService
Opposite ofGPTChatCompletionService.htmlToMarkdown(String)
.- Specified by:
markdownToHtml
in interfaceGPTChatCompletionService
-
countTokens
public int countTokens(@Nullable String text)
Description copied from interface:GPTChatCompletionService
Counts the number of tokens for the text for the normally used model. Caution: message boundaries need some tokens and slicing text might create a token or two, too, so do not exactly rely on that.- Specified by:
countTokens
in interfaceGPTChatCompletionService
-
checkTokenCount
protected void checkTokenCount(String jsonRequest)
-
htmlToMarkdown
@Nonnull public String htmlToMarkdown(String html)
Description copied from interface:GPTChatCompletionService
Helper for preprocessing HTML so that it can easily read by ChatGPT.- Specified by:
htmlToMarkdown
in interfaceGPTChatCompletionService
-
getEmbeddings
@Nonnull public List<float[]> getEmbeddings(List<String> texts, GPTConfiguration configuration) throws GPTException
Description copied from interface:GPTChatCompletionService
Calculates embeddings for the given list of texts.- Specified by:
getEmbeddings
in interfaceGPTChatCompletionService
- Throws:
GPTException
-
getEmbeddingsModel
public String getEmbeddingsModel()
Description copied from interface:GPTChatCompletionService
Returns the model used forGPTChatCompletionService.getEmbeddings(List, GPTConfiguration)
.- Specified by:
getEmbeddingsModel
in interfaceGPTChatCompletionService
-
buildException
protected static GPTException buildException(Integer errorStatusCode, String result)
-
-