Class GPTChatCompletionServiceImpl
- java.lang.Object
-
- com.composum.ai.backend.base.service.chat.impl.GPTInternalOpenAIHelper.GPTInternalOpenAIHelperInst
-
- com.composum.ai.backend.base.service.chat.impl.GPTChatCompletionServiceImpl
-
- All Implemented Interfaces:
GPTChatCompletionService,GPTInternalOpenAIHelper
public class GPTChatCompletionServiceImpl extends GPTInternalOpenAIHelper.GPTInternalOpenAIHelperInst implements GPTChatCompletionService, GPTInternalOpenAIHelper
Implements the actual access to the ChatGPT chat API.- See Also:
- "https://platform.openai.com/docs/api-reference/chat/create", "https://platform.openai.com/docs/guides/chat"
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classGPTChatCompletionServiceImpl.EnsureResultFutureCallbackMakes doubly sure that result is somehow set after the call.static interfaceGPTChatCompletionServiceImpl.GPTChatCompletionServiceConfigprotected static classGPTChatCompletionServiceImpl.RetryableExceptionThrown when we get a 429 rate limiting response.protected classGPTChatCompletionServiceImpl.StreamDecodingResponseConsumer-
Nested classes/interfaces inherited from interface com.composum.ai.backend.base.service.chat.impl.GPTInternalOpenAIHelper
GPTInternalOpenAIHelper.GPTInternalOpenAIHelperInst
-
-
Field Summary
Fields Modifier and Type Field Description protected GPTBackendsServicebackendsServiceprotected org.osgi.framework.BundleContextbundleContextprotected intconnectionTimeoutstatic StringDEFAULT_EMBEDDINGS_MODELstatic StringDEFAULT_HIGH_INTELLIGENCE_MODELstatic StringDEFAULT_IMAGE_MODELstatic StringDEFAULT_MODELprotected StringdefaultModelprotected static intDEFAULTVALUE_CONNECTIONTIMEOUTprotected static intDEFAULTVALUE_REQUESTS_PER_DAYprotected static intDEFAULTVALUE_REQUESTS_PER_HOURprotected static intDEFAULTVALUE_REQUESTS_PER_MINUTEprotected static intDEFAULTVALUE_REQUESTTIMEOUTprotected booleandisabledprotected RateLimiterembeddingsLimiterRate limiter for embeddings.protected StringembeddingsModelprotected StringembeddingsUrlprotected com.knuddels.jtokkit.api.EncodingencTokenizer used for GPT-4 variants.protected RateLimitergptLimiterIf set, this tells the limits of ChatGPT API itself.protected static com.google.gson.Gsongsonprotected StringhighIntelligenceModelprotected org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClienthttpAsyncClientprotected StringimageModelprotected longlastGptLimiterCreationTimeprotected RateLimiterlimiterLimiter that maps the financial reasons to limit.protected static org.slf4j.LoggerLOGprotected IntegermaximumTokensPerRequestprotected IntegermaximumTokensPerResponsestatic intMAXTRIESThe maximum number of retries.protected static StringOPENAI_EMBEDDINGS_URLprotected static PatternPATTERN_TRY_AGAINprotected com.knuddels.jtokkit.api.EncodingRegistryregistryprotected AtomicLongrequestCounterprotected intrequestTimeoutprotected ScheduledExecutorServicescheduledExecutorServiceprotected Map<String,GPTChatMessagesTemplate>templatesstatic StringTRUNCATE_MARKER-
Fields inherited from interface com.composum.ai.backend.base.service.chat.GPTChatCompletionService
MARKER_DEBUG_OUTPUT_REQUEST, MARKER_DEBUG_PRINT_REQUEST
-
-
Constructor Summary
Constructors Constructor Description GPTChatCompletionServiceImpl()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidactivate(GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config, org.osgi.framework.BundleContext bundleContext)protected voidcheckEnabled()protected voidcheckEnabled(GPTConfiguration gptConfig)protected voidcheckTokenCount(String jsonRequest)intcountTokens(String text)Counts the number of tokens for the text for the normally used model.protected ChatCompletionRequestcreateExternalRequest(GPTChatRequest request)Creates the external request.voiddeactivate()protected StringdetermineModel(GPTConfiguration configuration, boolean hasImage)protected static GPTChatCompletionServiceImpl.RetryableExceptionextractRetryableException(Throwable e)List<float[]>getEmbeddings(List<String> texts, GPTConfiguration configuration)Calculates embeddings for the given list of texts.protected List<float[]>getEmbeddingsImpl(List<String> texts, GPTConfiguration configuration, long id)protected List<float[]>getEmbeddingsImplDivideAndConquer(List<String> texts, GPTConfiguration configuration, long id)StringgetEmbeddingsModel()Returns the model used forGPTChatCompletionService.getEmbeddings(List, GPTConfiguration).GPTInternalOpenAIHelper.GPTInternalOpenAIHelperInstgetInstance()Returns a helper for implementation in this package.protected StringgetModel(GPTConfiguration gptConfiguration)StringgetSingleChatCompletion(GPTChatRequest request)The simplest case: give some messages and get a single response.GPTChatMessagesTemplategetTemplate(String templateName)Retrieves a (usually cached) chat template with that name.protected voidhandleStreamingEvent(GPTCompletionCallback callback, long id, String line)Handle a single line of the streaming response.StringhtmlToMarkdown(String html)Helper for preprocessing HTML so that it can easily read by ChatGPT.booleanisEnabled()Whether ChatGPT completion is enabled.booleanisEnabled(GPTConfiguration gptConfig)Checks whetherGPTChatCompletionService.isEnabled()and whether gptConfig enables executing GPT calls.booleanisVisionEnabled()Returns true if vision is enabled.protected org.apache.hc.client5.http.async.methods.SimpleHttpRequestmakeRequest(String jsonRequest, GPTConfiguration gptConfiguration)StringmarkdownToHtml(String markdown)Opposite ofGPTChatCompletionService.htmlToMarkdown(String).protected voidperformCallAsync(CompletableFuture<Void> finished, long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback, int tryNumber, long defaultDelay)Executes a call with retries.protected <T> longrecalculateDelay(String responsebody, long delay)If the response body contains a string like "Please try again in 20s." (number varies) we return a value of that many seconds, otherwise just use iterative doubling.Stringshorten(String text, int maxTokens)Helper method to shorten texts by taking out the middle if too long.voidstreamingChatCompletion(GPTChatRequest request, GPTCompletionCallback callback)Give some messages and receive the streaming response via callback, to reduce waiting time.voidstreamingChatCompletionWithToolCalls(GPTChatRequest request, GPTCompletionCallback callback)Give some messages and receive the streaming response via callback, to reduce waiting time.protected CompletableFuture<Void>triggerCallAsync(long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback)Puts the call into the pipeline; the returned future will be set normally or exceptionally when it's done.protected voidwaitForLimit()
-
-
-
Field Detail
-
LOG
protected static final org.slf4j.Logger LOG
-
OPENAI_EMBEDDINGS_URL
protected static final String OPENAI_EMBEDDINGS_URL
- See Also:
- Constant Field Values
-
PATTERN_TRY_AGAIN
protected static final Pattern PATTERN_TRY_AGAIN
-
DEFAULT_MODEL
public static final String DEFAULT_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_IMAGE_MODEL
public static final String DEFAULT_IMAGE_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_EMBEDDINGS_MODEL
public static final String DEFAULT_EMBEDDINGS_MODEL
- See Also:
- Constant Field Values
-
DEFAULT_HIGH_INTELLIGENCE_MODEL
public static final String DEFAULT_HIGH_INTELLIGENCE_MODEL
- See Also:
- Constant Field Values
-
DEFAULTVALUE_CONNECTIONTIMEOUT
protected static final int DEFAULTVALUE_CONNECTIONTIMEOUT
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTTIMEOUT
protected static final int DEFAULTVALUE_REQUESTTIMEOUT
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_MINUTE
protected static final int DEFAULTVALUE_REQUESTS_PER_MINUTE
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_HOUR
protected static final int DEFAULTVALUE_REQUESTS_PER_HOUR
- See Also:
- Constant Field Values
-
DEFAULTVALUE_REQUESTS_PER_DAY
protected static final int DEFAULTVALUE_REQUESTS_PER_DAY
- See Also:
- Constant Field Values
-
TRUNCATE_MARKER
public static final String TRUNCATE_MARKER
- See Also:
- Constant Field Values
-
MAXTRIES
public static final int MAXTRIES
The maximum number of retries.- See Also:
- Constant Field Values
-
defaultModel
protected String defaultModel
-
highIntelligenceModel
protected String highIntelligenceModel
-
imageModel
protected String imageModel
-
httpAsyncClient
protected org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient httpAsyncClient
-
gson
protected static final com.google.gson.Gson gson
-
requestCounter
protected final AtomicLong requestCounter
-
limiter
protected RateLimiter limiter
Limiter that maps the financial reasons to limit.
-
lastGptLimiterCreationTime
protected volatile long lastGptLimiterCreationTime
-
gptLimiter
protected volatile RateLimiter gptLimiter
If set, this tells the limits of ChatGPT API itself.
-
registry
protected com.knuddels.jtokkit.api.EncodingRegistry registry
-
enc
protected com.knuddels.jtokkit.api.Encoding enc
Tokenizer used for GPT-4 variants.
-
bundleContext
protected org.osgi.framework.BundleContext bundleContext
-
templates
protected final Map<String,GPTChatMessagesTemplate> templates
-
requestTimeout
protected int requestTimeout
-
connectionTimeout
protected int connectionTimeout
-
disabled
protected boolean disabled
-
scheduledExecutorService
protected ScheduledExecutorService scheduledExecutorService
-
maximumTokensPerRequest
protected Integer maximumTokensPerRequest
-
maximumTokensPerResponse
protected Integer maximumTokensPerResponse
-
embeddingsLimiter
protected volatile RateLimiter embeddingsLimiter
Rate limiter for embeddings. These are a quite inexpensive service (0.13$ per million tokens), so we just introduce a limit that should protect against malfunctions for now.
-
embeddingsUrl
protected String embeddingsUrl
-
embeddingsModel
protected String embeddingsModel
-
backendsService
protected GPTBackendsService backendsService
-
-
Method Detail
-
activate
public void activate(GPTChatCompletionServiceImpl.GPTChatCompletionServiceConfig config, org.osgi.framework.BundleContext bundleContext)
-
deactivate
public void deactivate()
-
getSingleChatCompletion
public String getSingleChatCompletion(@Nonnull GPTChatRequest request) throws GPTException
Description copied from interface:GPTChatCompletionServiceThe simplest case: give some messages and get a single response. If the response can be more than a few words, do consider usingGPTChatCompletionService.streamingChatCompletion(GPTChatRequest, GPTCompletionCallback)instead, to give the user some feedback while waiting.- Specified by:
getSingleChatCompletionin interfaceGPTChatCompletionService- Throws:
GPTException
-
makeRequest
protected org.apache.hc.client5.http.async.methods.SimpleHttpRequest makeRequest(String jsonRequest, GPTConfiguration gptConfiguration)
-
streamingChatCompletion
public void streamingChatCompletion(@Nonnull GPTChatRequest request, @Nonnull GPTCompletionCallback callback) throws GPTException
Description copied from interface:GPTChatCompletionServiceGive some messages and receive the streaming response via callback, to reduce waiting time. It possibly waits if a rate limit is reached, but otherwise returns immediately after scheduling an asynchronous call.- Specified by:
streamingChatCompletionin interfaceGPTChatCompletionService- Throws:
GPTException
-
streamingChatCompletionWithToolCalls
public void streamingChatCompletionWithToolCalls(@Nonnull GPTChatRequest request, @Nonnull GPTCompletionCallback callback) throws GPTException
Description copied from interface:GPTChatCompletionServiceGive some messages and receive the streaming response via callback, to reduce waiting time. This implementation also performs tool calls if tools are given inGPTChatRequest.getConfiguration(). It possibly waits if a rate limit is reached, but otherwise returns immediately after scheduling an asynchronous call.- Specified by:
streamingChatCompletionWithToolCallsin interfaceGPTChatCompletionService- Throws:
GPTException
-
handleStreamingEvent
protected void handleStreamingEvent(GPTCompletionCallback callback, long id, String line)
Handle a single line of the streaming response.First message e.g.: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}- Data:
gather {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" above"},"index":0,"finish_reason":null}]} - End: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1686890500,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
-
waitForLimit
protected void waitForLimit()
-
performCallAsync
protected void performCallAsync(CompletableFuture<Void> finished, long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback, int tryNumber, long defaultDelay)
Executes a call with retries. The response is written to callback; when it's finished the future is set - either normally or exceptionally if there was an error.- Parameters:
finished- the future to set when the call is finishedid- the id of the call, for logginghttpRequest- the request to sendcallback- the callback to write the response totryNumber- the number of the try - if it's 5 , we give up.
-
extractRetryableException
protected static GPTChatCompletionServiceImpl.RetryableException extractRetryableException(Throwable e)
-
triggerCallAsync
protected CompletableFuture<Void> triggerCallAsync(long id, org.apache.hc.client5.http.async.methods.SimpleHttpRequest httpRequest, GPTCompletionCallback callback)
Puts the call into the pipeline; the returned future will be set normally or exceptionally when it's done.
-
recalculateDelay
protected <T> long recalculateDelay(String responsebody, long delay)
If the response body contains a string like "Please try again in 20s." (number varies) we return a value of that many seconds, otherwise just use iterative doubling.
-
createExternalRequest
protected ChatCompletionRequest createExternalRequest(GPTChatRequest request) throws com.fasterxml.jackson.core.JsonProcessingException
Creates the external request. Caution: the model name does have to be processed withGPTBackendsService.getModelNameInBackend(String)- the actual model can only be determined here but can have a backend prefix yet.- Throws:
com.fasterxml.jackson.core.JsonProcessingException
-
determineModel
protected String determineModel(GPTConfiguration configuration, boolean hasImage)
-
checkEnabled
protected void checkEnabled()
-
isEnabled
public boolean isEnabled()
Description copied from interface:GPTChatCompletionServiceWhether ChatGPT completion is enabled. If not, calling the methods that access ChatGPT throws an IllegalStateException.- Specified by:
isEnabledin interfaceGPTChatCompletionService
-
isEnabled
public boolean isEnabled(GPTConfiguration gptConfig)
Description copied from interface:GPTChatCompletionServiceChecks whetherGPTChatCompletionService.isEnabled()and whether gptConfig enables executing GPT calls. (That is currently whether there is an api key either globally or in the gptConfig).- Specified by:
isEnabledin interfaceGPTChatCompletionService- Specified by:
isEnabledin interfaceGPTInternalOpenAIHelper
-
checkEnabled
protected void checkEnabled(GPTConfiguration gptConfig)
-
isVisionEnabled
public boolean isVisionEnabled()
Description copied from interface:GPTChatCompletionServiceReturns true if vision is enabled.- Specified by:
isVisionEnabledin interfaceGPTChatCompletionService
-
getTemplate
@Nonnull public GPTChatMessagesTemplate getTemplate(@Nonnull String templateName) throws GPTException
Description copied from interface:GPTChatCompletionServiceRetrieves a (usually cached) chat template with that name. Mostly for backend internal use. The templates are retrieved from the bundle resources at "chattemplates/", and are cached.- Specified by:
getTemplatein interfaceGPTChatCompletionService- Parameters:
templateName- the name of the template to retrieve, e.g. "singleTranslation" .- Throws:
GPTException
-
shorten
@Nonnull public String shorten(@Nullable String text, int maxTokens)
Description copied from interface:GPTChatCompletionServiceHelper method to shorten texts by taking out the middle if too long. In texts longer than this many tokens we replace the middle with " ... (truncated) ... " since ChatGPT can only process a limited number of words / tokens and in the introduction or summary there is probably the most condensed information about the text. The output has then maxTokens tokens, including the ... marker.- Specified by:
shortenin interfaceGPTChatCompletionService- Parameters:
text- the text to shortenmaxTokens- the maximum number of tokens in the output
-
markdownToHtml
public String markdownToHtml(String markdown)
Description copied from interface:GPTChatCompletionServiceOpposite ofGPTChatCompletionService.htmlToMarkdown(String).- Specified by:
markdownToHtmlin interfaceGPTChatCompletionService
-
countTokens
public int countTokens(@Nullable String text)
Description copied from interface:GPTChatCompletionServiceCounts the number of tokens for the text for the normally used model. Caution: message boundaries need some tokens and slicing text might create a token or two, too, so do not exactly rely on that.- Specified by:
countTokensin interfaceGPTChatCompletionService
-
checkTokenCount
protected void checkTokenCount(String jsonRequest)
-
htmlToMarkdown
@Nonnull public String htmlToMarkdown(String html)
Description copied from interface:GPTChatCompletionServiceHelper for preprocessing HTML so that it can easily read by ChatGPT.- Specified by:
htmlToMarkdownin interfaceGPTChatCompletionService
-
getEmbeddings
@Nonnull public List<float[]> getEmbeddings(List<String> texts, GPTConfiguration configuration) throws GPTException
Description copied from interface:GPTChatCompletionServiceCalculates embeddings for the given list of texts.- Specified by:
getEmbeddingsin interfaceGPTChatCompletionService- Throws:
GPTException
-
getEmbeddingsImplDivideAndConquer
protected List<float[]> getEmbeddingsImplDivideAndConquer(List<String> texts, GPTConfiguration configuration, long id)
-
getEmbeddingsImpl
protected List<float[]> getEmbeddingsImpl(List<String> texts, GPTConfiguration configuration, long id)
-
getEmbeddingsModel
public String getEmbeddingsModel()
Description copied from interface:GPTChatCompletionServiceReturns the model used forGPTChatCompletionService.getEmbeddings(List, GPTConfiguration).- Specified by:
getEmbeddingsModelin interfaceGPTChatCompletionService
-
getInstance
public GPTInternalOpenAIHelper.GPTInternalOpenAIHelperInst getInstance()
Description copied from interface:GPTInternalOpenAIHelperReturns a helper for implementation in this package. We do this indirection to make it only available for this package, since otherwise everything is public in an interface.- Specified by:
getInstancein interfaceGPTInternalOpenAIHelper
-
getModel
protected String getModel(@Nullable GPTConfiguration gptConfiguration)
-
-