Class HtmlToApproximateMarkdownServicePlugin
- java.lang.Object
-
- com.composum.ai.backend.slingbase.impl.HtmlToApproximateMarkdownServicePlugin
-
- All Implemented Interfaces:
ApproximateMarkdownServicePlugin
public class HtmlToApproximateMarkdownServicePlugin extends Object implements ApproximateMarkdownServicePlugin
A plugin for theApproximateMarkdownService
that transforms the rendered HTML to markdown. That doesn't work for all components, but might more easily capture the text content of certain components than trying to guess it from the JCR representation, as is the default.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
HtmlToApproximateMarkdownServicePlugin.CapturingResponse
We wrap a response to capture the content, forwarding all but modifying methods to the original response.protected static interface
HtmlToApproximateMarkdownServicePlugin.Config
protected static class
HtmlToApproximateMarkdownServicePlugin.EmptyRequestParameterMap
protected class
HtmlToApproximateMarkdownServicePlugin.NonModifyingRequestWrapper
Wraps the request to make sure nothing is modified.protected static class
HtmlToApproximateMarkdownServicePlugin.UnsupportedOperationCalled
Thrown when unsupported operation was called that requires blacklisting.-
Nested classes/interfaces inherited from interface com.composum.ai.backend.slingbase.ApproximateMarkdownServicePlugin
ApproximateMarkdownServicePlugin.PluginResult
-
-
Field Summary
Fields Modifier and Type Field Description protected Pattern
allowedResourceTypePattern
protected Map<String,Long>
blacklistedResourceType
ResourceTypes we ignore since their rendering uses unsupported methods.protected Long
blacklistedResourceTypeCleanupTime
protected Pattern
deniedResourceTypePattern
-
Constructor Summary
Constructors Constructor Description HtmlToApproximateMarkdownServicePlugin()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
activate(HtmlToApproximateMarkdownServicePlugin.Config config)
protected void
cleanupBlacklist()
protected void
deactivate()
@Nullable String
getImageUrl(@Nullable org.apache.sling.api.resource.Resource imageResource)
Retrieves the imageURL in a way useable for ChatGPT - usually data:image/jpeg;base64,{base64_image} If the plugin cannot handle this resource, it should return null.protected boolean
isBecauseOfUnsupportedOperation(Throwable e)
protected boolean
isIgnoredNode(org.apache.sling.api.resource.Resource resource)
We start with depth 3 since the higher nodes often contain headers, navigation and such that don't help for ChatGPT.@NotNull ApproximateMarkdownServicePlugin.PluginResult
maybeHandle(@NotNull org.apache.sling.api.resource.Resource resource, @NotNull PrintWriter out, @NotNull ApproximateMarkdownService service, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)
Checks whether the resource should be handled by this plugin and if so, handles it by printing an appropriate markdown representation to the PrintWriter.protected String
renderedAsHTML(org.apache.sling.api.resource.Resource resource, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)
We render the resource into a mock response and capture and return the generated HTML.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.composum.ai.backend.slingbase.ApproximateMarkdownServicePlugin
cacheMarkdown, getMasterLinks, resourceRendersAsComponentMatching
-
-
-
-
Field Detail
-
allowedResourceTypePattern
protected Pattern allowedResourceTypePattern
-
deniedResourceTypePattern
protected Pattern deniedResourceTypePattern
-
blacklistedResourceType
protected Map<String,Long> blacklistedResourceType
ResourceTypes we ignore since their rendering uses unsupported methods. Blacklisting for only 1h since there might be a deployment in the meantime. Maps the resource type to the time (ms) until it is blacklisted.
-
blacklistedResourceTypeCleanupTime
protected volatile Long blacklistedResourceTypeCleanupTime
-
-
Method Detail
-
maybeHandle
@NotNull public @NotNull ApproximateMarkdownServicePlugin.PluginResult maybeHandle(@NotNull @NotNull org.apache.sling.api.resource.Resource resource, @NotNull @NotNull PrintWriter out, @NotNull @NotNull ApproximateMarkdownService service, @Nonnull org.apache.sling.api.SlingHttpServletRequest request, @Nonnull org.apache.sling.api.SlingHttpServletResponse response)
Description copied from interface:ApproximateMarkdownServicePlugin
Checks whether the resource should be handled by this plugin and if so, handles it by printing an appropriate markdown representation to the PrintWriter.- Specified by:
maybeHandle
in interfaceApproximateMarkdownServicePlugin
- Returns:
- what is already handled by this plugin. It is possible to write to the PrintWriter in any case.
-
cleanupBlacklist
protected void cleanupBlacklist()
-
getImageUrl
@Nullable public @Nullable String getImageUrl(@Nullable @Nullable org.apache.sling.api.resource.Resource imageResource)
Description copied from interface:ApproximateMarkdownServicePlugin
Retrieves the imageURL in a way useable for ChatGPT - usually data:image/jpeg;base64,{base64_image} If the plugin cannot handle this resource, it should return null.- Specified by:
getImageUrl
in interfaceApproximateMarkdownServicePlugin
-
isBecauseOfUnsupportedOperation
protected boolean isBecauseOfUnsupportedOperation(Throwable e)
-
isIgnoredNode
protected boolean isIgnoredNode(@Nonnull org.apache.sling.api.resource.Resource resource)
We start with depth 3 since the higher nodes often contain headers, navigation and such that don't help for ChatGPT.
-
renderedAsHTML
protected String renderedAsHTML(org.apache.sling.api.resource.Resource resource, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response) throws javax.servlet.ServletException, IOException
We render the resource into a mock response and capture and return the generated HTML. The response is wrapped so that the real response cannot be modified. We don't do that for the request, because that would be more complicated and probably not needed.- Throws:
javax.servlet.ServletException
IOException
-
activate
protected void activate(HtmlToApproximateMarkdownServicePlugin.Config config)
-
deactivate
protected void deactivate()
-
-