Class ApproximateMarkdownServiceImpl
- java.lang.Object
 - 
- com.composum.ai.backend.slingbase.impl.ApproximateMarkdownServiceImpl
 
 
- 
- All Implemented Interfaces:
 ApproximateMarkdownService
public class ApproximateMarkdownServiceImpl extends Object implements ApproximateMarkdownService
Implementation forApproximateMarkdownService. 
- 
- 
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceApproximateMarkdownServiceImpl.ConfigConfiguration class Config that allows us to configure TEXT_ATTRIBUTES.- 
Nested classes/interfaces inherited from interface com.composum.ai.backend.slingbase.ApproximateMarkdownService
ApproximateMarkdownService.Link 
 - 
 
- 
Field Summary
Fields Modifier and Type Field Description static PatternADMISSIBLE_PATH_PATTERNWe allow generating markdown for subpaths of /content, /public and /preview .static Map<String,String>ATTRIBUTE_TO_MARKDOWN_PREFIXprotected com.composum.ai.backend.base.service.chat.GPTChatCompletionServicechatCompletionServiceprotected Set<String>htmltagsprotected static PatternIGNORED_ATTRIBUTE_PATTERNPattern for several kinds of ignored keys.protected static PatternIGNORED_NODE_NAMESWe ignore nodes named i18n or renditions and nodes starting with rep:, dam:, cq:protected static PatternIGNORED_VALUE_PATTERNIgnored values for labelled output: "true"/ "false" / single number (int / float) attributes or array of numbers attributes, or shorter than 3 digits or path, or array or type date or boolean or {Date} or {Boolean} , inherit, blank, html tags, target .protected static PatternIMAGE_PATTERNprotected PatternlabeledAttributePatternAllowA pattern which attributes have to be output with a label: the attribute name, a colon and a space and then the trimmed attribute value followed by newline.protected PatternlabeledAttributePatternDenyA pattern matching exceptions forlabeledAttributePatternAllow.protected List<String>labelledAttributeOrderA list of labelled attributes that come first if they are present, in the given order.protected PatternPATTERN_HTML_TAGprotected List<ApproximateMarkdownServicePlugin>pluginsprotected List<String>textAttributesA list of attributes that are output (in that ordering) without any label, each on a line for itself.static PatternTHREE_WHITESPACE_PATTERNIf that occurs in a string it has several words.protected List<Pattern>urlBlacklistWhitelist for URLs we can connect to get the markdown.protected List<Pattern>urlWhitelistBlacklist for URLs we can connect to get the markdown.protected static PatternVIDEO_PATTERN- 
Fields inherited from interface com.composum.ai.backend.slingbase.ApproximateMarkdownService
HEADER_IMAGEPATH 
 - 
 
- 
Constructor Summary
Constructors Constructor Description ApproximateMarkdownServiceImpl() 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidactivate(ApproximateMarkdownServiceImpl.Config config)protected booleanadmissibleKey(String key)protected booleanadmissibleValue(Object object)We do not print pure numbers, booleans and some special strings since those are likely attributes determining the component layout, not actual text that is printed.voidapproximateMarkdown(org.apache.sling.api.resource.Resource resource, PrintWriter realOutput, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)Generates a text formatted with markdown that heuristically represents the text content of a page or resource, mainly for use with the AI.StringapproximateMarkdown(org.apache.sling.api.resource.Resource resource, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)Generates a text formatted with markdown that heuristically represents the text content of a page or resource, mainly for use with the AI.protected StringattributeToMarkdown(@NotNull org.apache.sling.api.resource.Resource resource, String attributename, String value)protected voidcaptureHtmlTags(String value)protected URIcheckUrlAdmissible(URI uri)protected voidcollectLinks(@NotNull org.apache.sling.api.resource.Resource resource, List<ApproximateMarkdownService.Link> resourceLinks)Collects links from a resource and its children.protected voiddeactivate()protected ApproximateMarkdownServicePlugin.PluginResultexecutePlugins(org.apache.sling.api.resource.Resource resource, PrintWriter out, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)@NotNull List<ApproximateMarkdownService.Link>getComponentLinks(@NotNull org.apache.sling.api.resource.Resource resource)Returns a number of links that are saved in the component or siblings of the component that could be used as a proposal for the user to be used as source for the AI via markdown generation etc.StringgetImageUrl(org.apache.sling.api.resource.Resource imageResource)Retrieves the imageURL in a way useable for ChatGPT - usually data:image/jpeg;base64,{base64_image}StringgetMarkdown(String value)Returns a markdown representation of an attribute value, which might be plain text or HTML.@NotNull StringgetMarkdown(URI uri)Retrieves the text content for an URL.protected booleanhandleCodeblock(org.apache.sling.api.resource.Resource resource, PrintWriter out, boolean printEmptyLine)protected booleanhandleLabeledAttributes(org.apache.sling.api.resource.Resource resource, PrintWriter out, boolean printEmptyLine)protected booleanhandleResource(@NotNull org.apache.sling.api.resource.Resource resource, @NotNull PrintWriter out, boolean printEmptyLine)protected voidlogUnhandledAttributes(org.apache.sling.api.resource.Resource resource)protected voidtraverseTreeForStructureGathering(org.apache.sling.api.resource.Resource resource, PrintWriter out, String outerResourceType, String subpath)This is debugging code we needed to gather information for the implementation; we keep it around for now. 
 - 
 
- 
- 
Field Detail
- 
IGNORED_VALUE_PATTERN
protected static final Pattern IGNORED_VALUE_PATTERN
Ignored values for labelled output: "true"/ "false" / single number (int / float) attributes or array of numbers attributes, or shorter than 3 digits or path, or array or type date or boolean or {Date} or {Boolean} , inherit, blank, html tags, target . 
- 
IGNORED_ATTRIBUTE_PATTERN
protected static final Pattern IGNORED_ATTRIBUTE_PATTERN
Pattern for several kinds of ignored keys. 
- 
IGNORED_NODE_NAMES
protected static final Pattern IGNORED_NODE_NAMES
We ignore nodes named i18n or renditions and nodes starting with rep:, dam:, cq: 
- 
IMAGE_PATTERN
protected static final Pattern IMAGE_PATTERN
 
- 
VIDEO_PATTERN
protected static final Pattern VIDEO_PATTERN
 
- 
ADMISSIBLE_PATH_PATTERN
public static final Pattern ADMISSIBLE_PATH_PATTERN
We allow generating markdown for subpaths of /content, /public and /preview . 
- 
THREE_WHITESPACE_PATTERN
public static final Pattern THREE_WHITESPACE_PATTERN
If that occurs in a string it has several words. 
- 
textAttributes
@Nonnull protected List<String> textAttributes
A list of attributes that are output (in that ordering) without any label, each on a line for itself. 
- 
labelledAttributeOrder
protected List<String> labelledAttributeOrder
A list of labelled attributes that come first if they are present, in the given order. 
- 
labeledAttributePatternAllow
@Nullable protected Pattern labeledAttributePatternAllow
A pattern which attributes have to be output with a label: the attribute name, a colon and a space and then the trimmed attribute value followed by newline. 
- 
labeledAttributePatternDeny
@Nullable protected Pattern labeledAttributePatternDeny
A pattern matching exceptions forlabeledAttributePatternAllow. 
- 
urlBlacklist
protected List<Pattern> urlBlacklist
Whitelist for URLs we can connect to get the markdown. Required - the URL has to match one of the patterns. 
- 
urlWhitelist
protected List<Pattern> urlWhitelist
Blacklist for URLs we can connect to get the markdown. The URL must not match one of the patterns. 
- 
chatCompletionService
protected com.composum.ai.backend.base.service.chat.GPTChatCompletionService chatCompletionService
 
- 
plugins
@Nonnull protected volatile List<ApproximateMarkdownServicePlugin> plugins
 
- 
PATTERN_HTML_TAG
protected Pattern PATTERN_HTML_TAG
 
 - 
 
- 
Method Detail
- 
logUnhandledAttributes
protected void logUnhandledAttributes(org.apache.sling.api.resource.Resource resource)
 
- 
approximateMarkdown
@Nonnull public String approximateMarkdown(@Nullable org.apache.sling.api.resource.Resource resource, org.apache.sling.api.SlingHttpServletRequest request, org.apache.sling.api.SlingHttpServletResponse response)
Description copied from interface:ApproximateMarkdownServiceGenerates a text formatted with markdown that heuristically represents the text content of a page or resource, mainly for use with the AI. That is rather heuristically - it cannot faithfully represent the page, but will probably be enough to generate summaries, keywords and so forth.- Specified by:
 approximateMarkdownin interfaceApproximateMarkdownService- Parameters:
 resource- the resource to render to markdown. Caution: if this is not the content resource of a page but the cpp:Page, the markdown will contain all subpages as well!- Returns:
 - the markdown representation
 
 
- 
approximateMarkdown
public void approximateMarkdown(@Nullable org.apache.sling.api.resource.Resource resource, @Nonnull PrintWriter realOutput, @Nonnull org.apache.sling.api.SlingHttpServletRequest request, @Nonnull org.apache.sling.api.SlingHttpServletResponse response)
Description copied from interface:ApproximateMarkdownServiceGenerates a text formatted with markdown that heuristically represents the text content of a page or resource, mainly for use with the AI. That is rather heuristically - it cannot faithfully represent the page, but will probably be enough to generate summaries, keywords and so forth.- Specified by:
 approximateMarkdownin interfaceApproximateMarkdownService- Parameters:
 resource- the resource to render to markdown. Caution: if this is not the content resource of a page but the cpp:Page, the markdown will contain all subpages as well!realOutput- destination where the markdown rendering will be written.
 
- 
handleResource
protected boolean handleResource(@NotNull @NotNull org.apache.sling.api.resource.Resource resource, @NotNull @NotNull PrintWriter out, boolean printEmptyLine) 
- 
attributeToMarkdown
protected String attributeToMarkdown(@NotNull @NotNull org.apache.sling.api.resource.Resource resource, String attributename, String value)
 
- 
executePlugins
@Nonnull protected ApproximateMarkdownServicePlugin.PluginResult executePlugins(@Nonnull org.apache.sling.api.resource.Resource resource, @Nonnull PrintWriter out, @Nonnull org.apache.sling.api.SlingHttpServletRequest request, @Nonnull org.apache.sling.api.SlingHttpServletResponse response)
 
- 
getMarkdown
@Nonnull public String getMarkdown(@Nullable String value)
Description copied from interface:ApproximateMarkdownServiceReturns a markdown representation of an attribute value, which might be plain text or HTML. We determine whether it's HTML heuristically - in that case it's transformed to markdown, otherwise we just return the value.- Specified by:
 getMarkdownin interfaceApproximateMarkdownService
 
- 
getMarkdown
@NotNull public @NotNull String getMarkdown(@Nonnull URI uri) throws MalformedURLException, IOException, IllegalArgumentException
Description copied from interface:ApproximateMarkdownServiceRetrieves the text content for an URL.- Specified by:
 getMarkdownin interfaceApproximateMarkdownService- Throws:
 MalformedURLExceptionIOExceptionIllegalArgumentException
 
- 
handleCodeblock
protected boolean handleCodeblock(org.apache.sling.api.resource.Resource resource, PrintWriter out, boolean printEmptyLine) 
- 
handleLabeledAttributes
protected boolean handleLabeledAttributes(org.apache.sling.api.resource.Resource resource, PrintWriter out, boolean printEmptyLine) 
- 
admissibleValue
protected boolean admissibleValue(Object object)
We do not print pure numbers, booleans and some special strings since those are likely attributes determining the component layout, not actual text that is printed. all "true"/ "false" / single number (int / float) attributes or array of numbers attributes, or shorter than 3 digits or path, or array or type date or boolean or {Date} or {Boolean} , inherit, blank, html tags, target . 
- 
admissibleKey
protected boolean admissibleKey(String key)
 
- 
activate
protected void activate(ApproximateMarkdownServiceImpl.Config config)
 
- 
deactivate
protected void deactivate()
 
- 
getComponentLinks
@NotNull public @NotNull List<ApproximateMarkdownService.Link> getComponentLinks(@NotNull @NotNull org.apache.sling.api.resource.Resource resource)
Returns a number of links that are saved in the component or siblings of the component that could be used as a proposal for the user to be used as source for the AI via markdown generation etc. This heuristically collects a number of links that might be interesting. We traverse the attributes of resource and all children and collect everything that starts with /content. If there are less than 5 links, we continue with the parent resource until jcr:content is reached. The link title will be the jcr:title or title attribute.- Specified by:
 getComponentLinksin interfaceApproximateMarkdownService- Parameters:
 resource- the resource to check- Returns:
 - a list of links, or an empty list if there are none.
 
 
- 
collectLinks
protected void collectLinks(@NotNull @NotNull org.apache.sling.api.resource.Resource resource, List<ApproximateMarkdownService.Link> resourceLinks)Collects links from a resource and its children. The link title will be the jcr:title or title attribute.- Parameters:
 resource- the resource to collect links fromresourceLinks- the list to store the collected links
 
- 
getImageUrl
public String getImageUrl(org.apache.sling.api.resource.Resource imageResource)
Description copied from interface:ApproximateMarkdownServiceRetrieves the imageURL in a way useable for ChatGPT - usually data:image/jpeg;base64,{base64_image}- Specified by:
 getImageUrlin interfaceApproximateMarkdownService
 
- 
traverseTreeForStructureGathering
protected void traverseTreeForStructureGathering(org.apache.sling.api.resource.Resource resource, PrintWriter out, String outerResourceType, String subpath)This is debugging code we needed to gather information for the implementation; we keep it around for now. out.println("Approximated markdown for " + path); traverseTreeForStructureGathering(resource, out, null, null); out.println("DONE"); out.println("HTML tags found:" + htmltags); 
- 
captureHtmlTags
protected void captureHtmlTags(String value)
 
 - 
 
 -