Changes

Jump to navigation Jump to search
1,002 bytes removed ,  16:27, 18 July 2023
no edit summary
=HTML Parser=
If [http://vrici.lojban.org/~cowan/tagsoup/ TagSoup] is found in the [[Startup#Distributions|classpath]], HTML can be imported in BaseX without any problems. TagSoup ensures that only input is converted to well-formed HTML arrives at the XML parser (correct opening and closing tags, etc.)documents.
If TagSoup is not available on a system, the default XML parser will be used. (Only) , and the import will only succeed if the input is already well-formed XML, (which is usually the import will succeedcase for XHTML documents).
==Installation==
====Maven====
An easy way to add TagSoup to your project is to follow this stepsas follows:
1. Visit [https://mvnrepository.com/artifact/org.ccil.cowan.tagsoup/tagsoup/ MVN TagSoup Repository]
</syntaxhighlight>
4. copy that in your own maven project’s <code>pom.xml</code> file Insert the XML fragment into the <code><dependencies></code> element.  5. don’t forget to run of your project’s <code>mvn jetty:runpom.xml</code> againfile.
====Debian====
With Debian, TagSoup will automatically be automatically detected and included after it has been installed via:
apt-get install libtagsoup-java
==Options==
TagSoup offers a variety of options to customize the HTML conversion. For the complete list, please visit the [http://vrici.lojban.org/~cowan/tagsoup/ TagSoup] website. BaseX supportsmost of these options , with a few exceptions:
* '''encoding''': BaseX tries to guess the input encoding, but this can be overwritten by this option.
* '''files''': not supported as the input documents are piped directly to the XML parser.
* '''method''': set to 'xml' as default. If this is set to 'html' ending tags may be missing for instance.
* '''version''': dismissed, as TagSoup always falls back to 'version 1.0', no matter what the input is.
</syntaxhighlight>
=Text ParserChangelogPlain text can be imported as well: ==GUI== Go to Menu ''Database'' → ''New'' and select "TEXT" in the input format combobox.You can set the following option for parsing text documents in the "Parsing" tab: * '''Encoding''': Choose the appropriate encoding of the text file.* '''Lines''': Activate this option to create a <code>&lt;line&gt;...&lt;/line&gt;</code> element for each line of the input text file. ==Command Line== Turn on the CSV Parser before parsing documents and set some optional, parser-specific options and a file filter:
SET {{Option|PARSER}} text SET {{Option|TEXTPARSER}} lines=yes SET {{Option|CREATEFILTER}} * ==XQuery== Similar to the other formats, the text parser can also be specified via XQuery:;Version 11.0<syntaxhighlight lang="xquery">for $file in file:list("2Bimported", true(), "*.txt")return dbRemoved:add("DB", $file, "", map { 'parser': 'text' })</syntaxhighlight> =Changelog=Text Parser
;Version 7.8
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu