Difference between revisions of "Statistics"
Jump to navigation
Jump to search
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This article is part of the [[Advanced User's Guide]]. | This article is part of the [[Advanced User's Guide]]. | ||
− | It lists statistics on various | + | It lists statistics on various databases instances that have been created with BaseX, with value and full-text indexes turned off. The URLs to the original sources, if available or public, are listed below. |
− | The URLs to the original sources, if available or public, are listed below. | + | |
+ | [[Databases]] in BaseX are light-weight. If a database limit is reached, you can distribute your documents across multiple database instances and access all of them with a single XQuery expression. | ||
== Databases == | == Databases == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
Line 21: | Line 14: | ||
!#Nodes | !#Nodes | ||
!#Attr | !#Attr | ||
− | !# | + | !#ENames |
− | !# | + | !#ANames |
− | !# | + | !#URIs |
− | ! | + | !Height |
|- | |- | ||
| '''Limits''' | | '''Limits''' | ||
|'''512 GiB'''<br/>(2^39 Bytes) | |'''512 GiB'''<br/>(2^39 Bytes) | ||
− | |''' | + | |'''536'870'912'''<br/>(2^29) |
|''no limit''<br/> | |''no limit''<br/> | ||
|'''2'147'483'648'''<br/>(2^31) | |'''2'147'483'648'''<br/>(2^31) | ||
Line 114: | Line 107: | ||
| 9 | | 9 | ||
|- | |- | ||
− | | | + | | Inex2009 |
| 31 GiB | | 31 GiB | ||
| 2'666'500 | | 2'666'500 | ||
Line 697: | Line 690: | ||
| 13 | | 13 | ||
|} | |} | ||
+ | |||
+ | This is the meaning of the attributes: | ||
+ | |||
+ | * ''FileSize'' is the original size of the input documents | ||
+ | * ''#Files'' indicates the number of stored XML documents | ||
+ | * ''#DbSize'' is the size of the resulting database (excluding the [[Indexes#Value Indexes|value index structures]]) | ||
+ | * ''#Nodes'' represents the number of XML nodes (elements, attributes, texts, etc.) stored in the database | ||
+ | * ''#Attr'' indicates the maximum number of attributes stored for a single element | ||
+ | * ''#ENames'' and #ANames reflect the number of distinct element and attribute names | ||
+ | * ''#URIs'' represent the number of distinct namespace URIs | ||
+ | * ''Height'' indicates the maximum level depth of the stored nodes | ||
== Sources == | == Sources == | ||
Line 853: | Line 857: | ||
| compiled from Twitter feeds | | compiled from Twitter feeds | ||
|} | |} | ||
− | |||
− |
Latest revision as of 11:03, 1 March 2016
This article is part of the Advanced User's Guide. It lists statistics on various databases instances that have been created with BaseX, with value and full-text indexes turned off. The URLs to the original sources, if available or public, are listed below.
Databases in BaseX are light-weight. If a database limit is reached, you can distribute your documents across multiple database instances and access all of them with a single XQuery expression.
Databases[edit]
Instances | FileSize | #Files | DbSize | #Nodes | #Attr | #ENames | #ANames | #URIs | Height |
---|---|---|---|---|---|---|---|---|---|
Limits | 512 GiB (2^39 Bytes) |
536'870'912 (2^29) |
no limit |
2'147'483'648 (2^31) |
no limit |
32768 (2^15) |
32768 (2^15) |
256 (2^8) |
no limit |
RuWikiHist | 421 GiB | 1 | 416 GiB | 324'848'508 | 3 | 21 | 6 | 2 | 6 |
ZhWikiHist | 126 GiB | 1 | 120 GiB | 179'199'662 | 3 | 21 | 6 | 2 | 6 |
EnWiktionary | 79 GiB | 1 | 75 GiB | 134'380'393 | 3 | 21 | 6 | 2 | 6 |
XMark | 55 GiB | 1 | 64 GiB | 1'615'071'348 | 2 | 74 | 9 | 0 | 13 |
EnWikiMeta | 54 GiB | 1 | 52 GiB | 401'456'348 | 3 | 21 | 6 | 2 | 6 |
MedLine | 38 GiB | 379 | 36 GiB | 1'623'764'254 | 2 | 84 | 6 | 0 | 9 |
iProClass | 36 GiB | 1 | 37 GiB | 1'631'218'984 | 3 | 245 | 4 | 2 | 9 |
Inex2009 | 31 GiB | 2'666'500 | 34 GiB | 1'336'110'639 | 15 | 28'034 | 451 | 1 | 37 |
CoPhIR | 29 GiB | 10'000'000 | 31 GiB | 1'104'623'376 | 10 | 42 | 42 | 0 | 8 |
EnWikipedia | 26 GiB | 1 | 25 GiB | 198'546'747 | 3 | 24 | 21 | 2 | 6 |
XMark | 22 GiB | 1 | 26 GiB | 645'997'965 | 2 | 74 | 9 | 0 | 13 |
InterPro | 14 GiB | 1 | 19 GiB | 860'304'235 | 5 | 7 | 15 | 0 | 4 |
Genome1 | 13 GiB | 1 | 13 GiB | 432'628'105 | 12 | 26 | 101 | 2 | 6 |
NewYorkTimes | 12 GiB | 1'855'659 | 13 GiB | 280'407'005 | 5 | 41 | 33 | 0 | 6 |
TrEMBL | 11 GiB | 1 | 14 GiB | 589'650'535 | 8 | 47 | 30 | 2 | 7 |
XMark | 11 GiB | 1 | 13 GiB | 323'083'409 | 2 | 74 | 9 | 0 | 13 |
IntAct | 7973 MiB | 25'624 | 6717 MiB | 297'478'392 | 7 | 64 | 22 | 2 | 14 |
Freebase | 7366 MiB | 1 | 10 GiB | 443'627'994 | 8 | 61 | 283 | 1 | 93 |
SDMX | 6356 MiB | 1 | 8028 MiB | 395'871'872 | 2 | 22 | 6 | 3 | 7 |
OpenStreetMap | 5312 MiB | 1 | 5171 MiB | 6'910'669 | 3 | 19 | 5 | 2 | 6 |
SwissProt | 4604 MiB | 1 | 5422 MiB | 241'274'406 | 8 | 70 | 39 | 2 | 7 |
EURLex | 4815 MiB | 1 | 5532 MiB | 167'328'039 | 23 | 186 | 46 | 1 | 12 |
Wikicorpus | 4492 MiB | 659'338 | 4432 MiB | 157'948'561 | 12 | 1'257 | 2'687 | 2 | 50 |
EnWikiRDF | 3679 MiB | 1 | 3537 MiB | 98'433'194 | 1 | 11 | 2 | 11 | 4 |
CoPhIR | 2695 MiB | 1'000'000 | 2882 MiB | 101'638'857 | 10 | 42 | 42 | 0 | 8 |
MeSH | 2091 MiB | 1 | 2410 MiB | 104'845'819 | 3 | 6 | 5 | 2 | 5 |
FreeDB | 1723 MiB | 1 | 2462 MiB | 102'901'519 | 2 | 7 | 3 | 0 | 4 |
XMark | 1134 MiB | 1 | 1303 MiB | 32'298'989 | 2 | 74 | 9 | 0 | 13 |
DeepFS | 810 MiB | 1 | 850 MiB | 44'821'506 | 4 | 3 | 6 | 0 | 24 |
LibraryUKN | 760 MiB | 1 | 918 MiB | 46'401'941 | 3 | 23 | 3 | 0 | 5 |
736 MiB | 1'177'495 | 767 MiB | 15'309'015 | 0 | 8 | 0 | 0 | 3 | |
Organizations | 733 MiB | 1'019'132 | 724 MiB | 33'112'392 | 3 | 38 | 9 | 0 | 7 |
DBLP | 694 MiB | 1 | 944 MiB | 36'878'181 | 4 | 35 | 6 | 0 | 7 |
Feeds | 692 MiB | 444'014 | 604 MiB | 5'933'713 | 0 | 8 | 0 | 0 | 3 |
MedLineSupp | 477 MiB | 1 | 407 MiB | 21'602'141 | 5 | 55 | 7 | 0 | 9 |
AirBase | 449 MiB | 38 | 273 MiB | 14'512'851 | 1 | 111 | 5 | 0 | 11 |
MedLineDesc | 260 MiB | 1 | 195 MiB | 10'401'847 | 5 | 66 | 8 | 0 | 9 |
ZDNET | 130 MiB | 95'663 | 133 MiB | 3'060'186 | 21 | 40 | 90 | 0 | 13 |
JMNEdict | 124 MiB | 1 | 171 MiB | 8'592'666 | 0 | 10 | 0 | 0 | 5 |
XMark | 111 MiB | 1 | 130 MiB | 3'221'926 | 2 | 74 | 9 | 0 | 13 |
Freshmeat | 105 MiB | 1 | 86 MiB | 3'832'028 | 1 | 58 | 1 | 0 | 6 |
DeepFS | 83 MiB | 1 | 93 MiB | 4'842'638 | 4 | 3 | 6 | 0 | 21 |
Treebank | 82 MiB | 1 | 92 MiB | 3'829'513 | 1 | 250 | 1 | 0 | 37 |
DBLP2 | 80 MiB | 170'843 | 102 MiB | 4'044'649 | 4 | 35 | 6 | 0 | 6 |
DDI | 76 MiB | 3 | 39 MiB | 2'070'157 | 7 | 104 | 16 | 21 | 11 |
Alfred | 75 MiB | 1 | 68 MiB | 3'784'285 | 0 | 60 | 0 | 0 | 6 |
University | 56 MiB | 6 | 66 MiB | 3'468'606 | 1 | 28 | 4 | 0 | 5 |
MediaUKN | 38 MiB | 1 | 45 MiB | 1'619'443 | 3 | 21 | 3 | 0 | 5 |
HCIBIB2 | 32 MiB | 26'390 | 33 MiB | 617'023 | 1 | 39 | 1 | 0 | 4 |
Nasa | 24 MiB | 1 | 25 MiB | 845'805 | 2 | 61 | 8 | 1 | 9 |
MovieDB | 16 MiB | 1 | 19 MiB | 868'980 | 6 | 7 | 8 | 0 | 4 |
KanjiDic2 | 13 MiB | 1 | 18 MiB | 917'833 | 3 | 27 | 10 | 0 | 6 |
XMark | 11 MiB | 1 | 13 MiB | 324'274 | 2 | 74 | 9 | 0 | 13 |
Shakespeare | 7711 KiB | 1 | 9854 KiB | 327'170 | 0 | 59 | 0 | 0 | 9 |
TreeOfLife | 5425 KiB | 1 | 7106 KiB | 363'560 | 7 | 4 | 7 | 0 | 243 |
Thesaurus | 4288 KiB | 1 | 4088 KiB | 201'798 | 7 | 33 | 9 | 0 | 7 |
MusicXML | 3155 KiB | 17 | 2942 KiB | 171'400 | 8 | 179 | 56 | 0 | 8 |
BibDBPub | 2292 KiB | 3'465 | 2359 KiB | 80'178 | 1 | 54 | 1 | 0 | 4 |
Factbook | 1743 KiB | 1 | 1560 KiB | 77'315 | 16 | 23 | 32 | 0 | 6 |
XMark | 1134 KiB | 1 | 1334 KiB | 33'056 | 2 | 74 | 9 | 0 | 13 |
This is the meaning of the attributes:
- FileSize is the original size of the input documents
- #Files indicates the number of stored XML documents
- #DbSize is the size of the resulting database (excluding the value index structures)
- #Nodes represents the number of XML nodes (elements, attributes, texts, etc.) stored in the database
- #Attr indicates the maximum number of attributes stored for a single element
- #ENames and #ANames reflect the number of distinct element and attribute names
- #URIs represent the number of distinct namespace URIs
- Height indicates the maximum level depth of the stored nodes