Statistics

From BaseX Documentation
Revision as of 10:08, 3 February 2011 by Mroth (talk | contribs) (Removed redundant section (double "Sources" section), did a diff on them ...)
Jump to navigation Jump to search

The following table lists statistics on various XML instances that have been created with BaseX and, if available or public, links to the source documents.

The database size does not include any indexes

  • #nodes represents the number of XML nodes which have been created in the database
  • #atr, #eln, and #uri represent the number of distinct attributes, element names, and namespaces

Databases

Instances    file sizedb size#nodes  #atr  #eln  #atn  #uri  height  #docs  
RuWikiHist421 GiB416 GiB324,848,5083216261
ZhWikiHist126 GiB120 GiB179,199,6623216261
EnWiktionary79 GiB75 GiB134,380,3933216261
XMark55 GiB64 GiB1,615,071,34827490131
EnWikiMeta54 GiB52 GiB401,456,3483216261
MedLine38 GiB36 GiB1,623,764,254284609379
iProClass36 GiB37 GiB1,631,218,98432454291
Inex20931 GiB34 GiB1,336,110,6391528,0344511372,666,500
CoPhIR29 GiB31 GiB1,104,623,3761042420810,000,000
EnWikipedia26 GiB25 GiB198,546,74732421261
XMark22 GiB26 GiB645,997,96527490131
InterPro14 GiB19 GiB860,304,2355715041
Genome113 GiB13 GiB432,628,1051226101261
NewYorkTimes12 GiB13 GiB280,407,00554133061,855,659
TrEMBL11 GiB14 GiB589,650,53584730271
XMark11 GiB13 GiB323,083,40927490131
IntAct7973 MiB6717 MiB297,478,3927642221425,624
Freebase7366 MiB10 GiB443,627,9948612831931
SDMX6356 MiB8028 MiB395,871,8722226371
OpenStreetMap5312 MiB5171 MiB6,910,6693195261
SwissProt4604 MiB5422 MiB241,274,40687039271
EURLex4815 MiB5532 MiB167,328,03923186461121
Wikicorpus4492 MiB4432 MiB157,948,561121,2572,687250659,338
EnWikiRDF3679 MiB3537 MiB98,433,19411121141
CoPhIR2695 MiB2882 MiB101,638,857104242081,000,000
MeSH2091 MiB2410 MiB104,845,819365251
FreeDB1723 MiB2462 MiB102,901,519273041
XMark1134 MiB1303 MiB32,298,98927490131
DeepFS810 MiB850 MiB44,821,5064360241
LibraryUKN760 MiB918 MiB46,401,9413233051
Twitter736 MiB767 MiB15,309,015080031,177,495
Organizations733 MiB724 MiB33,112,3923389071,019,132
DBLP694 MiB944 MiB36,878,1814356071
Feeds692 MiB604 MiB5,933,71308003444,014
MedLineSupp477 MiB407 MiB21,602,1415557091
AirBase449 MiB273 MiB14,512,8511111501138
MedLineDesc260 MiB195 MiB10,401,8475668091
ZDNET130 MiB133 MiB3,060,18621409001395,663
JMNEdict124 MiB171 MiB8,592,6660100051
XMark111 MiB130 MiB3,221,92627490131
Freshmeat105 MiB86 MiB3,832,0281581061
DeepFS83 MiB93 MiB4,842,6384360211
Treebank82 MiB92 MiB3,829,513125010371
DBLP280 MiB102 MiB4,044,649435606170,843
DDI76 MiB39 MiB2,070,15771041621113
Alfred75 MiB68 MiB3,784,2850600061
University56 MiB66 MiB3,468,6061284056
MediaUKN38 MiB45 MiB1,619,4433213051
HCIBIB232 MiB33 MiB617,02313910426,390
Nasa24 MiB25 MiB845,8052618191
MovieDB16 MiB19 MiB868,980678041
KanjiDic213 MiB18 MiB917,83332710061
XMark11 MiB13 MiB324,27427490131
Shakespeare7711 KiB9854 KiB327,1700590091
TreeOfLife5425 KiB7106 KiB363,56074702431
Thesaurus4288 KiB4088 KiB201,7987339071
MusicXML3155 KiB2942 KiB171,4008179560817
BibDBPub2292 KiB2359 KiB80,1781541043,465
Factbook1743 KiB1560 KiB77,315162332061
XMark1134 KiB1334 KiB33,05627490131

Sources

InstancesSource
AirBasehttp://air-climate.eionet.europa.eu/databases/airbase/airbasexml
Alfredhttp://alfred.med.yale.edu/alfred/alfredWithDescription.zip
BibDBPubhttp://inex.is.informatik.uni-duisburg.de/2005/
CoPhIRhttp://cophir.isti.cnr.it/
DBLPhttp://dblp.uni-trier.de/xml
DBLP2http://inex.is.informatik.uni-duisburg.de/2005/
DDIhttp://tools.ddialliance.org/
EnWikiMetahttp://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current.xml.bz2
EnWikipediahttp://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
EnWikiRDFhttp://www.xml-benchmark.org/  generated with xmlgen
EnWiktionaryhttp://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-meta-history.xml.7z
EURLexhttp://www.epsiplatform.eu/
Factbookhttp://www.cs.washington.edu/research/xmldatasets/www/repository.html
Freebasehttp://download.freebase.com/wex
FreeDBhttp://www.xmldatabases.org/radio/xmlDatabases/projects/FreeDBtoXML
Freshmeathttp://freshmeat.net/articles/freshmeat-xml-rpc-api-available
Genome1ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/ds_ch1.xml.gz
HCIBIB2http://inex.is.informatik.uni-duisburg.de/2005/
Inex2009http://www.mpi-inf.mpg.de/departments/d5/software/inex
IntActftp://ftp.ebi.ac.uk/pub/databases/intact/current/index.html
InterProftp://ftp.bio.net/biomirror/interpro/match_complete.xml.gz
iProClassftp://ftp.pir.georgetown.edu/pir_databases/iproclass/iproclass.xml.gz
JMNEdictftp://ftp.monash.edu.au/pub/nihongo/enamdict_doc.html
KanjiDic2http://www.csse.monash.edu.au/~jwb/kanjidic2
MedLinehttp://www.nlm.nih.gov/bsd
MeSHhttp://www.nlm.nih.gov/mesh/xmlmesh.html
MovieDBhttp://eagereyes.org/InfoVisContest2007Data.html
MusicXMLhttp://www.recordare.com/xml/samples.html
Nasahttp://www.cs.washington.edu/research/xmldatasets/www/repository.html
NewYorkTimeshttp://www.nytimes.com/ref/membercenter/nytarchive.html
OpenStreetMaphttp://dump.wiki.openstreetmap.org/osmwiki-latest-files.tar.gz
Organizationshttp://www.data.gov/raw/1358
RuWikiHisthttp://dumps.wikimedia.org/ruwiki/latest/ruwiki-latest-pages-meta-history.xml.7z
SDMXhttp://www.metadatatechnology.com/
Shakespearehttp://www.cafeconleche.org/examples/shakespeare
SwissProtftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase
Thesaurushttp://www.drze.de/BELIT/thesaurus
Treebankhttp://www.cs.washington.edu/research/xmldatasets
TreeOfLifehttp://tolweb.org/data/tolskeletaldump.xml
TrEMBLftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase
Wikicorpushttp://www-connex.lip6.fr/~denoyer/wikipediaXML
XMarkhttp://www.xml-benchmark.org/  generated with xmlgen
ZDNEThttp://inex.is.informatik.uni-duisburg.de/2005/
ZhWikiHisthttp://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-meta-history.xml.7z
LibraryUKNgenerated from university library data
MediaUKNgenerated from university library data
DeepFSgenerated from filesystem structure
Universitygenerated from students test data
Feedscompiled from news feeds
Twittercompiled from Twitter feeds