Skip to content

GSoC2012 Progress (Jo)

sandroacoelho edited this page Jul 26, 2013 · 1 revision

Development branch


GSoC 2012 Progress Report

Progress

May 21 - May 27, 2012

Jun 1

Jun 6

  • implemented Pablo's indexing interfaces for JDBM and started with in-memory indexing
  • began writing Sources for indexing from TSV

Jun 7

  • started MemoryResourceStore, MemoryTokenStore, extended ResourceStore interface to be able to query by URL, extended MemoryStoreIndexer
  • added DBpediaResourceSource

Jun 8 - Jun 10

  • finished working versions of FileSources and Storage (in-memory and disk-based) for SurfaceForm, DBpediaResource, Candidate Map

  • created OntologyTypeStore (part of ResourceStore)

  • started ContextStore

  • efficient (de-)serialization with Kryo

  • Some statistics on the in-Memory versions on my MacBook (8GB RAM):

    Used Heap space with all 3 stores in memory ~1.6-1.7GB

    Store Startup Disk space (not compressed) Memory usage (no GC)
    MemorySurfaceFormStore (thresh10-TRD) 9587ms 139MB 782MB
    MemoryResourceStore (DBpedia, Freebase, Schema types) 18006ms 165MB 508MB
    MemoryCandidateMapStore (no threshold) 5831ms 123MB 427MB

Jun 11+Jun 12

  • added TokenSource, working version of TokenStore+TSV indexing, added Tokenizer
  • added TokenOccurrenceSource+TSV indexing
  • extended calculations in DBTwoStepDisambiguator

Original plan

Apr 24 – May 21, 2012

  • branching/forking main repository
  • discussion of core architecture changes with Pablo N. Mendes and Max Jakob
  • getting to know and coordinate with other GSoC students

May 21 - May 27, 2012

  • evaluation and comparison of databases

May 27 - June 10, 2012

  • implementation and testing of database-backed storage using the best evaluated system
  • impl. of all probability calculations and smoothing

June 10 - June 24, 2012

  • initial tests and evaluation of the problems resulting from the changes
  • performance evaluations and improvements

June 24 - July 1, 2012

  • evaluation of features used for the entity mention model
  • computation of any required additional counts

July 9, 2012

  • MIDTERM EVALUATION

July 1 - July 13, 2012

  • extension of the new database structure for the LM
  • (for this block, I calculated more time than I expect so that possible complications with finishing the first part for the MIDTERM EVALUATION can be compensated for)

July 17 - July 25, 2012

  • Machine Learning Summer School in Lisbon

July 25 - August 13, 2012

  • smoothing, optimization and evaluation of the new language model
Clone this wiki locally