Lightning Talks

From OpenSQLCamp

Jump to: navigation, search

Contents

graph engine

  • a mysql storage engine that allows for calculated values, ie select data that was not inserted
  • leverages tricks to leverage mysql internals to accomplish .

cluster/J

mysql clustering is many nodes

  • using java as a ORM
  • Jtie (a way to map any c++ api to java)
  • cluster/J why not use the existing bindings... well its a lot of code and Jtie is more automatic.

sphinx 09

  • full text search engine
  • three 'branches' 9.9, 9.10, 1.x
  • talks mysql
  • 9.10 adds multiprocess methods like apache, and string attr
  • 1.x gains online full text indexes

iibench

  • iibench zipf (ziffian distribution the n'th most frequent word occurs 1/n times)
  • uses python

jjtree in Coco

  • each node knows its parent
  • using class structure to build a syntax tree
  • allows to use query syntax to derive the correct values based on input
  • http://code.google.com/p/mod-ndb

= intergrate OSS into windows

  • supoorting the windows stack
  • looking to intergrate existing technology, basicly a complete OSS stack on windows
  • can donate virtual boxes for testing

trainwreck

  • its an agent for replication
  • replication by pull
  • allows for multiple trainwrecks (so you can offload less used content on to one box and the rest of the content to be spooled to 20 other boxes)
  • allows for parallel replication from the same server
  • because its an agent you can config to shove things to just about anywhere
  • it is technicaly a slave so it manages the binlogs layer using existing technology
  • written in C++

column stores

  • col stores are used to reduce IO ops for columnar queries (aggrigate or slices of col)

I play with data

  • started pair programing in fortran and then other things, now I'm here playing with SQL.
  • looking to do real statistical calculations of the data, but SQL is getting in the way.
  • Currently looking at R as a replacement, it has a built in column store db.
  • R has a SQL layer via DBI or ODBC.
  • question... how do I cleanly map a col store in R to a RDBMS so I don't run out of memory?
    • SQLDF
    • PL/R
    • java lucid DB, but lacks some stats.
  • Thanks for the notes. Just a couple of clarifications. I see three ways of doing real statistics with data stored in a RDBMS:
    • SQL itself, but statistical algorithms implemented in SQL are horrendous, when possible at all
    • [Rstats], which has literally thousands of packages written in R, C and FORTRAN, and recognizes its limitations in managing data, so passes that task off to a RDBMS using either ODBC or DBI
    • Taking advantage of the plug-in architecture of [LucidDB] to add maths/stats libraries and functions from [math.commons.apache.org]
  • Also, since R stores its datasets in a column-store fashion, might there be a way to map R datasets to external column-store databases, such as LucidDB, InfiniDB, Infobright, etc without the ODBC or DBI overhead?
  • Thank you to those who pointed me to [SQLDF], and the fact that [PL/R] was actively being developed by Joe Conway again.

Brian Aker guide to No-SQL

  • non-relational data stores that don't need schemas
  • group-by => map reduce
  • join => multiple map reduce
  • table scan => multi-machine map reduce
  • order by ... limit => natural data order => map reduce with lots of memory
  • feature?
    • forward rolling log (thus never need a vacuum)
    • no transactions (fantastic picture of elephant roasting a dolphin)
    • schema free (I know where everything is... don't touch)
    • SQL sucks
    • ps 6% of the worlds power goes do data center...

leto puts out the call for PL/Parrot

Personal tools