Lightning Talks
From OpenSQLCamp
Contents |
graph engine
- a mysql storage engine that allows for calculated values, ie select data that was not inserted
- leverages tricks to leverage mysql internals to accomplish .
cluster/J
mysql clustering is many nodes
- using java as a ORM
- Jtie (a way to map any c++ api to java)
- cluster/J why not use the existing bindings... well its a lot of code and Jtie is more automatic.
sphinx 09
- full text search engine
- three 'branches' 9.9, 9.10, 1.x
- talks mysql
- 9.10 adds multiprocess methods like apache, and string attr
- 1.x gains online full text indexes
iibench
- iibench zipf (ziffian distribution the n'th most frequent word occurs 1/n times)
- uses python
jjtree in Coco
- each node knows its parent
- using class structure to build a syntax tree
- allows to use query syntax to derive the correct values based on input
- http://code.google.com/p/mod-ndb
= intergrate OSS into windows
- supoorting the windows stack
- looking to intergrate existing technology, basicly a complete OSS stack on windows
- can donate virtual boxes for testing
trainwreck
- its an agent for replication
- replication by pull
- allows for multiple trainwrecks (so you can offload less used content on to one box and the rest of the content to be spooled to 20 other boxes)
- allows for parallel replication from the same server
- because its an agent you can config to shove things to just about anywhere
- it is technicaly a slave so it manages the binlogs layer using existing technology
- written in C++
column stores
- col stores are used to reduce IO ops for columnar queries (aggrigate or slices of col)
I play with data
- started pair programing in fortran and then other things, now I'm here playing with SQL.
- looking to do real statistical calculations of the data, but SQL is getting in the way.
- Currently looking at R as a replacement, it has a built in column store db.
- R has a SQL layer via DBI or ODBC.
- question... how do I cleanly map a col store in R to a RDBMS so I don't run out of memory?
- SQLDF
- PL/R
- java lucid DB, but lacks some stats.
- Thanks for the notes. Just a couple of clarifications. I see three ways of doing real statistics with data stored in a RDBMS:
- SQL itself, but statistical algorithms implemented in SQL are horrendous, when possible at all
- [Rstats], which has literally thousands of packages written in R, C and FORTRAN, and recognizes its limitations in managing data, so passes that task off to a RDBMS using either ODBC or DBI
- Taking advantage of the plug-in architecture of [LucidDB] to add maths/stats libraries and functions from [math.commons.apache.org]
- Also, since R stores its datasets in a column-store fashion, might there be a way to map R datasets to external column-store databases, such as LucidDB, InfiniDB, Infobright, etc without the ODBC or DBI overhead?
- Thank you to those who pointed me to [SQLDF], and the fact that [PL/R] was actively being developed by Joe Conway again.
Brian Aker guide to No-SQL
- non-relational data stores that don't need schemas
- group-by => map reduce
- join => multiple map reduce
- table scan => multi-machine map reduce
- order by ... limit => natural data order => map reduce with lots of memory
- feature?
- forward rolling log (thus never need a vacuum)
- no transactions (fantastic picture of elephant roasting a dolphin)
- schema free (I know where everything is... don't touch)
- SQL sucks
- ps 6% of the worlds power goes do data center...
