Source:https://stackoverflow.com/questions/11378457/nosql-table-schema
NOSQL Table Schema
I'm trying to plan a NOSQL table schema. There are relationships in my data, but they are mostly what would be N:N in a relational db; there are very few normal 1:N relationships.
So in this case, I'm trying to create implicit relationships that will allow me to browse from both ends of the relationship. I'm using Azure Table Storage, so I understand that full-text searching isn't available; I can only retrieve an "object" by its Partition Key + Row Key combination.
So imagine I have a table called "People" and a table called "Hamburgers" and each object in the tables can be related to multiple objects in the other table. Hamburgers are eaten by many people, people each eat many hamburgers.
Since the relationship is probably weighted to the people side - i.e. there are more people per hamburger than vice-versa, I would handle this in the tables like this:
Hamburger Table
Partition Key: Only 1 partition
Row Key: Unique ID
People Table
Partition Key: Only 1 partition
Row Key: Unique ID
"Columns": an extra value for every hamburger the person eats
Hamburger-People Table
Partition Key: Hamburger Row Key
Row Key: People Row Key
This way, if I'm looking at a hamburger and want to see all the people that eat it, I can go to the Hamburger-People table and use my Hamburger's Row Key to get the partition of all the people that eat the hamburger.
If I'm at a person and want to see all the hamburgers he/she eats, I have the extra values with the Row Keys of the hamburgers the person eats.
When inserting data into the tables, if the data involves a hamburger/person relationship, I would insert both values in the proper tables, then create the Hamburger-People table. If I was trying to keep a duplicate-free list of hamburgers, I would need to search the Hamburger table first to make sure the hamburger wasn't already in there (like "Whopper" - if it's in there, I wouldn't insert it again). Then, I would need to go insert a row in the hamburger's existing partition in Hamburger-People table.
But for the most part, the no-duplicate requirement doesn't exist.
Is this a good best-practices approach to NOSQL schema, or am I going to run into problems later?
UPDATE Also, I would like to be able to partition the data tables later, but I'm not sure how to do so with this structure; adding a 2nd partition to the hamburger table would require me to store an extra value in the hamburger-People table, and I'm not sure if that would start to be too complex.
| ||||
add a comment
|
1
|
Ok, nice questions and I think most of them are the ones each RDMBS developer face as soon as hits NoSQL world:
1. How to group the partitions? To get the best of the partitions you need to think that the load of your database should be distributed across your servers, lets see what will happend with your approach
A person with Key "A" enters to the restaurant you will save it and his burger, which is a Classic Tasty (Key "T") the person record goes to the server X and the Burger goes to server Y, now a new customer goes enters with the Key "B", and wants something different, a burger "W", again the person goes to server X and the burguer to server X, this time the server X is getting all the load, if you repeat this you'll see that the server X becomes a bottle neck, because 75% of the records are going there (all the people and 50% of the burgers), that will create some problems with your load. But... the problem will be better when you try to query because all the queries will hit the server X. To solve this you could use the key of the person as part of the partition for the relationship, so the person will be partitioned in the same server of the burguers relationship, this way your workload will be balanced and you wont have any problems if one of the servers goes down (the person and hamburguers will be "lost" together), this will be a consistence "inconsistency"
2. Should I use a "relationship" in a NoSQL database? Remember that NoSQL means that you are granted to duplicate information anytime your problem requires a solution to avoid "overqueries", so, if you can store the information that will be commonly queried together you will avoid a roundtrip to the database. So, if you store a "transaction" instead of "person and burguers" you will get a better performance and avoid some hits to the database, lets do an example of real data with your approach and compare it with "my" approach:
if you want to list your daily transactions you will need to:
Get all the records from the day in the "table" person-burguer, then go to the person "table" and retrieve the name of the customers and now, go to the hamburguer records and retrieve their names. (you wont be able to do cross-table queries because some records could be in one server and others in the second server)
Ok, what if you create a table "transactions" and store in there the following json:
{ custid: "AAABCCC", name: "Joe", lastName: "Black", date: "2012/07/07", order: { code: "Burger0001", name: "Tasty", price: 3.5 } }
I know you will have several records with the same "tasty" description, that's desnormalization which is very useful when you approach NoSQL solutions to these type of problems, now, how many transactions did you create to store the information to the database? just one! wow... and how many queries will you need to retrieve the information at the end of the day? again... just one, it will create some problems, but will save you a lot of work too, like... could you reprint the order easily? (yes you can!) what if the name of the customer changes? is that even possible?
I hope this help you some way,
I'm the creator of http://djondb.com so I think that having inside knowledge gives me a different approach to the problems according to what the database will be able to do, but I'm not aware of how azure will handle the queries if you are not able to query the document values and just the row keys, but anyway I hope this gives you an insight.
| ||||||||||||||||
|
***