NOSQL Table Schema

Question

I'm trying to plan a NOSQL table schema. There are relationships in my data, but they are mostly what would be N:N in a relational db; there are very few normal 1:N relationships.

So in this case, I'm trying to create implicit relationships that will allow me to browse from both ends of the relationship. I'm using Azure Table Storage, so I understand that full-text searching isn't available; I can only retrieve an "object" by its Partition Key + Row Key combination.

So imagine I have a table called "People" and a table called "Hamburgers" and each object in the tables can be related to multiple objects in the other table. Hamburgers are eaten by many people, people each eat many hamburgers.

Since the relationship is probably weighted to the people side - i.e. there are more people per hamburger than vice-versa, I would handle this in the tables like this:

Hamburger Table

Partition Key: Only 1 partition

Row Key: Unique ID

People Table

Partition Key: Only 1 partition

Row Key: Unique ID

"Columns": an extra value for every hamburger the person eats

Hamburger-People Table

Partition Key: Hamburger Row Key

Row Key: People Row Key

This way, if I'm looking at a hamburger and want to see all the people that eat it, I can go to the Hamburger-People table and use my Hamburger's Row Key to get the partition of all the people that eat the hamburger.

If I'm at a person and want to see all the hamburgers he/she eats, I have the extra values with the Row Keys of the hamburgers the person eats.

When inserting data into the tables, if the data involves a hamburger/person relationship, I would insert both values in the proper tables, then create the Hamburger-People table. If I was trying to keep a duplicate-free list of hamburgers, I would need to search the Hamburger table first to make sure the hamburger wasn't already in there (like "Whopper" - if it's in there, I wouldn't insert it again). Then, I would need to go insert a row in the hamburger's existing partition in Hamburger-People table.

But for the most part, the no-duplicate requirement doesn't exist.

Is this a good best-practices approach to NOSQL schema, or am I going to run into problems later?

UPDATE Also, I would like to be able to partition the data tables later, but I'm not sure how to do so with this structure; adding a 2nd partition to the hamburger table would require me to store an extra value in the hamburger-People table, and I'm not sure if that would start to be too complex.

**Cross** · Accepted Answer · 2012-07-09 02:05:24Z

Ok, nice questions and I think most of them are the ones each RDMBS developer face as soon as hits NoSQL world:

1. How to group the partitions? To get the best of the partitions you need to think that the load of your database should be distributed across your servers, lets see what will happend with your approach

A person with Key "A" enters to the restaurant you will save it and his burger, which is a Classic Tasty (Key "T") the person record goes to the server X and the Burger goes to server Y, now a new customer goes enters with the Key "B", and wants something different, a burger "W", again the person goes to server X and the burguer to server X, this time the server X is getting all the load, if you repeat this you'll see that the server X becomes a bottle neck, because 75% of the records are going there (all the people and 50% of the burgers), that will create some problems with your load. But... the problem will be better when you try to query because all the queries will hit the server X. To solve this you could use the key of the person as part of the partition for the relationship, so the person will be partitioned in the same server of the burguers relationship, this way your workload will be balanced and you wont have any problems if one of the servers goes down (the person and hamburguers will be "lost" together), this will be a consistence "inconsistency"

2. Should I use a "relationship" in a NoSQL database? Remember that NoSQL means that you are granted to duplicate information anytime your problem requires a solution to avoid "overqueries", so, if you can store the information that will be commonly queried together you will avoid a roundtrip to the database. So, if you store a "transaction" instead of "person and burguers" you will get a better performance and avoid some hits to the database, lets do an example of real data with your approach and compare it with "my" approach:

Joe Black comes to the restaurant and ask for a tasty, here you will do the following transactions: Create a Joe Black record Create a Burguer transaction record

if you want to list your daily transactions you will need to:

Get all the records from the day in the "table" person-burguer, then go to the person "table" and retrieve the name of the customers and now, go to the hamburguer records and retrieve their names. (you wont be able to do cross-table queries because some records could be in one server and others in the second server)

Ok, what if you create a table "transactions" and store in there the following json:

{ custid: "AAABCCC", name: "Joe", lastName: "Black", date: "2012/07/07", order: { code: "Burger0001", name: "Tasty", price: 3.5 } }

I know you will have several records with the same "tasty" description, that's desnormalization which is very useful when you approach NoSQL solutions to these type of problems, now, how many transactions did you create to store the information to the database? just one! wow... and how many queries will you need to retrieve the information at the end of the day? again... just one, it will create some problems, but will save you a lot of work too, like... could you reprint the order easily? (yes you can!) what if the name of the customer changes? is that even possible?

I hope this help you some way,

I'm the creator of http://djondb.com so I think that having inside knowledge gives me a different approach to the problems according to what the database will be able to do, but I'm not aware of how azure will handle the queries if you are not able to query the document values and just the row keys, but anyway I hope this gives you an insight.

Thanks Cross, good information. I have 2 questions: (1)This application is a social app, so it's less about what burgers did this person order than keeping a list of their favorite burgers. Since the main purpose is to keep track of the relationships, I'm not sure the transaction record would help - would it? Maybe I'm missing some of the benefits. — Andrew B Schultz, Jul 10 '12 at 15:22
(2) The issue of servers - with Azure, I don't have control or knowledge of the server where a table is strored (I don't think). That's abstracted, so I'm not sure thinking about that matters. Unless you're saying that by making sure the person key is part of the Hamburger-Person entity's partition key I would somehow ensure that the tables stay on the same server. But I don't see how that could be possible, because the table won't know that a string I'm inserting in a partition key is part of a row key elsewhere ... — Andrew B Schultz, Jul 10 '12 at 15:26
@AndrewBSchultz if you want to keep track of their favorite burgers the solution will be the same, store the "name" of the burger instead of its code, that way you wont need to do an extra query to get the name of their favorite burgers. something like: { "mb": "Tasty" } instead of { "mb": "0001001" }, with the code you will need an extra query to get the description of the product. — Cross, Jul 11 '12 at 15:57
@AndrewBSchultz I'll need to understand more about how microsoft azure handles the partitions, but what you need to bare in mind is that your data model needs to address the problem of having part of the data in one server and part in other, that will mean hitting 2 servers and one of them could be bottle neck in some point of time. If you create a data model that ensures you will get all the information you need in one step, then you solved any kind of issues with the partition keys or any other thing, meaning you need an approach like the one I stated before where you will save all in 1 doc — Cross, Jul 11 '12 at 16:01

B4GEN

Thursday, 17 August 2017

BIG DATA: NOSQL TABLE SCHEMA + ORACLE NOSQL DATABASE

NOSQL Table Schema

1 Answer

ORACLE-NOSQL-DATABASE http://www.oracle.com/technetwork/products/nosqldb/overview/nosqlandsqltoo-2041272.pdf

No comments:

Post a Comment