KAFKA
what is high throughput systems?
=> where input and output I/O operation run very fast
Stock Market Data is intensive data - data ka aghe piche hona bahut bada blunder kara Sakata hai
ex if any one stock price has 12 - after some time its 45 you can not emit first 45 and then 12
There is difference thing that transfer data [data ko pohochana alag baat hai]
lets assume all data to transfer to user, but sequence is also very important
if data transfer 45 first and then 12 its very blunder mistake
2nd ex Zomato (Rider updates) riders sahre the real time update, there is many riders on go road and every one send data to backend
Can I says those data is transferring to particular of his customer 1.It is getting transer to the customer 2. In Real time 3. We can not afford the data back and front
we have to track the all his route history to check wheather hi took halt somewhere, coutomer complaint, about late delivery
We must store the history of rider for future reference
Hence we can say its High throughput data
what if you have to create the application like Zomato
step 1
socket.on('rider:location', loc =>{ [rider send the location to server];
customer_emit(loc) sql 'INSERT into loc_log rider id' [Server will send the data to customer and sotred it in DB with location and rider id]
}
So you will stored the record BUT here is problem
If you understand the DB internal architecture
transition write on filesystem
DB works on 3 layer architecture [view]
Application layer view
Database Engine
Schema view
Finally received filesystem view which is store [write] on OS secondary disc
Why this is complex architecture, If I say I create the file.txt and write it on
It was a damn easy why I create a complex architecture for DB
because DB is not just write the file on disc
Its a ACID compliance ATOMICITY CONSITENCY ISOLATION DURABILITY
before the providing the data Databases make sure that I write the copy on right place, so that next time when need data, data should same consistence data ex 1.Received the from log 2.If 2 entry received same time
means many thing will happened behind the scene to write operation ex write operation =>
acquire the log
check the existing data
Index data update, insert
updae the change index
get the acknowledement
Why kakfa ?
Do you think to update the rider data on every second on DB directly
suppose 7lakh riders is available and updating data 7lakh per second there will down the database
Server will definitely crash. cant handle DB at a time
because DB has to follow some operation on every transition.
NOW HOW CAN I WRITE IN DB
Suppose there are riders and sending the data 1sec 1 user to server. you can do horizonatal scale and its on spcket IO
if you write on this such way not possible because DB has need some time to performe operation
for that I have to create bottle neck to cool down the operation
what is bottle neck ? A mold area at top of bottle where pass the water slowly on control way.
Rider will send the data massive way many number of request at the time i have to control it and
read it in cool way and insert it in DB
Share the location to customer
So my bottle neck should High Throughput. require to handle I/O is very fast give the data in control manner
This is basically Kafka does
Kafka is high throughput system, where inside you can through the huge data, in a control passion you can subscribe the data and keep reading it.
how does kafka work ?
In kafka case there is producer,
1.Producer produce the event
2. Store the event in application buffer (In memory) [Just like a Redis] Data base never do like this, data base has to keep the data
3. Copy the data in OS buffer system (RAM)
4. Write the data Async Periodically disc. write in a non blocking fashion
(means => I am writing it in background but dont provide the acknowledgement, I will keep it for myself ) [usually DB write on disc, acknowledge it and serve the next ]
5. and same time copy in network interface card buffer (NIC Buffer) send data to consumer, consumer can be DB or Socket server who need that event
IF somehow Kafka crash, he retrive data from OS buffer
One trade of we should consider if data came in kafa and that time crash without storing to buffer that time you there will chance to lose the data its small
There are two type of memory 1.Stack 2.Heap
Stack is very fast, because in heap you have to search the space for allocation and return to stack that why to allocate memory in heap is time lengthy.
Can we see this is problem every time I have to find space in heap to allocate memory to store data and transfer to stack, its very time consuming
The Zero Copy Principle With Apache Kafka
To solve this problem Kafak made the APPEND ONLY LOG SYSTEM I will keep the data next allocation so that next time i will pickup my last reference for next allocation to store and will get the event sequent
What is Kafka
Kafka is message boraker server there are 2 sides 1.Producer - who produce the messages 2.Consumer - who consume the messages
corner cases while creating the High Throughput application
imagine I created the event and its delivered by mistake at 2 places, is that a good thing.
PRODUCER SITE Kafka said when you produce the message provide me the topic : "(name space) topic is like a key
Partition : 2 - message will group withing this partion
Message : "{}" your actual message
CONSUMER SITE
lets see 1 consumer came and ask to kafka I want messages topic [rider updated]
Kafka agreed to provide the all updates of rider events
consumer(server) take this events and insert into DB
Now uses upon the consumer need scale the server so added DB Server and Kafka send the messages to the newly scale server
I have 4 messages releated to rider-updated and I provided to both DB Server 2 message to 1 server and another 2 to 2nd server they process it and insert into DB
Now I have Increase the 1 Socket server to share the data with consumer(who is wanted to rider updates) instead of inserting into DB
requested to kafka same Topic rider-updates. Kafka doesn't aware that, Socket server is processing different logic and DB server is for difference purpose. Both all are consumers for kafka
He served the all request among them and 3rd request to Socket server
Now problem happened Socket server is not inserting the data to DB . its severing to consumer of riders update
Request is incomplete 3rd not updated in DB
What was I accepting 4 request should distributed among DB Server and all 4 should shared with socket server.
Kafka ask to when you come with the topic add the Group also, message should distribute to particular group
Topic : rider-updated Group : group - DB server group - Socket server
so that all request will aligned to properly
Now kafka knows 2 DB Server has part of 1 server and Socket server is different part of 1 server
according he will provide the data as per the below diagram
Now Db server insert to data to DB and Socket server will sned the data sequency to the user
