Skip to main content

Command Palette

Search for a command to run...

KAFKA

Updated
7 min read

what is high throughput systems?

=> where input and output I/O operation run very fast

Stock Market Data is intensive data - data ka aghe piche hona bahut bada blunder kara Sakata hai

ex if any one stock price has 12 - after some time its 45 you can not emit first 45 and then 12

There is difference thing that transfer data [data ko pohochana alag baat hai]

lets assume all data to transfer to user, but sequence is also very important

if data transfer 45 first and then 12 its very blunder mistake

2nd ex Zomato (Rider updates) riders sahre the real time update, there is many riders on go road and every one send data to backend

Can I says those data is transferring to particular of his customer 1.It is getting transer to the customer 2. In Real time 3. We can not afford the data back and front

we have to track the all his route history to check wheather hi took halt somewhere, coutomer complaint, about late delivery

We must store the history of rider for future reference

Hence we can say its High throughput data

what if you have to create the application like Zomato

step 1

socket.on('rider:location', loc =>{ [rider send the location to server];

customer_emit(loc) sql 'INSERT into loc_log rider id' [Server will send the data to customer and sotred it in DB with location and rider id]

}

So you will stored the record BUT here is problem

If you understand the DB internal architecture

transition write on filesystem

DB works on 3 layer architecture [view]

  1. Application layer view

  2. Database Engine

  3. Schema view

Finally received filesystem view which is store [write] on OS secondary disc

Why this is complex architecture, If I say I create the file.txt and write it on

It was a damn easy why I create a complex architecture for DB

because DB is not just write the file on disc

Its a ACID compliance ATOMICITY CONSITENCY ISOLATION DURABILITY

before the providing the data Databases make sure that I write the copy on right place, so that next time when need data, data should same consistence data ex 1.Received the from log 2.If 2 entry received same time

means many thing will happened behind the scene to write operation ex write operation =>

  1. acquire the log

  2. check the existing data

  3. Index data update, insert

  4. updae the change index

  5. get the acknowledement

Why kakfa ?

Do you think to update the rider data on every second on DB directly

suppose 7lakh riders is available and updating data 7lakh per second there will down the database

Server will definitely crash. cant handle DB at a time

because DB has to follow some operation on every transition.

NOW HOW CAN I WRITE IN DB

Suppose there are riders and sending the data 1sec 1 user to server. you can do horizonatal scale and its on spcket IO

if you write on this such way not possible because DB has need some time to performe operation

for that I have to create bottle neck to cool down the operation

what is bottle neck ? A mold area at top of bottle where pass the water slowly on control way.

Rider will send the data massive way many number of request at the time i have to control it and

  1. read it in cool way and insert it in DB

  2. Share the location to customer

So my bottle neck should High Throughput. require to handle I/O is very fast give the data in control manner

This is basically Kafka does

Kafka is high throughput system, where inside you can through the huge data, in a control passion you can subscribe the data and keep reading it.

how does kafka work ?

In kafka case there is producer,

1.Producer produce the event
2. Store the event in application buffer (In memory) [Just like a Redis] Data base never do like this, data base has to keep the data
3. Copy the data in OS buffer system (RAM)
4. Write the data Async Periodically disc. write in a non blocking fashion
(means => I am writing it in background but dont provide the acknowledgement, I will keep it for myself ) [usually DB write on disc, acknowledge it and serve the next ]
5. and same time copy in network interface card buffer (NIC Buffer) send data to consumer, consumer can be DB or Socket server who need that event

IF somehow Kafka crash, he retrive data from OS buffer

One trade of we should consider if data came in kafa and that time crash without storing to buffer that time you there will chance to lose the data its small

There are two type of memory 1.Stack 2.Heap

Stack is very fast, because in heap you have to search the space for allocation and return to stack that why to allocate memory in heap is time lengthy.

Can we see this is problem every time I have to find space in heap to allocate memory to store data and transfer to stack, its very time consuming

The Zero Copy Principle With Apache Kafka

To solve this problem Kafak made the APPEND ONLY LOG SYSTEM I will keep the data next allocation so that next time i will pickup my last reference for next allocation to store and will get the event sequent

What is Kafka

Kafka is message boraker server there are 2 sides 1.Producer - who produce the messages 2.Consumer - who consume the messages

corner cases while creating the High Throughput application

imagine I created the event and its delivered by mistake at 2 places, is that a good thing.

PRODUCER SITE Kafka said when you produce the message provide me the topic : "(name space) topic is like a key

Partition : 2 - message will group withing this partion

Message : "{}" your actual message

CONSUMER SITE
lets see 1 consumer came and ask to kafka I want messages topic [rider updated]

Kafka agreed to provide the all updates of rider events

consumer(server) take this events and insert into DB

Now uses upon the consumer need scale the server so added DB Server and Kafka send the messages to the newly scale server

I have 4 messages releated to rider-updated and I provided to both DB Server 2 message to 1 server and another 2 to 2nd server they process it and insert into DB

Now I have Increase the 1 Socket server to share the data with consumer(who is wanted to rider updates) instead of inserting into DB

requested to kafka same Topic rider-updates. Kafka doesn't aware that, Socket server is processing different logic and DB server is for difference purpose. Both all are consumers for kafka

He served the all request among them and 3rd request to Socket server

Now problem happened Socket server is not inserting the data to DB . its severing to consumer of riders update

Request is incomplete 3rd not updated in DB

What was I accepting 4 request should distributed among DB Server and all 4 should shared with socket server.

Kafka ask to when you come with the topic add the Group also, message should distribute to particular group

Topic : rider-updated Group : group - DB server group - Socket server

so that all request will aligned to properly

Now kafka knows 2 DB Server has part of 1 server and Socket server is different part of 1 server

according he will provide the data as per the below diagram

Now Db server insert to data to DB and Socket server will sned the data sequency to the user