Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Wednesday, May 27, 2026

K8sGPT

May 27, 2026 0

 

K8sGPT is a tool for scanning your Kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

K8SGPT is an advance AI algorithm analysis your cluster state and provide intelligent inside for troubleshooting.  

CNAI - Cloud native along with AI. Kubeflow is a best example of CNAI.
AICN - Artificial Intelligent with Cloud native. K8SGPT is a best example of AICN.

WorkFlow:



Tuesday, April 28, 2026

AWS AI Frontier Agent

April 28, 2026 0

 

Frontier agent are autonomous system that work independent to achieve goals, scale massive to tackle concurrent tasks and run persistenly for hours or days without human intervention.

 Frontier Agent option:



Monday, April 6, 2026

Optimizing AI models for Production Environment

April 06, 2026 0

 



We can LLMs in three ways by usually

1. Encode text into semantic vectors with little/no file tuning
2. Fine tune a pre-trained LLM to perform a very specific task using by Transfer Learning
3. Query an LLM to solve a task which was pre-trained or could intuit.
Two types of LLMs now.
1) Auto encoding LLMs - Learn a entire sequence by predicting tokens (words) given past and future context.   It is best for classification and embedding + retrieval tasks. [Example BERT]
2) Auto regressive LLMs : It will predict a future token 
LLMs excel at task that require reasoning using context and input information in the conjunction to produce a nuanced answer.


AI agents are semi autonomous systems that interact with environment, make decisions and perform tasks on behalf of users.
Autonomy - They can perform tasks without continuous human intervention.
Decision Making - Use data to analyze and choose actions
Adaptability - Learn and improve over time with feedback.
Optimizing Models:
Speculative Decode : Using an assistant model to guide next token perdition
Caching OS models : Implementing prompt caching with open Source models
Quantization : Reducing computation requirement of neural network.
Distillation : Transfer knowledge from large model into small through targeted fine tuning.
Speculative Decoding:
Assistant agent calls for forward method of calling [calling parameter over and over again].  The main model simply verifies which token is agreed with request.





Sunday, March 22, 2026

LLM Context management along with Cursor 2.0

March 22, 2026 0

 

Cursor 2.0 is an AI editor for Production Environment. It will be run 8 parallel agents without any issue. 

Context Management like telling story when it getting convoluted. It will direct path when AI get confused.

Context window is a windows chat where user and AI interact each other.






AI driven project initialization:

1) Describe - Describe your problem or expectation
2) Define - Stack, database, auth & deployment
3) Generate - Cursor will take care of codes



 

Tuesday, March 10, 2026

Infrastructure as Code through AI workloads

March 10, 2026 0



A Kubernetes operator is a specialized controller designed to extend Kubernetes API, enabling the management of complex application through declarative configurations.
Kubernetes Operator operate within continuous reconciliation loop. This cycle begins when user create a resource, It prompting controller to monitoring a change and take a necessary action to ensure a desire state. Operator allow user to defined a desired state of application in custom resources, while operator
controller continue reconcile the actual state with this desire state, embodying the operational expertise of human site reliability engineer.
KubeFlow : It is a primary of orchestration tool for MI workflow. It is focus on training aspect of models.

Most using of Kubernetes resources in the real time.
APIService
ClusterRole
ClusterRoldBinding
ConfigMap
CronJob
CSIDriver
CSINode
DaemonSet
Deployment
EphemeralContainers
HorizontalPodAutoscalar
Ingress
IngressClass
Job
Namespace
Node
PersistVolume
Pod
PodDisruptionBudget
PodTemplate
ReplicatSet
ResourceQuota
Role
RoleBinding
Secret
Service
ServiceAccount
StatefulSet
StorageClass
VolumeAttachment
Binding
CertificateSigningRequest
ComponentStatus
ControllerRevision
CustomResourceDef
Endpoints
EndpointSlice
LeaseReplicationController
LimitRange
LocalSubjectAccess
MutatingWebhookConfiguration
NetworkPolicy
PodSecurityPolicy
PriorityClass
RuntimeClass
SelfSubjectAccess
SelfSubjectRules
SubjectAccessReview
TokenReview
ValidatingWebhook



Saturday, November 8, 2025

MCP - Model Context Protocol

November 08, 2025 0

 

MCP - Model Context Protocol:

MCP defined a LLM to access an external data, tools and context in a a structure way. MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems and data.

Overview of MCP:

AI application such as Claude or chatGPT can connect to data sources, tools [search engine] and workflow [prompts] through MCP and perform a tasks.

MCP like an interface which communicated to MCP client and discover their requirement and offer available services for their requirement. 
MCP Framework:
  • MCP SDK - It is a foundation for all the MCP development. It will use for Production and standard projects. It can be integrate into any tools or transport (STDIO, SSE)
  • FASTMCP 1.0 - It became a legacy support and integrated into MCP python SDK.
  • FASTMCP 2.0 - This is a latest and modern feature tools kits for advanced MCP workflows.
  • Others Frameworks - Java SDK and third party libs in other languages.
Agent workflows inside of Memory:


RAG - Retrieval Augmented Generation
It converts a data into numerical representation where each piece of data has information about how it relates to others.
Retrieval - when user ask a question or search, RAG turns question or search into own numerical representation (Embedding) and find a data which is similar meanings.
Augmentation - The top search result are then added into prompt and send to back to LLM
Generation - The search results give the LLM some local context and consider as response.
Embedding:
Embedding represent text as set of numerical data along with tensors (different dimensions)
Each dimension will store some information about text meaning or syntactical meaning.
Each words or sentence with similar meaning are stored near by vector space.
Models will learn to place a similar words or sentences close  together in the embedded space.
Common pre-trained models such as BERT and RoBERTs are  used for generating an embedding inside of vector space.
We can able to use an embedded for NLP tasks like semantic search, text classification and sentimental analysis.
Agentic RAG:
It is integrate an AI agents to enhance the RAG approach. It will breakdown from complex queries into manageable parts and using API tools where need to augment processing and better result.


Implementation of AI agent

November 08, 2025 0


                                 

Installation of Ollama:
Ollama is an open source tool which will helps us to run a NLP [Natural Language Processing] through locally.
Step1) Downloading the Ollawa tool for your suitable operating system and installed it.





Sunday, July 20, 2025

AI Basics

July 20, 2025 0

 


Supervised Learning:
It ill provides an output if we give an input into the applications called supervised Learning.
Machine Learning Vs Data Science:
Machine Learning:
Field of study that's gives computers the ability to learn without explicitly programs.
Ex. YouTube Advertisement and online shopping. It will generate an advertisement and shopping related notification for the user interest.
Data Science:
It will extract the knowledge and insights from data.
Ex. Share market data. It will analysis an insight of data and decided a probability of output.
Deep Learning:
It will take a multiple input and decided output like human brain by using a Artificial Neural Network.
Open-source frameworks for Machine Learning tool:
  • PyTorch
  • TensorFlow
  • Hugging Face
  • PaddlePaddle
  • Scikit-learn
  • R
How is Alexa works?
Steps to process the command:
1) Trigger word detection to activate the device [ Hi Alexa]
2) Speech recognition - "what is the weather in Delhi today" [convert audio file into text]
3) Intent recognition - purpose of the user "weather in Delhi"
4) Execute weather query and given output to user.
Generative AI:
AI system that can produce a high-quality content like text, image and audio. It used the machine learning model and learned a data and generate an output content.




 

Friday, March 28, 2025

Large Language Model [LLM] - Introduction

March 28, 2025 0

 


LLM stands for Large Language Model. It is specifically a deep learning model, trained on massive amounts of text data to understand and generate human language, enabling tasks like text generation, translation. It often sing "Transformer" models which are neural networks that can process relationships within language.


Reasoning LLMs


Traditional LLM workflow



Traditional LLM model is refine a dataset into pretraining workflow. The pretraining send a data into fine tuning model and give a precise collected output data. It will send it to human feed back and correct incase of any mismatch with fine tuning model.

Traditional LLMs
  • Direct pattern based prediction
  • Quick but less reliable on complex tasks
  • No explicit reasoning steps

Reasoning LLM:
  • Language models are designed complex and multiple set problems
  • Break down tasks into logical sub tasks.
  • Generate intermediate reasoning steps "thought processes"
Key Capabilities of Reasoning LLMs:
1) Chain-of-Thought Reasoning
        Internal dialogue approach
        step-by-step problem solving
2) Self consistency
        Verified own answers
        Revisits problematic solutions
3) Structured Outputs
        Organized reasoning steps

Practical Applications of Reasoning LLMs
Data Analysis
Medical diagnostics
Complex data interpretation
Anomaly detection
Background Processing
Batch processing workflows
Overnight analysis jobs
Evaluation Tasks
LLM as judge
Quality assessment
Verification workflows
Limitation of Reasoning LLM
Performance Trade-offs
* Increased latency : extended thinking process leads to significantly longer response times
* Higher resource requirements: ofent require more computational resoures
* cost-implications: More tokens and processing time translate to higher operational costs
DeepSeek:

    DeepSeek applied supervised fine-tuning to refine the models' capabilities. This involved training on datasets containing reasoning and non-reasoning tasks. Notably, reasoning data was generated by specialized "expert models" trained for specific domains such as mathematics, programming, and logic. These expert models were developed through supervised fine-tuning on both original responses and synthetic data generated by internal models like DeepSeek-R1-Lite. The use of expert models allowed DeepSeek to generate high-quality synthetic reasoning data to enhance the primary model's performance.






Friday, January 31, 2025

how to install the DeepSeek R1 module in your local machine

January 31, 2025 0




Deepseek has been created by Chinese AI company called Deepseek. This Deepseek model is compare with top of OpenAI models such as Maths, Coding, General knowedge and languages. 
The Deepseek-r1 is getting popular because it is open sourced and allowing anyone to download and run it locally.

Deepseek-r1 Module:
It is built-in chain of thought reasoning enhances its efficiency and cheaper compare to other OpenAI models. 
PS: It may getting delay the response if our system has very limited CPU and memory.

GIT HUB URL : https://github.com/deepseek-ai/DeepSeek-R1

We will run a DeepSeek-r1 module through Ollama.
Ollama:
Ollama is an open source tool which will helps us to run a NLP [Natural Language Processing] through locally.
Step1) Downloading the Ollagwa tool for your suitable operating system and installed it.



Step2) Navigate to DeepSeek-r1 module from the ollama site.



Hardware requirements for each r1 module:


Step 3) Open a terminal or powershell and validate the Ollama status
#ollama list

Step 4) I have downloaded the 1.5b module for my testing.

Step5) Installed the DeepSeek-r1 module through ollama  tool
#ollama run deepseek-r1

You can start using or asking your queries or coding after complete the installation.




Saturday, September 7, 2024

Different types of AI models

September 07, 2024 0


AI - System or machines that mimic human intelligence to perform tasks and can iteratively improve themselves based on the information they collect.  Artificial Intelligence capable of generating text, images, videos or other data using generative models often in response to prompts.   Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

AGI - Artificial General Intelligence is a type of AI that can understand, learn and apply knowledge across broad range of tasks similar to human cognitive abilities.   

Example of Cloud Machine Learning:



Different AI Types:

Machine Learning:

Simple Input -> Simple output -> Single topic

Deep Learning:

Complex input -> Simple output -> Single topic

Foundation Model:

Complex inputs -> Complex ouputs -> multiple topics

Foundation Model:

A foundation model is a type of large scale artificial intelligence model that is trained on a broad range of data at massive scale, allowing to develop a wide understanding of many topics and tasks.  These models can be adapted or fine-tuned for various specific applications, demonstrating flexibility and efficiency across different domains. 

 Parameters of foundation models:

  • Embedding Vectors : The foundation model dealing with categorical variable (like words or user ID in recommendation systems), embedding vectors are a form of parameter represent a categories in a continuous vector space and capturing semantic similarities among categories. 
  • Weights:  This is a numerous parameters in neural network.  Weights are used in various layers of a neural network to scale the input data in a meaningful ways.  For example in a convolutional layer commonly used in image processing, weights determine the importance of neighboring pixel values for feature detection. 
  • Biases: Bias parameters are added to the output of weighted inputs to shift the activation function curve up or down.  This is crucial for models to accurately represent patterns in data that do not pass through the origin of the coordinate system. 
  • Attention Scores: Attention mechanisms use parameters to weigh the significance of different parts of the input data differently.  For instance, in language models attention scores determine how much emphasis the model places on different words when generating a response or translating text.
LLM [Large Language Module]

LLM - It is highly specialized for tasks involving human language.  They are optimized for understanding and generating text which makes them more efficient for language specific tasks like conversation, translation or content generation.




Saturday, August 24, 2024

Generative AI with public cloud

August 24, 2024 0

 


LLM:

  • A language model (LM) is a probabilistic model of text.

Encoders:

Models that convert a sequence of words to an embedding.

Decoders:

Models take a sequence of words and output next word.

Examples: GPT-4, Llama and bloom

Encoder - decoder module:

We passed english letters and encoder covert into token. Decode passed one tocken at time.

Hallucination:

It is generated text that is non factual and Or ungrounded.

LLM application:

Retrieval Augmented Generation (RAG)

  • Primarily used in QA where the model has access to support documents for a query.

Code models:

  • Instead of training on written language train on code and comments.

In-context learning and few shot prompting:

  • In context learning - conditions an LLM with instructions and Or demonstrate of the task.
  • K-shot prompting: Explicitly providing k examples of the intended task in the prompt.
  • F-string Or formatted string are a feature in python can be used to create prompt templates for LLM.

Language Agents:

* A Budding area of research where LLM based agents

Some notable work in the space:

* ReAct 

Iterative framework where LLM emits thoughts, then act and observes result

* Toolformer

Pre-training technique where strings are replaced with calls to tools that yield result.

OCI Generative AI service:

* Fully managed service that provides a set of customizable Large Language Models (LLM) avilable via a single API to build generative AI applications.

Generation:

Command -> Command light -> llama 2.7

Dedicated AI cluster:

* Dedciated AI cluster has a GPU based resource that host the customers fine-tuning and inference workloads.

OCI setup:

configuration file : ./oci/config

Model parameters:

Temperature : Determines how creative module should be, default temperature is 1 and maximum temperature is 5.

Length: Approximate length of the summary, choose from short, medium and length.

Embeddings:

Embedding is a numerical represent of piece of text converted into number sequences.

A piece of text could be a word, phrase, sentence or paragraph or more paragraphs.

Models creates a 1024 vector for each embedding.

Max 512 tokens per embedding.

Model create a 384 dimensional vector for each embedding.

It will be very difficult to tune a 2 billion tokens. We are using the In-context Learning/Few shot Prompting:

GPU memory is limited, so switching between models can incur significant overhead due to reloading the full GPU memory.

Dedicated AI cluster units:

* Large cohere - Dedicated AI cluster units for hosting or fine tuning the cohere command

* Small Cohere - Dedicated AI cluster units for  hosting or fine tuning for small cohere command

* Embed Cohere - Dediated AI cluster for hosting the models

* Llama2-70 model - Dedicated AI cluster for hosting the Llamba2 models

Fine tunnning is required 2 units and each cluster is active for five hours.

RAG framework:

Retriever : It is act like search engine. 

Ranker : Evalate and priorites rank based a quality of the data.

Generator : It provide  human like texts.

RAG techniques:

RAG sequence 

RAG token

RAG pipelines:

Documents -> Chunks -> Embedding -> Index [database]

Vector database:

A vector is sequence of numbers called dimensions, used to capture the important "features" of the data.

Semantic search:

It means search by meaning rather than giving a number.