Siddartha Pullakhandamsiddartha10.hashnode.dev·Sep 5, 2024Getting Started with QuantizationWhat is Quantization? It is the process of reducing/mapping higher precision weights and activations into lower precision. In simple terms shrinking a model to smaller size that can be used to run on resources with limited memory. Linear Quantizatio...Discuss·11 likes·50 readsquantization
Kevin Loggenbergblog.thecodesmith.co.za·Jul 9, 2024Local LLM's with .NetIntroduction In this article we will explore performing inference on GGUF models with Llama.cpp using the Llamasharp nuget package. It sounds like it should take longer than it actually does. GGUF models are probably one of the easiest models to work...Discuss·149 readsLlamaSharp
Spheron NetworkforSpheron's Blogblog.spheron.network·Jun 5, 2024Understanding Deep Learning: Training, Inference, and GPU Shortage ChallengesDeep learning has revolutionized numerous fields, including computer vision, natural language processing, and speech recognition. However, the power of deep learning comes at a cost – the computational demands are immense, both during the training an...Discuss·83 readsDeep LearningDeep Learning
Venkat Ramanvenkat.eu·May 31, 2024Essential Math & Concepts for LLM Inference(Image Credit: HF TGI Benchmark) Introduction As enterprises and tech enthusiasts increasingly integrate LLM applications into their daily workflows, the demand for TFLOPS is ever increasing. Apple, Microsoft, Google, and Samsung have already introdu...Discuss·345 readsAI
Haocheng Linhaochengcodedev.hashnode.dev·Apr 30, 2024Understanding and Calculating the Variance of Sample MeanIntroduction When working with a set of data points, understanding the variability within the data is crucial for drawing meaningful conclusions. One fundamental measure of variability is the variance, which quantifies how much the values in a datase...Discussstatistics
RJ Honickylearning-exhaust.hashnode.dev·Apr 12, 2024Are All Large Language Models Really in 1.58 Bits?Introduction This post is my learning exhaust from reading an exciting pre-print paper titled The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits about very efficient representations of high-performing LLMs. I am trying to come up to s...Dima G and 1 other are discussing this2 people are discussing thisDiscuss·3 likes·2.1K readsllm
TECHcommunity_SAGtechcommsag.hashnode.dev·Mar 15, 2024Leveraging Hyperscaler Clouds for Machine Learning Inferencing on Cumulocity IoT DataAuthors: @kanishk.chaturvedi@Nick_Van_Damme1 Introduction In the fast-paced world of IoT, processing and analyzing data in real-time is crucial. With billions of devices generating vast amounts of data, leveraging Machine Learning (ML) is key to turn...Discusscumulocity
Kaushal Powarwrittenbykaushal.hashnode.dev·Jan 4, 2024How to convert HF (safetensors) 🤗 model to ggufYou want to convert Huggingface model to gguf format?I was struggling to tackle the same problem a few days ago. I finetuned a Llama 7B model and the model was saved in safetensor format. I wanted to use gguf model so I searched a lot and found a sol...Discuss·1 like·3.7K readsLLMllamacpp
Nosananosana.hashnode.dev·Oct 18, 2023Nosana's New Direction: AI InferenceToday, we’re excited to share a significant update about the future of Nosana. After careful consideration, we’ve decided to pivot away from CI/CD services. Instead, Nosana will now focus on providing a massive GPU-compute grid for AI inference. The ...DiscussGPU
aansh savlaaanshsavla.hashnode.dev·Aug 9, 2023Inferring using Prompt EngineeringInferring means deducing or concluding using some evidence or reasoning. In terms of AI, inferring is also known as making decisions based on available information or data. A Machine Learning model takes input and performs some analysis such as extra...Discuss·53 readsinference