ConicAI Coding LLM

Model Details

Model Name: ConicAI LLM Model
Developer: GIRISH KUMAR DEWANGAN
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Architecture: Transformer (Causal LM)
Fine-tuning Method: LoRA (PEFT)
Task Domain: Code Generation, Debugging, Explanation
Primary Language: Python

Model Description

ConicAI Coding LLM is a parameter-efficient fine-tuned model optimized for structured coding tasks. It enhances the base model’s reasoning ability by introducing instruction-conditioned outputs and structured response generation.

The model focuses on three key aspects:

Accuracy → Correct code generation
Interpretability → Explanation + confidence
Efficiency → Lightweight fine-tuning

Core Design Philosophy

Instruction Conditioning
```
Instruction → Input → Output
```
Structured Output Learning
Post-Generation Validation Awareness

Capabilities

Code generation
Code debugging
Code explanation
Structured output generation
Confidence estimation
Hallucination detection

Output Schema

{
  "code": "string",
  "explanation": "string",
  "confidence": 0.0,
  "important_tokens": [],
  "relevancy_score": 0.0,
  "hallucination": false,
  "hallucination_check_reason": "",
  "latency_ms": 0
}

🧪 How to Use This Model (Colab / Local)

!pip -q install -U transformers peft accelerate huggingface_hub safetensors

from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN') 
model = "girish00/ConicAI_LLM_model"
prompt = input("Please enter your prompt: ")

from huggingface_hub import login, snapshot_download
login(token=HF_TOKEN)

repo = snapshot_download(model, token=HF_TOKEN)

import sys
sys.path.append(repo)

from infer_local import build_instruction_prompt, build_structured_result
from peft import PeftConfig, PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time, json

cfg = PeftConfig.from_pretrained(repo)
base = cfg.base_model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(base)
base_model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)
llm = PeftModel.from_pretrained(base_model, repo)
llm.eval()

inputs = tokenizer(build_instruction_prompt(prompt), return_tensors="pt").to(llm.device)

start = time.perf_counter()
with torch.no_grad():
    out = llm.generate(
        **inputs,
        max_new_tokens=320,
        output_scores=True,
        return_dict_in_generate=True,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )
latency = int((time.perf_counter() - start) * 1000)

gen_ids = out.sequences[0][inputs["input_ids"].shape[1]:].tolist()
text = tokenizer.decode(gen_ids, skip_special_tokens=True)

conf = []
for tid, score in zip(gen_ids, out.scores):
    probs = torch.softmax(score[0], dim=-1)
    conf.append(float(probs[tid].item()))

print(json.dumps(
    build_structured_result(
        prompt,
        text,
        latency,
        tokenizer=tokenizer,
        generated_ids=gen_ids,
        token_confidences=conf
    ),
    indent=2
))

Training Details

Dataset

Size: ~5K–10K samples
Instruction-based coding dataset

Training Procedure

Method: LoRA fine-tuning
Framework: Transformers + PEFT
Precision: FP16 / Mixed

Training Hyperparameters

Parameter	Value
Epochs	1–3
Batch Size	2
Learning Rate	2e-4
Max Sequence Length	512
LoRA Rank (r)	8
LoRA Alpha	16
LoRA Dropout	0.05

Inference Configuration

max_new_tokens = 200
temperature = 0.2
top_p = 0.9
do_sample = True

Evaluation

Syntax validation
Prompt-based testing
Relevancy scoring
Hallucination detection

Strengths

Lightweight and efficient
Strong performance on structured prompts
Generates readable and correct Python code
Provides reasoning-aware outputs

Limitations

Sensitive to prompt structure
Confidence scores are heuristic
Limited generalization beyond training dataset

Risks & Considerations

Always validate generated code
Not suitable for critical production systems
May produce incorrect logic

Best Practices

Instruction:
Input:
Output:

Use clear and specific prompts
Keep temperature low for reliability
Apply post-processing for cleaner output

Technical Specifications

Transformer-based causal language model
LoRA adaptation on attention layers
Hugging Face Transformers + PEFT

Environmental Impact

Uses parameter-efficient fine-tuning
Lower compute compared to full fine-tuning
Suitable for local deployment

Intended Use

Direct Use

Coding assistant
Debugging support
Learning programming

Out-of-Scope Use

Security-critical systems
Autonomous production deployment

Author

GIRISH KUMAR DEWANGAN

License

Apache License 2.0

Downloads last month: 431

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for girish00/ConicAI_LLM_model

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B-Instruct

Adapter

(31)

this model