Home / AI / Generative AI: Building with Large Language Models
AI

Generative AI: Building with Large Language Models

Build generative AI applications: Azure OpenAI integration, prompt engineering, semantic memory, plugin architecture, RAG patterns, and production best pract...

What you will learn

Practical execution with concise explanations, real implementation patterns, and production-ready recommendations.

], temperature=0.7, max_tokens=500``` )

print(response.choices[0].message.content)


## Application Patterns

### Chat Completions





```python
conversation_history = [
```json
{"role": "system", "content": "You are a Python coding expert."}```
]

def chat(user_message):
```text
conversation_history.append({"role": "user", "content": user_message})

response = client.chat.completions.create(
    model="gpt-4",
    messages=conversation_history,
    temperature=0.3
)

assistant_message = response.choices[0].message.content
conversation_history.append({"role": "assistant", "content": assistant_message})

return assistant_message

Multi-turn conversation

print(chat("Write a function to reverse a string")) print(chat("Now add type hints and docstring"))


## Streaming Responses

```python
def stream_chat(user_message):
```text
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_message}],
    stream=True
)





for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

## Retrieval-Augmented Generation (RAG)

### Vector Database Integration





```python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

search_client = SearchClient(
```text
endpoint="https://<search-service>.search.windows.net",
index_name="documents",
credential=AzureKeyCredential("<key>")```
)

def rag_query(question):
```sql
## Retrieve relevant documents
results = search_client.search(
    search_text=question,
    select="content",
    top=3
)





context = "\n\n".join([doc["content"] for doc in results])

## Generate answer with context
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Answer based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]
)





return response.choices[0].message.content

## Semantic Kernel Framework

### Basic Setup





```python
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

kernel = sk.Kernel()

kernel.add_chat_service(
```text
"chat",
AzureChatCompletion(
    deployment_name="gpt-4",
    endpoint="<endpoint>",
    api_key="<key>"
)```
)

## Define semantic function
prompt = """
Summarize the following text in {{$maxWords}} words or less:





{{$input}}
"""

summarize = kernel.create_semantic_function(prompt, max_tokens=500)

result = await summarize("Very long text here...", maxWords=50)
print(result)

Plugin Architecture

Plugin Architecture

Figure: Plugin Registration Tool – registered steps and message pipeline.

from semantic_kernel.skill_definition import sk_function, sk_function_context_parameter





class MathPlugin:
```python
@sk_function(
    description="Add two numbers",
    name="add"
)
@sk_function_context_parameter(name="num1", description="First number")
@sk_function_context_parameter(name="num2", description="Second number")
def add(self, context) -> str:
    num1 = float(context["num1"])
    num2 = float(context["num2"])
    return str(num1 + num2)

Register plugin

kernel.import_skill(MathPlugin(), "Math")

Use with LLM

Use with LLM

Figure: Configuration and management dashboard with status overview.

result = await kernel.run_async(

kernel.skills.get_function("Math", "add"),
num1="15",
num2="27"```
)





Function Calling

Function Calling

Figure: Azure Functions monitor – invocation graph, execution timeline, and bindings.

functions = [
```json
{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name"
            },




            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
            }
        },
        "required": ["location"]
    }
}```
]

response = client.chat.completions.create(
```text
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
functions=functions,
function_call="auto"```
)

if response.choices[0].message.function_call:
```text
function_name = response.choices[0].message.function_call.name
arguments = json.loads(response.choices[0].message.function_call.arguments)

## Execute function
weather_data = get_weather(**arguments)





## Send result back to model
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "What's the weather in Seattle?"},
        response.choices[0].message,
        {"role": "function", "name": function_name, "content": json.dumps(weather_data)}
    ]
)





## Content Generation

### Code Generation





```python
def generate_code(specification, language="python"):
```text
prompt = f"""```
Generate {language} code for the following specification:

{specification}

Requirements:
- Include type hints
- Add comprehensive docstrings
- Follow PEP 8 style guide
- Include error handling
"""
    
```text
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.2
)

return response.choices[0].message.content

### Data Transformation

```python
def transform_data(data, target_format):
```text
prompt = f"""```
Transform the following data to {target_format} format:

{data}

Ensure valid syntax and proper formatting.
"""
    
```text
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}],
    temperature=0
)

return response.choices[0].message.content

## Token Management

```python
import tiktoken





def count_tokens(text, model="gpt-4"):
```text
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))

def truncate_to_token_limit(text, max_tokens=4000, model="gpt-4"):

encoding = tiktoken.encoding_for_model(model)
tokens = encoding.encode(text)

if len(tokens) <= max_tokens:
    return text

truncated_tokens = tokens[:max_tokens]
return encoding.decode(truncated_tokens)

## Cost Optimization

```python
class CostTracker:
```python
PRICING = {
    "gpt-4": {"input": 0.03, "output": 0.06},  # per 1K tokens
    "gpt-35-turbo": {"input": 0.0015, "output": 0.002}
}





def __init__(self):
    self.total_cost = 0

def calculate_cost(self, model, input_tokens, output_tokens):
    pricing = self.PRICING.get(model, self.PRICING["gpt-35-turbo"])
    cost = (input_tokens / 1000 * pricing["input"]) + (output_tokens / 1000 * pricing["output"])
    self.total_cost += cost
    return cost

def log_request(self, response):
    usage = response.usage
    cost = self.calculate_cost(
        response.model,
        usage.prompt_tokens,
        usage.completion_tokens
    )
    print(f"Request cost: ${cost:.4f} | Total: ${self.total_cost:.4f}")

## Safety and Content Filtering

```python
from azure.ai.contentsafety import ContentSafetyClient





content_safety = ContentSafetyClient(
```text
endpoint="<endpoint>",
credential=AzureKeyCredential("<key>")```
)

def safe_generation(prompt):
```text
## Generate content
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)





generated_text = response.choices[0].message.content

## Check for harmful content
safety_result = content_safety.analyze_text(
    text=generated_text,
    categories=["Hate", "Sexual", "Violence", "SelfHarm"]
)





## Filter if unsafe
for category in safety_result.categories_analysis:
    if category.severity >= 4:  # High severity
        return "Content filtered due to safety policies."





return generated_text

## Best Practices

- Use system messages to set consistent behavior
- Implement retry logic with exponential backoff
- Cache responses for identical prompts
- Monitor token usage and costs
- Set appropriate temperature (0-0.3 for factual, 0.7-1.0 for creative)
- Implement content safety checks
- Use streaming for better UX
- Version prompts alongside code
- Test with diverse inputs





## Troubleshooting

| Issue | Cause | Resolution |
|-------|-------|------------|
| Rate limit errors | Too many requests | Implement retry with backoff |
| High costs | Inefficient prompts | Optimize token usage; use cheaper models |
| Inconsistent outputs | High temperature | Lower temperature; use structured output |
| Context overflow | Long conversations | Implement conversation summarization |
| Hallucinations | Lack of grounding | Use RAG; add verification steps |




## Architecture Decision and Tradeoffs

When designing AI/ML solutions with Azure AI Services, consider these key architectural trade-offs:

| Approach | Best For | Tradeoff |
|----------|----------|----------|
| Managed / platform service | Rapid delivery, reduced ops burden | Less customisation, potential vendor lock-in |
| Custom / self-hosted | Full control, advanced tuning | Higher operational overhead and cost |

> **Recommendation:** Start with the managed approach for most workloads and move to custom only when specific requirements demand it.

## Validation and Versioning

- Last validated: April 2026
- Validate examples against your tenant, region, and SKU constraints before production rollout.
- Keep module, CLI, and SDK versions pinned in automation pipelines and review quarterly.

## Security and Governance Considerations

- Apply least-privilege access using RBAC roles and just-in-time elevation for admin tasks.
- Store secrets in managed secret stores and avoid embedding credentials in scripts or source files.
- Enable audit logging, data protection policies, and periodic access reviews for regulated workloads.

## Cost and Performance Notes

- Define budgets and alerts, then monitor usage and cost trends continuously after go-live.
- Baseline performance with synthetic and real-user checks before and after major changes.
- Scale resources with measured thresholds and revisit sizing after usage pattern changes.

## Official Microsoft References

- https://learn.microsoft.com/azure/ai-services/
- https://learn.microsoft.com/azure/machine-learning/
- https://learn.microsoft.com/azure/ai-foundry/

## Public Examples from Official Sources

- These examples are sourced from official public Microsoft documentation and sample repositories.
- Documentation examples: https://learn.microsoft.com/azure/ai-services/
- Sample repositories: https://github.com/Azure-Samples?tab=repositories&q=ai&type=&language=&sort=
- Prefer adapting these examples to your tenant, subscriptions, and governance requirements before production use.

## Key Takeaways

Building with generative AI requires prompt engineering, cost management, safety controls, and thoughtful integration patterns for reliable production applications.





## References

- https://learn.microsoft.com/azure/ai-services/openai/
- https://learn.microsoft.com/semantic-kernel/

Discussion