CISOSE 2026 · July 27 – 30, 2026 · Fukuoka, Japan

Building and Verifying AI Systems with Agentic AI:
Linking Agentic Development, Data Visualization, and AI Testing

We teach the method, not a showcase — a production AI education platform is only the worked example. The methodology is yours to take to any field.

GenerativeAI AgenticAI AITesting LLMasJudge DataVisualization BigDataPlatforms
Tutorial · Overview
CONFIRMED
Format
Hands-on
live build · visualize · test
Duration
3h
single session
Presenters
3 speakers
NCU · NTHU · Mitsubishi Electric
Language
English
International audience
Conference
IEEE International Conference on Cyber Intelligence and Software-Oriented Service Engineering
July 27 – 30, 2026 · Fukuoka, Japan
cisose.fit.ac.jp/2026
Why this tutorial · Why now

Generative AI changed who can build AI systems — and who must now verify them.

Generative and agentic AI in the development loop have collapsed the cost of building real-world, multi-user AI systems — but not the cost of understanding or trusting them. With agentic coding tools, a small team — even a single lab — can now stand up a multi-tenant big-data platform that once required a large engineering organization.

Building faster does not make a system easier to see into or to trust. This tutorial teaches a method for closing that gap — build it with agentic AI, visualize it, and use that visualization to test it. A production AI education platform is only the running example; the method is domain-agnostic, and a visualization researcher contributes techniques that make any model legible, not just ours. You take home the method, not a tour of one platform.

Each technical section pairs a methods-first treatment with a worked example from Uedu, a deployed multi-tenant AI tutoring platform built largely through agentic AI development and operated under a single umbrella IRB approval (NTU-REC 202507EM058). Read through a software-engineering lens, every step is a familiar SE practice — requirements, construction, verification, maintenance — applied where AI both builds the system and is the system. The platform is a worked example — not the answer; attendees who prefer to build from scratch take the methodology home.

01

Build it with agentic AI

Stand up a multi-tenant educational big-data platform with spec-driven, agentic AI development — write the spec, let agents implement it. Architecture decision flowcharts, and explicit checkpoints for where AI-generated code still needs human verification.

02

See inside it

Make AI systems legible with data visualization — CAM-style contribution maps and visual anomaly detection. Glass-box inspection, not just black-box pass/fail.

03

Trust it with testing

A six-layer AI-testing rubric covering code, pipeline, behavior, guardrail, governance, and drift, with a live LLM-as-Judge harness and a cross-model reproducibility schema.

§ At a glance

The method, and the worked example.

The method is three repeating moves. The architecture below is one worked example — the platform is the example, not the answer; your own system will differ.

The method · build → visualize → test
SPEC — written first agents implement it verify against it BUILD agentic AI development spec-driven VISUALIZE data visualization see what the model and data do TEST AI testing · LLM-as-Judge verify behavior and drift iterate — see it, test it, fix it, repeat
From intent to tests · the spec is a continuum
← more abstract · the why more executable · the proof → Intent prose · the goal Spec the contract Executable checks machine-checkable Tests the runnable spec

A specification is one artifact at many levels of formality. Translate the intent downward and it becomes executable; at the far end, the spec is the tests — which is why building and verifying can share one source of truth.

The worked example · a heterogeneous AI platform
DATA SOURCES LLM dialogue Behavioral traces CV / images Wearable BBI/HRV Environmental (gov't) Finance / markets Psychometrics (MBTI/RIASEC) Edge → cloud ingestion Multi-tenant big-data platform (the worked example) VERIFY VISUALIZE data viz · CAM · anomaly TEST LLM-as-Judge · rubric

Dialogue, behavioral, vision, physiological, environmental, financial, and psychometric streams flow into one multi-tenant platform — which you then visualize and test. The same two lenses apply to whatever you build.

§ Learning objectives

What you take home: the methodology.

You take home the methodology — not our platform. By the end of the three hours you should be able to apply these five methods to your own project, in your own domain.

01 Objective

A repeatable, spec-driven method for agentic-AI development — write the spec, let agents implement it, and reuse the spec as your test oracle — with a checklist of where AI-generated code still needs human verification, transferable to your own platform in any domain.

02 Objective

A layered AI-testing rubric (code, pipeline, behavior, guardrail, governance, drift) and a runnable LLM-as-Judge harness with versioned prompts and cross-model reproducibility — drop it into your own evaluation pipeline.

03 Objective

Visualization methods (CAM-style contribution maps, visual anomaly detection) for inspecting why a model behaves as it does — to apply to your own models, not just the ones shown here.

04 Objective

A workflow that uses visualization as a tool inside AI testing — a reusable pattern for making any AI system both visible and verifiable.

05 Objective

A redacted, IRB-governed data-handling and SOP template you can adapt to your own institution and jurisdiction.

§ Apply it to your own project

Two take-home decision tools.

You leave with the methodology — these two tables help you decide which method to reach for, and audit what you have already covered, on your own system.

Decision matrix · which method for your situation
A
If you need to… Reach for Taught in Watch out
Ship a platform fast with a small team Agentic AI development plus a human-verification checklist §2 Build The checklist is non-negotiable — unreviewed AI-generated code is the failure mode.
See why a model focused where it did CAM-style contribution visualization §3 Visualize Designed for CNNs; transformers and LLMs need attention or attribution analogues.
Find outliers and quality problems in image or sensor data at scale Visual anomaly detection with incremental dimension reduction §3 Visualize Needs a baseline of normal examples; rare-but-valid cases can look anomalous.
Judge whether an LLM answer is actually good LLM-as-Judge with a versioned prompt §4 Test Pin the judge model and prompt version, or scores drift between runs.
Catch cost, usage, or behavioral drift over time The drift / anomaly layer of the six-layer rubric §4 Test Set thresholds from real baselines, not guesses.
Handle sensitive data across jurisdictions IRB-governed data-handling SOP template §4 Test Adapt to local law — Taiwan PIPA, Japan APPI, EU GDPR.
Six-layer self-assessment · audit your own system
B
Layer Ask of your own system Method
Code Is the AI-generated code reviewed where it matters? Human-verification checkpoints (§2)
Pipeline Do data and model pipelines fail loudly and reproducibly? Pipeline tests plus a reproducibility schema (§4)
Behavior Does the model actually answer correctly? LLM-as-Judge with a versioned prompt (§4)
Guardrail Are unsafe or out-of-scope outputs blocked? Guardrail checks and adversarial probes (§4)
Governance Is sensitive data handled lawfully across jurisdictions? IRB-governed SOP template (§4)
Drift Would you notice anomalies or drift after deployment? Visual anomaly detection on the drift layer (§3 + §4)
§ Schedule · 180 minutes

Three hours, six segments.

Build with generative AI, see inside the model with visualization, and verify behavior with AI testing — each segment pairs methods with a worked example and a take-home artifact.

Tutorial Schedule
3 hours
  • 1
    Opening: build it, visualize it, test it
    10 min
    Why generative AI has changed who can build AI systems — and who must now verify them.
  • 2
    BUILD · Building a multi-user big-data platform with agentic AI (Chang + Li) hands-on
    40 min
    Multi-tenant architecture, developed spec-driven — write the spec, let agents implement it, and verify the result against that same spec. Where the agentic AI coding tools help and where they do not; human-verification checkpoints for AI-generated code. Plus real sensor ingestion at scale — a wearable (Garmin BBI/HRV) edge-to-cloud stream as a worked big-data time-series example.
  • Break
    10 min
  • 3
    VISUALIZE · Data visualization for AI systems (Teng-Yok Lee)
    50 min
    Visualization across the AI lifecycle — model decisions (CAM-style contribution maps), training dynamics (loss-contribution in-situ visualization), and learned-policy behavior (visual analytics of LSTM control policies) — turning a model and its data into something you can see, and a working instrument for the testing that follows.
  • Break
    10 min
  • 4
    TEST · Testing AI-built systems at scale, with visualization as a tool (Chang + Li)
    40 min
    Six-layer rubric. LLM-as-Judge harness with a versioned prompt. The visualization methods from the previous segment, applied to the drift/anomaly layer. A compact data-governance / IRB sub-section.
  • 5
    SYNTHESIS · Live demonstration
    10 min
    Inject a platform usage / cost anomaly, visualize it (the drift/anomaly layer), then score how the system responds with an LLM-as-Judge — see it and test it in one loop.
  • 6
    Open problems and Q&A
    10 min
    Where build, visualize, and test still break — an honest treatment of what is not yet solved.
Hands-on · you do it, not just watch
§2 Build Write a spec — watch it build

The room proposes (or votes on) a small spec; a presenter builds it live with an agentic tool; together we find where the AI-generated code still needs human verification. A recorded fallback is staged in case live generation misbehaves.

Closing Apply it to your own project

Take the decision matrix and the six-layer self-assessment from this page and run them against your own system, on the spot — leaving with a filled-in plan, not just notes.

Materials

Slides and materials will be posted on this page after the session.

Get in touch

Questions or want to connect? Email us at [email protected].

§ Software-engineering view

Every step is a software-engineering practice.

This is not "AI instead of software engineering" — it is software engineering, at the intersection of two emerging areas: AI4SE (AI building software) and SE4AI (engineering AI-heavy systems). Every part of the tutorial maps onto a classic SE discipline.

Tutorial element → software-engineering discipline
In this tutorial Software-engineering discipline
Spec-driven development (SDD) Requirements engineering · executable specifications
Agentic AI development + human-verification checkpoints Software construction · AI-assisted development · code review · technical-debt control
Six-layer rubric · LLM-as-Judge · reproducibility Software testing · verification & validation · quality assurance
Drift / anomaly layer Software maintenance · evolution · runtime monitoring
Multi-tenant, edge-to-cloud platform Software architecture · service-oriented & distributed systems
Data & model visualization (Teng-Yok Lee) Program & model comprehension · debugging tools
IRB-governed data handling / SOP Software process · compliance & governance
§ CISOSE 2026 federation

How this tutorial maps to eight conferences.

CISOSE 2026 federates eight constituent conferences. Build · See · Trust spans service-oriented systems, AI testing, big data, and explainable AI substantively, with edge and IoT adjacent. Cross-track integration is the structural reason CISOSE is the right venue for this content.

Tutorial coverage by federated conference
●●● core · ●● substantive · ● adjacent
Federated conference Tutorial segment Coverage
Service-Oriented Systems Engineering §2 platform architecture · §5 demo
AI Testing & Quality Assurance §4 six-layer rubric · LLM-as-Judge
Big Data & Machine Learning §2 big-data backbone · §3 visualization · §4 testing at scale
Cyber-Intelligence (overall) §3 data visualization / model inspection
Responsible AI §4 governance / IRB sub-section
Intelligent Mobile Computing §2 edge-to-cloud wearable ingestion
Smart Cities & IoT Sensing data sources
Decentralized Apps / Blockchain Out of tutorial scope
§ Speakers

Three presenters: a platform team and a visualization researcher.

Chia-Kai Chang (National Central University) and Kuei-Hao Li (National Tsing Hua University) build and test a large AI education platform; Teng-Yok Lee (Mitsubishi Electric) contributes the data-visualization methods that power that testing. All three present in person at CISOSE 2026 in Fukuoka.

Lead presenter
0000-0003-2575-2738
Chia-Kai Chang
Chia-Kai Chang (張家凱)
Assistant Professor, Center for General Education
National Central University, Taiwan
[email protected]

Founder and principal investigator of the Educational Omics Lab. Builds and operates Uedu, a multi-tenant AI tutoring platform deployed across multiple universities and developed largely through agentic AI development. Recent work spans large-scale learning analytics (ACM L@S 2026), educational big-data infrastructure (ICMET 2025), and a short paper in the CISOSE 2026 federation (IEEE BigDataService 2026). Holds an umbrella IRB approval for multimodal educational research.

Leads in this tutorial
§1 Opening §2 Build · GenAI development §4 Test · AI testing §5 Live demo
Co-presenter
0009-0007-3474-8489
Kuei-Hao Li
Kuei-Hao Li (李奎皓)
Ph.D. Candidate, Interdisciplinary Doctoral Program
National Tsing Hua University, Taiwan

Co-founder of the Uedu platform, with research interests in digital learning, AI-assisted instruction, and agentic AI. Co-presents how the platform was built with agentic AI development tools and how it is tested. Co-author on the team's recent work including ACM L@S 2026 and ICMET 2025 (Educational Omics Data Lake). Focus: pedagogical design, cross-institutional deployment, and agentic development workflow.

Leads in this tutorial
§2 Build · GenAI development §4 Test · AI testing §5 Live demo
Co-presenter
Teng-Yok Lee
Teng-Yok Lee (李庭育)
Principal Researcher
Mitsubishi Electric, Japan

Principal Researcher at Mitsubishi Electric working on data visualization and visual anomaly detection. PhD in scientific visualization from The Ohio State University, with highly-cited work in IEEE TVCG and IEEE PacificVis. Recent work includes IntegralCAM, a method for estimating and visualizing CNN feature contributions (IEEE ICME 2025), and efficient large-scale visual anomaly detection (IEEE AVSS 2025; arXiv 2026), alongside multiple patents on anomaly and object detection. Brings deep visualization and high-performance-computing expertise to the problem of making AI systems and their data legible and inspectable in production.

Leads in this tutorial
§3 Data visualization §5 Live demo
§ Companion publications

Anchor papers behind this tutorial.

The tutorial draws on, and points to, our team's recent peer-reviewed work. Each is positioned alongside the segment in which it is used as a worked example.

ACM L@S 2026
AI Teaching Assistants at Scale: Cross-Disciplinary Patterns of Adoption and Cognitive Engagement Across Hundreds of University Courses
C.-K. Chang, K.-H. Li
Anchor for §2 platform + §4 testing at scale.
ICMET 2025
Designing an Educational Omics Data Lake: A Multimodal Infrastructure for Technology-Enhanced Learning
C.-K. Chang, K.-H. Li
Anchor for §2 big-data backbone.
IEEE ICME 2025
IntegralCAM: Integral-Based Contribution Estimation and Visualization for Convolutional Neural Networks
T.-Y. Lee
Anchor for §3 — visualizing model decisions.
EuroVis 2021
Loss-Contribution-Based In-Situ Visualization for Neural Network Training
T.-Y. Lee et al.
Anchor for §3 — visualizing training to debug a model.
IEEE PacificVis 2020
DynamicsExplorer: Visual Analytics for Robot Control Tasks Involving Dynamics and LSTM-Based Control Policies
T.-Y. Lee et al.
Anchor for §3 — visual analytics of a learned policy’s behavior.
IEEE AVSS 2025
AAAD: Adaptive Activated Anomaly Detection on Varied Backgrounds
K. Miyamoto, T.-Y. Lee, A. Minezawa
Anchor for §3 anomaly detection → §4 drift layer.
§ Tutorial materials

Materials in preparation.

The slide deck, code references, LLM-as-Judge rubric, and IRB SOP template will be released here once the theme and co-presenter lineup are confirmed. Several of the underlying AI-testing artifacts already live inside the Uedu codebase and are referenced from our other publications.

Tutorial proposal (6-page IEEE format)

Drafted; held privately while the scope is being revised. Inquiries are welcome by email.

Slide deck

In preparation — to be released once the lineup is confirmed.

Code references & LLM-as-Judge rubric

The underlying LLM-as-Judge harness, AI-testing scaffolds, and prompt-versioning schema live inside the Uedu codebase and are referenced from our peer-reviewed publications. A tutorial-specific release will follow.

Redacted IRB SOP template

To be prepared as a tutorial handout. The umbrella IRB approval (NTU-REC 202507EM058) remains active and governs our other published work.

§ Venue

Fukuoka, Japan.

IEEE International Conference on Cyber Intelligence and Software-Oriented Service Engineering (CISOSE 2026). July 27 – 30, 2026, Fukuoka, Japan.

https://cisose.fit.ac.jp/2026/
Conference info
Conference
CISOSE 2026
IEEE International Conference on Cyber Intelligence and Software-Oriented Service Engineering
Dates
July 27 – 30, 2026
Location
Fukuoka, Japan
Language
English
Registration

Registration is handled through the CISOSE 2026 main conference website. This tutorial is part of the CISOSE 2026 program.