Technical Project Brief

gene-alogy.net is a custom-built genealogical research platform, built from the ground up, it is not a template, CMS, or hosted service. I created this project as a means of tracking my genealogical research, and incorporating vizualisation tools for my dna research which unfortunately are not offered elsewhere. Genetic genealogy deals with massive datasets and having an efficient and dynamic range of tools to evaluate match data and combine matches from numerous different platforms is the functional focus of the project. I also had family members ask to view my research ocasionaly, but no other popular platforms create a shareable view, let alone combine traditional genealogy research with the genetic data. This page describes the main aspect of the project's architecture, database design, and the tools it affords me which have become essential to my research workflow.

552

Persons in tree

Fourteen

Countries

4,999

DNA Matches

7,579

Segments Mapped

Numbers above are live counts queried from the database on each page load.

Technology Stack

Backend

PHP 8

Database

MySQL / MariaDB

Frontend

Vanilla JS + Canvas API

Mapping

Leaflet.js

Family Trees

GoJS + FamilyTree.js

External API

NCBI E-utilities

Authentication

Custom session-based

Theming

Switchable CSS + templates

Dynamic Page Architecture

No page on this site is individually hand-coded (apart from this one, ironically). Every page, from the ancestor profiles to a DNA chromosome visualization is instead rendered dynamically by querying the database and assembling output upon page load. A shared include chain handles authentication, theming, and database connection before any page-specific logic runs. Adding a new ancestor, uploading a source document, or importing a DNA match automatically propagates anywhere it's called on the site with no manual page updates.

Request → Response Flow (All Pages)

HTTP Request

›

include_header.php entry point

›

style_switcher.php
auth_system.php
db_connect.php

›

Page Logic
DB queries → HTML

›

include_footer.php

›

Rendered Page

Theming & Responsive Design

The site supports multiple switchable visual themes which I enjoy changing from time to time; at the moment I simply have a light and dark version of the current view. Visitor's preference persists across sessions by way of cookies. Both themes are fully responsive. A sticky mini-header appears on scroll, rebuilt dynamically from the main navigation without duplicating markup. An AJAX-powered search autocomplete in the sidebar queries the database as you type, returning matching ancestors with birth/death years.

Profile Pages

A single profile.php handles every person in the database. It dynamically assembles its content from whichever tables contain data for the queried individual: birth/death records, parents, spouses, and children from the genealogy tables; military service, religious affiliation, colonial records, enslavement data, and census references from the life event tables; DNA cluster assignments from the segment tables; source documents from the filesystem; photos from a naming-convention-based directory scan. If a table has no row for that person, that section simply doesn't appear — no conditionals scattered across hundreds of hand-coded pages.

Each profile also renders a life-event timeline and a Leaflet map plotting birth, death, and residence locations as geo-coordinates, drawn from a separate timeline_data.php endpoint via fetch. I propogate geocoordinates myself due to the frequency of innacurate location data I have encountered in the past, especially when dealing with historical places which do not always map onto modern maps. This way I ensure the accuracy of the location data used in the timeline and map. Individuals in the database are sorted and easily located not only via the live search, but through a variety of useful sorting pages listed on the sidebar.

A profile page assembled entirely from database records — birth and death places are automatically linked to country pages, military service is pulled from the warfare table, photos are matched by filename convention, and the timeline is built from life-event records and rendered with Leaflet.

Primary Source Documents

Source files (scans, PDFs, transcriptions) are stored in a sources/ directory and associated with individuals by a filename convention. Profile pages scan the directory at page load and display matching documents automatically without requiring any manual linking of new sources. For significant, difficult to read, or non-English records, transcriptions are input directly into the ancestor's JSON notes file and rendered inline on the profile.

An 1827 marriage record from Sonora, Mexico — transcribed from the original Spanish and rendered inline on the ancestor's profile. The source document scan is linked at the bottom of the page alongside all other associated files.

Photo Restoration & Colorization

Many profile photos are digitally restored and colorized. Restoration is done manually in Photoshop; AI enhancement is applied selectively after sufficient manual cleanup. Colorization uses knowledge of the subject's ethnic background, social context, and any available descriptive records to estimate accurate skin, hair, and clothing tones. A modal disclaimer on every page with restored photos explains the methodology and links to the original scans in the sources section.

Database Design

The database is organized into five functional groups. Genealogical relationships, DNA evidence, historical context, and life events are linked by shared identifier keys but stored independently — so the research dataset can grow in any direction without restructuring existing tables.

Database Schema — Functional Groups

DNA & Genetics

individuals
matches
segment_matches
chromosome_map
chromosomes
clusters
snps
snp_data
snp_annotations
snp_notes

Genealogy

ancestors
relatives
associates
ethnicities

Historical

census_us
census_nonfederal
census_national_
benchmarks
census_vocab

Life Events

events
residence
places
warfare
colonists
jobs
faiths
enslaved
enslavers

System

users

Entity relationship diagram showing all database tables and their column definitions — The full schema as viewed in DataGrip — showing column definitions, data types, and foreign key relationships across all table groups.

The separation between ancestors, relatives, and associates reflects a research distinction: direct-line ancestors, collateral relatives, and associated individuals (neighbors, witnesses, enslaved persons) who appear in documents but whose exact relationship may not yet be established. All three are queryable together through UNION queries wherever a full person lookup is needed — for example, fetching a parent or spouse by ancnum regardless of which table they're in.

DNA Research Tools

As previously stated, the primary purpose of this project is its incorporation of multiple custom-built DNA analysis tools, developed to support research workflows that commercial testing platforms and third-party tools don't offer together in a single integrated environment, if at all. The visualizations and integrations are rendered in real time from the database using the HTML5 Canvas API rather than no pre-generated images. This is helpful in that it allows for dynamic updates when newer, more precise data becomes available with a simple database insert.

Genome Painting

Each individual whose test I manage for research purposes has a full genome view of all 22 non-sex chromosomes which are automatically painted with pre-defined segment colors by ethnicity for both the maternal and paternal copies, with a linked Leaflet map rendering the geographic regions corresponding to detected ancestries. I manually define the regions using GeoJSN, allowing me to make corrections or precision edits as needed.

Chromosome Browser

Shared DNA segments for each individual test and numerous matches from all the major dna testing databases are able to be displayedacross all 22 autosomal chromosomes for any match or cluster. Ethnicity composition is painted underneath each match segment, showing not just where a segment is shared but which ancestral connections it likely represents. This is invaluable for following match clues down the line being studied, and for confirming traditional paper trail accuracy as well. Maternal and paternal chromosomes are displayed separately and segments are clickable in cluster view, allowing navigation to the individual match's detail. Dna matches appearing to match more than one tester will display each matched segment alongside each tester's corresponding chromosome and match segment.

Cluster Analysis

DNA matches are grouped into overlapping clusters on a tester's chromosome region, grouping matches who likely descend from the same ancestral source. A custom distance rule merges nearby segments on the same chromosome into a single logical cluster segment. Each segment is tagged with its Most Recent Common Ancestor (MRCA), if known, and a foreign key linking DNA evidence directly to a named individual in the genealogy tables allows for the display of the corresponding genetic data on the ancestor's profile page when queried.

SNP Browser

Raw SNP data from DNA tests is stored at the variant (allele) level and queryable via an AJAX-driven browser. This allows me to easily search for a specific RSID (gene variant) or browse by category (ancestry markers, phenotype, health, trait). Each variant shows genotype calls for all tested individuals side by side, allowing for easy comparison and analysis, with custom annotation notes and outbound links to dbSNP and SNPedia.

Auto-Annotation Pipeline

A CLI script queries the NCBI E-utilities API to automatically annotate the curated SNP list with gene symbols, chromosomal positions, and ClinVar clinical significance — keeping research data linked to authoritative genomic databases without manual entry. Variants already annotated are skipped unless a --refresh flag is passed.

Historical Cross-Reference

Census records, colonial settlement records, military service, religious affiliation, and enslavement records are stored in dedicated tables and surfaced on profile pages. The life-event timeline and map on each profile are built from these records dynamically, plotting all known locations across the individual's lifespan.

Genome painting view showing 22 chromosomes painted by ethnicity with a Leaflet ancestry map above — Leonard's genome — all 22 chromosomes painted by ethnicity for maternal and paternal copies, with a Leaflet map above rendering the corresponding geographic regions. The legend below shows calculated percentages. All data is live from the database; the Canvas and map are drawn fresh on every page load.

Chromosome browser showing DNA match clusters across all 22 chromosomes — Match overview — all 22 chromosomes with clusters and individual matches color-coded and labeled on hover. Clicking a segment navigates to the cluster or match detail.

Individual DNA match detail showing chromosome visualization and obfuscated match name — Match detail view — chromosome segments with ethnicity overlay, shared segment table, and most recent common ancestor link. Match name is obfuscated for non-logged-in visitors (Ab**yz format).

DNA Research Workflow — Raw Data to Visualization

1 — Raw Data Ingestion

DNA test files are parsed and loaded into snp_data and snps, associated with the tested individual in the individuals table.

2 — Match Import

Shared segment data from testing services is loaded into matches and segment_matches with chromosome positions, centimorgan values, SNP counts, and maternal/paternal side assignments.

3 — Ethnicity Mapping

Each chromosome position range is assigned an ethnicity from the ethnicities table and written to chromosome_map. Each ethnicity record stores an associated GeoJSON polygon used to paint the ancestry map.

4 — Cluster Assignment & MRCA Tagging

Matches are grouped into clusters. Each segment is tagged with the Most Recent Common Ancestor (MRCA) — an ancnum linking directly to a named individual in the genealogy tables.

5 — Visualization & Research

All data surfaces through the chromosome browser, genome painting, cluster viewer, and SNP browser — rendered in real time with no pre-generated images or cached flat files.

Security & Privacy

Because the site contains DNA data belonging to other people — matches who have not chosen to make their information public — a tiered access system controls what any given visitor can see. The obfuscation below is live, not illustrative: this is what a DNA match name actually looks like to a non-logged-in visitor.

DNA match detail page showing names obfuscated as Ab**yz for non-logged-in users — Match names and tester names are run through a server-side obfuscation function before being serialized into the JavaScript data payload — the readable names never leave the server for unauthenticated sessions. "Am**da" and "Le**rd" are the actual output, not a client-side hide.

Access Control Tiers

Public (Not Logged In)

✓ Ancestor profiles (non-living)

✓ Family tree, blog, gallery

✓ Chromosome & genome visualizations

✓ Segment positions and cM values

✗ Match names → shown as Ab**yz

✗ Match email addresses → masked

✗ Profiles of living individuals

Logged In

✓ Everything above

✓ Full match names and contact info

✓ Living individual profiles

✓ Source documents

✓ SNP browser and raw variant data

✓ Census cross-references

Implementation

All database queries use prepared statements (via PDO and MySQLi) to prevent SQL injection. All output rendered to the browser passes through htmlspecialchars() to prevent cross-site scripting. DNA match names and email addresses are run through a server-side obfuscation function before being serialized to the JavaScript data payload for unauthenticated sessions — the data never leaves the server in readable form, not merely hidden client-side. Individuals born after a threshold year with no recorded death date are automatically treated as potentially living, and their complete profile data is withheld from unauthenticated requests regardless of how the URL is constructed.

Research Purpose

The site exists primarily as a research tool. The goal is to connect every DNA match to a named ancestor, verify those connections across multiple tested individuals in the same family, and build a fully-cited, cross-referenced record of family history spanning fourteen countries and several centuries. The public-facing side makes that research available to family members and others with shared ancestry, while keeping sensitive data — matches, source documents, and living relatives — accessible only where appropriate.

Every structural decision in the schema — the separation of ancestors from relatives, the MRCA foreign key on segment matches, the per-individual chromosome maps, the GeoJSON stored on ethnicity records — exists because the research required it. The architecture follows the research, not the other way around.

Forensically Stalking,
& uncovering the dead.

LOGIN