import Hyperlink from '../components/Hyperlink';
import Image from '../components/Image';
import YouTubeVideo from '../components/YouTubeVideo';

export default function OtherDemos() {
    return (
        <div className="flex-none mx-10 sm:mx-20 sm:w-[32rem]">
            <div className="pt-20">
                <p className="font-sans text-2xl font-medium">
                    Note: Some of gifs below can take up to half a minute to
                    load. This will be fixed soon.
                </p>
            </div>
            {/* UNCOMMENT BELOW IF DELETE NOTES ABOVE */}
            {/* <div className="pt-20 lg:pt-28"> */}
            <div className="pt-20">
                <h2 className="mb-4 font-sans text-sm font-bold tracking-widest text-orange-600" id="other_form_header">VR & AI</h2>
                <h1 className="font-sans text-3xl font-medium mb-14" id="other_form">
                    Form, 3D design tool
                </h1>
                <div className="font-sans">
                    <p className="mb-5">
                        Studies show that the immersion 3D experiences provide
                        allows better user engagement, which can help with
                        tasks like learning and entertainment. Why haven’t 3D
                        experiences proliferated then, even with the many
                        headsets available in industry? Yes, the launch of
                        Oculus Quest 2, really the first wireless, affordable
                        VR headset, happened relatively recently. Even still,
                        3D content is sparse outside of experiences provided by
                        larger game studios. In my opinion, this is because
                        creating 3D content is too difficult and cumbersome
                        using current 3D design tools.
                    </p>
                    <p className="mb-5">
                        In this two month project from November 2022, I set out
                        to accelerate 3D content creation by building a more
                        intuitive 3D design tool for designing 3D assets. The
                        overarching belief is that 3D content should be easier
                        to create using 3D interfaces, rather than in
                        industry-standard 2D tools like Maya and Blender.
                    </p>
                    <p className="mb-5">
                        Form is a VR-native design tool that uses hand tracking
                        and voice commands as first-class citizens to allow 3D
                        artists to design 3D assets. Form is tightly integrated
                        with AI to allow these normally limiting input
                        modalities to work seamlessly.
                    </p>
                    <p className="mb-5">
                        Using hands allows users to shape and scale primitive
                        solid objects intuitively. Here, I use pinch gestures
                        to shape a torus.
                    </p>
                    <Image filename="form1.gif" />
                    <p className="mb-5">
                        With the recent development of robust <Hyperlink
                            text="large language models"
                            link="https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for/"
                        /> (LLMs) like <Hyperlink
                            text="ChatGPT"
                            link="https://openai.com/blog/chatgpt/"
                        />, users can control software using natural language,
                        removing the barrier of entry to use complex tools that
                        previously had steep learning curves. Here, I use
                        ChatGPT and speech recognition technologies to parse a
                        natural language command.
                    </p>
                    <Image filename="form2.gif" />
                    <p className="mb-5">
                        If LLMs are so powerful, why not just build an
                        end-to-end AI system that generates 3D assets from text
                        directly then? We already see this in production
                        with <Hyperlink
                            text="Luma"
                            link="https://lumalabs.ai/"
                        /> or research from institutions like <Hyperlink
                            text="Google"
                            link="https://dreamfusion3d.github.io/"
                        /> and <Hyperlink
                            text="OpenAI"
                            link="https://github.com/openai/point-e"
                        />. However, having interviewed prominent 3D artists
                        in the industry, I noticed that artists, the target
                        audience for Form, are more empowered from tools that
                        accelerate their current workflow, not tools that
                        replace their workflow. That way, they fully control
                        the artistic process, something uniquely human and
                        essential for creatives. So instead of replacing the
                        design process, AI is deeply embedded into Form to
                        automate mundane workflows.
                    </p>
                    <p className="mb-5">
                        For example, say we want to generate a forest given a
                        tree. The tight coupling between AI and Form makes
                        AI-powered <Hyperlink
                            text="procedural generation"
                            link="https://en.wikipedia.org/wiki/Procedural_generation"
                        /> possible. The generated content is also editable
                        for the user.
                    </p>
                    <Image filename="form3.gif" />
                    <p className="mb-5">
                        Let’s go a step further. Using novel LLMs with a unique
                        scene representation designed for AI allows for
                        surprisingly complex capabilities. Here, I first
                        duplicate some object to build a Rubik’s Cube. Then, I
                        rotate the face of the Rubik’s Cube without having to
                        specify that I want all of the cubes on the right to
                        turn with respect to the global x-axis. From a
                        technical standpoint, this is truly groundbreaking.
                    </p>
                    <Image filename="form4.gif" />
                    <p className="mb-5">
                        Form is browser-based via <Hyperlink
                            text="WebXR"
                            link="https://immersiveweb.dev/"
                        /> meaning Form is available just by going to the web
                        browser on the headset and going to a link. Though I
                        never reached this state, using browser-based
                        technologies allows tools to be cloud-based, making <Hyperlink
                            text="synchronous editing"
                            link="https://en.wikipedia.org/wiki/Collaborative_editing"
                        /> possible (such as collaboration on Figma).
                    </p>
                    <p className="mb-5">
                        Form also uses <Hyperlink
                            text="signed distance functions"
                            link="https://jamie-wong.com/2016/07/15/ray-marching-signed-distance-functions/"
                        /> (SDFs) to parametrize primitive solid objects.
                        SDFs can be used for <Hyperlink
                            text="constructive solid geometry"
                            link="https://en.wikipedia.org/wiki/Constructive_solid_geometry"
                        /> (CSG) which makes boolean operations on 3D solids
                        simple. SDFs are widely used as representations for 3D
                        data in 3D deep learning and are probably how 3D data
                        will be represented in the future.
                    </p>
                    <p className="mb-5">
                        I stopped development of Form when I realized that to
                        get more people into AR/VR, content creation is not the
                        problem. Adoption of new technologies throughout the
                        past century has only happened with the ability to
                        connect with others first, which attracted me to the
                        problem of co-presence.
                    </p>
                    <p className="mb-5">
                        If you like this work, feel free to reach out for more
                        demos. <Hyperlink
                            text="Marisa Lu"
                            link="https://twitter.com/lumar_isa"
                        />, whose opinions have guided me throughout, is also
                        a great person to connect with.
                    </p>
                </div>
            </div>
            <div className="pt-28">
                <h2 className="mb-4 font-sans text-sm font-bold tracking-widest text-orange-600" id="other_dictation_header">VR</h2>
                <h1 className="font-sans text-3xl font-medium mb-14" id="other_dictation">
                    Spatial dictation to replace typing in mixed reality
                </h1>
                <div className="font-sans">
                    <p className="mb-5">
                        <a
                            href="#other_form_header"
                            className="text-orange-600 hover:underline"
                        >Form</a> is one of the first truly VR-native design
                        apps and one of the first VR apps to use only hands
                        and voice for control. The design and implementation of
                        Form therefore necessitated the development of novel
                        UIs to support this new medium and these under-explored
                        input modalities.
                    </p>
                    <p className="mb-5">
                        One such UI I developed aims to replace typing with
                        dictation as typing in mixed reality is awkward and
                        inefficient. Unfortunately, the current UX for
                        dictation is riddled with problems – the speaker must
                        speak fast enough or else the dictation will stop,
                        dictated text does not allow for editing, and complex
                        commands are often too long for users to remember
                        leading to mistakes. This new dictation UI spatializes
                        dictated text and allows users to drag words onto the
                        search prompt. Though this is only a 2D prototype, you
                        can imagine this UI in 3D where the user drags dictated
                        words around using hands and then searches the prompt
                        when ready.
                    </p>
                    <YouTubeVideo videoId="ZJbctgULde0" />
                    <p className="mb-5">
                        I thought this new UI was also promising for mobile
                        use cases, but upon testing realized that phone screens
                        are just too small for users to easily manipulate words
                        in space.
                    </p>
                </div>
            </div>
            <div className="pt-28">
                <h2 className="mb-4 font-sans text-sm font-bold tracking-widest text-orange-600" id="other_promptly_header">AI</h2>
                <h1 className="font-sans text-3xl font-medium mb-14" id="other_promptly">
                    Promptly, prompt search UI for text-to-image models
                </h1>
                <div className="font-sans">
                    <p className="mb-5">
                        In the summer of 2022, the AI landscape experienced an
                        explosion of new text-to-image models capable of
                        producing realistic images and art just from a single
                        natural language command. While companies like OpenAI
                        released access to these models slowly, the true
                        turning point was when Stability AI released <Hyperlink
                            text="Stable Diffusion"
                            link="https://huggingface.co/CompVis/stable-diffusion"
                        />, a completely open-sourced text-to-image
                        model that could run on consumer-grade hardware while
                        still rivaling the performance of OpenAI’s <Hyperlink
                            text="DALL-E 2"
                            link="https://openai.com/dall-e-2/"
                        /> and <Hyperlink
                            text="Midjourney"
                            link="https://www.midjourney.com/home/"
                        />.
                    </p>
                    <p className="mb-5">
                        “Prompt engineering,” or knowing which prompts to feed
                        into these models as inputs, takes some finesse. For
                        artists, using the right prompts comes naturally since
                        they have a wide range of art-specific vocabulary to
                        choose from. For non-artists like me however, words
                        like “steampunk” or “oil painting” don’t come as
                        easily, and so editing prompts to produce the exact
                        image we want is a painstakingly long process.
                    </p>
                    <p className="mb-5">
                        Promptly is a prompt search UI that I developed in
                        October 2022 to help solve this problem. In the prompt
                        field, the user inputs a prompt, and from there can
                        either generate images from that prompt or generate
                        prompt suggestions. These suggestions are prompts
                        “similar” to the inputted prompt.
                    </p>
                    <YouTubeVideo videoId="7o3AifnfB9k" />
                    <p className="mb-5">
                        I generate these prompt suggestions by essentially
                        searching for neighboring prompts of the inputted
                        prompt in latent space. Specifically, since these
                        models were trained using the <Hyperlink
                            text="LAION 5B"
                            link="https://laion.ai/blog/laion-5b/"
                        /> dataset, I first generate the inputted prompt’s word
                        embedding and then I search for k=1000 nearest
                        neighbors in the embedding space in the entire LAION
                        dataset. The keywords of these neighboring prompts are
                        then extracted using an off-the-shelf keyword extractor
                        and then ranked by frequency. Finally, the top 5
                        keywords are added to the original prompt to become
                        prompt suggestions.
                    </p>
                    <p className="mb-5">
                        Since modifying prompts is an iterative process, I
                        designed the resulting UI of Promptly as an infinite
                        canvas with draggable nodes representing prompts and
                        images. This UI makes it easy to prototype and
                        experiment with similar but different prompts so that
                        users converge to the prompt that produces the image
                        they desire.
                    </p>
                    <p className="mb-5">
                        The immediate next step would be to allow images as
                        input so that users can determine which prompts are
                        most likely to produce the inputted image. This project
                        showed great promise during user testing, but I pivoted
                        because I felt like if I didn’t create this tool,
                        someone else would. I realized at this point that to
                        maximize my impact in the world, I need to find an area
                        needs me and my skills to push the space forward.
                    </p>
                </div>
            </div>
            <div className="pt-28">
                <h2 className="mb-4 font-sans text-sm font-bold tracking-widest text-orange-600" id="other_livedata_header">VR</h2>
                <h1 className="font-sans text-3xl font-medium mb-14" id="other_livedata">
                    Showcasing live data in virtual reality
                </h1>
                <div className="font-sans">
                    <p className="mb-5">
                        In May and June of 2022, I wanted to build a few VR
                        demos that utilized live data, something that from my
                        knowledge hasn’t been done before. I had two goals: (1)
                        to find out if live data is enough for people to want
                        to use VR on a more recurring basis, and (2) to look
                        out for any key design principles for building
                        next-generation 3D UIs.
                    </p>
                    <p className="mb-5">
                        From user research after building these demos, I did
                        not find anything conclusive for either (1) or (2).
                        Nonetheless, the demos are cool, so here they are.
                    </p>
                    <p className="mb-5">
                        The first demo I did was a live flight tracker in 3D. I
                        scraped live flight data for Delta Airlines flights and
                        rendered them in 3D above a globe asset.
                    </p>
                    <Image filename="livedata1.gif" />
                    <p className="mb-5">
                        I noticed that users quickly lost interest in the
                        flight tracker because the planes never really moved
                        much; in other words, the live data wasn’t changing
                        fast or “live” enough to be interesting. With that in
                        mind, I built a live F1 race tracker. Here, I’m
                        tracking F1 driver Sergio Perez in the <Hyperlink
                            text="Monaco Grand Prix"
                            link="https://www.formula1.com/en/racing/2022/Monaco.html"
                        />. The gif is sped up by an order of magnitude.
                    </p>
                    <Image filename="livedata2.gif" />
                    <p className="mb-5">
                        To extend this demo, I could have rendered other cars
                        on the field, displayed more visualizations of race
                        statistics, or built better 3D custom assets. I chose
                        the latter. This led me down a rabbit hole of learning
                        Blender and, eventually, the decision to create my own
                        3D design tool called Form (described <a
                            href="#other_form_header"
                            className="text-orange-600 hover:underline"
                        >above</a>). I’d love to come back one day and recreate
                        these demos for fun using better assets designed using
                        Form.
                    </p>
                </div>
            </div>
            <div className="pt-28 py-28">
                <h2 className="mb-4 font-sans text-sm font-bold tracking-widest text-orange-600" id="other_explorations_header">MISCELLANEOUS</h2>
                <h1 className="font-sans text-3xl font-medium mb-14" id="other_explorations">
                    Other explorations
                </h1>
                <div className="font-sans">
                    <p className="mb-5">
                        While the projects above have considerable depth, I’ve
                        also made an effort to explore the software landscape
                        in breadth. In this section, I explain some of the
                        projects that I started but stopped before having
                        significant results due to changing interests or
                        realization of lack of business value.
                    </p>
                    <p className="mb-5">
                        Storytelling remains a uniquely human ability, because
                        it requires both a logical construction of the plot as
                        well as artistic nuance to generate suspense and
                        relief. The release of <Hyperlink
                            text="DreamBooth"
                            link="https://dreambooth.github.io/"
                        /> in August therefore excited me, as DreamBooth
                        generates images of an inputted subject in fully-novel
                        but contextualized scenes. For example, if I train
                        DreamBooth with ~20 different images of Matt Damon
                        (takes around 30 minutes), then DreamBooth can generate
                        images of any prompt with Matt Damon, such as "Matt
                        Damon hiking Mt. Everest." I realized this ability to
                        generate novel images with a persistent subject can be
                        combined with <Hyperlink
                            text="GPT-J"
                            link="https://huggingface.co/docs/transformers/model_doc/gptj"
                        /> to generate picture books, ie. stories with relevant
                        images of the protagonist. Note that this work, a
                        weekend project, was before the release of <Hyperlink
                            text="ChatGPT"
                            link="https://openai.com/blog/chatgpt/"
                        />, so the generated stories lacked literary
                        consistency and often made no sense. You can access
                        my <Hyperlink
                            text="Colab notebook"
                            link="https://colab.research.google.com/drive/1VLO1CFOpFY2FKvKrnE6uczCiZ7ePuMlk"
                        />, but it now fails to run as GPT-J inference is not
                        free anymore via the serverless GPU provider <Hyperlink
                            text="Banana"
                            link="https://www.banana.dev/"
                        />.
                    </p>
                    <p className="mb-5">
                        I’ve come to love the web because of its innate
                        cross-compatibility and availability in most digital
                        devices. When <Hyperlink
                            text="Stable Diffusion"
                            link="https://huggingface.co/CompVis/stable-diffusion"
                        /> was released, it only worked on Linux machines with
                        Nvidia GPUs under the <Hyperlink
                            text="PyTorch"
                            link="https://pytorch.org/"
                        /> framework. I worked on building out a <Hyperlink
                            text="Tensorflow.js"
                            link="https://www.tensorflow.org/js"
                        /> version of the network and the corresponding
                        inference code to bring the network to the web browser.
                        I never ended up finishing the project as the converted
                        Tensorflow.js network, at the time 3.6 GB, was too
                        large and reached the limitations of the browser’s <Hyperlink
                            text="WebAssembly"
                            link="https://webassembly.org/"
                        /> library. I could not circumvent this issue using
                        other ML frameworks like <Hyperlink
                            text="ONNX"
                            link="https://onnx.ai/"
                        /> either; more information can be in <Hyperlink
                            text="this"
                            link="https://github.com/microsoft/onnxruntime/issues/13006"
                        /> GitHub issue.
                    </p>
                    <p className="mb-5">
                        During the heart of the Web3 hype cycle in March, I
                        wanted to see what the value of decentralization was.
                        My initial project was a PDF minter which taught me how
                        easy it is to build a <Hyperlink
                            text="dApp"
                            link="https://ethereum.org/en/developers/docs/dapps/"
                        />. I then had a two week contract at <Hyperlink
                            text="Doppel"
                            link="https://www.doppel.com/"
                        /> in April, working on NFT counterfeit detection.
                        While I didn’t accept my return offer as their first
                        hire because of a lack of passion for the field, I
                        highly recommend working at Doppel if you’re
                        interested. The team is great.
                    </p>
                    <p className="mb-5">
                        I’ve also worked on several fun projects prior to all
                        of the above in January / February such as web
                        scrapers, a tool to save and untag Facebook photos
                        automatically, browser automation bots for myself, and
                        my favorite as a resident of the Mission in San
                        Francisco – <Hyperlink
                            text="whohasthebestmissionburrito.com"
                            link="http://whohasthebestmissionburrito.com/"
                        /> (server has since been taken down but a screenshot
                        of what the website looked like can be viewed <Hyperlink
                            text="here"
                            link="https://github.com/edhyah/sfburrito/blob/main/screenshot_sm.png"
                        />). Before these, I had no experience in web
                        development because I worked in AI, ML, and robotics.
                    </p>
                    <p className="mb-5">
                        Outside of software, I spent several weeks diving
                        deeply into designing custom 3D models. I’m proud to
                        say I’ve finished all of Blender Guru's <Hyperlink
                            text="donut tutorials"
                            link="https://www.youtube.com/playlist?list=PLjEaoINr3zgFX8ZsChQVQsuDSjEqdWMAD"
                        /> as well as countless other tutorials, and through
                        the process I’ve come up with a deeper appreciation for
                        3D design and its flaws. This led me to build <a
                            href="#other_form_header"
                            className="text-orange-600 hover:underline"
                        >Form</a>.
                    </p>
                </div>
            </div>
        </div>
    );
};
