Index of a/v recordings

Front Page Forums Help Wanted or Offered Index of a/v recordings

This topic contains 17 replies, has 10 voices, and was last updated by  Alex M 5 years, 7 months ago.

Viewing 15 posts - 1 through 15 (of 18 total)
  • Author
    Posts
  • #3198

    Alex M
    Member

    Hi all, I’m thinking about creating a web site with searchable index of Culadasa’s audio and video recordings.

    There are several use cases:

    • from time to time I’d like to refer someone to Culadasa’s words on topic that interests them but I can’t find it,
    • on recent Patreon Q&A’s he gets asked some things he’s already answered in depth, but nobody knows or remembers,
    • I’d like to be able to return to specific parts later and listen to them again when I’m at appropriate stage in practice,
    • Generally making teachings more available. Someone may want to quickly find additional info on how to deal with some specific problem, without listening to everything else. It’s very useful to listen to everything from beginning to end, and it concerns me a bit that this project would make it easy for people to just skim instead.

    Implementation: each recording is manually divided into parts, and each part is marked with tags and/or topic. There should be admin-level web-interface for that. Tags/topics are searchable. Clicking on a search result starts audio / video player that is preset to play corresponding fragment, but allows seeking through the whole recording too. I’ll possibly look into machine speech-to-text translation to get transcriptions, they may provide enough correct output to make full-text search somewhat useful.

    Future and plans: possibly add a/v of other teachers if there is interest. There are a lot of ways to make UI better, both for mods and for users. Project may evolve into generic platform for indexing lectures, online classes or whatever, but the idea will probably be superceded with advances in machine speech recognition in 10-15 years or integrated into MOOC platforms. I’ve heard about ongoing efforts to manually transcribe the talks, and when/if done this project won’t be as useful anymore. I also may want to make some money off the site to cover development, operational costs and maybe make some profit, though I don’t have much hope for any of that.

    My motivation: I’ve benifitted greatly from Culadasa’s audio recordings and feel that they are actually a required complimentary part to the TMI. So I’d like to make them more accessible and useful for more people. To make web dev a bit less gruesome I’m planning to use a hipster development stack, learn something new in the process, and do things more “right” this time.

    Current state and commitment level: I’ve already started building a mock-up mostly to find out if the new tech is any good. It’ll take 1-2 months of work to create a usable site and mark up initial amount of talks. I’m not quite sure if I’ll be able to follow through as amount of work is quite large and when I’ll get a day job I’ll have much less available time than I do now.

    Why I’m posting this: I’d like to sort things out in advance so no time/effort are wasted.

    • I’d like to get feedback on the idea: is it worth doing? Are there any hidden cons? Maybe somebody is building or have built something like this already?
    • I need permission from Culadasa to use his audio/video recordings, as many of them as possible. If he agrees that this is a good idea, I think he can give permission to use dharmatreasure and youtube recordings, but I’ll have to ask hosts of podcasts (BATGAP, DY, …) individually. While it’s possible to send users directly to e.g. youtube video, it wouldn’t likely be ergonomic. And they have to be stored on site anyway for the markup / indexing stage.
    • This topic was modified 5 years, 8 months ago by  Alex M.
    #3200

    JavaJeff
    Member

    It’s funny you bring this up. I was thinking of a very similar thing not too long ago…what is called a “concordance” in literary circles. It would be very valuable. A helluva lot of work, though.

    One of the main issues I thought of was to agree upon a standard way of transcribing spoken word – do we trim out asides that are not pertinent to the topic? Do we trim out superfluous words/utterances (the “uh”s)? etc.

    #3201

    Alex K
    Member

    I think this would be helpful/useful for many.

    I have only listened to a a few of Culadasa’s talks so will be slowly working my way through the archive and happy to help if assistance wanted.

    • This reply was modified 5 years, 8 months ago by  Alex K.
    #3202

    Alex M
    Member

    JavaJeff: Dealing with utterances, fixing grammar and such are more relevant to manual transcription efforts. I think somebody is working on it now and it’s both very useful and time-consuming. I feel like it’s not possible to transcribe all recordings in the near future, so this project aims to make recordings searchable in the meantime. Also some may prefer audio / video media.

    Here is what I mean by example: in the recent patreon Q&A Culadasa gets asked about purifications during stage 4, awakening without cessations, … . Recording is manually preprocessed: admin marks beginning of each question and end of the answer, and adds keywords (tags, labels) like “stage 4”, “purifications” and short description what this part is about, like “Experience of stage 4 purifications from meditator’s perspective”. Fragment about cessations may have keywords “gradual awakening”, “gradual stream entry”, “no cessation”, “awakening without cessation” and be described by “What to focus on in practice if partial awakening happened without cessation”.

    Users of the site then can search for specific tags, or use full-text search on descriptions. And then proceed to listen / view actual recordings.

    #3203

    Ted Lemon
    Member

    We’ve actually been working on a wiki to accomplish something like this. Do you think that would work, or are you thinking of something different?

    #3204

    William W
    Member

    I have listened to all of Culadasa’s online audio discourses at Dharma Treasure.org, starting with some short answers done in 2008. There are a number of missing talks from September 2010 to March 17, 2011, because of an error in the link to where they are on the cloud.

    I have been annotating the discourses since April 7, 2011, to varying levels of detail depending on if I was meditating, annotating more or lacked mindfulness. I have included the file of annotations as of the the date listed at the start of the file. File names are those used on your servers, with dates of discourse added to the beginning of those needing it in the format “yymmdd.part” & part of the annotation added to the end of the file name, after a space dash space dash, as they are saved in my computer. I plan to re-listen to those prior to April 7, 2011, to annotate them as well.

    I believe this file of annotations attached to the file name could be used to search topics & details of chosen discourses for closer evaluation to use for writing a book or articles for dissemination of Culasdasa’s practical technical insights & his ontological, epistemological framework based on Suttas, other sources & experiences. I’m not sure if this would be more useful integrated into the web site as a search table looking for key words as topics referred to the recording, for use by the growing Sangha Community.

    I have begun annotating the Patreon Q&A sessions, almost all of them to date & some of the video talks he has given, eg at the New York International Meditation Center.

    I had previously suggested to Nick Perry and Blake Barton, that if we could have Culadasa do the training of DragonSpeak Premium program, we would have a good chance of using that to transcribe the files. If there is any way that I can provide any further help, please let me know, via email, wmwallin45@sbcglobal.net, or phone 510-932-0038.

    Attachments:
    You must be logged in to view attached files.
    #3206

    Malte Malm
    Member

    I think this would be a great idea, especially since he speaks of so many things not readily found in TMI and as far as I know his next book will have a more mainstream audience and societal/political focus?

    Anyways, my first thoughts were transcribing + a wiki, however would require large efforts, perhaps with a transcription “service” where parts could be chunked out to interested people this could be done? Anyways, a wiki would be super interesting. Publicly avaliable things, for example youtube, could be referenced to by URL + relevant time in the video.

    Maybe even better would be to have someone/a team parse through all of his stuff and drafts and then ghost write a book or maybe post stuff topic-wise in a blog format.

    #3207

    Jamie
    Member

    I think this seems like a good idea in principle. Do you have an estimate of how many hours of audio we’re talking about here?

    I might be interested in helping to annotate the clips, particularly since I haven’t listened to a lot of it myself yet.

    I also have some very limited web development experience and a desire to learn more, if I could be of any use that way.

    #3208

    William W
    Member

    As of 2018-07-30 Total of all 733 recordings – 681h 19m; Total 733 recordings I have listened to – 681h 19m; Total 553 recordings annotated or indexed – 554h 44m; Total left recordings heard prior to my starting indexing 180 recordings of 126h 35m,since the first annotated audio is 110407.1-tcmc-thursday-04-07-2011a, & I have completed all those listed on the website resources since that date. To see what has been done so far, refer to the attachment to my earlier post today.
    I had previously suggested to Nick Perry and Blake Barton, that if we could have Culadasa do the short voice training of DragonSpeak Premium program, we would have a good chance of using that to transcribe the files. Dragon Speaking Premium 13 works best using the usual microphone that he normally gives his discourses for more accurate transcription. A new version of this program costs about $100. I could buy it and send it to Culadasa to train it and then the program with the training files could be sent to me & I could try using the mp3 files to transcribe the files. Certainly he has had some incredibly informative and educational discourses and Q & A sessions that, once transcribed could be edited to provide a resource for future meditators. eg as answers to many problems on & off the cushion.

    #3209

    ward
    Member

    Although you requested only recordings, I would like to offer a listing of Culadasa’s posts from the old Jhana and Insight Yahoo group. This is in the spirit of compiling valuable info that is not directly from TMI. I found these posts to have both practical and historical interest. I posted the links here some time ago, but now I am attaching them in the form of a text file you can adapt to your own format. Someone will need to go through them and add brief descriptions and/or keywords.

    Attachments:
    You must be logged in to view attached files.
    #3211

    William W
    Member

    I reviewed the attachment that I included above, & I should warn you that the annotations doen for most of 2011 are very brief, listing only the major topics, but since then the annotations have gone into much greater detail, with more paraphrasing of what Culadasa spoke about.

    #3213

    Blake Barton
    Keymaster

    Hi Alex M., William and All,

    I think that there are some good ideas here, and I feel that Dharma Treasure and Culadasa would probably be interested.

    I am serving as the Executive Director of Dharma Treasure, and from my perspective, I would prefer to see an indexed, searchable recording list reside on the main Dharma Treasure web site or DT Wiki, instead of on a separate web site.

    The work that William has done indexing the talks is quite amazing, and would be a great start to this sort of project. Ward’s work from Jhana_Insight would also be great to include.

    I am not sure if any work is currently being done on the transcription project. William, I can talk to Culadasa about training DragonSpeaks. He made most of the recordings with a portable audio recorder. Do you know if it can import an audio file for training or if you need to record directly into the software? Do you know its accuracy, and if it can work with a specialized vocabulary like Dharma terms?

    I will discuss this with Culadasa, and get back to everyone.

    With Gratitude,
    Blake

    #3214

    Alex K
    Member

    Hi,

    Such a great endeavor, I was thinking along some of these lines too… A wiki is a great start, I have already sent Blake a couple of links to talks, articles and podcast interviews outside dharma treasure.

    I second the motion, that it would be good to keep things together as much as possible and have tge Dharma treasure website as the focal point.

    I offer to help with time, some web knowledge (I am a software engineer and also do web frontends) and stupid ideas 🙂

    #3215

    Alex M
    Member

    Ted: wiki would work quite fine, I haven’t thought of that.

    Pros:

    • works right now
    • requires general sys admin knowledge to operate and support, if any. No need to look for a programmer that can work on the code, add a feature or fix a problem that popped up, that’s a very important thing.
    • multi-user editing is built-in, including logs and rollbacks
    • easily extended beyond a/v recordings, and there is already need for including posts from news groups.

    Cons: wiki is a generic solution, so a lot of things will never be quite right or ergonomic. Here are some things from top of head, for some there may be solution, for some may not, depending on wiki engine:

    • Support for tags may be absent or unsutable for the task, cross-referencing have to be done manually (e.g. page with everything that’s related to stage 4 could be autogenerated by custom code, but may have to be done manually in wiki, and something will get lost)
    • Referencing specific part of a recording works for youtube links (start time can be embedded into link). It’s not obvious how to do that for soundcloud, or raw mp3 files, I haven’t looked deeply into this. Requiring users to seek manually to the specified time is not very convinient. Though it won’t stop people that really need the information.
    • Soundcloud may go under, it nearly did about 2 years ago, so if it’s used for audio links, they can become broken.
    • Displaying just one a/v player per page. Embedding a frame with youtube video for each fragment may be too heavy
    • Generally ergonomics for people doing annotations and end users will suffer.

    William W: thank you, that’s the great amount of work and can be of much use to people even is published right now as-is. Problem is how to make it visibile.

    Blake: integrating into an existing web site expands scope and brings another set of restrictions and tradeoffs. There are several possibilities how to do this from full integration to proxying content or embedding a single-page app, but it’s substantially easier to implement the thing as a standalone site (or put it on a subdomain of dharmatreasure.org). Another question is how to organize things so the site will continue to work if something happens to the original developer/admin. Also if the site is integrated into dharmathreasure, it’s much more unlikely that scope will ever expand to other meditation schools, e.g. Shinzen, though I don’t know if it’s a good thing or not.

    • This reply was modified 5 years, 8 months ago by  Alex M.
    #3223

    William W
    Member

    I am sorry for the delay in responding to your posts.

    Ward: thank you for sending those links to the conversations from the Jana Yahoo group.

    Blake: I think that the work I have done annotating the talks Culadasa are quite uneven, some are very brief and some are very detailed, but include my particular paraphrasing of what Culadasa said. I will try to generate an Excel type spreadsheet which will have the name of the file, as I have renamed it according to the convention that I may have explained earlier and in the next column the annotations related to that MP3 file name. Without too much difficulty I believe I could go through the website, and discover the file name of the MP3 file as it presently exists or is referenced to on the website, when one asks to download the file. There may be easier ways of accomplishing this task of linking the annotations to the files that exist on the dharma treasure website. That is above my pay grade.

    Another project I would like to work on is to restore the linking that is on the webpage to the MP3 files that exist in the cloud but when one tries to listen or download them only brings up a unworkable link path. This problem applies to the discourses from September 2010 to March of 2011. Then I would be able to listen to those discourses and annotate them as well.

    regarding the program Dragon speak, I have version 11 premium, which I have trained by going through the one-hour or so testing of the microphone and reading some predetermined text for the program to get a beginning start on how one speaks, and then I have used it on and off, less than I had intended since I was not writing as much as my fantasies of writing had misled me to believe. I have trained this program with two different microphones but I also have a portable recorder that has sufficient sound quality to work with this program. The program is always improving in its transcription of what I say since it learns from the mistakes that are made besides which it includes some easy means to correct misspellings or wrong words by simply saying the words correct that, which brings up a list of 10 choices which sound the same. With continued use of the program, its recognition of one’s speech improves. By the use of spelling, I was able to enter Culadasa’s name just once and the program was able to recognize Culadasa’s name after that. The program maintains a correlation between the sound file and the words that have been transcribed so that they can be corrected at later date using voice commands. I think it would be impossible to train it just by importing an audio file. But I do think it would be quite possible to to have Culadasa train the program and then send me a copy of his user profile files which then I could use to further train the program to more accurately transcribe the existing MP3 files. No success is guaranteed as we are all aware of. With regard to its accuracy you be the judge since I have dictated this posting using this program and made very few corrections, leaving most of the errors for you to see. With regard to the specialized vocabulary of the Buddha Dharma I believe that just spelling the correct words initially would be sufficient for the program to recognize them later. In dictating this posting I have only spelled out to words, which did not include Buddha Dharma, that came out without any training of the program.

    Alex K: could you please send me the link to the wiki project. Thank you.

    Alex M: it sounds like you and Alex K have much more programming or software knowledge than I do, and I would be grateful for any suggestions to remedy any errors in my thinking about how I may help put this indexing of his discourses and other source documents and videos to make specific topics more easily query able. Adding the start times to specific topics within a single audio or video file would require someone to go through noting the times when that topic began, and adding that information to the wiki link or the annotation page of the wiki link. You see I’m already out of my depth. With regard to having the wiki page or whatever it will be called incorporated into Dharma treasure or being independent from the Dharma treasure website would also necessitate making a decision of whether or not at this stage we are trying to simply index for easy access to Culadasa’s various discourses and topics discussed, or are we trying to make a general page to index topics in the Buddha Dharma and how various teachers have talked about these topics, which I think would be a much bigger project, and would involve some degree of cooperation with the students of other teachers, in the hopes of a fruitful outcome of inter-pollination. I have dictated this entire posting and it is taken me slightly more than one hour, most of which I spent in thinking about what I wanted to say. Thank you for all your help and your interest in promoting further dissemination of the very valuable practical meditation advice, and very interesting ontological and epistemological theories/interpretations of the Buddha Dharma as discussed by Culadasa. Be well

    • This reply was modified 5 years, 8 months ago by  William W.
    • This reply was modified 5 years, 8 months ago by  William W.
Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic.